WEBVTT

00:00.080 --> 00:00.920
Welcome back.

00:00.960 --> 00:06.560
We succeeded in splitting the data into test and training sets.

00:06.600 --> 00:12.360
Now we're going to introduce a very important concept called feature scaling.

00:12.480 --> 00:20.160
Feature scaling is especially crucial for neural networks and many other machine learning algorithms.

00:20.520 --> 00:27.120
It standardizes the range of independent features in the data to a common scale.

00:27.160 --> 00:28.840
Back to our model.

00:28.840 --> 00:30.320
We have the input.

00:30.360 --> 00:34.920
In the input layer we have nine parameters cylinders.

00:34.920 --> 00:35.880
Displacement.

00:35.880 --> 00:42.360
Horsepower, weight, acceleration, model, year, origin one, two and three.

00:42.480 --> 00:52.080
So if we look at cylinders parameter we have numbers and the range is between 3 and 9 while the displacement

00:52.080 --> 01:02.480
is different scale which is CC and it's range from six 8 to 4 five five, while the horsepower is another

01:02.570 --> 01:12.410
Measurement and another feature with horsepower, HP and range 46 to 230, while the weight is in pounds

01:12.450 --> 01:18.490
ranging from 1613 to 5 one, four zero and so on.

01:18.490 --> 01:21.130
And the other features are the same.

01:21.130 --> 01:27.890
So those features have different scales and different units.

01:27.890 --> 01:31.090
How to measure the effect.

01:31.090 --> 01:44.290
In a one scale, we need to use one scale called feature scaling and make them as a one unit common

01:44.290 --> 01:44.970
scale.

01:44.970 --> 01:54.970
So feature scaling standardizes the range of the independent features in the data to a common scale.

01:54.970 --> 02:04.370
So here we need to create a common scale to measure the features because they are different.

02:04.370 --> 02:10.660
Why this is very important for neural network for the faster convergence.

02:10.820 --> 02:17.540
Gradient descent converges much faster when features are on similar scales.

02:17.540 --> 02:25.660
So when when dealing and when giving those data to the machine learning, it will learn faster.

02:25.820 --> 02:29.020
When the features are on similar scales.

02:29.020 --> 02:34.180
Without scaling, the algorithm zigzags towards the optimum.

02:34.180 --> 02:36.380
It will not reach the optimum.

02:36.380 --> 02:39.540
Also, we prevent dominance.

02:39.580 --> 02:43.300
Prevents large scale features like the weight.

02:43.340 --> 02:44.540
Look at the weight.

02:44.540 --> 02:57.340
We have £303,500 for our for our example, and the range is between 1613 and 2 5140.

02:57.580 --> 03:06.580
And this is a large scale compared to the cylinders displacement, horsepower, model year, origin

03:06.580 --> 03:07.470
and so on.

03:07.470 --> 03:10.510
So the dominant here is wait.

03:10.550 --> 03:21.830
We're not going to like detect and predict the MPG based only on the weight because the weight is the

03:21.870 --> 03:23.110
dominance here.

03:23.110 --> 03:32.590
So here we need to prevent large scale features like weight from dominating small scale features like

03:32.630 --> 03:39.190
acceleration, cylinders, displacement, horsepower, model year, origin, and so on.

03:39.350 --> 03:48.790
All features contribute equally to the learning process, so the MPG depends on acceleration, model

03:48.790 --> 03:54.190
year, weight, horsepower, displacement, and cylinders equally.

03:54.190 --> 04:01.830
All feature contribute equally to the learning process, reducing numerical precision.

04:01.950 --> 04:10.160
Precision issues during calculations and prevents preventing gradient, explosion or vanishing problems

04:10.160 --> 04:12.200
that we will see in the next videos.

04:12.240 --> 04:19.920
Okay, so the main concept here is to standardize all the features to one scale.

04:19.960 --> 04:24.240
To do that we need to create a scalar object.

04:24.440 --> 04:30.080
So here we have to use scalar standard scalar.

04:30.120 --> 04:32.920
What does standard scalar do.

04:33.080 --> 04:37.280
Standard scalar transforms or standard scalar.

04:37.320 --> 04:37.840
Sorry.

04:38.000 --> 04:41.160
What does standard scalar do.

04:41.360 --> 04:50.760
It transforms data to have mean equals to zero and standard deviation equals to one.

04:50.960 --> 05:01.520
With a formula x minus mean over standard deviation, x is the feature value over a minus mean over

05:01.560 --> 05:03.200
standard deviation.

05:03.360 --> 05:07.000
For example, here we have the weight feature.

05:07.000 --> 05:13.600
Before scaling we have 1000 for the first car 3,002nd card and so on.

05:13.640 --> 05:22.760
After scaling, we need to create a scale that allows us to measure the weight according to the mean

05:22.760 --> 05:24.400
and the standard deviation.

05:24.400 --> 05:27.960
So mean equals to zero and standard deviation equals to one.

05:28.000 --> 05:39.920
We use this formula to transform the 1500 to -1.2, and 3000 to 1.5, 2500 to 0.8, and so on.

05:39.960 --> 05:40.480
Okay.

05:40.760 --> 05:43.200
So this is the the scaling.

05:43.200 --> 05:46.800
This is before scaling and this is after scaling.

05:46.920 --> 05:55.040
Now we need to use the fit transform function x train and scaled.

05:55.280 --> 06:02.320
So this is the annotation I want from you to fill with scalar dot fit transform.

06:02.320 --> 06:05.240
And here we need to train x.

06:05.360 --> 06:14.290
Here we are calculating the mean and standard deviation from training data only then Transforming the

06:14.290 --> 06:15.690
training data.

06:15.690 --> 06:23.970
So here we use the training data, not the testing data, because we need to test our data in a reliable

06:23.970 --> 06:26.370
mode in a reliable way.

06:26.570 --> 06:30.450
We don't need to pass any data before for the training.

06:30.690 --> 06:37.850
Also, we used the X train in order to calculate the mean and the standard deviation from training data

06:37.850 --> 06:38.290
only.

06:38.530 --> 06:49.730
Then let's use the transform function x scaled equals to scalar dot transform x test.

06:49.770 --> 06:58.450
This will use the same mean and standard deviation from training data doesn't fit on test data.

06:58.570 --> 07:01.290
So those are very important.

07:01.290 --> 07:05.770
Here we are creating the scaler by using this formula.

07:05.810 --> 07:09.410
X minus mu over standard deviation.

07:09.410 --> 07:14.010
Having the mean equals to zero and standard deviation equals to one.

07:14.010 --> 07:21.020
So those data if we make it, we have the if we calculate, we get mean equals to zero and standard

07:21.020 --> 07:21.980
deviation one.

07:22.180 --> 07:24.780
Before scaling, the weight was this.

07:24.780 --> 07:31.860
And after scaling the weight becomes this okay without scaling problems.

07:31.900 --> 07:34.100
Neural network train very slow.

07:34.100 --> 07:41.020
So without using the trained scaled feature and calculating the mean and standard deviation from training

07:41.020 --> 07:46.540
data only, then transforming the training data using the fit, transform and transform functions.

07:46.580 --> 07:49.500
Neural networks train very slowly.

07:49.780 --> 07:57.020
Some features dominate, others model may not converge properly, and poor performance.

07:57.300 --> 08:07.220
Feature scaling is a critical pre-processing step that often makes the difference between a model that

08:07.220 --> 08:10.540
works well and one that doesn't.

08:10.540 --> 08:13.820
So this is a very, very important thing.