WEBVTT

00:00.080 --> 00:01.080
Welcome back.

00:01.120 --> 00:04.800
We trained our model and get the learned equation.

00:04.800 --> 00:09.040
Price equals to 2.9 times size plus 2.7.

00:09.320 --> 00:17.280
This is the equation that it is concluded by the model when is being trained.

00:17.320 --> 00:18.120
Go up.

00:18.120 --> 00:21.080
If you remember we splitted the data.

00:21.120 --> 00:29.320
20% of the data will be used for testing and 80% of the data will be used for training.

00:29.360 --> 00:32.000
We used this 80% for training.

00:32.000 --> 00:37.080
Now we're going to use this 20% of the data for testing.

00:37.080 --> 00:41.120
So step number three make predictions on test data.

00:41.320 --> 00:54.280
So here let me start testing y pred equals to model dot predict x test model dot predict is uses the

00:54.280 --> 00:59.080
trained model to make predictions on the test data.

00:59.120 --> 01:03.480
Set X test contains the input features for testing.

01:03.680 --> 01:07.960
X-test contains the input features for testing.

01:08.200 --> 01:13.000
Why predict stores the model's predicted values?

01:13.160 --> 01:16.400
If we run, everything will work fine.

01:16.440 --> 01:20.320
Now let's calculate the evaluation metrics.

01:20.320 --> 01:26.880
Start with MSE equals to mean squared error y.

01:27.160 --> 01:32.880
Test y predict the MSE mean squared error.

01:32.920 --> 01:39.640
Measures the average squared difference between actual and predicted values.

01:39.840 --> 01:43.000
Lower MSE better performance.

01:43.000 --> 01:49.240
So when you get lower MSE, your uh, you're getting better performance.

01:49.280 --> 01:58.520
Now let's calculate another thing that we're going to use for evaluating our metrics, which is the

01:58.520 --> 02:08.160
r squared score R squared score R squared score measures how well the model explains the variance in

02:08.160 --> 02:08.720
the data.

02:08.760 --> 02:12.680
Higher R square equals to better performance.

02:12.840 --> 02:16.320
The range is from minus infinity to one.

02:16.520 --> 02:20.680
So r square equals to r squared score.

02:20.880 --> 02:25.520
What data we need to pass the y test and the y prediction.

02:25.520 --> 02:31.640
So we need to set the difference between predicted and the tested once.

02:31.640 --> 02:35.040
So the actual and the predicted.

02:35.200 --> 02:42.440
Once you see guys how we compared how we pass the data for the testing and prediction.

02:42.480 --> 02:48.600
How to set the difference by MSE and how to calculate a measure.

02:48.640 --> 02:52.360
How the model explains the variance in data.

02:52.360 --> 02:59.200
So MSE we need to get lower MSE and we need to get higher R squared.

02:59.240 --> 03:06.970
Don't worry we're going to clarify everything in the in the chart in the next couple of minutes here.

03:06.970 --> 03:12.930
We need to print mean squared error MSE and r squared score.

03:12.970 --> 03:15.130
Let's run our application.

03:15.530 --> 03:17.450
Here we get the error.

03:17.490 --> 03:21.530
We need uh the name model is not defined.

03:21.530 --> 03:24.090
So let me go up.

03:24.130 --> 03:26.530
And because I started the session.

03:26.650 --> 03:31.730
So I need to run this run this, run this.

03:37.970 --> 03:40.170
And then run this okay.

03:40.490 --> 03:44.130
So we get mean squared error.

03:44.170 --> 03:48.490
The MSE is 0.163.

03:48.530 --> 03:54.170
For r squared score is 0.9507.

03:54.210 --> 04:03.850
In order to see if you get a good result or not it depends on the data and data scale.

04:03.850 --> 04:08.770
So in order to determine that if this MSA is good or not.

04:08.810 --> 04:15.530
We go up and measure how many data set we have.

04:15.850 --> 04:21.370
So in our data set we have around 100.

04:21.370 --> 04:29.930
So 100 times 0.16 is good equals to 16% error.

04:30.050 --> 04:34.770
Cut to see if MSA is good or not.

04:34.930 --> 04:39.690
We need to check the target values and your data scale.

04:39.690 --> 04:47.650
If we go up, we have 100 rows of data that are going to be tested.

04:47.650 --> 04:51.530
So here we have 100 rows.

04:51.530 --> 04:54.730
So generating 100 random numbers.

04:54.730 --> 04:56.170
This is our data.

04:56.170 --> 04:59.130
So our data is being displayed.

04:59.410 --> 05:03.330
And here we are displaying only the first 110.

05:03.570 --> 05:10.330
If I need to get the whole data just run the application and you can see that it is 99.

05:10.530 --> 05:11.490
Starting from zero.

05:11.490 --> 05:12.930
So it's 100.

05:12.970 --> 05:17.690
Okay, so guys we are testing our data on 100.

05:17.730 --> 05:25.770
And we get this amazing result which is 0.6 which is perfect and outstanding.

05:25.770 --> 05:32.450
So for from 0 to 100 we get 0.6 which is good.

05:32.490 --> 05:40.410
But I want from you to focus with me 0.63 for for data range.

05:40.650 --> 05:45.330
For data range between 0 and 1.

05:45.530 --> 05:54.650
It's quite quite poor 0.63 for for data between 0 and 10.

05:54.810 --> 05:55.650
Excellent.

05:55.650 --> 05:58.930
Ranging between 0 and 100 is outstanding.

05:58.930 --> 06:00.290
So this is our case.

06:00.290 --> 06:02.850
It's very good okay guys.

06:02.850 --> 06:11.530
So this is a very important thing you should learn which is the metrics the mean and the r squared.
