WEBVTT

00:00.120 --> 00:01.040
Welcome back.

00:01.080 --> 00:04.080
We learned about distribution of MPG histogram.

00:04.200 --> 00:13.480
Now let's create a heatmap visualization to show relationships between numerical features in data set.

00:13.520 --> 00:15.840
Create a new code set.

00:15.920 --> 00:19.000
Start with plt dot figure.

00:19.200 --> 00:25.560
Figure size ten by eight and set the heatmap using SNS.

00:25.600 --> 00:34.040
This is the newly created and imported library that we used before, which is the seaborn as SNS.

00:34.160 --> 00:41.040
And as I told you, Seaborn data visualization library built on top of Matplot library.

00:41.080 --> 00:47.880
It provides a high level interface for creating attractive statistical graphics.

00:47.880 --> 00:54.240
So we're going to use this library to create our heat map SNS dot heatmap.

00:54.240 --> 00:56.040
Creating the heatmap.

00:56.080 --> 00:58.560
Passing the parameters.

00:58.720 --> 00:59.910
Data set dot.

00:59.910 --> 01:02.630
Correct the annotation true.

01:02.790 --> 01:03.790
See map.

01:03.790 --> 01:04.230
Cool.

01:04.230 --> 01:04.710
Worm.

01:04.910 --> 01:08.270
Worm and line width is 0.5.

01:08.630 --> 01:09.350
The title.

01:09.350 --> 01:13.910
Give it the coloration heatmap and the plot dot show.

01:13.950 --> 01:16.950
I'm using the Matplot library.

01:17.070 --> 01:23.150
Also we can name it as feature coloration, heatmap or matrix.

01:23.190 --> 01:25.870
Okay, let's run and see.

01:26.070 --> 01:26.950
And here we go.

01:26.990 --> 01:29.030
This is our heatmap.

01:29.070 --> 01:33.790
It's very very amazing I love this type of heatmaps.

01:33.790 --> 01:34.750
And here we go.

01:34.950 --> 01:41.390
Let me explain what we've done here and what this heatmap means.

01:41.430 --> 01:42.830
Back to our code.

01:42.830 --> 01:48.110
Here we have the coloration dataset dot coloration.

01:48.150 --> 01:55.110
The coloration calculates the the numerical values between the features.

01:55.350 --> 02:03.370
Coloration values range between -1 to 1, so please write those notes down.

02:03.410 --> 02:12.730
Dataset dot correlation calculates the correlation matrix between all numerical columns minus one to

02:13.010 --> 02:13.890
plus one.

02:13.930 --> 02:20.090
The correlation values ranges from -1 to 1 one.

02:20.130 --> 02:23.650
Perfect positive correlation zero.

02:23.690 --> 02:27.050
No correlation minus one.

02:27.090 --> 02:29.290
Perfect negative correlation.

02:29.290 --> 02:32.730
Now we have the heat map.

02:32.730 --> 02:35.490
And this is the main heat map creation.

02:35.690 --> 02:39.650
As I told you we passed the correlation matrix.

02:39.850 --> 02:44.330
The annotation through displays the actual correlation values on the heatmap.

02:44.330 --> 02:50.090
Cool warm colors were blue represents negative coloration.

02:50.090 --> 02:54.330
So the negative coloration would be in blue.

02:54.570 --> 02:59.400
So blue color represents the negative coloration.

02:59.680 --> 03:04.280
The red color represents the positive color correlation.

03:04.520 --> 03:13.880
Okay and zero centers the color map to zero, which is neutral, and the title is uh feature coloration

03:13.920 --> 03:15.760
correlation heatmap.

03:15.920 --> 03:20.520
And let's display the final result like this.

03:20.520 --> 03:22.440
So we're going to interpret it.

03:22.480 --> 03:30.840
Diagonals always show 1.0 with each variable perfectly correlated with itself.

03:30.840 --> 03:41.000
So MPG is correlated with mpg origin is present and correlated with origin symmetric mirrored across

03:41.000 --> 03:42.040
the diagonals.

03:42.040 --> 03:43.600
So we have asymmetry.

03:43.920 --> 03:47.840
Color intensity indicates coloration strength.

03:47.840 --> 03:58.940
As I told you, blue is, uh, the high blue and the more intense blue is perfectly negative coloration

03:58.940 --> 04:02.820
and positively coloration is in red.

04:02.820 --> 04:08.340
So NPG is strongly correlated with NPG with itself.

04:08.380 --> 04:13.420
Numbers show exact coloration correlation coefficients.

04:13.460 --> 04:21.900
Okay, so this coloration heatmap shows relationships between automotive features from what appears

04:21.900 --> 04:24.340
to be a vehicle data set.

04:24.340 --> 04:33.780
So here we have we have eight columns and cylinders placement horsepower weight acceleration model year

04:33.780 --> 04:35.220
and origin.

04:35.340 --> 04:43.860
So this is a very important way to display the feature coloration using this heatmap.

04:44.180 --> 04:47.780
Strong positive correlation red high value.

04:47.780 --> 04:52.680
So here we have this square this red square.

04:52.720 --> 04:57.040
So for example cylinders and displacement.

04:57.040 --> 04:58.240
So this is the cylinders.

04:58.240 --> 05:03.520
And this is displacement 0.95 very strong cylinders.

05:03.520 --> 05:08.240
And weight is 0.93 cylinders.

05:08.360 --> 05:09.920
Uh sorry cylinders.

05:09.920 --> 05:14.000
And weight is 0.90.93.

05:14.040 --> 05:15.680
Displacement and weight.

05:15.680 --> 05:17.440
They are very very strong.

05:17.720 --> 05:23.080
Displacement and horsepower is 0.9.

05:23.080 --> 05:25.040
Also very strong here.

05:25.080 --> 05:34.320
As an interpretation, we conclude that vehicles with more cylinders tend to have larger engines displacement

05:34.320 --> 05:38.160
more, horsepower and weight more.

05:38.320 --> 05:43.760
Also, we have strong negative correlation blue high negative values.

05:43.760 --> 05:46.520
Here we have this blue box.

05:46.520 --> 05:57.510
And here we have a blue box the mpg and weight, so mpg and the weight is zero -0.83.

05:57.750 --> 06:14.150
Strongest and negative mpg and displacement -0.81 mpg and cylinder 0.75 78 and mpg and horsepower -0.78.

06:14.270 --> 06:24.350
Fuel efficiency mpg decreases as vehicles get heavier, have larger engines, more cylinders, and more

06:24.510 --> 06:25.590
horsepower.

06:25.590 --> 06:32.270
So this is the coloration that this heatmap tries to gives us.

06:32.270 --> 06:36.990
And to, to uh, allow us to conclude this conclusion.

06:36.990 --> 06:43.790
So positive relationships between MPG and model year is 0.58.

06:43.830 --> 06:52.580
You notice that this mpg and model year 58 mpg and origin is 0.57.

06:52.620 --> 07:00.100
Newer vehicles and certain origins tend to have better fuel efficiency, but this is not a very big

07:00.100 --> 07:00.540
deal.

07:00.540 --> 07:05.260
So I want from you to understand the colors and the coefficients.

07:05.500 --> 07:12.060
Vehicles with more horsepower tend to have slower acceleration, likely because they are heavier.

07:12.220 --> 07:15.340
If we get the horsepower and the acceleration.

07:15.340 --> 07:21.500
So here the acceleration and the horsepower -0.69.

07:21.540 --> 07:26.500
Vehicles with more horsepower tend to have slower acceleration.

07:26.740 --> 07:29.620
Those are the negative correlation.

07:29.660 --> 07:30.140
Okay.

07:30.380 --> 07:39.580
So this heat map reveals the classic automotive engineering trade off between performance and fuel economy.
