WEBVTT

00:00.080 --> 00:01.080
Welcome back.

00:01.120 --> 00:04.240
Let's import the necessary libraries.

00:04.240 --> 00:11.080
So here create a new cell and let's import our necessary libraries.

00:11.080 --> 00:13.960
So this is the first section importing libraries.

00:14.000 --> 00:26.320
Let's start with import TensorFlow as tf import pandas as pd, import numpy as np and import matplotlib.

00:26.800 --> 00:28.680
Pyplot as plt.

00:28.880 --> 00:35.320
For better data visualization, we're going to use a new library called Seaborn.

00:35.560 --> 00:43.280
Seaborn is a Python data visualization library built on top of matplotlib library.

00:43.320 --> 00:49.240
It provides a high level interface for creating attractive statistical graphics.

00:49.440 --> 00:57.480
We're going to use like, uh, heatmaps and other complex and advanced graphics.

00:57.480 --> 01:03.920
So for this we're going to use our lovely library called Seaborn.

01:04.040 --> 01:07.070
Import Seaborn as SNES.

01:07.350 --> 01:17.590
If you are using the IDE's Python IDE's, you should start with, for example pip install TensorFlow,

01:17.990 --> 01:24.790
pip install seaborn, pip install matplotlib, pandas and numpy.

01:25.150 --> 01:34.030
And if you are using like IDE's, PyCharm, Jupyter, you should execute those Python package commands.

01:34.030 --> 01:43.590
Also, in order to check that they are already installed by Google Colab, you can run those commands,

01:43.630 --> 01:50.150
adding the exclamation mark to execute the Python installer package from a cell.

01:50.150 --> 01:56.190
So you notice that Google Colab requirement already satisfied.

01:56.190 --> 02:03.750
So all of those libraries are found directly and without using the pip install with.

02:03.910 --> 02:07.150
They are built in in the notebooks.

02:07.190 --> 02:15.470
Okay, so there is no need to to install any of those libraries because they are already installed and

02:15.470 --> 02:16.270
satisfied.

02:16.430 --> 02:18.110
So post this.

02:18.150 --> 02:20.910
Remove this cell and here we go.

02:21.190 --> 02:24.910
Okay, so we imported this new library.

02:24.950 --> 02:29.070
Seaborn, as I told you, it's used for data visualization.

02:29.270 --> 02:36.070
It's built on built on top of Matplot library for advanced visualization.

02:36.070 --> 02:43.190
We use Seaborn because it turns complex statistical relationships into beautiful, intuitive visualization

02:43.390 --> 02:48.910
that helps students and professionals understand their data at a glance.

02:48.950 --> 02:54.910
Also, we're going to use our old friend called scikit learn.

02:55.070 --> 03:03.550
We talked about Sky scikit learn library in the previous sections and in the introductory sections with

03:03.550 --> 03:06.950
data visualization and creating the models.

03:06.990 --> 03:13.110
We're going in this model to use scikit learn with TensorFlow.

03:13.110 --> 03:20.860
So TensorFlow is great for building and training neural networks, deep learning architectures, GPU

03:20.900 --> 03:24.700
acceleration, production, deployment and sky kit.

03:24.740 --> 03:31.460
Learn is great for data pre-processing and feature engineering, traditional machine learning algorithms,

03:31.500 --> 03:37.260
model evaluation metrics, data splitting and cross-validation.

03:37.260 --> 03:44.500
So we're going to use the scikit learn with with this model because we have nine parameters.

03:44.500 --> 03:47.820
We need a simple API one line to split data.

03:47.860 --> 03:54.540
Also, we're going to split the data between training and testing, flexible test size, random state

03:54.540 --> 04:02.340
and and used in thousands of projects and perfect for TensorFlow returning numpy arrays.

04:02.340 --> 04:09.980
So we use scikit learn for data pre-processing because preventing data leakage is critical for valid

04:09.980 --> 04:10.500
results.

04:10.540 --> 04:19.220
Simpler code industry standards, error resistance, and model agnostic same pre-processing works with

04:19.220 --> 04:20.860
any ML library.

04:20.980 --> 04:26.580
TensorFlow excels at what it's designed for building and training neural networks.

04:26.620 --> 04:30.980
Let each library do what it does best.

04:31.020 --> 04:31.460
Okay.

04:31.820 --> 04:39.780
So we're going to use it with visualizing the standard metrics additional metrics and clear interpretations

04:39.780 --> 04:41.420
from Sky.

04:42.260 --> 04:53.940
Learn model selection import train test split Skyler metrics, import mean squared error, MSE mean

04:53.980 --> 04:58.580
absolute error, and r squared score.

04:58.620 --> 05:03.900
Also we're going to import our friend standard scatter.

05:04.100 --> 05:08.540
And this is the main reason I used the sky layer.

05:08.580 --> 05:14.980
So sky learn dot pre-processing import standard scalar.

05:15.020 --> 05:25.100
Standard scalar is a data pre-processing tool that transforms your features to have a mean of zero and

05:25.140 --> 05:28.290
standard deviation of one.

05:28.330 --> 05:35.010
This is also called standardization or Z-score normalization.

05:35.010 --> 05:43.050
And we're going to use the skyline pre-processing and import the Standardscaler from this package in

05:43.050 --> 05:45.930
order to use it with our data.

05:45.970 --> 05:56.730
The general formula of standardization, or Z-score is Z equals to x minus m over D, X is the original

05:56.730 --> 06:06.250
feature value minus mean of the feature over the standard deviation of the feature.

06:06.290 --> 06:09.210
Z is the scaled value Z-score.

06:09.330 --> 06:16.130
Okay, so I want from you to understand this and why we're going to use the standard scaler in order

06:16.130 --> 06:22.410
to transform your features, to have a mean of zero and standard deviation of one.

06:22.450 --> 06:22.890
Okay.

06:23.090 --> 06:28.090
Use standard scaler when features have different units and scales in.

06:28.090 --> 06:29.610
In this example we have.

06:29.650 --> 06:38.730
We have nine features and every feature has a different scale or unit, and using algorithms sensitive

06:38.730 --> 06:47.810
to feature scales like neural networks, SVM, KNN, you assume features are roughly normally distributed.

06:47.810 --> 06:58.610
So data is very important and the type of the data is very crucial in order to, uh, to determine what

06:58.650 --> 07:03.770
algorithms, what libraries, what scalars you should use.

07:03.770 --> 07:12.290
In this case, we're going to use the standard scalar that it is used that makes features compatible

07:12.290 --> 07:14.850
by giving them same scale.

07:14.850 --> 07:23.210
So we have nine different scales with nine different inputs, nine different variables and features.

07:23.290 --> 07:26.050
We need to normalize them.

07:26.050 --> 07:31.130
So standard scalar makes features compatible by giving the.