WEBVTT

00:00.120 --> 00:01.040
Welcome back.

00:01.080 --> 00:04.520
We learned about Pandas and Matplot library.

00:04.560 --> 00:06.960
Now let's do a mini project.

00:06.960 --> 00:08.800
Analyze a real data set.

00:09.000 --> 00:14.840
Let's load and analyze a real CSV file from online resource.

00:14.840 --> 00:18.640
By the way, you can upload any Excel sheet, any spreadsheet.

00:18.800 --> 00:27.160
Here to the files and link it, but I prefer using online things online URLs.

00:27.200 --> 00:32.320
Here we're going to load data from URL, explore it and visualize the results.

00:32.360 --> 00:36.440
Go to Google type demo CSV file online.

00:36.600 --> 00:42.480
There are a lot of websites that gives you free CSV files like this website.

00:42.520 --> 00:45.440
Don't click on it because it will download it.

00:45.640 --> 00:50.520
We need to copy the link, so copy link from here.

00:50.720 --> 00:55.680
Go to this cell and paste the name.

00:55.720 --> 01:02.210
Also you can check other sample files, but I prefer this GitHub.

01:02.330 --> 01:05.930
In order to do that, we need an online data set.

01:05.930 --> 01:13.570
So data URL equals this string DF data frame equals to PD Panda.

01:13.770 --> 01:19.090
The pandas a powerful data analysis and manipulation library for Python.

01:19.290 --> 01:21.970
As we talked about it in the previous videos.

01:22.090 --> 01:23.330
Dot read key.

01:24.370 --> 01:29.050
So here we are reading a CSV from a data URL.

01:29.490 --> 01:32.890
Then we need to use this to display it.

01:33.090 --> 01:35.170
Run the cell and here we go.

01:35.410 --> 01:41.730
We have John Doe and other, but there are no column names.

01:41.730 --> 01:46.810
Because this file doesn't contain any column name.

01:46.850 --> 01:48.170
Go to GitHub.

01:48.170 --> 01:50.410
Search for sample CSV files.

01:50.410 --> 01:54.050
This is good repository github.com slash data.

01:55.210 --> 01:56.810
Sample CSV files.

01:56.810 --> 01:58.570
This is a good one.

01:58.730 --> 02:04.570
You can get the description for the files.

02:04.570 --> 02:12.890
For example index country email I need to get CSV files with numeric values in order to calculate the

02:12.890 --> 02:16.050
mean and do other jobs on it.

02:16.050 --> 02:18.610
So here we have people schema.

02:18.610 --> 02:23.130
We have organizations products and others okay.

02:23.410 --> 02:27.890
So you can select products copy link from here.

02:27.890 --> 02:30.730
Also you can download it by clicking on it.

02:30.770 --> 02:32.650
Open it with Excel.

02:32.650 --> 02:33.690
And here we go.

02:33.970 --> 02:42.850
We have index name description brand category, price currency stock and others okay we need to calculate

02:42.850 --> 02:44.450
the average price.

02:44.450 --> 02:50.970
So let me copy this link address and paste it here.

02:51.170 --> 02:56.130
Then run this cell and the data is being visualized okay.

02:56.290 --> 03:01.210
So this is how to load data from online CSV file.

03:01.250 --> 03:09.820
Again guys you can upload your CSV files here and copy the link to to to to directly visualize them.

03:09.860 --> 03:16.140
Now let's visualize the data with our Matplot library.

03:16.380 --> 03:21.700
Here we have many columns we need to get the numeric ones.

03:21.740 --> 03:28.660
I love using the price and also you can use the stack and get the index.

03:28.700 --> 03:31.860
Okay so you can use the stack.

03:32.020 --> 03:35.700
Let's start with print average stack.

03:35.900 --> 03:38.620
Print df dot mean.

03:38.780 --> 03:43.380
And we need to use the numeric values only.

03:43.420 --> 03:48.540
Actually not visualizing the data like statistics run.

03:48.540 --> 03:49.740
And here we go.

03:49.780 --> 03:56.860
We have the index the price the stock internal ID and data type.

03:56.860 --> 03:59.740
So those are the statistics of the data.

03:59.780 --> 04:08.030
You can get the average for the numeric values index because index is an integer, you don't need this.

04:08.070 --> 04:10.630
The stock, the price.

04:10.630 --> 04:12.430
Those are very useful.

04:12.510 --> 04:15.630
Now let's visualize the data.

04:15.670 --> 04:19.350
Let's let's plot the stocking price.

04:19.590 --> 04:24.870
So here we have plt dot figure figure size.

04:24.870 --> 04:27.950
We need to get the figure size ten by six.

04:27.990 --> 04:30.510
It's good plt dot plot.

04:30.550 --> 04:38.310
Here we need to define the values and the the numeric things that we need to display.

04:38.310 --> 04:42.950
So def and what things we need to display.

04:42.950 --> 04:48.350
We need to display the stock price okay.

04:48.470 --> 04:54.070
Marker equals to oh color blue line width equals to two.

04:54.230 --> 04:59.150
The title would be stock price Y label is price.

04:59.470 --> 05:04.630
The plot dot grid equals to true and plot dot show.

05:04.790 --> 05:07.110
We need to call this function.

05:07.110 --> 05:09.190
Let's run and here we go.

05:09.430 --> 05:11.350
This is the stock price.

05:11.350 --> 05:19.590
You can notice that the we have 100 company and the stock price is from 0 to 1000.

05:19.630 --> 05:27.630
You see that for example the company number 20 the stock price is below 200.

05:27.670 --> 05:33.990
If we go up to the to here we need to get all the tabular format.

05:33.990 --> 05:36.430
And the company number 20.

05:36.670 --> 05:42.990
The stock price is one six $3 $163 okay.

05:43.030 --> 05:45.590
So it's below 200.

05:45.750 --> 05:48.670
So this data visualization is real.

05:48.870 --> 05:58.270
And it's very very helpful for statistical things for data analysis, for machine learning and others.

05:58.270 --> 06:02.590
So we're going to use those libraries in the next videos.
