WEBVTT

00:00.080 --> 00:01.160
Welcome back.

00:01.200 --> 00:07.280
We visualized our data frame and we succeeded in creating this table.

00:07.320 --> 00:12.760
Now let's explore our data use build a methods to quickly understand your data.

00:12.800 --> 00:17.760
Again guys, trust me, when working with simple tables it's very clear.

00:17.760 --> 00:22.280
It's obvious that that Diana has the highest grade.

00:22.280 --> 00:30.440
But when working with thousands and hundreds of rows and columns, it would be a very complex and complicated

00:30.440 --> 00:31.080
step.

00:31.200 --> 00:35.760
So what things we can do with our data?

00:35.760 --> 00:38.560
Let me display the first two rows.

00:38.560 --> 00:40.520
So I use the f.

00:40.760 --> 00:45.360
This is the data frame that we created before dot head.

00:45.400 --> 00:46.600
And we pass to.

00:46.800 --> 00:49.360
This includes first two rows.

00:49.400 --> 00:55.680
Let me run and see that the first two rows Alice and Bob.

00:55.960 --> 01:00.000
Those are the first two rows zero and one.

01:00.000 --> 01:09.780
If I need the last two rows so I can use df dot tail to and this is the last two rows.

01:09.940 --> 01:14.500
Charlie and Diana, if I need to get data set info.

01:14.820 --> 01:25.500
What the what this means df.info data set info provides a concise summary of the data frame, including

01:25.660 --> 01:34.620
data types of each column, number of non-null values, memory usage, and number of rows and columns.

01:34.780 --> 01:39.700
Okay, so let me run and see the output.

01:39.740 --> 01:46.740
Here we have the data set info class pandas data frame range index.

01:46.740 --> 01:51.300
For entries 0 to 3 we have four rows.

01:51.500 --> 01:54.260
The data columns total four columns.

01:54.260 --> 01:56.820
We have name, age, city, and grade.

01:56.820 --> 02:05.700
And you can notice that zero, one, two and three name, age, city and grade null null null null and

02:05.700 --> 02:06.340
others.

02:06.540 --> 02:14.450
And here we have the data type for the name is object, which is the string integer 64 for age.

02:14.610 --> 02:20.570
Object for the city, which is string and integer 64 for decades.

02:20.770 --> 02:21.690
Data types.

02:21.690 --> 02:26.810
Integers two and objects two for the columns and memory usage.

02:27.090 --> 02:29.370
260 bytes.

02:29.650 --> 02:35.530
Okay, this is how we get the information about the data set.

02:35.530 --> 02:40.810
So the range index five entries or four entries 0 to 3.

02:41.210 --> 02:47.890
We have four rows with index 0 to 3 data columns four columns in total.

02:48.050 --> 02:52.010
And others we don't have the null null.

02:52.050 --> 02:55.450
They are not null null null values.

02:55.450 --> 02:58.370
Nothing with null null values.

02:58.410 --> 03:08.890
Now let me use another like ready made and included function in that data frame which is describe function.

03:08.890 --> 03:12.770
We get basic statistics df dot describe.

03:13.090 --> 03:18.090
This will generate descriptive statistics for numerical columns.

03:18.290 --> 03:27.030
Count mean standard deviation, minimum maximum values, percentiles 25%, 50, 75 and others.

03:27.030 --> 03:30.390
Scroll down to get those basic statistics.

03:30.430 --> 03:36.230
We have two columns that works with integers, so age and grade.

03:36.270 --> 03:42.350
This will generate statistical data for those numerical columns.

03:42.590 --> 03:47.870
Age and grades count for mean 20.5.

03:47.950 --> 03:50.590
Standard deviation 1.29.

03:50.630 --> 03:59.270
Minimum 1,925% 19.7 50%, 75%, and maximum 22.

03:59.470 --> 04:02.390
Also, this is for the grade.

04:02.430 --> 04:07.350
Okay, so you can see how powerful is this library.

04:07.470 --> 04:14.990
It calculates all those statistics with one click and one command called df dot describe.

04:15.030 --> 04:15.510
Okay.

04:15.670 --> 04:25.710
So this is how we quickly understand and use the methods to understand our data and get statistics about.