WEBVTT

00:00.080 --> 00:05.640
Let's learn about the third type of machine learning, which is unsupervised learning.

00:05.880 --> 00:14.800
Unsupervised learning is a type of machine learning that analyzes and models data without labeled responses

00:14.960 --> 00:17.640
or predefined categories.

00:17.840 --> 00:25.880
Unlike supervised learning, where the algorithm learns from input output pairs, the unsupervised learning

00:25.880 --> 00:36.000
algorithms work solely with input data and aim to discover hidden patterns, structures, or relationships

00:36.000 --> 00:43.880
within the data set independently, without any human intervention or prior knowledge of that data.

00:43.880 --> 00:46.080
Meaning, it's very clear here.

00:46.080 --> 00:48.160
We have training data set.

00:48.200 --> 00:54.760
We have training data set with no labels passed by the algorithm creating the model.

00:54.760 --> 01:01.920
And then we get the prediction under supervised learning for the clustering dimension reduction and

01:01.920 --> 01:03.760
association rule mining.

01:03.920 --> 01:10.360
Clustering A clustering technique involves grouping similar data based on defined criteria.

01:10.560 --> 01:15.660
It's useful for segmenting data and finding patterns in each group.

01:15.780 --> 01:17.220
Dimension reduction.

01:17.220 --> 01:23.740
In order to find the exact information, dimension reduction reduces the number of variables considered

01:24.020 --> 01:25.900
and the association rule.

01:25.940 --> 01:34.420
Mining discovering relationships between seemingly independent databases or other data repositories

01:34.420 --> 01:36.500
through association rules.

01:36.540 --> 01:45.460
I prepared a very good example about unsupervised learning, and this image shows sets of animals like

01:45.460 --> 01:53.420
elephants, camels and cows that represent raw data that that the unsupervised learning algorithm will

01:53.420 --> 01:54.300
process.

01:54.460 --> 01:57.060
Here we have the input raw data.

01:57.100 --> 02:05.940
The interpretation stage signifies that the algorithm doesn't have predefined labels or categories for

02:05.940 --> 02:06.620
that data.

02:06.740 --> 02:14.500
It needs to figure out how to group or organize the data based on the inheritance pattern.

02:14.660 --> 02:23.800
An algorithm represents an algorithm represents unsupervised learning process, which can be clustering,

02:23.800 --> 02:29.440
dimensionality reduction, or anomaly detection to identify patterns in the data.

02:29.480 --> 02:33.400
So based on this algorithm, the model will be trained.

02:33.640 --> 02:38.520
The process stage shows the algorithm working on the data.

02:38.800 --> 02:43.120
The output shows the result of unsupervised learning process.

02:43.120 --> 02:50.200
In this case, the algorithm might have grouped the animals into clusters based on their species, for

02:50.200 --> 02:55.160
example elephants, camels, cows, and the output.

02:55.200 --> 03:03.520
Here, for example, you can count the number of legs and number of eyes that all of of necks.

03:03.680 --> 03:11.520
So those are some variables that may be used for processing and may be used by the algorithm in order

03:11.520 --> 03:13.080
to process the data.

03:13.360 --> 03:18.920
The working of unsupervised machine learning can be explained in many steps.

03:18.920 --> 03:23.960
The first step is collecting the data like the input raw data.

03:23.960 --> 03:30.160
Gather data set without predefined labels or categories without any tags exactly like this image.

03:30.200 --> 03:33.260
Go and get those images.

03:33.260 --> 03:39.300
We don't tell the machine that those four cows or camels or elephant.

03:39.300 --> 03:41.220
Select the algorithm.

03:41.340 --> 03:46.220
Choose a suitable supervised unsupervised algorithm such as clustering.

03:46.260 --> 03:47.660
Like k means.

03:47.860 --> 03:51.140
This is a very popular algorithm.

03:51.300 --> 03:58.300
K mean association rule learning like dimensionality reduction like PCA based on the goal.

03:58.340 --> 04:06.780
Also we have PCA train the model on raw data, feed the unlabeled data set to the algorithm.

04:06.780 --> 04:12.100
We get the data and all those data to the algorithm.

04:12.140 --> 04:19.860
The algorithm looks for similarities, relationships or hidden structure within the data, then grouping

04:20.140 --> 04:21.780
or transform data.

04:21.980 --> 04:30.980
The algorithm organizes data into groups and clusters, analyzing that discovered groups, rules or

04:30.980 --> 04:39.820
features to gain insight or use them for further tasks like visualization, anomaly detection, or as

04:39.820 --> 04:41.820
an input for other models.