WEBVTT

00:00.080 --> 00:00.960
Welcome back.

00:01.000 --> 00:05.280
We understood our model, hierarchy and shapes.

00:05.560 --> 00:10.040
Now let's continue with the object detector class.

00:10.040 --> 00:19.520
And here under those variables start creating the SSD mobile net output structure.

00:19.560 --> 00:29.160
Start with private val output locations equals to array of size one.

00:29.400 --> 00:33.160
And what this array contains array.

00:33.480 --> 00:37.680
And it's of maximum detection number.

00:37.840 --> 00:42.920
And set this the float array of size four.

00:43.040 --> 00:46.880
Then private val output classes.

00:46.880 --> 00:57.280
It's an array one of float array maximum detection number private val output scores array of one float

00:57.320 --> 01:00.800
array and maximum detection number.

01:00.840 --> 01:03.240
Don't worry we're gonna clarify everything.

01:03.440 --> 01:08.290
Output detection number which is float array of size one.

01:08.330 --> 01:18.610
Actually, those are the variables that are used for handling the object detection model output.

01:18.730 --> 01:27.730
This is the classic output structure for an object detection model specifically designed to handle the

01:27.730 --> 01:31.410
float32 output we discussed.

01:31.450 --> 01:42.730
Okay, so let me write here the output structure output structure for an object which is float32 okay.

01:42.970 --> 01:51.970
The output locations bounding box coordinates in a normalized coordinates from 0 to 1.

01:52.170 --> 01:57.250
So the shape would be one maximum detection number for and.

01:57.570 --> 02:04.250
This purpose stores the bounding box coordinates for each detected object.

02:04.290 --> 02:15.370
Each box has four values, typically y minimum x minimum y maximum x maximum normalized to zero one

02:15.610 --> 02:20.410
or x center, y center width and height, which is normalized.

02:20.450 --> 02:20.930
Okay.

02:21.210 --> 02:25.810
So this is the bounding box in normalized coordinates.

02:26.010 --> 02:31.010
Here we have for example y maximum or y minimum.

02:31.010 --> 02:37.250
We start with minimum x minimum y max and x max.

02:37.370 --> 02:37.890
Okay.

02:38.050 --> 02:41.130
This is the first variable the output location.

02:41.130 --> 02:43.570
This is the for the box the bounding box.

02:43.890 --> 02:44.210
Okay.

02:44.250 --> 02:46.250
We need to display the box.

02:46.530 --> 02:49.650
So we have to create the boundaries.

02:49.810 --> 02:59.770
The boundaries in a in a for for uh for elements array or an array containing four elements.

02:59.810 --> 03:00.210
Okay.

03:00.410 --> 03:03.130
Now the output classes.

03:03.170 --> 03:10.410
Class IDs the shape would be one and maximum detection.

03:10.410 --> 03:18.330
Maximum detection number stores the class index for each detection.

03:18.530 --> 03:26.260
So this is a very important lesson because we need to understand the output also, and why we need them

03:26.260 --> 03:32.820
as a float32, and why we need to get those arrays and analyze them.

03:33.020 --> 03:40.100
So we get the first array that represents the y minimum x minimum y maximum x maximum.

03:40.100 --> 03:41.820
Those are for the bounding boxes.

03:42.060 --> 03:49.700
The second array is for storing the class index for each detection okay.

03:49.900 --> 03:55.220
So the integer class IDs stores as floats.

03:55.340 --> 04:02.060
For example 0.0, f 1.0, f 2.0 f, and so on.

04:02.180 --> 04:06.780
Okay, this is for the output classes.

04:06.780 --> 04:12.620
Now we have the output scores the confidence scores.

04:12.620 --> 04:16.460
So confidence scores shape one.

04:16.500 --> 04:27.590
Maximum detection number stores the confidence score for each detection floats between 0.0 and 1.0.

04:28.030 --> 04:32.830
So 0.6 refers for 60% confidence.

04:32.870 --> 04:37.110
This is the confidence that we used in drawing the boxes.

04:37.150 --> 04:40.190
If you remember here draw into canvas.

04:40.470 --> 04:47.310
This text is used to draw the object detection and get the score from it.

04:47.350 --> 04:58.270
Okay now we have the last one is the output number detections which is the valid detection count.

04:58.310 --> 05:09.670
For example, shape one tells how it tells you how many of the maximum detection number detections are

05:09.710 --> 05:11.470
usually valid.

05:11.670 --> 05:14.390
So how many valid detections.

05:14.390 --> 05:19.350
There are a lot of detections but they are not uh not sure.

05:19.630 --> 05:28.510
And if you remember here in the data, the detection parameters we set the threshold is 0.6.

05:28.670 --> 05:35.150
So if you uh, if you open your camera and direct it to the door.

05:35.350 --> 05:43.750
There is there is a possibility of detecting a window, but the window here is 0.4.

05:44.070 --> 05:48.870
It's detected, but it's that the score threshold is 0.6.

05:49.190 --> 05:55.390
So we have door 0.7, window 0.4.

05:55.710 --> 05:57.150
It's eliminated.

05:57.190 --> 05:59.710
We keep the door.

05:59.710 --> 06:03.590
So it will be 0.7 greater than 0.6.

06:03.590 --> 06:08.150
So the inference and the score is higher than 0.6.

06:08.190 --> 06:11.990
Go and display and draw the box around the object.

06:12.110 --> 06:15.550
This is the output detection only process.

06:15.550 --> 06:18.990
The first output number of detections.

06:18.990 --> 06:28.070
To end detections we need to get the highest rank, the highest inference, the highest score from the

06:28.070 --> 06:29.110
detections.

06:29.110 --> 06:37.190
So we learned about the output of an output structure for the SSD mobile.

06:37.230 --> 06:38.710
Net model.
