WEBVTT

00:00.080 --> 00:07.640
Before we start creating the partial detection function, let me check the prepare image for model if

00:07.640 --> 00:10.600
we need to do any changes here.

00:10.600 --> 00:16.840
Preparing the exact input tensor of the model, not unit eight one 300 303.

00:16.880 --> 00:18.040
This is for SSD.

00:18.320 --> 00:20.800
We need to prepare it for.

00:20.960 --> 00:30.720
For YOLO it's float 32 one 640 I am I'm repeating this because it's very important.

00:30.720 --> 00:39.200
This is the main difference between the models between YOLO, SSD and MobileNet and other models.

00:39.200 --> 00:40.680
So this is the bitmap.

00:40.880 --> 00:46.840
This is the tensor image passing the tensor image bitmap and the data type.

00:46.840 --> 00:48.920
It's not float32.

00:49.080 --> 00:56.400
Let me use this is for mobile net SSD and this is for yolo.

00:56.440 --> 01:03.740
YOLO use Float32 data type instead of using data type unsigned integer eight.

01:03.780 --> 01:06.820
We're going to use float 32.

01:07.020 --> 01:15.780
Then the image processor builder dot add resize operation the image size image size bilinear.

01:15.820 --> 01:18.340
This is the resize method.

01:18.500 --> 01:25.100
We can add rotation, image rotation and normalize operation and build.

01:25.100 --> 01:28.860
For YOLO there is normalization.

01:28.860 --> 01:32.260
So here YOLO normalization.

01:32.260 --> 01:41.420
So here add normalize operation from zero f 2 to 155 okay.

01:41.460 --> 01:46.180
This is the standard deviation alt plus enter to import the class.

01:46.180 --> 01:47.020
And here we go.

01:47.220 --> 01:48.260
Scroll down.

01:48.260 --> 01:50.140
This is the parse detection result.

01:50.140 --> 01:52.980
This is from the SSD MobileNet.

01:53.020 --> 01:56.860
It's different from it in YOLO.

01:56.860 --> 02:05.760
So let me remove this and let me start with the partial detections here.

02:05.760 --> 02:08.720
We need to return a list of detection object.

02:08.720 --> 02:17.280
So let's start with val results equals to mutable list of detection object.

02:17.440 --> 02:20.760
We need to store our results in a mutable list.

02:20.800 --> 02:26.400
Now for loop I in zero until the output size.

02:26.560 --> 02:27.840
What we need to do.

02:28.000 --> 02:33.520
We need to loop through this and get the boxes.

02:33.520 --> 02:35.040
How to get the boxes.

02:35.040 --> 02:40.600
As I told you, we have five data per detection.

02:40.920 --> 02:54.920
We have the X center, we have Y center, we have width, we have height and we have the confidence.

02:54.920 --> 03:04.380
Please write them down because we need to get for every detection those five Parameters to do that here.

03:04.620 --> 03:09.020
Let me start with x center equals to output.

03:09.060 --> 03:10.780
Output at index.

03:10.860 --> 03:15.660
In order to get the first index we start by zero zero.

03:15.860 --> 03:19.860
At an index I the output buffer.

03:20.060 --> 03:27.140
Then similar to that what we've done here, the y center is the second element.

03:27.180 --> 03:34.940
The width is the third element, the height is the fourth element, and the confidence is the fifth

03:34.980 --> 03:37.500
element at index I.

03:37.540 --> 03:38.100
Okay.

03:38.260 --> 03:48.180
The confidence index we created this variable before the confidence index you can remove it or you can

03:48.220 --> 03:50.620
use it like here.

03:50.780 --> 03:53.420
So scroll down to here.

03:53.580 --> 03:59.260
And instead of using four you can specify the confidence index okay.

03:59.540 --> 04:03.470
But for making it simple this is the confidence index.

04:03.470 --> 04:07.110
This is the height index with y center and x center.

04:07.150 --> 04:14.430
Those are the five parameters, the five data that we get from each detection.

04:14.550 --> 04:17.230
Now let me make some checks.

04:17.230 --> 04:23.110
If confidence greater than parameters and the confidence threshold.

04:23.110 --> 04:28.870
Or if the confidence thresholds the score threshold.

04:28.870 --> 04:30.070
What we need to do.

04:30.270 --> 04:36.670
We need to convert normalized center to pixel coordinates on preview surface.

04:36.670 --> 04:48.310
For that start with val x equals to x center minus width over two f times result view size dot width.

04:48.390 --> 04:57.710
Also similar to that, the y minimum for the height, the x maximum, and the y maximum.

04:57.990 --> 05:10.050
What are x maximum, y maximum, x maximum and y maximum That x minimum is the left side of the box,

05:10.250 --> 05:20.250
the y minimum is the top, the x maximum is the right, and the y maximum is the bottom.

05:20.290 --> 05:23.650
Okay, we are converting them.

05:24.010 --> 05:28.970
Convert normalized center to pixel.

05:29.010 --> 05:32.250
We converted that normalized center.

05:32.250 --> 05:35.970
This is the center of the detected object to pixels.

05:36.010 --> 05:41.370
Now let's draw the rectangle for detecting for the detected object.

05:41.490 --> 05:45.770
Rectangle equals to rect f left.

05:46.050 --> 05:47.210
Start by left.

05:47.250 --> 05:56.250
Curse in or curse at least zero f top at least zero f right at most.

05:56.290 --> 06:04.550
The result view dot width dot to float and the bottom curse at most result view dot view dot to float.

06:04.550 --> 06:06.350
This is the second step.

06:06.550 --> 06:09.190
This is the first step and this is the second step.

06:09.190 --> 06:12.590
The first step is to convert normalize center to pixels.

06:12.590 --> 06:21.750
Because as I told you in YOLO we we use normalized coordinates and normalize the normalized things.

06:21.750 --> 06:31.710
And in the second step let's draw the bounding box and applying clamping boundaries constraints to ensure

06:31.710 --> 06:36.310
the rectangle stays within the visible screen bounds okay.

06:36.510 --> 06:46.990
So using those curves at least and curse at most those clamps the bounding box coordinates to ensure

06:46.990 --> 06:52.910
they don't extend beyond the view's boundary zero to width or height.

06:52.950 --> 06:58.270
It's a safety measure to prevent drawing outside the screen.

06:58.310 --> 07:01.110
Okay, so this is a safety measure.
