WEBVTT

00:00.080 --> 00:01.120
Welcome back.

00:01.160 --> 00:10.800
Whenever you change any object detection model, you get a different shape, a different input and output

00:10.800 --> 00:11.520
shapes.

00:11.760 --> 00:16.080
You should update the object detector class.

00:16.360 --> 00:19.520
So this is the most important class.

00:19.520 --> 00:22.040
This is the core of your project.

00:22.080 --> 00:31.440
This is the core of your application that you should update whenever you have a different model okay.

00:31.480 --> 00:41.200
Remember we need those input and output shapes Float32 one 640 640 and three.

00:41.520 --> 00:46.680
The output 32 Float32 one five 8400.

00:46.680 --> 00:54.680
For those, we're going to update those variables inside the companion object.

00:54.720 --> 01:07.680
The SSD mobile net expects 300 by 300 input, while the YOLO let me write YOLO shape expects 640 by

01:07.800 --> 01:09.320
640.

01:09.360 --> 01:14.520
Okay, so remove this and write 640.

01:14.560 --> 01:17.200
The maximum detection number is ten.

01:17.200 --> 01:20.080
Default or maximum result.

01:20.200 --> 01:21.480
Uh, default.

01:21.480 --> 01:24.880
Maximum result can make it like ten, for example.

01:25.080 --> 01:29.720
And you can add the other parameters.

01:29.720 --> 01:36.480
Private constant output size equals to 8400.

01:36.520 --> 01:44.480
Private constant val num number of features equals to 84.

01:44.640 --> 01:47.160
Confidence index equals to.

01:47.200 --> 01:53.320
For the class probability index, start equals to five.

01:53.360 --> 01:55.280
This is the corresponds to.

01:55.320 --> 02:00.480
This number of features correspond to this and this number.

02:00.480 --> 02:01.640
So it's five.

02:01.680 --> 02:03.840
We are detecting only one class.

02:04.000 --> 02:08.920
And the output of the data data per detection is five.

02:08.920 --> 02:10.720
So this is five.

02:11.000 --> 02:14.400
We set the number of features equals to five.

02:14.600 --> 02:18.090
The input size is correct 8400.

02:18.130 --> 02:25.450
And this constant there is no need for it because only we have only one class to detect.

02:25.490 --> 02:28.490
Okay, now let's create the output.

02:28.610 --> 02:33.490
So here we have the output classes output scores output detections.

02:33.490 --> 02:34.850
We're going to remove them.

02:34.970 --> 02:41.890
We're going to create only one output buffer matching that one five 8400.

02:41.930 --> 02:48.130
For that let me create val output equals to array.

02:48.490 --> 02:52.490
And here array of number of features.

02:52.690 --> 02:57.530
And set the float array of the output size.

02:57.530 --> 03:01.690
Here the number of features is five.

03:01.730 --> 03:03.570
It's an array of five.

03:03.930 --> 03:09.890
And it's a float array of output size which is 8400.

03:09.930 --> 03:12.770
Here we created an array of size one.

03:12.970 --> 03:17.170
Inside it we create an array of number of features which is five.

03:17.490 --> 03:27.290
Then we created a float array that it is float array which is float32 output size, which is 8400.

03:27.330 --> 03:27.810
Okay.

03:27.970 --> 03:31.890
So an array inside an array inside an array.

03:32.250 --> 03:36.530
This is the output buffer matching the model output.

03:36.530 --> 03:38.850
This is very very important.

03:38.930 --> 03:47.010
Now let's create the input buffer input buffer private void input buffer.

03:47.130 --> 03:54.690
Remember that input buffer is one 640 640 and three.

03:54.730 --> 04:06.130
For that you use tensor buffer dot create fixed size and inside it start with int array of one.

04:06.290 --> 04:08.890
Matching this image size.

04:08.930 --> 04:12.690
The variable that we created before 640.

04:12.810 --> 04:18.010
Image size 640 and three, which is the last one.

04:18.130 --> 04:24.690
Remember we need to return Float32 one 640 640 and three three.

04:24.730 --> 04:27.370
Is the color channels RGB?

04:27.570 --> 04:32.420
What is the data type uh of our of our input buffer.

04:32.420 --> 04:35.140
It's a float 32 okay.

04:35.180 --> 04:38.380
So data type dot float32 okay.

04:38.620 --> 04:42.740
Let's use the input buffer and output buffer.

04:42.860 --> 04:44.780
Scroll down here.

04:44.780 --> 04:47.420
We don't need all of those.

04:47.580 --> 04:50.460
Let me scroll down to analyze.

04:50.580 --> 04:59.580
We have the background scoop launch detections and the listener scroll down here to the detect prepare

04:59.580 --> 05:06.500
image for model image proxy interpreter dot run for multiple input outputs.

05:06.660 --> 05:18.940
We need to remove it because it's not our case here the interpreter dot run pass the input which is

05:19.060 --> 05:25.940
input buffer and pass the output which is output okay, not the output map.

05:25.940 --> 05:28.020
It's called the output.

05:28.020 --> 05:31.660
You see guys this is the objects that we created.

05:31.700 --> 05:33.980
The output and input.

05:34.100 --> 05:40.580
Scroll up the input buffer and the end, the input buffer and the output.

05:40.580 --> 05:44.500
You can name it as output buffer in order to distinguish it.

05:44.540 --> 05:48.140
Okay, so let me use this output buffer.

05:48.180 --> 05:54.060
Scroll down here inside the text and output buffer okay.

05:54.140 --> 05:56.700
Now let's parse the detections.

05:56.700 --> 06:02.140
You can return parse detection result that it's a list of detection object.

06:02.260 --> 06:05.460
So here parse the detections.

06:05.460 --> 06:07.220
Parse detections.

06:07.220 --> 06:15.940
Also we can create a new function to distinguish it from parse detection result because it's completely

06:15.940 --> 06:16.740
different.

06:16.740 --> 06:25.140
So I'll comment it and parse YOLO detections okay.

06:25.180 --> 06:28.300
This is the new function we need to create.

06:28.300 --> 06:33.100
So here private function partial detections.

06:33.140 --> 06:37.300
List of detection result detection object.

06:37.300 --> 06:40.380
And in the next video we'll continue with this function.
