WEBVTT

00:00.080 --> 00:01.040
Welcome back.

00:01.080 --> 00:03.320
I want from you to focus with me.

00:03.320 --> 00:05.440
This is a very important step.

00:05.600 --> 00:10.880
We need to understand the input and output of our model.

00:11.040 --> 00:15.520
For that, let's go to a website called natron.

00:15.520 --> 00:19.080
So natron dot app, upload your model.

00:19.080 --> 00:24.160
And here we have a hierarchy for our model.

00:24.160 --> 00:31.040
You are not um like forced to understand all of those.

00:31.320 --> 00:43.120
We need to start with the first one and understand that it takes an image with 300 width and 300 height

00:43.280 --> 00:44.760
times three.

00:45.000 --> 00:48.160
This is the shape, this is the input shape.

00:48.440 --> 00:56.200
So if we go to here click on this normalized input image tensor you get inputs and outputs.

00:56.520 --> 01:00.280
The inputs normalized input image tensor.

01:00.320 --> 01:04.930
We need to convert it to a unit eight unified integer.

01:05.050 --> 01:12.010
You int eight with this array one 300 303.

01:12.010 --> 01:15.130
So keep this in your mind.

01:15.290 --> 01:21.690
The outputs would be for the TF light detection post process.

01:21.770 --> 01:23.650
Its type is Float32.

01:23.970 --> 01:32.850
Another output would be detection process, post process one Postprocess two, and post process three.

01:33.010 --> 01:44.410
All of those are of type float32 so the input is unit eight and unit or you int eight.

01:44.570 --> 01:50.130
With this shape this array one 300 303.

01:50.170 --> 01:52.730
Now let's understand the input.

01:52.850 --> 02:01.090
So this is the shape and data type of that data your model expects to receive.

02:01.210 --> 02:07.740
Copy it And let me move back to Android Studio and write the notes here.

02:07.780 --> 02:08.100
Okay.

02:08.140 --> 02:12.900
I want from you to write them down because those are very important notes.

02:13.020 --> 02:17.140
The input you want and it's not unified.

02:17.140 --> 02:25.180
It's unsigned integer, unsigned integer, unsigned integer eight bit.

02:25.340 --> 02:27.860
Those values are always positive.

02:27.860 --> 02:34.260
So think about it as an number between 0 and 255 okay.

02:34.300 --> 02:38.540
So this is a positive eight bit integer.

02:38.580 --> 02:46.980
Eight bit means it uses eight bit one byte of memory per number, allowing to two to power eight equals

02:46.980 --> 02:49.140
to 256.

02:49.140 --> 02:50.740
Possible values is.

02:51.060 --> 02:55.260
For this purpose we start from 0 to 255.

02:55.580 --> 03:01.820
This data type is extremely common for image data, because the standard image pixel values range from

03:01.820 --> 03:10.350
0 to 250, where zero typically represents the absence of color, which is black, and 250, which is

03:10.350 --> 03:14.510
full intensity of a color white or a primary color.

03:14.590 --> 03:25.710
In short, your model expects image pixel values as integer between 0 and 55 255.

03:25.750 --> 03:30.590
Now let's take this array, copy it and paste it here.

03:30.710 --> 03:32.230
The first dimension.

03:32.430 --> 03:34.310
This is the batch size.

03:34.550 --> 03:43.470
Deep learning models are often designed to process multiple inputs simultaneously for efficiency.

03:43.670 --> 03:49.550
This dimension tells us how many images are in a single batch.

03:49.670 --> 03:57.990
A one here means the model is set up to process one image at a time.

03:58.150 --> 04:00.510
303 hundred.

04:00.830 --> 04:10.410
The first 300 is the height, and the second is the width, which is more common in deep learning frameworks

04:10.410 --> 04:11.610
like TensorFlow.

04:11.810 --> 04:17.810
Those represents the height and width of the image in pixels.

04:17.930 --> 04:21.370
And the last dimension is three.

04:21.570 --> 04:25.890
This represents the color channels of the image.

04:26.050 --> 04:28.930
It's a standard RGB image.

04:29.090 --> 04:36.930
Red values channel one, green values, channel two B values and blue values channel three.

04:37.170 --> 04:44.090
So for a single pixel, the model sees three numbers r, g, b.

04:44.610 --> 04:50.730
So this number represents the r, g b color channels of the image.

04:50.850 --> 04:51.850
A tensor.

04:51.890 --> 05:03.570
The unsigned integer describes a batch containing a single 300 by 300 pixels color RGB image.

05:03.610 --> 05:13.300
You can think of it as a stack of three mattresses, each 300 by 300.

05:13.500 --> 05:22.620
One for each color channel red, blue and green, all wrapped up in a single container.

05:22.780 --> 05:23.660
The batch.

05:23.820 --> 05:26.140
So this is for image one.

05:26.460 --> 05:31.460
This is a batch of one image one 300 pixel height.

05:31.620 --> 05:41.260
The first row the RGB, RGB, RGB, and uh till till the end row number two the same.

05:41.580 --> 05:50.300
This is a matrix and a matrix, which is an array of arrays, and this is repeated 300 times for width

05:50.340 --> 05:53.060
till reaching row number 300.

05:53.100 --> 05:53.620
Okay.

05:53.860 --> 06:02.980
So when you want to use a model with this input specification you must pre-process your image to match

06:02.980 --> 06:04.180
it exactly.

06:04.300 --> 06:08.030
Resize your image to 300 by 300 pixels.

06:08.030 --> 06:09.550
This is what we're going to do.

06:09.590 --> 06:20.470
Ensures it is an RGB format, not BRG or BGR, which OpenCV sometimes uses, or grayscale.

06:20.750 --> 06:28.470
Do not normalize the pixel values to 0 to 1 or -1 to 1.

06:28.670 --> 06:34.270
Leave them as integers in 0 to 255.

06:34.470 --> 06:40.510
Range because the data type is an unsigned integer eight.

06:40.670 --> 06:51.190
Some models have a normalization step built in, but given this specific model, it expects the row

06:51.590 --> 06:54.630
unsigned integer eight values.

06:54.710 --> 06:57.590
This is a very important lesson.

06:57.750 --> 07:04.150
You should pay attention to it when dealing with different models and different combinations, different

07:04.150 --> 07:07.040
input and different output shapes.

07:07.040 --> 07:08.280
The output is.

07:08.320 --> 07:09.920
Float 300.

07:10.080 --> 07:22.000
Float 32, which is the float 32 provides much higher numerical precision you than, uh, unsigned integer

07:22.040 --> 07:22.520
eight.

07:22.760 --> 07:33.760
So the result would be a 3 or 4 of those tflite detection postprocess, which are of type Float32.

07:33.800 --> 07:40.760
Training and inference require fine grained values for accurate calculations.

07:41.000 --> 07:42.600
Deep learning operations.

07:42.600 --> 07:44.160
Metric metrics.

07:44.640 --> 07:51.560
Multiplication activations work much better with floating point numbers.

07:51.600 --> 08:00.120
Many models internally normalize pixel values to range from 0 to 1 or -1 to 1, but in this example,

08:00.120 --> 08:02.640
we're not normalizing anything.

08:02.760 --> 08:03.240
Okay.