WEBVTT

00:00.540 --> 00:01.740
Hello, my name is Typhoon.

00:01.740 --> 00:06.660
And now let's delve into the fascinating process of disassembling a binary.

00:06.840 --> 00:12.810
Having witnessed how a binary is compiled, it's time to explore the contents of the object file generated

00:12.810 --> 00:15.430
during the assembly phase of compilation.

00:15.450 --> 00:21.810
Subsequently, I will guide you through the disassembly of the main binary executable, highlighting

00:21.810 --> 00:26.610
the distinctions between its contents and those of the object file.

00:26.640 --> 00:33.540
So this will provide you with a clearer understanding of what resides within an object file and what

00:33.540 --> 00:35.700
is added during the linking phase.

00:35.730 --> 00:41.220
To facilitate our disassembly journey, we will utilize objdump.

00:41.250 --> 00:41.930
Utility.

00:41.940 --> 00:48.040
It's a simple and user friendly disassembler that comes bundled in most Linux distributions.

00:48.060 --> 00:57.210
For now we will rely on Objdump object dump to gain quick insights into the code and data encapsulated

00:57.210 --> 00:59.040
within a binary.

00:59.810 --> 01:08.840
And here what we're going to do here is we'll firstly run this red elf again or the Objdump here.

01:09.170 --> 01:11.090
Objdump.

01:12.520 --> 01:19.900
And SG here aero data and after that we will enter our data file.

01:19.900 --> 01:21.790
In this case my app.au.

01:24.630 --> 01:25.900
And here we will click.

01:26.670 --> 01:27.390
Press enter.

01:27.390 --> 01:28.440
And that's it.

01:28.440 --> 01:32.760
So here we have this output.

01:33.460 --> 01:43.720
And let's try another now another command with this here Objdump uppercase M and let's actually write

01:43.720 --> 01:44.230
it again.

01:44.230 --> 01:46.930
So Objdump uppercase.

01:46.930 --> 01:48.580
M Intel D here.

01:48.580 --> 01:54.340
And after that we will again pass the whole file here and that's it.

01:54.340 --> 01:59.860
So here, if you look at carefully here, so.

02:01.120 --> 02:02.110
You will see.

02:02.320 --> 02:06.550
I called this Objdump twice.

02:06.790 --> 02:08.080
First here.

02:09.110 --> 02:17.210
I tell Objdump to show the contents of the dot or R or data section.

02:17.210 --> 02:23.990
So this stands for read only data and it's part of the binary where all constants are stored.

02:23.990 --> 02:26.050
So including this hello world here.

02:26.060 --> 02:26.320
Right?

02:26.330 --> 02:29.150
So we defined this.

02:29.150 --> 02:29.820
Hello world.

02:29.840 --> 02:32.420
Let me actually find a C file here.

02:32.420 --> 02:34.040
So we define this.

02:34.040 --> 02:36.350
Hello world as constants, right?

02:36.350 --> 02:38.900
We define this macro here.

02:39.020 --> 02:46.490
So here this is a read only data section of our binary file.

02:46.490 --> 02:48.320
And here.

02:49.310 --> 02:56.510
I will return to a more detailed discussion of this error data and on the other sections of Elf binaries

02:56.510 --> 03:01.630
in next lectures, which we will also learn about Elf binary format.

03:01.640 --> 03:10.580
So for now you can see that the contents of error data here consists of Ascii encoding here.

03:17.450 --> 03:20.150
But we can see the Ascii encoding of the string.

03:21.380 --> 03:24.500
And we also so this is the left side output.

03:24.830 --> 03:31.400
On the right side, you can see it's the human readable representations of Jews.

03:31.400 --> 03:32.490
Same bias.

03:32.510 --> 03:37.190
Now let's look again at here we have this comma between the Hello world here.

03:37.190 --> 03:42.380
I don't know why, but it's another mistyped here, probably.

03:42.960 --> 03:43.650
And.

03:45.010 --> 03:52.510
So we are seeing this comma between this hello world because we defined Hello world like that.

03:53.030 --> 03:57.430
I mistyped the comma instead of space here.

03:57.430 --> 03:58.540
Sorry for that.

03:59.860 --> 04:01.480
And here.

04:01.750 --> 04:02.590
That's it.

04:02.740 --> 04:03.550
So.

04:05.180 --> 04:12.980
You may wonder why the call that should reference Potts seems to point into the middle of the main function.

04:13.130 --> 04:21.410
So as I mentioned before, data and code references from object files are not yet fully resolved during

04:21.410 --> 04:27.380
compilation, so the compiler lacks knowledge of the base address at which the file will eventually

04:27.380 --> 04:28.240
be loaded.

04:28.250 --> 04:33.260
Consequently, the call to Potts in the object file remain unsolved.

04:33.290 --> 04:39.680
It awaits the linker to fill in the correct value for this reference.

04:39.680 --> 04:49.580
So you can verify this by employing read elf, which reveals all the relocation symbols present in the

04:49.580 --> 04:50.660
object file.

04:52.120 --> 04:52.840
And.

04:54.070 --> 04:54.910
Here.

04:56.260 --> 04:58.390
We can in second call here.

04:58.890 --> 04:59.260
Um.

05:00.480 --> 05:03.720
With this objdump here.

05:03.720 --> 05:09.240
Disassembles all the code in the object file in Intel syntax, as we mentioned here.

05:09.720 --> 05:14.400
As you can see, it contains only code of the main function.

05:15.710 --> 05:17.750
All right, so.

05:19.220 --> 05:23.150
Uh, this because that's only function defined in the source file.

05:23.150 --> 05:23.510
Right?

05:23.510 --> 05:25.600
So let's try that again here.

05:25.610 --> 05:26.330
As you can see.

05:26.360 --> 05:26.960
Oops, not.

05:41.110 --> 05:48.370
As you can see here, we have this only function integer main function in our program and that's why

05:48.370 --> 05:49.690
we are seeing this.

05:51.290 --> 05:52.610
On the main function here.

05:54.720 --> 06:01.020
So for the most part, the output conforms pretty closely to the assembly code previously produced by

06:01.020 --> 06:02.490
the compilation phase.

06:02.730 --> 06:05.210
Give or take a few the assembly level macros here.

06:06.300 --> 06:15.450
What's interesting to note that is the the pointer to the Hello world string at here.

06:16.700 --> 06:17.930
Eddie here.

06:21.120 --> 06:22.560
This call here.

06:24.640 --> 06:27.010
And we also have this ad.

06:29.410 --> 06:29.940
R d.

06:29.950 --> 06:30.790
A r.

06:30.820 --> 06:31.970
A x.

06:33.010 --> 06:33.700
Here.

06:35.430 --> 06:38.490
This year is set to zero.

06:38.490 --> 06:46.050
So and here the this call that should print the string to the screen using paths here.

06:56.120 --> 07:02.000
Also points to a non-essential location here, as you can see.

07:02.090 --> 07:03.770
So but.

07:05.360 --> 07:07.880
We have the offset of.

07:08.870 --> 07:11.750
I think there's one here.

07:12.500 --> 07:12.980
So.

07:15.380 --> 07:16.640
This prints too.

07:16.670 --> 07:20.810
This shows some nonsensical location to us.

07:21.540 --> 07:30.300
And why does the car that should reference Potts point instead into the middle of a main right and I

07:30.300 --> 07:36.120
previously mentioned that the data and code references from object files are not yet fully resolved

07:36.120 --> 07:42.690
because the compiler doesn't know at what base address the file will eventually be loaded.

07:42.690 --> 07:48.780
That's why the call to the for the linker to fill in the correct value for this reference.

07:49.720 --> 07:54.460
And here we will use the red elf here.

07:55.880 --> 07:57.200
Red elf.

07:57.200 --> 07:58.640
Relax here.

08:05.740 --> 08:10.540
Relax and we will also again use the my app.

08:11.450 --> 08:12.070
That all?

08:14.840 --> 08:15.740
And that's it.

08:15.740 --> 08:16.780
This is our output.

08:16.790 --> 08:19.970
Using the red elf here, we use the relax parameter.

08:19.970 --> 08:21.980
So the relocation.

08:21.980 --> 08:24.680
This is a relocation symbol at here.

08:26.050 --> 08:32.320
Has the linker that it should resolve the reference to the string to point to whatever address it ends

08:32.320 --> 08:35.560
up at in the air or data.

08:36.820 --> 08:37.330
Section.

08:37.330 --> 08:43.630
So similarly, the line marked line, the second line after this offset here.

08:43.810 --> 08:45.670
1A1.

08:46.830 --> 08:47.640
A one.

08:47.670 --> 08:48.480
A1E.

08:48.510 --> 08:49.170
Here.

08:50.080 --> 08:50.830
So.

08:51.530 --> 08:52.340
Here.

08:52.340 --> 08:53.480
Similarly the.

08:54.290 --> 08:59.450
This line tells the linker how to resolve the calls to pots.

08:59.540 --> 08:59.990
Right.

08:59.990 --> 09:05.000
So you may notice that the value for here.

09:06.010 --> 09:10.840
The the first one is zero and the second one is four being subtracted.

09:12.070 --> 09:13.810
From the parts symbol.

09:14.630 --> 09:16.640
So you can ignore that for now.

09:16.670 --> 09:23.590
So the way the linker computes, relocation is a bit involved and the relative output can be confusing

09:23.600 --> 09:24.470
in most cases.

09:24.470 --> 09:31.310
So I will just gloss over the details of relocation here and focus on the bigger picture of this assembly.

09:32.360 --> 09:33.900
A binary instead.

09:33.920 --> 09:39.920
So I will provide more information about relocation symbols in next sections of our course.

09:41.450 --> 09:47.600
And the leftmost column of each line in the output.

09:48.770 --> 09:51.900
Uh, is the offset in the object file here?

09:52.940 --> 09:55.440
Uh, where the resolved reference must be filled in.

09:55.460 --> 10:02.150
So if you are paying close attention, you may notice that in both cases it's equal to the offset of

10:02.150 --> 10:05.850
the instructions that needs to be fixed plus one.

10:05.870 --> 10:11.420
For instance, the call to pots is at code offset.

10:11.420 --> 10:14.120
Here is one.

10:14.940 --> 10:17.790
A1A here, right?

10:20.160 --> 10:25.020
And this is because you only want to override the operand of the instruction.

10:25.020 --> 10:28.350
So not the opcode of the instruction itself.

10:28.350 --> 10:34.530
So it just so happens that for both instructions that need fixing up the opcode is one byte long.

10:34.530 --> 10:37.620
So it point to the instruction operand.

10:37.950 --> 10:42.150
The relocation symbol needs to skip past the opcode byte.
