WEBVTT

00:00.480 --> 00:06.240
Once the pre-processing phase concludes, the source code is prepared for compilation.

00:06.270 --> 00:12.600
During the compilation phase, the pre-processing code undergoes translation into assembly language.

00:12.600 --> 00:18.540
So it's worth noting that compilers often incorporate significant optimization capabilities in this

00:18.540 --> 00:31.770
phase, so these optimizations can be adjusted using the GCC uppercase or zero through GCC uppercase

00:31.800 --> 00:33.120
or three.

00:33.450 --> 00:36.750
So from 0 to 3 options in GCC.

00:36.870 --> 00:44.190
So the level of optimization choosing can greatly impact the resulting disassembly as you will explore

00:44.190 --> 00:47.070
in detail in next sections.

00:47.070 --> 00:54.360
So now you might wonder why the compilation phase produces assembly language instead of directly generating

00:54.360 --> 00:55.110
machine code.

00:55.110 --> 01:01.810
So this design decisions becomes clearer when considering the multitude of programming languages in

01:01.810 --> 01:02.470
existence.

01:02.470 --> 01:09.460
So aside from C, there are numerous popular compiled languages like C Plus plus objective C, Common

01:09.460 --> 01:13.210
Lisp, Delphi, Go and Haskell to just name a few.

01:13.210 --> 01:21.100
So developing a compiler that directly emits machine code from each of these languages would be an incredibly

01:21.100 --> 01:23.780
daunting and time consuming task.

01:23.800 --> 01:30.640
Instead, it is more practical to emit assembly code, which is already quite challenging for developers,

01:30.640 --> 01:37.840
and utilize a dedicated assembler that handles the final translation from assembly to machine code for

01:37.840 --> 01:39.280
all supported languages.

01:39.310 --> 01:46.540
Therefore, the output of the compilation phase is assembly code represented in a reasonably human readable

01:46.540 --> 01:50.080
format with preserved symbolic information.

01:50.080 --> 01:57.400
So in the case of GCC, which automates all compilation phases by default, you need to instruct it

01:57.400 --> 02:02.710
to halt after the compilation stage and save the assembly files to disk.

02:02.740 --> 02:08.410
To examine the emitted assembly as we did for in previous lecture for pre-processing phase.

02:08.410 --> 02:15.880
So to achieve this you can use the s flag where where's is conventional extension for assembly files?

02:15.910 --> 02:25.090
Additionally, the maxim here m a s m Intel option is passed to GCC instructing it to generate assembly

02:25.090 --> 02:29.140
Intel syntax instead of the default AT&amp;T syntax.

02:29.140 --> 02:36.040
So to provide you with a visual representation, we will use visual and practical representation.

02:36.040 --> 02:39.030
We will create an example code.

02:39.040 --> 02:44.860
So here now let's create this code here.

02:46.570 --> 02:55.390
GCC again, as we said, we will use the uppercase S and m a s m Intel here.

02:55.390 --> 03:05.890
And after that we will input our C file, which is my app dot C here, and that's it.

03:05.890 --> 03:06.970
It's compiled.

03:06.970 --> 03:15.010
And what we're going to do is we will read the compilation example with Mousepad with some text editor

03:15.010 --> 03:15.520
here.

03:15.520 --> 03:17.050
Mouse pad.

03:19.240 --> 03:21.400
The my app dot here.

03:21.400 --> 03:28.540
As you can see, we have C and S, we will open the S, which just generated and here this is assembly

03:28.540 --> 03:33.370
generated by the compilation phase for the Hello World Program.

03:33.370 --> 03:39.100
And for now I won't go into details about the assembly code just for now.

03:39.100 --> 03:46.120
But but what's interesting here is that the assembly code is relatively easy to read because the symbols

03:46.120 --> 03:49.630
and functions have been preserved.

03:49.630 --> 03:57.970
So for instance, constant and variable variables have symbolic names rather than like just addresses,

03:57.970 --> 04:06.040
even if it's just an automatically generated name such as SC zero for the Nameless Hello World String,

04:06.040 --> 04:09.640
and there's an explicit label for the main function.

04:09.970 --> 04:12.790
The only function in this case here.

04:13.510 --> 04:22.180
And the main function is going to be right here above LSB zero and.

04:22.920 --> 04:28.080
And there's an explicit label for the um, also hello world String.

04:28.080 --> 04:28.890
So.

04:30.400 --> 04:36.550
And any reference code to the data are also symbolic, such as reference to this.

04:36.580 --> 04:37.840
Hello world.

04:38.110 --> 04:38.860
Here.

04:40.320 --> 04:40.620
This.

04:40.620 --> 04:41.760
Hello, world here.

04:42.750 --> 04:43.650
And.

04:44.700 --> 04:50.230
And you will have no such luxury when dealing with a stripped binaries.

04:50.250 --> 04:52.380
Later in this course.

04:52.500 --> 04:55.800
Here, for example, we have all the variable names and so on.

04:55.950 --> 04:59.700
And in the next lecture we will go with the assembly phase.

04:59.700 --> 05:01.710
So I'm waiting you in the next lecture.
