WEBVTT

00:00.490 --> 00:06.340
Binary analysis revolves around the examination and evaluation of binaries.

00:06.370 --> 00:12.100
To understand this field better, let's delve into the fundamental concepts of binary formats and life

00:12.100 --> 00:13.720
cycle of binaries.

00:13.750 --> 00:20.410
Once you grasp these concepts, you will be well prepared to tackle the subsequent sections which we

00:20.410 --> 00:25.450
will delve into Elf and P binaries in next sections.

00:25.450 --> 00:32.230
So these two formats are extensively utilized on Linux and Windows Systems, making them crucial for

00:32.230 --> 00:36.390
comprehensive binary analysis in the realm of modern computing.

00:36.400 --> 00:44.320
Computations are carried out using the binary numerical systems, which represents numbers as sequences

00:44.320 --> 00:46.390
of ones and zeros.

00:46.420 --> 00:53.530
The binary code, also known as machine code, is a language executed by these computer systems.

00:53.530 --> 01:02.690
So a program comprises a combination of machine instructions which, as the name and binary code and

01:02.690 --> 01:10.790
data such as variables and constants to manage the multitude of programs on a given system effectively,

01:10.790 --> 01:20.360
it becomes essential to store the code and data belonging to each program within a single fine and self-contained

01:20.360 --> 01:20.960
file.

01:20.960 --> 01:28.250
So these files contain executable binary programs are referred as binary executable files or simply

01:28.250 --> 01:28.670
binaries.

01:28.670 --> 01:36.080
So the primary objective of this course is to of this section is to explore and analyze these binaries

01:36.080 --> 01:40.630
comprehensively for malware analysis and reverse engineering.

01:40.640 --> 01:48.410
So before we dive into the intricacies of binary formats like Elf and PE, let's begin with a broad

01:48.410 --> 01:53.240
overview of how executable binaries are generated from source code.

01:53.270 --> 02:00.320
Following that, we will dissect a sample binary, allowing you to gain a solid understanding of the

02:00.320 --> 02:04.850
code and data encapsulated within binary files.

02:04.880 --> 02:13.040
Armed with this knowledge, you will proceed to explore Elf and P binaries in next sections and also

02:13.040 --> 02:13.670
in next sections.

02:13.670 --> 02:20.240
You will have the opportunity to construct your own binary loader, enabling you to parse and analyze

02:20.240 --> 02:20.930
binaries.

02:20.930 --> 02:26.330
So firstly we will understand this compilation phase.

02:26.330 --> 02:32.450
So the production of binaries involves a process called compilation, right?

02:32.450 --> 02:41.600
Which translates human readable source code such as C or C plus plus into machine executable by processor.

02:41.600 --> 02:43.790
So here we are.

02:43.790 --> 02:51.200
I draw some diagram and this illustrates an overview of these steps typically involved in the compilation

02:51.200 --> 02:53.690
process for C code or C plus plus code.

02:53.690 --> 03:04.700
Similar steps apply to some languages as well and compiling C code in encompasses four phases here the

03:04.700 --> 03:08.270
preprocessing compilation assembly.

03:08.880 --> 03:09.690
And linking.

03:09.690 --> 03:15.630
So interestingly, one of these phases is also known as compilation and creating a particular overlap

03:15.630 --> 03:17.310
in terminology.

03:17.360 --> 03:21.330
Other modern compilers often merge some or all of these phases.

03:21.330 --> 03:26.490
We will examine them individually for the sake of clarity and demonstration.

03:26.490 --> 03:32.760
So firstly we have the preprocessing, which you will also learn more deeply in next lectures.

03:32.760 --> 03:38.100
So this initial phase involves the preprocessing of the source code.

03:38.130 --> 03:45.810
It includes the operations such as macro expansions, file inclusion and conditional compilation.

03:45.810 --> 03:52.920
So Preprocessing ensures that the code is ready for subsequent stages of compilation.

03:52.920 --> 03:55.770
We also have the compilation in this phase.

03:55.770 --> 04:02.310
The Preprocessed source code is translated into assembly language specific to the target architecture,

04:02.310 --> 04:10.420
so the compiler analyzes the code syntax checks for errors and generates assembly instructions accordingly.

04:11.370 --> 04:13.500
In third we have assembly.

04:13.680 --> 04:20.100
During assembly, the compiler further translates the assembly code into the object code consisting

04:20.100 --> 04:25.230
of binary instructions and data representations specific to the target processor.

04:25.260 --> 04:32.700
This step involves resolving memory addresses and generating relocation information and lastly linking.

04:33.180 --> 04:39.270
Finally, the linking phase combines the generated object code with the necessary libraries and object

04:39.270 --> 04:43.530
files to create a fully functional binary executable.

04:43.560 --> 04:50.640
The linker resolves symbol references, performs memory address fixing and generates the final binary

04:50.670 --> 04:53.910
file ready for execution.

04:54.210 --> 05:02.190
And it's important to note that although the four phases mentioned here are typically distinct, modern

05:02.190 --> 05:08.910
compilers often optimize and streamline the process by merging some or all of these steps.

05:08.910 --> 05:16.120
So nonetheless, understanding each phase individually provides a solid foundation of for comprehending

05:16.120 --> 05:19.360
the intricacies of the binary analysis process.

05:19.360 --> 05:26.110
And by familiarizing yourself with the compilation process and the anatomy of binaries, you are now

05:26.110 --> 05:30.820
equipped to delve deeper into specific formats of Elf and P binaries.

05:30.820 --> 05:37.900
So these values formats hold a wealth of information waiting to be explored and analyzed, offering

05:37.900 --> 05:43.960
valuable insights into the inner workings of executable programs.

05:44.990 --> 05:47.810
And here we have this diagram here.

05:47.810 --> 05:54.030
So now let's delve deeper into the pre-processing phase.

05:54.050 --> 05:56.240
In next lecture.
