WEBVTT

00:00.440 --> 00:01.760
Congratulations.

00:01.760 --> 00:08.450
By now, you have gained a solid understanding of how the binary compilation process works and have

00:08.450 --> 00:11.120
explored the inner workings of binaries.

00:11.330 --> 00:17.570
You have even delved into the realm of static disassembly using objdump, and if you've been following

00:17.570 --> 00:24.050
along, you may have your very own shiny new binaries sitting on your hard drive.

00:24.170 --> 00:31.520
Now it's time to dive into the fascinating journey of loading and executing a binary.

00:31.550 --> 00:39.960
A crucial topic that will pave the way for exploring dynamic analysis concepts in upcoming sections.

00:39.980 --> 00:46.940
While the precise details may vary depending on the platforms and binary format, the process of loading

00:46.940 --> 00:51.260
and executing a binary typically involves a series of fundamental steps.

00:51.260 --> 00:53.240
And here I draw some diagram.

00:53.240 --> 00:53.630
Here.

00:53.630 --> 00:58.550
In this diagram we are providing a glimpse of.

00:59.760 --> 01:08.580
Binaries into how an elf binary such as the one we just compiled in previous lecture, is represented

01:08.580 --> 01:11.310
in memory on a Linux based platform.

01:11.310 --> 01:18.030
On a high level, the process of loading a binary on Windows follows a similar pattern.

01:18.030 --> 01:24.930
So the process of loading a binary is so intricate and requires substantial effort from the operating

01:24.930 --> 01:25.380
system.

01:25.380 --> 01:32.580
And it's important to note that the binaries representation in memory does not necessarily mirror its

01:32.580 --> 01:33.960
own disk representation.

01:33.960 --> 01:37.710
For example, sections of zero initialized.

01:39.290 --> 01:45.050
Zero initialized data within the on disk binary may have compressed to conserve disk space.

01:45.050 --> 01:52.340
So however, when loaded into memory, these sections expand to contain the actual zero values.

01:53.460 --> 02:01.710
So furthermore, certain portions of the on disc binary may be recorded in memory or not loaded into

02:01.710 --> 02:02.390
memory at all.

02:02.400 --> 02:09.960
So as the intricacies of on disk versus in memory, binary representations are closely tied to a specific

02:09.960 --> 02:10.770
binary formats.

02:10.770 --> 02:16.950
We will explore this topic in greater detail in the next sections.

02:16.950 --> 02:20.670
For now, let's focus on the high level overview of the loading process.

02:20.670 --> 02:28.200
So when you decide to run binary, the operating system initiates the setup of a new process dedicated

02:28.200 --> 02:32.400
to running the program, complete with its own virtual address space.

02:32.610 --> 02:38.460
Subsequently, the operating system maps an interpreter into the virtual memory of the process.

02:39.770 --> 02:47.600
This interpreter, a user space program possesses the knowledge and capability to load the binary and

02:47.600 --> 02:51.380
perform the necessary relocations on Linux system.

02:51.380 --> 03:00.340
The interpreter is typically shared library known as a Linux, so here or lib one lib 2.0.

03:00.560 --> 03:10.520
Conversely, on Windows, the interpreter functionality is integrated into the ntdll.dll so once the

03:10.520 --> 03:17.000
interpreter is loaded, the kernel hands over control to it and the interpreter begins its work with

03:17.000 --> 03:18.740
the userspace environment.

03:18.740 --> 03:23.750
So the role of the interpreter is crucial in preparing the binary for execution.

03:23.750 --> 03:29.900
It carries out a series of tasks such as resolving symbol references, setting up the program's initial

03:29.900 --> 03:32.990
memory layout and performing necessary relocations.

03:33.380 --> 03:39.110
By delegating these responsibilities to the interpreter, the operating system can ensure a consistent

03:39.110 --> 03:44.910
and reliable execution environment for binaries across various platforms.

03:44.970 --> 03:51.990
Understanding the loading process is essential as it forms the foundation for dynamic analysis techniques

03:51.990 --> 03:54.660
that we will explore in next sections.

03:54.660 --> 04:01.140
Stay tuned for further exciting insights into dynamic analysis of binaries here.

04:01.650 --> 04:02.610
So.

04:04.750 --> 04:05.440
Here.

04:08.880 --> 04:13.590
What are we going to do is we will first open our Linux machine here.

04:13.590 --> 04:16.500
Let me change the screen here.

04:17.840 --> 04:19.160
Holly here.

04:23.230 --> 04:26.620
It's clear that console and now.

04:28.070 --> 04:39.680
Linux Elf binaries come with a special section called the int int ARP that specifies the path to the

04:39.680 --> 04:43.100
interpreter that is to be used to load the binary.

04:43.740 --> 04:54.090
Now we will see that with red elf with P here dot interp a of my app dot.

04:54.980 --> 04:55.820
All here.

04:58.430 --> 04:59.720
Pay that out because.

05:00.170 --> 05:01.370
And that's it.

05:02.620 --> 05:03.850
And here.

05:04.630 --> 05:12.520
As you can see with Rudolph, we are saying this the interpreter part of the interpreter.

05:12.820 --> 05:18.430
And as mentioned here, the interpreter loads the binary into its virtual address space.

05:18.460 --> 05:25.420
The space in which the interpreter is loaded, and it then parses the binary to find out.

05:26.100 --> 05:29.640
Among other things, which dynamic libraries the binary uses.

05:29.640 --> 05:38.340
So the interpreter maps these into virtual address space using map or an equivalent function, and then

05:38.340 --> 05:45.090
performs any necessary last minute relocations in the binary code sections.

05:47.310 --> 05:48.000
So.

05:48.390 --> 05:50.490
So as I said this.

05:51.930 --> 05:59.580
Does any last minute relocations in the binary code sections to fill in the correct addresses for references

05:59.580 --> 06:00.990
to the dynamic libraries.

06:01.020 --> 06:09.270
In reality, the process of resolving resolving references to functions in dynamic libraries is often

06:09.360 --> 06:11.580
deferred until later.

06:11.580 --> 06:16.950
So in other words, instead of resolving these references immediately at load time, the interpreter

06:16.950 --> 06:21.030
resolves references only when they are invoking for the first time.

06:21.030 --> 06:26.940
So this is known as lazy binding, which I will explain in more detail in next sections.

06:26.940 --> 06:33.180
And after relocation is complete, the interpreter looks up the entry point of the binary and transfers

06:33.180 --> 06:37.980
control to it, beginning normal execution of the binary.

06:38.070 --> 06:43.680
Now that you are familiar with the general anatomy of a life cycle of a binary, it's time to dive into

06:43.680 --> 06:47.280
the details of specific binary format.

06:47.580 --> 06:54.610
And we will start with the widespread Elf format, which is the subject of the next sections of our

06:54.610 --> 06:55.180
course.

06:55.180 --> 06:57.430
And I'm waiting you in next lecture.
