WEBVTT

00:00.530 --> 00:06.800
Now that you have gained a high level understanding of the inner workings of binaries, it's time to

00:06.800 --> 00:09.200
delve into a specific binary format.

00:09.230 --> 00:15.980
In this section, we will explore the executable and linkable format Elf, which serves as the default

00:15.980 --> 00:19.190
binary format for Linux based systems.

00:19.460 --> 00:27.050
Elf executable and linkable format finds its utility in various types of files, including executables,

00:27.080 --> 00:30.950
object files, shared libraries and code dams.

00:30.980 --> 00:38.930
While our primary focus will be on Elf executables in this section, it's important to note that the

00:38.930 --> 00:44.660
concepts we discuss apply to other types of Elf files as well.

00:45.400 --> 00:53.350
Given that we will primarily work with 64 bit binaries through this section and our discussion will

00:53.350 --> 00:57.370
revolve around the intricacies of 64 bit files.

00:57.370 --> 01:04.930
However, it's worth mentioning that 32 bit format is similar, differing mainly in the size of the

01:04.930 --> 01:12.040
size and the order of certain header files and other data structures, but they are basically similar

01:12.070 --> 01:22.150
here and therefore you will have no trouble extrapolating the concepts discussed here to 32 bit binaries.

01:22.360 --> 01:30.880
And in this diagram I created an illustration of the format and contents typically found in a 64 bit

01:30.910 --> 01:33.550
Elf executable file here.

01:33.550 --> 01:39.280
So at first glance, the complexity of analyzing Elf binaries may appear overwhelming.

01:39.370 --> 01:48.410
However, in a sense, Elf binaries consist of four primary components an executable header.

01:50.620 --> 01:52.810
A series of program heater.

01:53.650 --> 01:58.540
A number of sections and series of section headers.

01:58.570 --> 02:02.560
Now let's explore each of these components in detail.

02:03.650 --> 02:10.820
So as we see in this diagram, standard Elf rivalries begin with an executable heater.

02:12.040 --> 02:18.370
Followed by the program headers and conclude with the sections and section headers.

02:19.040 --> 02:21.950
Uh, to facilitate a more coherent discussion.

02:21.950 --> 02:29.360
I will deviate slightly from this order and first delve into the sections and sections leaders before

02:29.360 --> 02:32.660
addressing the program itself here.

02:34.030 --> 02:37.960
So let's begin with the executable header for now.

02:37.960 --> 02:44.050
So the executable header marks the beginning of every file.

02:44.050 --> 02:49.600
So it consists of a structured sequence of bytes that provides essential information about the file,

02:49.630 --> 02:57.700
such as its status as an Elf file, the specific type of file it represents, and the locations within

02:57.700 --> 03:04.000
the file where you can find the remaining contents to gain a comprehensive understanding of the executable

03:04.000 --> 03:10.570
header format, you can refer to the type definition and related definitions of other elf type elf related

03:10.600 --> 03:17.050
types and constants which can be found in our Linux distro here.

03:17.050 --> 03:20.320
So we will jump back to Linux here.

03:22.280 --> 03:23.780
Open this here.

03:24.140 --> 03:27.290
And what we're going to do here is.

03:29.220 --> 03:29.700
Sorry.

03:33.860 --> 03:34.820
We will now.

03:36.100 --> 03:36.760
Had the terminal.

03:36.760 --> 03:48.370
And what we're going to do is we will read the user mouse pad user include dot H here.

03:59.910 --> 04:04.800
Of that age and here, as you can see, actually.

04:24.530 --> 04:26.240
And this is our file.

04:26.780 --> 04:30.950
And here we have the E type.

04:31.130 --> 04:32.650
Machine type here.

04:32.660 --> 04:35.150
As you can see, we also have the comments of it.

04:35.800 --> 04:37.750
Let's increase the font size a little bit.

04:37.750 --> 04:45.190
And here so the executable header is represented here as a C struct here.

04:46.260 --> 04:52.110
And called the Elf 64 e HDR here.

04:54.000 --> 04:59.910
And if you look at up as we did here, you will get the same results.

05:00.270 --> 05:01.500
And here.

05:02.720 --> 05:11.660
And you may notice that the struct definition given there contains types such as 64 half and Elf 64

05:11.660 --> 05:12.830
word here.

05:12.860 --> 05:22.750
These are just typedefs for integer types such as u integer 16 dash t and u integer 32 t.

05:23.000 --> 05:27.020
So for simplicity here you can see.

05:28.200 --> 05:29.070
Uh, the.

05:30.160 --> 05:33.880
Comments of all those definitions here.

05:34.740 --> 05:37.680
And now let's start with the.

05:38.600 --> 05:41.180
E ident array.

05:41.720 --> 05:42.170
Right.

05:42.200 --> 05:43.130
So.

05:44.220 --> 05:45.430
This is an array.

05:45.640 --> 05:47.380
The executable header.

05:47.710 --> 05:48.880
The elf file.

05:48.950 --> 05:49.300
Oops.

05:49.600 --> 05:50.220
Sorry.

05:50.230 --> 05:53.320
Let's actually get the pen here.

05:53.320 --> 05:54.910
So I will draw this.

05:57.030 --> 05:57.210
It.

06:00.050 --> 06:01.100
And here.

06:01.190 --> 06:01.970
So we will.

06:01.970 --> 06:06.050
First, let's start with the ident here.

06:08.060 --> 06:18.650
So the executable heater and the files start with the 16 byte array called the E ident.

06:19.580 --> 06:25.940
And the array always starts with the four byte.

06:27.370 --> 06:29.520
For byte magical.

06:29.530 --> 06:31.500
That's the magic value here.

06:31.510 --> 06:35.740
Identifying the file as an elf binary.

06:37.500 --> 06:38.190
And.

06:39.070 --> 06:41.610
Sexually it again here.

06:42.640 --> 06:49.480
And the magic value consists of the hexadecimal number of 0X7.

06:49.840 --> 06:50.890
F here.

06:51.610 --> 06:54.940
Followed by an Ascii character.

06:55.760 --> 07:02.000
Um, codes for letters like E here, L and F.

07:03.310 --> 07:11.620
Having these bites right at the start is convenient because it allows tools such as file like.

07:12.770 --> 07:14.540
We did in previous year.

07:14.540 --> 07:17.810
We can get the information of files with.

07:19.040 --> 07:19.470
File.

07:19.580 --> 07:27.440
Command in Linux here, for example, let's go to new terminal and desktop and we will use the files

07:27.470 --> 07:28.010
again.

07:30.160 --> 07:32.050
To see the desktop here.

07:33.770 --> 07:35.630
And here we have several files here.

07:35.630 --> 07:38.360
So let's try with my APK here.

07:38.360 --> 07:42.080
And as you can see here, it's a C source Ascii text here.

07:42.110 --> 07:51.200
My my app file, my app dot all here and we can see the Elf 64 bit LSB Relocatable.

07:51.200 --> 07:54.740
We discussed about this in previous lecture here, so.

07:55.720 --> 07:57.400
Uh, we will skip this for now.

07:57.940 --> 08:01.960
And here we have the.

08:05.000 --> 08:10.510
So we can quickly discover that they are dealing with an Elf file and following magic value.

08:10.520 --> 08:18.830
There are a number of bytes that give more detailed information about the specifics of the type of Elf

08:18.830 --> 08:19.550
file.

08:19.700 --> 08:26.930
In elf dot h here elf dot header file the indexes for these bytes.

08:27.200 --> 08:39.220
For example indexes for here four through 15 in the identifier array are symbolically referred as a

08:39.220 --> 08:39.740
E class.

08:39.740 --> 08:41.750
Here I will write it out.

08:41.840 --> 08:46.460
So a E class is the.

08:47.890 --> 08:49.630
A class.

08:50.810 --> 08:51.560
Uppercase.

08:53.050 --> 08:53.860
Also.

08:55.170 --> 08:58.050
E a e theta.

09:04.230 --> 09:07.150
Also a.

09:08.360 --> 09:09.110
Version.

09:14.980 --> 09:15.790
Also.

09:19.910 --> 09:21.680
A wasabi here.

09:22.220 --> 09:23.690
These are the underscores.

09:30.510 --> 09:33.540
And also a E.

09:36.460 --> 09:37.540
Abbey version.

09:37.870 --> 09:38.620
Abby version.

09:38.620 --> 09:40.720
And a.

09:41.450 --> 09:44.240
Lastly here a part.

09:44.300 --> 09:46.760
Sorry for my handwriting.

09:47.240 --> 09:48.830
These are actually not handwriting.

09:48.830 --> 09:50.330
This is mouse writing here.

09:50.780 --> 09:53.540
I'm struggling with this, so.

09:55.290 --> 09:58.650
The A Path field actually contains multiple bytes.

09:58.680 --> 10:06.420
Namely indexes seven nine through 15 in a ident here.

10:10.670 --> 10:16.100
All of these bytes are currently designated as padding, so there are reserved for possible future use,

10:16.100 --> 10:18.080
but currently set to zero.

10:18.080 --> 10:24.800
And the A class byte denotes what the specifications refers to as the binary class.

10:24.800 --> 10:32.720
So this is a bit of a misnomer since the world class is so generic and it could mean almost anything.

10:32.720 --> 10:36.680
So you will learn about this in the next lectures.

10:36.680 --> 10:38.390
But firstly.

10:39.830 --> 10:40.520
We will.

10:42.100 --> 10:43.390
Rudolph here.

10:43.390 --> 10:44.860
We can close this now.

10:45.250 --> 10:46.210
We will.

10:46.210 --> 10:46.800
Rudolph.

10:47.140 --> 10:50.230
Our old file here.

10:50.970 --> 10:51.470
Let's see here.

10:51.470 --> 10:54.800
As you can see here, we should have the dot out.

10:55.280 --> 10:59.120
So if we execute, try to run this app, we will.

10:59.120 --> 11:06.010
This is just a regular Hello World application is written in C and what we're going to do is read Elf

11:06.020 --> 11:12.320
here, read Elf H and a dot out and that's it.

11:12.320 --> 11:16.460
We have the several information here which I will explain right now.

11:16.550 --> 11:26.150
And here the a ident here, this is the a ident is shown on the line marked the magic.

11:26.750 --> 11:27.770
This was the.

11:28.980 --> 11:32.460
That agent which we discussed previously here.

11:33.200 --> 11:34.670
When we started this lecture.

11:34.670 --> 11:35.750
Let's try this.

11:35.750 --> 11:37.280
Open this up here again.

11:38.970 --> 11:39.630
And.

11:46.330 --> 11:48.700
We're going to go to cat or mouse.

11:48.700 --> 11:49.660
Fat is okay here.

11:49.660 --> 11:53.950
Mouse pad is mouse pad user.

11:54.920 --> 11:55.850
Include.

11:58.540 --> 11:59.470
And that.

12:03.740 --> 12:05.380
Once while include l.v.h.

12:05.420 --> 12:05.870
Here.

12:08.560 --> 12:09.670
And here.

12:11.670 --> 12:12.660
We will now.

12:15.910 --> 12:17.080
To that.

12:18.110 --> 12:19.090
Right here.

12:19.600 --> 12:25.030
That's actually the struct, but in struct we have this array magic number and other information here.

12:25.060 --> 12:26.200
A ident.

12:27.130 --> 12:28.510
And here.

12:31.200 --> 12:33.570
The thing you can see here in magic.

12:34.870 --> 12:42.040
As I said, it's the ident array and it starts with the familiar four magic bytes.

12:42.040 --> 12:49.840
Seven F, uh, 45 uh, followed by a value of two.

12:51.990 --> 12:55.920
Indicating that elf class six, fourth, then one.

12:56.860 --> 12:59.290
Um, which is Elf data to LSB.

12:59.320 --> 13:03.730
And finally another one which is EV current.

13:03.730 --> 13:16.150
So the remaining bytes are all zeroed out since the a OCB and a EB version bytes are at their default

13:16.150 --> 13:16.630
values.

13:16.630 --> 13:20.860
So the padding bytes are also are all set to zero as well.

13:20.860 --> 13:30.220
So this the information contained in some of these bytes is explicitly repeated on dedicated lines marked

13:30.220 --> 13:32.830
as the class here.

13:33.880 --> 13:35.140
Data versions.

13:35.470 --> 13:40.180
Two's complement Little endian and version one Current.
