WEBVTT

00:00.630 --> 00:07.260
Now that we have a deeply explored the Elf format, now let's turn our attention to another value as

00:07.260 --> 00:10.860
binary format, the portable executable format.

00:11.490 --> 00:18.090
Understanding the P format is particularly valuable for analyzing Windows binaries, which are prevalent

00:18.090 --> 00:22.950
in malware analysis and Windows specific applications.

00:23.890 --> 00:30.700
And p portable executable can be seen as modified version of the common object file format F, which

00:30.700 --> 00:36.460
was previously utilized in Unix based systems before being replaced by Elf.

00:36.460 --> 00:45.280
And due to this historical connection, P portable executable is sometimes referred to as P slash C

00:45.280 --> 00:56.320
or F, and it's worth noting that the 64 bit variant of P is called P 32 plus, although there are minor

00:56.320 --> 01:00.430
differences between P 32 plus and the original P format.

01:00.430 --> 01:06.760
For simplicity, we will refer to it as P throughout our section here.

01:06.760 --> 01:15.580
And in this overview of the P format, we will highlight its key distinctions from Elf, providing insights

01:15.580 --> 01:19.480
for those interested in working with Windows binaries.

01:19.480 --> 01:28.160
And it's important to keep in mind that the p e shares many similarities with other formats and fortunately

01:28.160 --> 01:35.360
with our newfound knowledge of Elf, which you learned in previous lecture, exploring additional binary

01:35.360 --> 01:38.150
formats becomes a much smoother journey.

01:38.150 --> 01:46.070
And to facilitate our discussion and aid in visualization, I've created a diagram specifically for

01:46.070 --> 01:53.270
this lecture, and this diagram focuses on the key data structures defined within a win and t dot header

01:53.270 --> 02:00.020
file and essential header file included in the Microsoft Windows Software Development Developer kit.

02:00.110 --> 02:04.430
And now let's delve into the specifics of the P format here.

02:05.180 --> 02:11.750
As you examine the diagram, you will notice that both similarities and crucial differences compared

02:11.750 --> 02:12.890
to the format.

02:12.890 --> 02:18.520
And one significant distinction is the presence of the Ms-dos header within the P format.

02:18.530 --> 02:26.150
So yes, we are referring to Ms-dos, the Microsoft operating system that made its début in 1981.

02:26.180 --> 02:32.900
You might be wondering why such an archaic element exists in a modern binary format.

02:32.900 --> 02:40.910
So the answer lies in backward compatibility, and during the introduction of the P portable executable

02:40.910 --> 02:48.380
format, there are there was a transitional period when users were utilizing both traditional Ms-dos

02:48.380 --> 02:56.930
binaries and a newer P binaries, and to ease this transition and reduce confusion, every P file starts

02:56.930 --> 03:05.970
with an Ms-dos header, enabling it to be interpreted as an Ms-dos binary to some extent and the primary

03:05.970 --> 03:13.920
purpose of the Ms-dos header is to outline how to load and execute an Ms-dos stub, which follows the

03:13.920 --> 03:15.120
Ms-dos header here.

03:15.120 --> 03:22.170
And this step typically consists of a small Ms-dos program that runs instead of the main program when

03:22.170 --> 03:25.830
a user executes a P binary in Ms-dos mode.

03:25.830 --> 03:33.180
And while the Ms-dos stub program often prints a message like, this program cannot be run in Dos mode

03:33.180 --> 03:34.740
and then exists.

03:35.370 --> 03:40.020
It's important to note that in theory it could be fully functional.

03:40.020 --> 03:46.500
Ms-dos version of the program itself and the Ms-dos header begins with a magic value represented by

03:46.500 --> 03:49.740
the Ascii characters M Z.

03:49.830 --> 03:53.580
Hence it is sometimes referred to as an MSI header.

03:53.580 --> 04:00.120
For our purposes, the most important field within the Ms-dos header is the last field known as the

04:00.260 --> 04:10.590
l, f, a, n e w, and this field denotes the file offset at which the actual p p portable executable

04:10.620 --> 04:11.880
binary begins.

04:11.880 --> 04:20.430
And consequently when a portable executable where program loader opens the binary, it reads the Ms-dos

04:20.430 --> 04:21.210
header.

04:21.950 --> 04:29.480
And skips past it along with the Mr. Stop and proceeds directly to the start of the PE heaters.

04:29.510 --> 04:37.400
Now let's explore the PE heaters, which can be seen as the equivalent of Elf's executable loader.

04:37.400 --> 04:41.780
However, in the case of PE, the executable heater here.

04:42.230 --> 04:44.130
Executable file.

04:44.150 --> 04:47.630
Heater PE signature and PE optional heater.

04:48.680 --> 04:56.810
So in the case of the P the portable executable, the executable header is divided here.

04:57.600 --> 05:04.920
Into three distinct parts, a 32 bit signature, a P file header and a P optional.

05:04.950 --> 05:05.460
Header.

05:05.460 --> 05:11.280
And if you refer to the definitions in winrt dot header file, we encounter.

05:11.310 --> 05:19.320
A structure named image and headers 64, which encompasses all three components and collectively we

05:19.320 --> 05:26.610
can consider the entire image and the headers 64 structure as portable executables equivalent of the

05:26.610 --> 05:27.600
executable header.

05:27.600 --> 05:35.100
And however, in practice the signature file header and optional header are treated as separate entities

05:35.100 --> 05:42.060
and each serving a unique purpose in the overall structure of the PE format and.

05:42.940 --> 05:49.030
By gaining a comprehensive understanding of the PE format, we acquired the necessary insights to effectively

05:49.030 --> 05:50.980
analyse the Windows binaries.

05:51.010 --> 05:58.300
Navigating the interface of intricacies of these binaries becomes more feasible as we grasp the inner

05:58.300 --> 06:02.530
workings of the PE format and.

06:03.630 --> 06:07.380
Now what we're going to do is here we will firstly.

06:09.210 --> 06:14.130
Go back to Windows Machine because we will create some program here.

06:16.470 --> 06:20.760
And after that here, let's actually open the.

06:21.950 --> 06:23.630
Our windows machine here.

06:25.800 --> 06:26.460
Where?

06:27.220 --> 06:28.120
And that's it.

06:30.050 --> 06:30.650
Hello.

06:31.100 --> 06:40.490
And now let's write a simple Hello world application using C and compile it in our windows machine.

06:40.490 --> 06:42.760
And let me see here.

06:42.770 --> 06:43.360
Yes.

06:43.370 --> 06:44.720
Here we will create a new file.

06:44.720 --> 06:45.350
This is my.

06:45.920 --> 06:47.540
And here we will.

06:48.880 --> 06:51.130
Hello, world dot C.

06:52.760 --> 06:53.600
Let's save it.

06:54.430 --> 06:56.370
On desktop and here.

06:56.380 --> 06:58.390
So what we're going to do is.

06:59.640 --> 07:05.000
And include here include stdio dot h here.

07:05.010 --> 07:07.260
Let's actually increase the font size a little bit.

07:09.160 --> 07:16.960
You don't want to install any recommended plusplus extension pack because we will not develop any other

07:16.960 --> 07:22.990
than just a simple Hello world application here and integer main here.

07:27.220 --> 07:28.360
And here.

07:31.010 --> 07:32.000
We will again.

07:33.010 --> 07:34.270
As we did on the Linux.

07:34.270 --> 07:42.670
We created our own Hello World application, compiled it, analyze it on in previous lecture for.

07:43.610 --> 07:46.640
P e here and L files.

07:46.850 --> 07:48.230
If you remember that.

07:48.860 --> 07:52.130
And here we will print F print f.

07:55.940 --> 07:57.530
Grand theft here.

07:58.910 --> 07:59.630
We'll just.

07:59.660 --> 08:00.410
Hello?

08:01.740 --> 08:02.340
Well world.

08:03.360 --> 08:04.050
And that's it.

08:04.050 --> 08:08.310
After that, we will return here and that's it.

08:08.620 --> 08:12.480
Now we will open our cmd here.

08:12.510 --> 08:15.060
Let's also increase the font size a little bit to.

08:17.930 --> 08:19.130
Let's give it a color.

08:21.930 --> 08:26.100
And here we will go to the desktop here in desktop.

08:30.020 --> 08:30.590
In this graph.

08:30.590 --> 08:36.080
As you can see, we have Hello world dot C and now we will compile it with GCC.

08:40.020 --> 08:45.600
And as you can see, we have error here because we forgot to.

08:46.090 --> 08:48.060
Oops, we forgot to.

08:49.180 --> 08:50.920
Add the comma here after return.

08:50.920 --> 08:51.430
Sorry.

08:51.430 --> 08:51.970
Here.

08:52.060 --> 08:53.200
And that's it.

08:53.290 --> 08:58.660
Now we will use the Hello world dot C and as you can see, our program is compiled.

08:58.810 --> 09:01.190
This is our dot exit here.

09:01.210 --> 09:02.830
Now we will.

09:04.990 --> 09:10.450
And here, as you can see, we created our Hello World program and we will further analyze this program

09:10.450 --> 09:13.090
on Linux using the objdump.

09:13.420 --> 09:17.680
And however, this file can also be analyzed in Windows system.

09:17.770 --> 09:24.940
But since the Objdump tool comes pre-installed in Linux, we will use it right now.

09:24.940 --> 09:25.840
So we will.

09:27.040 --> 09:31.810
Copy this and we will go back to our Linux machine here.

09:32.560 --> 09:33.610
Holly here.

09:38.680 --> 09:40.210
Enter the password.

09:41.420 --> 09:44.060
We'll copy it right on the desktop.

09:46.390 --> 09:51.710
And so the apex, if we run if we open this, we have some of the data.

09:51.890 --> 09:54.470
Data which you will learn all of this here.

09:54.500 --> 09:56.120
We have also BC.

09:57.890 --> 10:01.610
And now what we're going to do is we will open the terminal.

10:11.770 --> 10:13.930
We'll write the Objdump.

10:15.470 --> 10:19.250
The x here and a x here.

10:19.490 --> 10:23.120
And as you can see, we don't have any file because we are not in the right directory.

10:23.150 --> 10:28.220
We will see the printer directory to desktop and objdump.

10:30.290 --> 10:33.620
With x parameter and after that, hello dot x.

10:34.040 --> 10:38.240
And here, as you can see, we have a lot of information going on here.

10:38.630 --> 10:49.250
And the signature is simply a string containing the Ascii characters key, followed by the few new characters

10:49.250 --> 10:49.790
here.

10:50.750 --> 10:58.640
And it is analogous to the magic bytes in the ident field in the Elf executable header.

10:59.130 --> 11:05.450
Um, if you remember, we tested that in the previous section and the file header here describes the

11:05.450 --> 11:11.690
general properties of the file and the most important fields are here.

11:13.490 --> 11:14.060
Actually.

11:14.960 --> 11:15.950
Down a little bit.

11:17.280 --> 11:18.900
That's actually a copy this because.

11:20.970 --> 11:23.760
We will need to analyze it further.

11:54.250 --> 11:56.080
Well, copy all of it here.

11:56.320 --> 11:56.950
Now.

12:12.540 --> 12:16.200
And now we will simply open the mouse path here.

12:19.280 --> 12:21.710
And let's save this as a.

12:26.170 --> 12:26.830
Analyze that.

12:26.830 --> 12:28.480
See, because we want.

12:29.130 --> 12:31.560
The color colors here.

12:35.260 --> 12:35.680
That's it.

12:38.040 --> 12:43.680
And after that, let's turn our terminal font size back to normal.

12:55.370 --> 13:00.650
And here, as you can see here, the most important fields here are machine.

13:03.040 --> 13:04.810
Number of.

13:07.940 --> 13:09.140
Here and.

13:10.250 --> 13:11.360
And here.

13:12.080 --> 13:13.490
Also have the.

13:16.950 --> 13:18.090
Characteristics.

13:19.010 --> 13:24.230
Which is, as you can see, we also have this hex number, relocation stripped, executable line number

13:24.230 --> 13:27.070
strings and 32 bit words here.

13:27.080 --> 13:33.890
And the two fields describing the symbol table are deprecated and the P files should no longer make

13:33.890 --> 13:37.160
use of embedded symbols and debugging information.

13:37.160 --> 13:45.620
And instead these symbols are optionally emitted as part of the separate debug file and as an elf's

13:45.890 --> 13:48.080
machine, the machine field here.

13:50.090 --> 13:50.570
Yeah.

13:50.990 --> 13:52.220
Machine field here.

13:57.010 --> 13:59.470
What but the be that we will go to.

14:04.640 --> 14:05.200
It is.

14:13.080 --> 14:16.800
USR include elf dot a header here.

14:17.040 --> 14:18.480
We'll go to a machine.

14:21.460 --> 14:25.510
And as you can see here, here, this is as you can see, its architecture.

14:25.720 --> 14:30.070
And this in machine here in our life.

14:30.100 --> 14:37.000
This machine field describes the architecture of the machine for which the P file is intended, and

14:37.000 --> 14:39.040
in this case, the.

14:40.010 --> 14:42.140
Excited here.

14:43.800 --> 14:46.560
X 86 and.

14:48.970 --> 15:00.970
64 and which you will you will find this x 60, x 86 and 64 because these here defined as constants

15:01.000 --> 15:05.110
0X0 86 or 0 x.

15:06.100 --> 15:07.390
86 here.

15:09.980 --> 15:13.250
Which you will see at the beginning of the file.

15:14.770 --> 15:16.600
And as you can see, its architecture is different.

15:16.600 --> 15:17.320
That's why.

15:18.880 --> 15:25.120
We didn't return it and file format Pi 386 here in this case.

15:25.120 --> 15:37.990
But if your architecture is 8664 bit, you will see this 86 and 6484 and the last four hex numbers.

15:39.280 --> 15:40.960
And here we also.

15:41.940 --> 15:48.420
Have the number of section fields, that number of sections.

15:56.750 --> 15:58.820
Number of section.

16:03.600 --> 16:11.670
And this number of section field is simply the number of entries in the section header table and the

16:11.670 --> 16:18.630
size of opt optional header is the size in bytes of the optional header that follows the file header

16:18.630 --> 16:21.540
and the characteristics field here.

16:23.160 --> 16:24.000
Characteristics.

16:24.180 --> 16:28.950
Feel you can see here zero x 107.

16:30.960 --> 16:35.160
Uh, this field lacks the mark here.

16:36.120 --> 16:43.380
And describing things such as the Indians of the binary.

16:43.410 --> 16:52.680
We have this relocation stripped, executable line number stripped 32 bit words and whether it's a DLL

16:52.680 --> 16:58.500
and whether it has been stripped as shown in our objdump output.

16:58.500 --> 17:05.160
The example binary contains characteristics flags that mark it as a large address.

17:05.190 --> 17:11.250
Now the relocation and executable um executable file here.

17:13.590 --> 17:17.250
And it's a 32 bit words, right?

17:18.080 --> 17:23.510
And it's a line numbers also also the line numbers is stripped which you will you will learn what it

17:23.510 --> 17:25.610
is in next lecture.

17:25.610 --> 17:32.480
And despite what the name suggests, the PE optional header is not really optional for executable.

17:32.490 --> 17:38.930
So it may be missing in object files and in fact you will likely find the portable executable optional

17:38.930 --> 17:42.580
header in any portable executable you will encounter.

17:42.590 --> 17:42.940
Right.

17:42.950 --> 17:51.380
So it contains contains lots of fields and I will go over the most important ones in next lecture.

17:51.380 --> 17:52.090
I'm waiting you in.

17:52.110 --> 17:52.850
Next lecture.
