WEBVTT

00:00.680 --> 00:01.600
Hello everyone.

00:01.880 --> 00:13.360
I'm here again and in this lecture we are going to explore two fundamental concepts in x86 64 assembly

00:13.360 --> 00:16.920
programming which is the memory addressing.

00:19.320 --> 00:21.360
And Endianness.

00:21.760 --> 00:28.200
So these are essential for understanding how data is stored, accessed and manipulated at a low level

00:28.200 --> 00:29.680
in your programs.

00:30.360 --> 00:34.880
Now let's start with the memory addressing here, because this is one of the most essential concept

00:35.160 --> 00:37.440
to understand assembly programming.

00:37.720 --> 00:48.640
So imagine your computer's memory as a very long rows of mailboxes here, and each with a number on

00:48.640 --> 00:48.960
it.

00:50.320 --> 00:53.960
Now these numbers are the memory addresses.

00:54.800 --> 01:03.860
Every single byte of data stored in random access memory is placed in one of these mailboxes and its

01:03.900 --> 01:06.260
address tells the CPU.

01:08.340 --> 01:09.780
Where to find it.

01:10.220 --> 01:16.700
So when a program is run, the operating system loads the program's executable file into memory.

01:16.740 --> 01:21.340
Now this is done by mapping the file into specific regions of memory.

01:22.020 --> 01:31.460
So for example, in this part of the memory this is for notes dot txt.

01:32.100 --> 01:40.980
And for example in the other part is for the virus that exa right.

01:41.380 --> 01:46.180
This is basically how memory works in a nutshell.

01:47.020 --> 01:59.000
So we have basically in the memory the code section data and b As S section.

01:59.400 --> 02:09.840
So the code section holds the machine instructions E or IP and so on and so forth.

02:11.680 --> 02:17.200
The data section contains global and static variables that are initialized.

02:18.960 --> 02:26.720
The BSS sections contains uninitialized, global and static variables.

02:26.720 --> 02:31.200
Basically the data but uninitialized.

02:32.920 --> 02:39.680
Now each of these segments starts uh, at a known memory address when the CPU executes instructions

02:39.680 --> 02:47.240
in the code section here, uh, these instructions may reference data stored in the data section by

02:47.280 --> 02:52.560
directing using the memory addressing.

02:53.000 --> 02:58.940
Sometimes the data itself contains more addresses, for example the memory address here, by the way.

02:58.940 --> 03:04.180
Here, for example, this is the part where we store the memory addresses.

03:05.980 --> 03:10.580
So this memory address tells that this data is here.

03:10.740 --> 03:15.460
Or this memory addresses redirects to memory itself.

03:15.500 --> 03:15.900
Right.

03:15.940 --> 03:20.340
So you will understand this in more deeply as we learn more about it.

03:20.740 --> 03:21.980
But uh, yeah.

03:22.860 --> 03:28.700
Uh, a pointer, uh, which is this is basically what pointers does uh, in C, if you know the C,

03:28.940 --> 03:32.060
you will be very familiar with this concept here.

03:32.580 --> 03:37.940
So basically, uh, data itself contains more addresses sometimes.

03:37.980 --> 03:40.380
Now for example uh pointers.

03:40.420 --> 03:53.620
Pointers is simply a value that points to another memory address like this allowing for indirect access

03:53.720 --> 03:55.400
to other data.

03:55.800 --> 04:03.040
Now, this structure is vital when reverse engineering because it helps us understand how a program

04:03.040 --> 04:08.840
is organized in memory and how different parts of it interact through addresses.

04:12.680 --> 04:19.760
Now let's talk about the Indians here, which is how multi-byte data is laid out in memory.

04:20.280 --> 04:35.480
So when you store a four byte value, uh, which we call it a D word in uppercase, it has to be broken

04:35.720 --> 04:42.840
down into four separate bytes and placed into a memory.

04:43.200 --> 04:44.560
But in what order?

04:44.640 --> 04:45.640
So we have to.

04:45.960 --> 04:47.080
One byte.

04:48.040 --> 04:49.200
One byte.

04:49.760 --> 04:51.600
One byte, one byte.

04:51.600 --> 04:52.660
So this, uh.

04:53.420 --> 04:54.020
Yeah.

04:54.060 --> 04:54.620
But in what?

04:54.660 --> 04:55.340
Order here.

04:55.900 --> 04:59.100
So there are two main, uh, schemes here.

04:59.300 --> 05:14.380
Uh, so Indiana's separate to two main categories into a lead endian or little endian and the big endian.

05:17.500 --> 05:17.980
Yes.

05:17.980 --> 05:21.940
Uh, basically let's start with understanding what the little endian does.

05:22.060 --> 05:33.780
Uh, this uh format is mostly used in X uh 86 processors, uh, and therefore uh, by most windows programs

05:33.780 --> 05:43.700
now in little endian format, the least significant bit least significant is L s b is stored at first.

05:44.660 --> 05:50.280
So let's say you want to store the Dword value like D.

05:50.400 --> 05:50.760
D.

05:51.000 --> 05:52.160
This is a hex, by the way.

05:52.640 --> 06:06.200
Since we are writing the zero x here, we are telling that it is a hexadecimal d cc b b a a in memory

06:06.560 --> 06:07.480
in little endian.

06:07.480 --> 06:08.880
It will be stored like this.

06:09.080 --> 06:26.080
The first is going to be a a in a address of 0000, and the 0X0001, it will be BB0X0003 it will be

06:26.640 --> 06:39.080
cc and in the 0X00 for it at the memory address of 0X004, it will be the, uh Didi.

06:40.320 --> 06:50.220
So if you read uh from the address 000 as a Dword, you get back this hexadecimal value, even though

06:50.220 --> 06:55.820
the bytes are stored in a reverse order, but in the big endian.

06:55.860 --> 07:08.620
Here, what you will see is if you want to store the again bytes up here, it will be basically the

07:08.620 --> 07:09.700
same as well.

07:11.460 --> 07:17.180
But the format is often used in network protocols here.

07:17.180 --> 07:25.300
So and streaming data um, because it's uh, more uh, intuitive when reading a sequence of bytes.

07:27.300 --> 07:37.500
But basically the most endian here uses the more, uh, big endian here uses most significant byte.

07:37.580 --> 07:39.100
So from uh.

07:41.260 --> 07:47.750
Most significant to the least significant byte in the little endian, we are using the LSB, which is

07:47.790 --> 07:50.470
the least significant byte.

07:52.430 --> 07:54.550
Now here's what you see is what you get.

07:54.590 --> 07:58.470
The format is often used in network protocols basically.

07:58.910 --> 08:09.190
But yeah, little endian using x86 processors, which is basically the x86 in the windows.

08:09.510 --> 08:14.110
Now you may ask here why little endian matters in x86.

08:14.590 --> 08:18.630
So the little endian format has some unique advantages.

08:18.670 --> 08:31.470
Now let's consider this memory layout, uh where the 0X001 or let's use the one byte zero one is a a.

08:33.710 --> 08:39.870
So now if you read from this uh zero x uh zero.

08:40.710 --> 08:41.190
Oops.

08:41.350 --> 08:42.150
Not like this.

08:42.790 --> 08:45.090
Uh, it should be one Uh, zero.

08:45.250 --> 08:46.490
And here it will be.

08:46.490 --> 08:48.450
One, two, three.

08:49.170 --> 08:50.050
Sorry for this.

08:50.450 --> 08:56.930
Now, if you read from this 0000, which is, uh, holds the value of AA.

08:58.650 --> 09:08.850
Now, as a byte, if you read this as a byte, what you will get if you read this as a byte you will

09:08.850 --> 09:15.090
get is 0XAA from uh, this here.

09:15.130 --> 09:15.490
Right.

09:16.290 --> 09:23.530
But if you read this as a word, you will get

09:24.610 --> 09:29.930
0X00AA.

09:31.090 --> 09:39.330
But if you read this as a D word, what you will get is

09:42.350 --> 09:48.790
0012350AA.

09:50.030 --> 09:51.710
Now you can notice something different here.

09:51.750 --> 10:01.350
Obviously the value is always interpreted starting from the same address, even when the size of reed

10:01.350 --> 10:02.030
changes.

10:02.870 --> 10:08.230
Now this consistent behavior simplifies memory access logic in many programs.

10:08.270 --> 10:16.510
So in contrast, if we if the same data were stored in a big endian format, reading it in a different

10:16.510 --> 10:19.550
sizes would give different results.

10:20.950 --> 10:22.550
Now this can.

10:22.590 --> 10:23.950
But, uh, here.

10:23.990 --> 10:24.430
Yeah.

10:24.470 --> 10:29.190
In here, uh, you will get, uh, zero same because we just only have two.

10:29.870 --> 10:39.750
But if you get this from here in a big endian format, you will get that 0XAA, uh, zero zero.

10:39.790 --> 10:44.850
So you remember the one is less significant bite and the other is more significant bite.

10:44.970 --> 10:46.770
The same one applies here.

10:47.570 --> 10:48.450
Um, yeah.

10:49.050 --> 10:50.130
In the D word.

10:51.210 --> 10:52.530
We get the same here.

10:54.050 --> 10:57.850
Just changing the A's and it wouldn't change anything here.

10:58.290 --> 11:05.690
But in the D word here again it will change to 0XAA00000.

11:06.330 --> 11:10.810
That is what, uh, the most difference is the least significant byte.

11:10.810 --> 11:17.770
Little endian, uses the least significant byte as a starting point, and the big endian uses most significant

11:17.810 --> 11:18.450
byte.

11:19.130 --> 11:19.530
That's it.

11:19.530 --> 11:25.610
How it, uh, how the indigenous and memory addressing works.

11:25.610 --> 11:32.810
Uh, in computing, uh, doesn't matter if it's an x86, ASM and ASM or any other programming languages

11:33.130 --> 11:36.570
or, um, assembly languages here.

11:37.370 --> 11:43.270
That's basically how the CPU central processing unit or computer processing unit.

11:43.310 --> 11:49.310
Some call that functions and yeah, that's it with our, uh, lecture.

11:49.430 --> 11:55.710
Uh, thank you for watching understanding memory addresses and how data is laid out, uh, in the Indians

11:56.110 --> 12:01.590
is crucial for reverse engineering and low level programming in the x86 world.

12:01.590 --> 12:03.670
Uh, little endian is the standard.

12:04.070 --> 12:11.830
Uh, so, uh, the most you will see, it doesn't matter if it's a Linux or Windows, you you will get

12:11.830 --> 12:21.990
the least significant byte for byte first in the most cases, especially in x86 world.

12:23.310 --> 12:29.990
Uh, but it is also used in some uh, micro processing units, uh, which will be explained later.

12:30.230 --> 12:38.770
But yeah, the the most, the, the most uh, difference is the LSP, which is least significant.

12:38.770 --> 12:44.210
Five starts from the left to right and the most significant right or right to left.

12:46.450 --> 12:53.090
And yeah, now this is, uh, and this also affects how we interpret data from memory and how instructions

12:53.090 --> 12:55.810
interact with it, which is very important here.

12:56.010 --> 13:01.210
So if you, uh, read from the left, it will have a different meaning.

13:01.210 --> 13:09.730
If we don't know this is a big endian or if we on the little endian, if we read it from the right,

13:09.730 --> 13:16.770
it will have a completely different, uh, value or meaning, uh, for the instructions and or the values.

13:16.810 --> 13:18.810
Doesn't matter right now.

13:18.850 --> 13:26.490
Next time you load a program into a debugger or a disassembler, remember, behind every value there's

13:26.490 --> 13:31.010
a memory address that we explained in our previous diagram.

13:31.650 --> 13:35.730
And how you read it depends on the endian format.
