WEBVTT

00:00.540 --> 00:02.220
Hello, my name is Dave Boone.

00:02.250 --> 00:09.090
Let's delve into the intriguing realm of symbols and stripped binaries in high level source code such

00:09.090 --> 00:10.200
as C code.

00:10.230 --> 00:17.550
We encounter functions and variables with meaningful, human readable names during the compilation process.

00:17.550 --> 00:26.580
Compilers generate symbols which serve as references to keep track of these symbols symbolic names.

00:26.580 --> 00:33.960
So symbols record the correspondence between binary code data and each symbols meaning.

00:33.990 --> 00:41.580
For instance, function symbols provide vital information by mapping high level function names to their

00:41.580 --> 00:43.880
respective addresses and sizes.

00:43.890 --> 00:51.960
This information proves invaluable to the linker when combining object files, resolving functions and

00:51.960 --> 00:57.210
variables, references between modules and aiding in the debugging process.

00:57.660 --> 00:59.250
And here.

01:00.440 --> 01:09.230
Uh, to give you an idea of what symbolic information looks like here, we're going to do red elf s

01:09.260 --> 01:10.050
y m.

01:10.100 --> 01:12.020
S a dot out here.

01:12.260 --> 01:17.310
And this is this is how symbolic information looks like.

01:17.330 --> 01:29.630
So here we used red elf, red elf tool to display the symbols, and you will return to using this red

01:29.750 --> 01:33.800
utility in next lectures and interpreting all its output.

01:33.830 --> 01:42.080
For now, just to keep in mind, among many unfamiliar symbols, there is a symbol for the main function

01:42.080 --> 01:42.980
here.

01:43.340 --> 01:45.680
It should be somewhere.

01:46.310 --> 01:48.710
Main symbol for main function.

01:56.120 --> 02:02.360
My FC abs and it should be 32 object.

02:02.360 --> 02:07.880
And here we're going to write this type is going to has to be function.

02:09.770 --> 02:11.660
It should be global.

02:13.230 --> 02:14.580
Function global.

02:15.370 --> 02:16.540
Function global.

02:16.900 --> 02:17.650
Here.

02:18.510 --> 02:33.570
And here yes, the sizes 37 type is function global and default and the index is 15.

02:33.570 --> 02:36.780
Here the name is Main.

02:37.200 --> 02:42.990
So here you can see that it specifies this address here.

02:45.530 --> 02:50.960
All right at which Main will reside when the binary is loaded into memory.

02:50.960 --> 02:54.700
So the output also shows the size of Main.

02:54.710 --> 03:02.450
In this case it's 37 bytes and indicates that you are dealing with a function symbol.

03:02.450 --> 03:10.070
And here we have this type function here and the symbolic information can be emitted as part of the

03:10.070 --> 03:16.100
binary itself, as you witnessed earlier, or it can be generated separately in the form of a symbol

03:16.100 --> 03:16.640
file.

03:16.640 --> 03:23.360
So a symbolic information comes in various flavors, ranging from basic symbols required by the linker

03:23.360 --> 03:26.000
to more extensive debugging symbols.

03:26.000 --> 03:32.390
So debugging symbols provide a comprehensive mapping between the source code lines and corresponding

03:32.390 --> 03:39.170
minor level instructions so they go beyond simple address mappings and even describe function parameters,

03:39.170 --> 03:41.240
stack frame information and more.

03:41.990 --> 03:52.490
For elf binaries, elf binaries debugging symbols are typically generated in a dwarf format, while

03:52.520 --> 03:59.870
p binaries like Windows binaries usually use the proprietary Microsoft Portable debugging PDB format,

03:59.870 --> 04:08.150
so the information is usually embedded within the binary, while the PDB comes in the form of a separate

04:08.150 --> 04:11.000
symbol file here.

04:12.240 --> 04:16.170
And symbolic information plays a crucial role in binary analysis.

04:16.170 --> 04:21.930
So, for example, having a well defined set of function symbols greatly simplifies the disassembly

04:21.930 --> 04:22.470
process.

04:22.470 --> 04:28.590
So each function symbol serves as a starting point, ensuring accurate disassembly and minimizing the

04:28.590 --> 04:36.060
risk of mistakenly disassembling data as code, which which would result in incorrect instructions in

04:36.060 --> 04:37.980
the disassembly output.

04:38.010 --> 04:43.500
Understanding which parts of a binary belong to a specific functions and knowing their name facilities

04:43.500 --> 04:46.830
comprehension for human reverse engineers.

04:46.830 --> 04:53.450
It allows them to compartmentalize and comprehend the code's purpose.

04:53.460 --> 04:59.850
Even basic linker symbols, though less extensive than debugging information, provide significant assistance

04:59.850 --> 05:08.040
in numerous binary analysis applications, and to parse symbols, you can utilize tools like Read Elf.

05:08.040 --> 05:13.530
Like we use this here in this lecture as I mentioned.

05:14.250 --> 05:18.840
Or you can also employ libraries such as Lib.

05:19.510 --> 05:21.970
Lib feet here, I think.

05:21.970 --> 05:22.450
Yeah.

05:22.960 --> 05:32.410
Lib feet here, which is this is a separate library that we will use um, in two sections later.

05:32.410 --> 05:41.260
And additionally there are symbolized libraries like Lib Dwarf, like lib dwarf, and specifically designed

05:41.260 --> 05:44.260
for parsing dwarf debug symbols.

05:44.260 --> 05:52.420
However, in this course we will not extensively cover these libraries, but basically we will make

05:52.420 --> 05:54.400
this tutorial for this library.

05:54.400 --> 06:01.600
So regrettably, extensive debugging information is typically excluded from production ready binaries.

06:01.600 --> 06:07.810
In fact, even basic symbolic information is often stripped away to reduce file sizes and hinder reverse

06:07.810 --> 06:11.950
engineering, particularly in the case of a malware or proprietary software.

06:11.980 --> 06:18.460
This means that as a binary analyst, you frequently encounter the more challenging scenario of stripped

06:18.460 --> 06:22.940
binaries devoid of any single symbolic information.

06:22.940 --> 06:29.960
So throughout this course, I take into considerations the absence of symbolic information and focus

06:29.960 --> 06:34.460
on analyzing stripped binaries except when explicitly noted otherwise.

06:34.460 --> 06:40.730
So this approach prepares you for real world scenarios where you may have to tackle the complexities

06:40.730 --> 06:44.060
of reverse engineering without the aid of symbols.

06:44.060 --> 06:46.040
And I'm meeting you in the next lecture.
