Reverse Engineering Tutorial 3 | EXE File Structure

Reverse Engineering Tutorial 3 | EXE File Structure

Hey friends, today i will teach you the internal structure of a executable or simply called exe file. This article is very important because it will clear your concepts of different sections of executable file and in which section you will find useful stuff during reverse engineering of any application or simply debugging any application. From high level view exe file only looks like a single file but actually it consists of several parts and a hacker must understand what are these parts and what is the use of all these different sections in exe file.

reverse code engineering, rce, exe reverse engineering
Exe Internal Sections

Whenever we debug any exe file, you might have noticed several strange looking things appears and most of times it happens you don't able to understand what are these, so what you do, you close the debugger. But this will not happen after reading this article. Once you start exploring things that you understand, you will explore more and hence chances of success will be more. So friends lets start learning Exe file structure.

The sections that are most commonly present in an executable (depends on thecompiler used or debugger used to analyze the executable) are:  
  •  Executable Code Section, named .text (Microsoft) or .txt (olydbg) or CODE(Borland)  
  •  Data Sections, named .data.rdata, or .bss (Microsoft) or DATA (Borland)  
  •  Resources Section, named .rsrc  
  •  Export Data Section, named .edata  
  •  Import Data Section, named .idata  
  •  Debug Information Section, named .debug  
Note: Structure of a PE(portable executable) file on disk is exactly the same as when it is loaded into memory so if you can locate info in the file on disk you will be able to find it when the file is loaded into memory.  
However it is not copied exactly into memory. The windows loader decides
which parts need mapping-in and which parts has to be omitted. Data that is not mapped-in is placed at the end of the file past any parts that will be mapped-in e.g. Debug information.  
Lets understand the detailed meaning of all sections:
1. Executable Code Section
In Windows, all code segments  reside in a single section called  .text or .txt or CODE. Since Windows uses a page-based  virtual memory management system, having one large code section is easier to manage for both the operating system and the application developer. This section also contains the entry point(EP) and the jump thunk table (where present) which points to the IAT.
Note: 
a. EP is the entry point from where the code section starts in obfuscated exe file.
b. Jump thunk table : contains all the jump addresses and references.
c. IAT: It stands for import address table, this is a table of function pointers filled in by the windows loader as the dlls are loaded. I will post a complete tutorial for Import address table because its a very important concept. For now just take it as table containing function pointers.

2. Data Section
The  .bss section represents uninitialized data for the application, including all variables declared as static within a function or source module.
The  .rdata section represents read-only data,  such as literal strings, constants, and debug directory information.
All other variables (except automatic variables, which appear on the stack) are stored in the .data section. These are application or module global variables.

3. Resource Section
The  .rsrc section contains resource information for a module. There are many
resource editors available today which allows editing, adding, deleting, replacing and copying resources. 

4. Export Data Section
The  .edata section contains the Export Directory for an application or DLL.
When present, this section contains information about the names and addresses of exported functions.

5. Import Data Section
The  .idata section contains various information about imported functions
including the Import Directory and Import Address Table. The import section contains information about all the functions imported  by the executable from DLLs. This information is stored in several data structures. The most important of these are the Import Directory and the Import Address
Table which we will discuss next. The Windows loader is responsible for loading all of the DLLs that the application uses and mapping them into the process address space. It has to find the addresses of all the imported functions in their various DLLs and make them available for the executable being loaded.

6. Debug Information Section
Debug information is initially placed in the  .debug section. The PE file format
also supports separate debug files (normally identified with a .DBG extension) as a means of collecting debug information in a central location. The debug section contains the debug information, but the  debug directories live in the .rdata section mentioned earlier. Each of those directories references debug information in the .debug section.   

7. Base Relocation Section 
Last but not the least and most important section too for hackers perspective. When the linker creates an EXE file, it makes an assumption about where the
file will be mapped into memory. Based on  this, the linker puts the real addresses of code and data items into the executable file. If for whatever reason the executable ends up being loaded somewhere else in the virtual address space, the addresses the linker plugged into the image are wrong. The information stored in the  .reloc section allows the PE loader to fix these addresses in the loaded image so that they're correct again. On the other hand, if the loader was able to load the file at the base address assumed by the linker, the .reloc section data isn't needed and is ignored. 

We will continue our reverse code engineering tutorials in Future classes too. So keep connected and keep reading our articles.
If you have any doubts ask me in form of comments.

Post a Comment

Please Select Embedded Mode To Show The Comment System.*

Previous Post Next Post