Embedded Firmware Extraction
A tale of practicing firmware extraction mixed with some file format reverse engineering.
Last updated
A tale of practicing firmware extraction mixed with some file format reverse engineering.
Last updated
Disclaimer: I am not an embedded expert. I love dabbling with electronics and taking things apart. This post is about my experience and learnings of disassembling an IoT device and having a look at its firmware.
Resources on getting started with hardware hacking are abundant. Great introductions like this "Intro to Hardware Reversing" video from Tony Gambacorta or this blog post about "Dumping Firmware" from Black Hills Information Security will teach you everything you need to start yourself.
So, why this post? Connecting via the Universal Asynchronous Receiver and Transmitter (UART) protocol will not always drop you into a root shell and dumping firmware will not always work with the tools shown in tutorials. Dealing with these scenarios can be challenging and time-consuming. Thus, I decided to document some challenges (read: fails) and learnings from my attempt at analyzing an IoT camera.
I will not cover the basics of UART or do theory on embedded storage devices. I will, however, show you my steps, methodology, and findings.
When I was looking for devices to tinker with I did not put a huge effort into researching the model or brand first, which in hindsight I definitely recommend doing. Resources such as https://fccid.io or directly https://apps.fcc.gov/oetcf/eas/reports/GenericSearch.cfm (featured by Tony Gambacorta in his previously mentioned video) may help you select a device by giving you an idea of what's inside the product.
Basically, the Federal Communications Commission (FCC) of the United States assigns IDs for devices that meet certain regulatory standards for wireless communication. Applications include pictures of the internals and are publicly accessible. You can search for the device FCC ID using Google and then submit the FCC ID here.
Instead of doing this, I opted for a random chinese outdoor camera by Virtavo.
First things first, of course the most fun part of any hardware project is taking things apart. Luckily, this wasn't a problem at all and I managed to disassemble every component without breaking things. On the inside we got:
Now, the attentive reader will have noticed the small pads right next to the RISC-V processor:
The pads are kindly named RX and TX revealing a UART interface that is connected directly to the Hi3861 chip. That was easy but it gets better. When we turn the board around, we see:
Another UART interface, presumably for the video processor. Right next to it we can also find a flash chip that we'll take a closer look at later on.
Two labelled UART interfaces with intact traces sounds promising, so what do we do with them? If you are lucky enough to own one of these fancy Sensepeek sets (https://sensepeek.com/), then it's as as simple as placing the probes on the pads. One of my colleagues actually demonstrated this and sure enough got a boot log within seconds.
However, soldering some small copper wire to the pads also works just fine. Once that was done, I used one of many available USB-to-UART bridges to hook up the RX and TX lines of one interface with my computer. Here, I used an AZDelivery CP2102.
The "AZDelivery CP2102 USB to TTL (Transistor-Transistor Logic) converter" is based on the CP2102 chip (common alternatives include the CH340 or FT232) and serves as an intermediary between the USB port of your computer and the UART interface of the embedded device. If you are on a Windows machine, you may need some additional drivers for the correspondig chipset. For example, in this case you would need the CP2102 drivers.
As always with UART: RX (receiver) must be connected to TX (sender) and vice versa. The final setup looked like this (looks wonky but it works):
Using a plain Linux distribution, connecting to the UART interface can easily be achieved with:
/dev/ttyUSB0
is the equivalent of the COM port on Windows. If you are unsure which one's the correct one, there's multiple ways to find out. One simple way is to ls -la /dev/tty*
before and after plugging in the USB-to-UART adapter.
Being lucky one more time, I didn't even have to change the baudrate or other parameters - 1152800 8N1 (default) worked fine. Connecting to the UART interface of the T31 chip and booting the device:
There's some version information, the camera sensor, and a bunch of log output. That's it - no bootloader screen, no shell. Interestingly, the output contains this line:
Now, I didn't bother to buy a premium account and connect this camera to the cloud but I could imagine the credentials showing up here if I did.
Regardless, apart from some log information this interface doesn't give me much. So let's switch to the other interface:
No shell here either. The output looks slightly more interesting but it's mostly just versions and configuration output. Note that this camera was setup normally once before being disassembled. During this setup, the camera had to be connected to a WLAN.
It was a fun surprise to see the stored SSID and password of my wireless lab access point being shown in cleartext without any further interaction.
While showing cleartext credentials on a serial console (especially on an outdoor camera) may not be a great idea, it's not what I am after. In order to understand how the device operates, I want access to some sort of shell or the filesystem.
Just because we didn't get a shell though, does not mean that the UART interface was completely useless. For example, there was some output that could be analyzed more in-depth, like this snippet from the Hi3861 chip starting to communicate with sg.ipc365.com.
Or several hints that the sd card slot was being checked for a bootloader and a filesystem, which could potentially be abused to boot from a "malicious" image.
But all that is out of scope for this post. Instead, I focused on getting the contents from that flash memory we saw in the beginning.
The XM25QH64C (note that the C from XMC is not actually part of the model number) is a flash memory chip with 8 MB of storage that can be interfaced via the serial peripheral interface (SPI) bus. Because connecting to that chip manually (i.e. using a Pi and SPI programming) would mean more effort, I decided to use another gadget for reading flash chips easily - the CH341A Flash Programmer.
The CH341A Flash Programmer plugs into your computer as USB device and allows you to easily interface with different sorts of common memory chips. This way, we may be able to read or even write memory contents. Most offers include some adapters and a testclip for different sizes of chips, which I can definitely recommend. The CH341A works fine on Windows (AsProgrammer) and Linux (flashrom
).
Although the testclip that often comes together with the CH341A can hook directly onto the chip while it's on the board, surrounding circuitry might prevent you from getting an actual reading. Thus, I usually desolder the chip and place it into the clamp like this:
Before plugging in the USB adapter, there are two things to watch out for:
Make sure that the chip is placed the right way (pin 1 of any IC is usually marked with a small dot) and that every pin has a solid connection to the adapter (as shown in the picture above).
Use the markers on the side of the CH341A board to confirm that you placed the adapter into the socket the right way:
Getting step 2 wrong will short the board and you will notice a distinct smell and heat coming from the board. After I tried this accidentally for educational purposes, I quickly disconnected the board, placed the adapter the right way, turned it back on, and it worked flawlessly. Note that this could potentially damage (i.e. burn) your board or the flash chip permanently.
Failing to do step 1, however, may have you running around in circles for hours. Using the Windows utility AsProgrammer.exe
(there are several locations where you can download it from, choose your poison, here's a GitHub repository) I successfully read some flash contents:
Alternatively, you can use flashrom
on a Linux system:
After saving the output to a bin file, I started to analyze it right away. But I soon realized that unpacking the firmware was not as straight forward as I had hoped. Applying strings
on the binary file resulted in some promising boot loader messages and configuration settings but all that binwalk
was able to carve were some empty directories of a squashfs
filesystem.
At this point I tried several tools without success until I turned back to flashrom
and AsProgrammer.exe
to dump the flash contents again. To my surprise, the checksum was a different one this time and I continued to read the chip several times to make sure that this time I actually had read the chip correctly.
Take an extra minute to realign the chip in the adapter and read it multiple times if you're not sure that the connection is flawless. Verifying the checksum can save you a lot of time.
Finally, equipped with an exact image of the memory chip, I felt confident that extracting the filesystem should be easy now.
What stands out immediately are the unexpected descriptions and the many odd hexadecimal offsets. Of course odd offsets can occur, but when looking at a firmware image I usually expect at least some reasonable numbers that may align with common memory sizes.
And indeed, apart from results such as "Base64 standard index table" and "SHA256 hash constants" we can spot a few lines that look like they could be worth investigating:
The first line appears to be an intact uImage header, indicating a MIPS architecture based Linux-3.10.14-Archon.
uImage is a kernel image with a prepended U-Boot header that provides some information such as where the image will be loaded to. U-Boot is a boot loader for embedded boards that is closely related to Linux. MIPS is a Reduced Instruction Set Computer (RISC) Instruction Set Architecture (ISA). Sounds a mouthful but describes the architecture that is compatible with the hardware implemented on the chip (the Hi3861 is a RISC-V processor while the T31 supports MIPS).
SquashFS is a compressed read-only filesystem while JFFS2 is a compressed but writeable filesystem. Their offset starts at 7MiB (out of the 8 MiB flash memory) so this can not be everything yet.
After using binwalk -e XM25QH64C.bin
we can see that the SquashFS is almost empty:
The JFFS2 looks a bit more interesting but still does not include the root filesystem:
These look like log files and some configuration. The most interesting bit was probably:
But where's the root file system with all the interesting files, scripts, and binaries? At this point I was quite stuck for some time as I never had to deal with firmware extraction to this extent before.
Huge shoutout at this point to one of my colleagues and also @DigitalAndrew from the TCM (TheCyberMentor) discord for teaching me new stuff and enduring my endless questions. With their help and lots of trial and error I was finally able to make sense of the image.
Probably the best hint I received was to use entropy analysis to get a better understanding of the file structure. And that's easily done with binwalk -E
:
While binwalk
could be fine tuned with -H
and -L
to identify the most relevant edges, the defaults result in pretty interesting graph.
When looking at the entropy graph of binwalk
, it can be read as:
Entropy (close to) 0: no chaos, very structured data and/or reoccurring patterns
Low entropy: structured data (clear text will usually show some form of patterns)
High entropy: unstructured data, a constant high entropy may indicate compression
Entropy (close to) 1: fully random (pure chaos), probably encrypted data
Look out for steep falling and rising edges as they may indicate a change in the file structure. With the rough position of these edges, you can use a hexeditor to analyze the surrounding area of the binary file. Search for magic bytes, trailers, or other interestings bits and bytes.
Alternatively, we can use Detect-It-Easy (DIE) which also offers an entropy graph - amongst many other binary analysis things.
Note that the scale is different here, so 1
is not the highest value, but the same principles apply.
In order to fully make sense of the single regions, it also helped to take a look at the strings
output of the entire image. For examples, the strings of the boot loader segment can contain valuable information about memory regions, sizes, and meanings.
Grepping for boot
turned out to yield the most interesting results such as the bootargs
and bootcmd
lines. You can read more about these U-Boot variables here.
It took some time and experimenting to align the given information with the firmware image but, ultimately, this snippet from the CMDLconsole
was a perfect fit:
256 KiB: boot loader (256*1024 bytes)
352 KiB: "tag" (this region contains an "Encryption Key
" and also the "CMDLconsole
" line but I am not entirely sure where that partition belongs to - I assume it's part of the boot loader)
2 MiB: kernel image (including 64 byte uImage header)
4512 KiB: root filesystem (that's what we're after)
256 KiB: "system"
Remember the SquashFS we saw ranging from 0x700000 to 0x74000? That's exactly 256 KiB and the boot loader, kernel image, and rootfs fill exactly 0x700000 bytes. So I am pretty confident that "system" is the named partition for the SquashFS we extracted earlier.
256 KiB + 512 KiB: config & log -> matches the JFFS2 that we extracted previously (0x740000 to 0x800000) and does indeed contain some configurations and logs (with the cleartext WPA2 passphrase again)
Let's take a look at the entropy graph one last time:
0x000000
0x098000
0x098000
boot loader (incl. "tag")
0x098000
0x298000
0x200000
compressed kernel
0x298000
0x700000
0x468000
compressed rootfs
0x700000
0x740000
0x040000
squashfs ("system")
0x740000
0x800000
0x0c0000
JFFS2 ("config" & "log")
We know that binwalk
's ruleset could not match the start of the root filesystem with any known type. But the screenshot above indicates a cpio
archive (looking at the ASCII column), so there's at least some hope that this is not an encrypted blob but rather something that can be extracted.
First, I used dd
to manually extract the rootfs partition from the raw image:
Expectedly, file rootfs.bin
fails to identify a useful filetype. Next up, I ran strings
against it and the first result was "rootfs_camera.cpio"
- we saw that string right at the beginning of the rootfs in the hexeditor.
Whenever I encounter such strings (including a name, version, manufacturer, library, whatever), it sometimes pays off to do a quick google search. Maybe someone has already reverse engineered this file format or maybe we find some documentation.
In this case, I found several blog posts describing extraction of rootfs_camera.cpio
contents.
Alright, let's try that:
No wonder that didn't work - binwalk
has a rule for LZOP
compressed files here and couldn't recognize it from the beginning. So something is different from the guide.
At this point I knew I had to be close, so I opened the file in hexeditor
once again and started looking for anything noticeable.
The first few bytes don't match any known magic bytes. More interestingly though, they look like a 32 bit value stored in little endian order. Now, this is something I have only ever seen in CTFs (Capture-the-Flag events), but just for fun I converted this number: 0x39C598 or 3786136 in decimal.
When I saw that value, I had a very good hunch of where this was going. Remember the size of the rootfs? That was 0x468000 or 4620288 in decimal. Remember the entropy graph and how it had this large blob of nothing at the end of the rootfs?
The actual size of the rootfs is exactly 0x39C598 bytes, the rest is padded with zeroes. So for some reason, the first four bytes of the rootfs were overwritten with its size.
So what happens when we simply restore the original bytes of an lzop
file (89 4C 5A 4F
)?
Finally.
At last, the root filesystem is extracted and we could start with a static analysis of etc/init.d/rcS
, for example.
As this post has already become longer than anticipated, I will keep this section short.
There is one final important lesson to note. Obviously, as you may have guessed from the layout of the board, the chip description of the T31, or even the log outputs - this flash is used to store the firmware for the T31. It does not contain a single piece of application logic running on the Hi3861.
So, mission failed?
While it's not exactly the thing I originally intended to do, I believe it turned out to be a valuable experience. Extracting the T31 firmware was a great practice and I hope that this post succeeds in saving other people some time or trouble with similar situations.
Lastly, dumping the firmware of the Hi3861 would probably include desoldering the board with the processor on it, reading the manual, identifying the right pins, and then connecting to the chip manually (for example via UART, or whatever this chip offers for programming).
Thanks for reading!