Packers, How They Work, Featuring UPX
A security researcher demonstrates how to get packers, specifically UPX, up and running on our system. Read on to learn more!
Join the DZone community and get the full member experience.
Join For FreeThis article is featured in the new DZone Guide to Proactive Security. Get your free copy for more insightful articles, industry statistics, and more!
UPX, the Ultimate Packer for Executables, is a software packer. A really good one, in fact, and it’s currently under active development to boot. The source code for it is available, and it’s available on a variety of operating systems. It compresses executables for a variety of operating systems too.
As the name suggests, UPX packs executables. And by “pack,” I mean it compresses and compartmentalizes programs. It will take an executable, compress it, and pack the compressed code into another section of the executable. At runtime, it will uncompress the previously compressed code and execute it.
Along the way, it happens to obfuscate the intent and actual code of the program.
Don’t get me wrong here—UPX is a remarkable feat of engineering, and I don’t begrudge the use of the program. It has real applications in areas where storage space or bandwidth is at a premium. And honestly, it’s not hard to detect or to work around when used, and most anti-malware solutions don’t have a problem detecting it today. But the idea behind packers in general, and UPX specifically, is very useful to malware authors.
To show how these tools are used to hide malware payloads, I’m going to show you how packers work, using UPX, and how you can detect them. Now I’m using MacOS, but you can follow along with Linux if you’d like. Most of the points I make will carry over. UPX is available on Windows
Installing UPX. UPX is a breeze to install. You can compile from source if you’d like, but I didn’t, so I used Homebrew. You could use MacPorts too, or if you’re on Linux, it’s likely available via the package manager of your choice. To install via Homebrew, typing brew install upx from the command line, should work just fine.
Creating a test program. Now that you have UPX installed, we’re going to create two different versions of /bin/ls
. One packed, one not, and we’ll start to compare the two. I suggest you create a directory you can start to work in. I called mine upx-test
, but you go with whatever name works for you. Copy /bin/ls
into the directory.
Now, let’s create a packed version: upx ls -o ls-upx
should do the trick. Now, just so we are able to maintain both versions, copy ls
to ls-orig
. Just for fun, compare the output from running ./ls-orig
to ./ls-upx
. Notice they look exactly the same. Just for fun, compare the hashes of ./ls
, ./ls-orig
, and ./ls-upx
. You’ll notice that ./ls
and ./ls-orig
have the same hash, but that the hash of ./ls-upx
is different even after executing ./ls-upx
. This is interesting in that it implies that the UPX packed executable extracts the packed code, runs it, and terminates its process without serializing anything to disk. The program representation on disk never changes.
Looking Through the Binaries
See upx.github.io for more information, source code, and documentation.
Now, let’s start to look through the binaries. We’ll use Jonathan Levin’s fabulous JTool for this. Sorry, there’s not an equivalent on Linux, but later I’ll use Hopper for some reverse engineering. You can see similar things via the Hopper interface, and Hopper runs on both Linux and MacOS.
Anyway, if you’re on a Mac, install JTool (brew install jtool should work. I believe MacPorts has this too). If you’re on Linux, you’ll need to use gnu binutils for this.
Anyhow, let’s take a look at the program formats, shall we?
Figure 1: Segments and sections of the original ls command
In figure 1, we see the output from jtool-l
over the original ls executable. This has a large amount of information, including the program entry point and OS versions. The text segment is fairly large as well and contains constant values, strings, and other information. Now, let’s take a look at the version we just packed:
Figure 2: Segments and sections from a UPX-packed ls command
Well, it looks like we have a little less information here, and most of the program consists of the text segment too. So what are these sections?
Well, LC_SEGMENT_64 is a standard 64-bit segment to be loaded into an address space (this is MacOS specific). LC_VERSION_MIN_MACOSX lets us know the minimum OS version for this executable, and LC_UNIXTHREAD gives us information about how the program is to be started,
including the program entry point. __TEXT contains program code usually, while __LINKEDIT contains information used by the linker—things like strings, function names, and so on.
__PAGEZERO is a chunk of memory allocated and protected to throw exceptions on access. This helps protect the program against null pointer or integer dereference errors in code.
It seems that UPX pretty radically changes the program. Let’s look a little deeper.
Reverse Engineering
Now, we’re going to start looking into the program. We can do this from the command line using tools like JTool or using high-end disassemblers like IDA. In this case, we’ll use a reverse engineering tool called Hopper. Hopper is a fully-featured reverse engineering tool, and it offers a free trial to boot. If you’re following along, go ahead and download it. I’ll wait.
Ok, great—let’s get started.
Open ./ls-upx
and ./ls-orig
in Hopper, and take a look at the left pane. Select the Proc. tab. The first thing you see is that there are very few functions defined in ./ls-upx
when compared to ./ls-orig
.
You can download Hopper from hopperapp.com. It’s something like an order of magnitude less expensive than IDA Pro, so if it fits your needs, I highly recommend it. It doesn’t support all the processor types IDA does, nor does it run on Windows, but it works very well for MacOS reverse engineering.
Figure 3 : UPX makes executables not-very-functional!
Messaging
Not only are there very few functions defined, they are all local. None of these are imported from external libraries (go to Navigate > Imported Symbols to see what I mean). And if you click on the str tab, you’ll see that the UPX version has no strings defined in the program, while the non-UPX version has plenty.
Now, select the pseudo-code button at the top of the display area:
Figure 4: Pseudo-code mode activated!
If we do this for both the ./ls-orig
and ./ls-upx
, you’ll see something interesting. First, the original ls command has a large initial entry point, while the UPX’d version doesn’t. The original makes system calls fairly immediately (i.e. getenv(.), ioctl(.), and so on). The UPX’d version is self-contained (we’d expect this, as it doesn’t import anything).
Figure 5: Decompiled entry point from UPX’d ls
Interesting things here! Look at the do/while loop in the center of the function; here, we’re traversing the program __TEXT segment from the top down, until we get to a non-null value. Then, we have two additional function calls, and we exit (notice that once we get to the call to sub_f0000f7e(.) we have no more jumps—although, honestly, the called functions could cheat and jump, but we’ll take a look at that shortly).
Of these functions, sub_f00008fd(.) is the most interesting. If we look into that function (which does not decompile well, to be honest), we can see that this is the primary function dispatcher for the UPX code. If you look at the assembly block structure of the function (press the button just to the left of the pseudo-code button in Hopper), you can see that it’s much more complex than the EntryPoint(.) function, and it delegates to many more functions as well.
You’ll also notice that it returns a function pointer which the program then calls (e.g. rax = (r15)();). Guess what this is? This is the original program entry point after the code has been decompressed and is ready for execution.
UPX does lots of interesting things that we’re not covering, like loading all of the dependent libraries, for example—important on MacOS, as static linking isn’t supported these days. But this gives you an idea of how these packers work. UPX doesn’t encrypt either, which is certainly something you could do, though it shifts a bit into malware specific functionality. But at the end of the day, you can pretty clearly tell if a file’s been packed. It has no (or very few) strings. It has no (or very few) library dependencies. And the defined functions in the executable is very small.
With UPX, there are other clues too—if you look through the binary, you’ll find sequences of ASCII codes that spell “UPX!” used as tokens. You can see one of them just prior to where the booting function stopped scrolling through the __TEXT segment, for example.
Packers aren’t magic, but they are well-designed pieces of software engineering. Now you know how they work, and who knows, maybe you’ll use UPX someday too.
This article is featured in the new DZone Guide to Proactive Security. Get your free copy for more insightful articles, industry statistics, and more!
Opinions expressed by DZone contributors are their own.
Comments