Static Analysis of the Emotet Malware

Adam Munger
The Startup
Published in
11 min readNov 11, 2020

--

Figure 1 Emotet.doc Opened in LibreOffice

Figure 2 below shows the ANY.RUN process graph for the initial stages of the Emotet malware sample that we’re going to analyze. There is so much interesting code to get from first click to system compromise and I wanted to explore the process in a little more detail. This article is provided for informational purposes only, so proceed at your own risk if you decide to follow along.

Figure 2 Process Graph of Emotet from ANY.RUN

For this article, we’ll use the Emotet malware sample provided by ANY.RUN at https://app.any.run/tasks/3363fde4-111b-4aaa-b73d-e4144433c284. You’ll need a free account if you want to download the file for yourself, but you can also search the malware repository of your choice for the hash provided below. ANY.RUN is a great free site and you can see a full report of how the malware interacts with the sandbox, to include screen captures. The goal of this article is to provide a closer look at the initial infection process through static analysis.

This Emotet sample is downloaded as a password protected zip file and we’ll move it into REMnux before we do anything. The md5 hash for this sample is 92021ca10aed3046fc3be5ac1c2a094. You can confirm the file hash in REMnux without unzipping the file using the command “unzip -p emotet.doc.zip | md5sum” as shown below. The password for most zipped malware is “infected”. The password’s not a secret. It’s just there to keep us from doing anything we’ll regret later.

Figure 3 Hash Verification

Microsoft Office documents will either be in the legacy Object Linking and Embedding (OLE2) format or Open Office XML (OOXML) format beginning with Office 2007. Using the Linux “file” command you can see this is a Microsoft Word 2007+ file, which is OOXML.

Figure 4 Confirming the File Type

OOXML files are actually ZIP archives containing the various file components. We can take a look at the file contents by unzipping the archive as shown below.

Figure 5 Unzipping the Word Document

If we look in the “word” folder, we see some interesting files and folders to include “vbaProject.bin” and an “ActiveX” folder. The presence of these indicate embedded macros and ActiveX.

Figure 6 Exploring the structure of Emotet.doc

We can use olevba or olevba3 to parse emotet.doc to find interesting information about its contents. Figure 7 below is the beginning of the olevba3 output. Here we see again that this is an OpenXML file and we also see a number of empty macros in vbaProject.bin.

Figure 7 Running olevba3

Further down in the olevba3 output we see the autoopen() function in the macro zacGkX9. This macro will be run when the document is opened.

Figure 8 autoopen() in olevba3 Output

Figure 9 below is the end of the output from the previous run of olevba3. Everything in this output is potentially concerning, but of note, we see that some code will run when the document is opened with autoopen() and it will attempt to hide itself from the user by using Windows Management Instrumentation (WMI) with ShowWindow. We also see that the code will create an object and potentially uses multiple methods of obfuscation. Finally we see that olevba3 has detected VBA stomping, which we’ll discuss later.

Figure 9 The olevba3 Summary Findings

We can use oledump.py to extract the macros from vbaProject.bin. We’ll start with zacGkX9 because it contained the autoopen() function. Running olevba.py with no arguments will display all of the VBA streams. Macro streams are marked with an uppercase “M”. Streams marked with a lowercase “m” only contain attribute or option statements. We can see below that zacGkX9 is stream 18.

Figure 10 oledump.py of vbaProject.bin

Having already taken a look both streams 17 and 18, I know they reference each other, so we’ll output both of them to the same file to make our analysis a little easier. We’ll use oledump.py to extract stream 18 with the “-s <stream>” option and decompress it as readable text with the “-v” option. We can then append stream 17 with the same command using the append operator “>>”.

Figure 11 Dumping Macro Streams with oledump.py

Once extracted, we can use Visual Studio Code to take a look at streams 17 and 18. Below is the first few lines of the VBA code that will be run when the document is opened. You can see this is the stream zacGkX9 and the autoopen() subroutine is shown.

Figure 12 Examining Streams 17 and 18 in Visual Studio Code

Emotet uses several code obfuscation techniques. The underscore character “_” preceded with a space is used like word wrap in VBA. The malware authors use this technique in several places to obfuscate the structure of the code. This may seem insignificant, but malware developers will take advantage of nuance syntax to make the code as long and as complex as possible. The statement in figure 13 is used 10 times in the code. Figure 14 shows another example of this technique in a key part of the code.

Figure 13 New Line Code Obfuscation
Figure 14 More New Line Code Obfuscation
Figure 15 Correction of Code Obfuscation in Figure 14

The code also makes use of the replace method to obfuscate commands with extra spaces as shown in figure 16 below. This is pretty easy for an analyst to spot and clean up, but some automated signature-based tools may miss this.

Figure 16 Code Obfuscation with Replace() to Remove Extra Spaces

Removing the extra spaces, we see that the macro will create an object of type “winmgmts:Win32_Process” as seen below in figure 17.

Figure 17 Deobfuscation of the Code in Figure 14

The code uses multiple randomized and complex variable assignments that are never used, so all of that is just junk that can be ignored. All of the other unused variables can be ignored as well. There are also multiple randomized nested loops that do nothing as shown in figure 18 below. All of them can be ignored.

Figure 18 Obfuscation with Useless Blocks of Code

Once we get rid of the distractions, we are left with much simpler code that is still a little bit obfuscated. We went from 131 lines of code to just 13 as seen below in figure 19. The remaining code makes several references to ActiveX form objects with the “.Caption” phrase. We can look at the ActiveX form elements to complete the code as shown below in figure 19.

Figure 19 Minimized Emotet VBA

We can use Structured Storage Viewer (SSView) to see what’s in those ActiveX objects as seen below in figures 20 and 21.

Figure 20 Starting SSView

If we take a look at activeX7.bin, we see some interesting obfuscated code. We’ve already imported that code in figure 19 above.

Figure 21 PowerShell in ActiveX Form Element

If you have access to Microsoft Office or LibreOffice in your lab environment you can view the form elements directly.

Figure 22 ActiveX10.bin / hjL90Njk Caption Text
Figure 23 ActiveX4.bin / McQHX3 Caption Text
Figure 24 ActiveX2.bin / PWo3kW Caption Text
Figure 25 ActiveX7.bin / psYO9m Caption Text

If you want a more automated way of doing this, you can use ViperMonkey as shown below in figure 26. ViperMonkey uses some of the same methods in the background. It won’t necessarily show the level of detail that we’ve learned about how the malware is structured, but it’s an an excellent tool and gets you to the good stuff very quickly.

Figure 26 Running ViperMonkey on emotet.doc

ViperMonkey finds all those unused variables and junk code that we had to sift through earlier as shown in figure 27 below.

Figure 27 ViperMonkey Detecting Junk Code Used for Obfuscation

ViperMonkey finds the PowerShell script with ease as shown in figure 28 below.

Figure 28 ViperMonkey Finding the PowerShell Code

We can run ViperMonkey with the “-s” option to strip out all of the junk before saving to a file as shown in figure 29 below.

Figure 29 Using ViperMonkey to Strip Useless Code and Output to a File

ViperMonkey searches for interesting code and provides possible IOCs at the conclusion of its analysis as seen in figure 30 below.

Figure 30 ViperMonkey IOCs

Figure 31 below is the JSON output of ViperMonkey. It’s very clear what the Emotet macro is doing with just a quick look. The whole purpose of the macro is to launch PowerShell and execute the code within.

Figure 31 JSON Output of ViperMonkey

Now that we know how the macro works, we can focus on the extracted PowerShell in figure 32 below.

Figure 32 Extracted PowerShell in Visual Studio Code

If we load the extracted PowerShell into CyberChef, we can use the “From Base64” recipe to decode it as show in figure 33 below.

Figure 33 Decoded Base64 PowerShell in CyberChef

In figure 33 above, we see that every other character is a period. In this case, it’s not a literal period, but the way that CyberChef displays a null-byte. We can correct this by decoding with UTF-16, instead of the default UTF-8 character encoding as shown in figure 34 below.

Figure 34 Applying UTF 16 Decoder in CyberChef

Now we can take the decoded PowerShell from CyberChef and put it into Visual Studio Code to deobfuscate it. CyberChef will output one long string, so we’ll format it for easier reading as shown in figure 35 below.

Figure 35 Decoded and Beautified PowerShell in Visual Studio Code

We can go through the code line by line to make it more readable and get rid of the extraneous code used for obfuscation. In figure 36 below we can see the original and edited lines with comments.

Figure 36 Edited Emotet PowerShell Script

Figure 37 below is the simplified PowerShell code that will be run when the user opens the Word document.

Figure 37 The Final Emotet PowerShell

The resulting PowerShell is pretty simple. It first saves the download location on the victim computer to $filepath (originally $FwcAJs6). Then it creates a WebClient object with the handle of $webclient (originally $u8UAr3). It then saves 5 URLs to a list separated by @ symbols. Then it loops through that list to download the 284.exe file. It will stop looping through the list once it has successfully downloaded the file. If the downloaded file is over 23,931 bytes, it will launch the executable. 284.exe will begin the exploit process.

But what about that VBA stomping?

If we go back to figure 9, we remember that olevba3 identified VBA stomping. VBA stomping means that the macro source code we were looking at with olevba3 and the actual p-code do not match. To look at the p-code we’ll use pcodedmp as show in figure 38 below. We’ll then convert the p-code back to VBA code.

Figure 38 Dumping Emotet P-Code and Disassembling Into VBA
Figure 39 P-Code Output from pcodedmp

We used pcode2code to convert the p-code seen in figure 39 to VBA as shown in figure 40.

Figure 40 P-Code Version of Emotet Macro

We went through the same steps to deobfuscate the p-code version of the macro as show in figure 41 below.

Figure 41 Edited P-Code VBA Macro

What we see in figure 42 below is the remaining code from the p-code version of the macro. It is definitely different, but has a very similar structure. I was not able to get this version to run. I’m not sure if that is because this is yet another distraction, if there was a conflict in the versions VBA or I just missed something. It really doesn’t matter though because the PowerShell came from the Active-X form objects and not the VBA , so the result would be the same regardless.

Figure 42 Minimized P-Code VBA

I know this was a long journey, but I hope you stuck with me and I look forward to writing more soon. In my next article, I will take a look at the 284.exe file that the PowerShell script attempts to download. That file is also available from ANY.RUN as a separate download. Until then, happy hacking!

If you enjoyed this article and want to see more, please let me know in the comments. You can also find me on LinkedIn at https://www.linkedin.com/in/adammunger/.

References:

--

--