Reverse Engineering SmokeLoader: An In-Depth Analysis (Stage 1)
At our recent event, Operational CTI: Lessons from the Attacks that Didn't Target You, we shared that we would releasing reverse engineering blogs on SmokeLoader. Here is part 1!
SmokeLoader is a malware strain that’s been seen in the wild since at least 2014. It’s a backdoor that can be expanded by adding modules. This versatility makes the malware a bit harder to signature, but it's common practice among malware like this.
One notable aspect of SmokeLoader is its attempt to hide its C2 activity by generating requests from legitimate sites as well as domains that the attackers control. Understanding the modules used as add-ons to each build can provide insight around the malware strain and the attack’s objectives.
This first blog entry covers the more contextual information around SmokeLoader, including the campaign associated with it. We’ll also cover how we set up our analysis environment, and the considerations we had when analysing the preliminary stages.
The campaign
The 7Zip campaign saw the exploitation of the CVE-2025-0411 vulnerability. Exploiting this vuln increased an attacker’s chances of successfully infiltrating a company’s workstations by bypassing the Mark of The Web (MoTW) security controls. (We created a lab on the 7Zip campaign. Check it out here.)
Once bypassed, the SmokeLoader executable stopped Windows SmartScreen from opening a warning window to the user.
This blog picks up where our webinar left off. We’ll cover the SmokeLoader executable and analyze it to show you how it executes itself – there are many layers!
The malware we’ll analyze in this blog is the last stage in the Trend Micro report. You can also have a read of VirusTotal’s report here.
Let’s get to it!
This malware gets downloaded as part of the attack chain with the extension .pdf.exe. In some workstations, file extensions aren’t visible, so in this case, the malicious file looks like it ends with “.pdf”. This is a social engineering tactic that tricks the workstation user into double-clicking the malware, thinking it will open a PDF instead of actual malware.
According to VirusTotal, SmokeLoader is a 32-bit executable written using Microsoft Visual Studio and is 249.50 KB in size. This is really useful information you can use to start understanding what sort of instructions and patterns can be added to the binary from the Visual Studio compiler.
Analysis setup
Don’t forget! As with any malware, it’s important to place it in a sandboxed environment when you analyze it. You don’t know what it’s going to do yet!
When doing deep analysis, you’ll need to use a debugger and a disassembler or decompiler. Here, we’re using x32dbg and Ghidra – both incredibly powerful tools.
We took the following steps when analyzing the malware:
- We looked at the decompiler to see what code was coming up and made a hypothesis of what the code was doing.
- We opened the malware in a debugger and executed the code we analyzed in Step 1, going no further than what we already knew.
- We replaced any variable names or information in the decompiler to provide further context to the code.
- We repeated Steps 1–3 until the malware acted on its objectives.
Reverse engineering
This malware deliberately has many stages to slow down reverse engineers’ analysis. It goes through multiple stages of shellcode extraction and execution, multiple dynamic API resolutions, memory page protection changes, and process injection.
The first thing we did was find main(), where the malware author's code is written. Ghidra is really helpful for this!
We found the entry point, followed __tmainCRTStartup, double-clicked it, and found the code pattern ___tmainCRTStartup ();.
This will help when we start looking for main() in the debugger.
We now knew that the main function was the one with 0x7c80 as the last four hex digits of the address.
We used this information to set a nice breakpoint on main() in the debugger. This can be confusing when starting out on your reverse engineering journey, so check out our lab on finding main() here.Once we had a breakpoint set at main() in the debugger, we started investigating what happened from there in Ghidra.
Two function calls happened: one is a longer function, and the other is a few instructions followed by jmp EAX.
Given that EAX has a memory address populated by the data address DAT_00512C44, which is initialized elsewhere in the program, we guessed that there’s some deobfuscation of instructions somewhere, and the jump instruction takes the code to that location.
We could have set a breakpoint at jmp EAX in the debugger and executed the code until then. However, some malicious activity might have occurred before the jmp call in the previous function, so it was worth investigating.
The first function, shown below, had many do-while loops, meaning that the code will execute the loop at least once. However, we inspected the code inside the loops and found there wasn’t much going on.
There was a check at a certain data address, DAT_00512DFC, and actions were taken based on that – though exactly what actions were taken is unclear, given that it doesn’t execute. When checking, we reviewed the memory location of DAT_00512DFC in the debugger and found that it didn’t hold anything meaningful to make those checks occur.
This shows that this code is likely kept in the malware, regardless of whether it’s used or not, and will only execute if a certain type of build is created. Another thing to note is that the Windows APIs that get called didn’t really do much anyway, and currently didn’t indicate much malicious activity.
Some of the strings in this function might have seemed useful, but if the code doesn’t execute instructions associated with them or doesn’t directly manipulate them, they’re useless to our investigation.
So we skipped over the loops in the debugger by finding the instructions after the loop, setting a breakpoint, and running the program until it hits that breakpoint. The third lab in the Foundational Binary Dynamic Analysis collection covers this technique in more detail.
Scrolling down, you can see a function that’s outside the if statements and do-while loops:
This is FUN_00417440(void). Inside this function is a call to LocalAlloc where the size is stored in DAT_00512dfc and the memory address location is stored in DAT_00512c44.
Keen-eyed investigators will notice DAT_00512C44 is the memory location of the jmp EAX in the second function called from main().
This proves that our hypothesis from the beginning was correct: This function sets up a memory location with instructions that get executed later.
It’s important to keep track of the value stored in DAT_00512DFC from the debugger because it’ll determine the size of the memory dump we’ll be making later for the shellcode!
Finally, when looking at the bottom of the large function, we found a number of function calls:
LoadLibraryA was followed by two function calls. This is interesting because the majority of the time, LoadLibraryA in malware is followed by GetProcAddress (which happens to be what’s inside FUN_00417470).
Weirdly, though, the return from LoadLibraryA("msimg32.dll") wasn’t used. GetProcAddress in the next function was simply called statically instead of dynamically, finding it with LoadLibraryA.
In this function, VirtualProtect was called to change the heap memory to executable!
This technique, and how to analyze it, is covered in detail in the fifth lab in the Foundational Dynamic Analysis collection.
The function after that holds a lot of XOR and array manipulation, so we assumed this is deobfuscation. The location being written confirmed this, starting at the location of our heap, which was just changed to executable.
This technique is also covered in the third lab of the Foundational Dynamic Analysis collection.
The final function in the block simply added 0x2310 onto the heap starting address:
` if (iVar4== 0xd606b)
FUN_00417460();`
So, we learned that everything in the function has been pointless! Attackers do this deliberately to slow down reverse engineers. And it works, but it’s still crucial to take these steps so we can confirm it’s pointless, not just assume that it is.
Finally, we set a breakpoint in the debugger at the jmp EAX instruction, ran the program, then dumped the memory.
Conclusion
This first blog post covered the SmokeLoader malware’s campaign details and some initial analysis considerations, including how to identify and dump the first layer of shellcode.
This is just the beginning! Stay tuned for more SmokeLoader analysis, including a look at what anti-analysis and obfuscation techniques are employed.