Maelstrom #4: Writing a C2 Implant
In this blog, we will discuss how to write a C2 implant for the modern era. We will look at the history of offensive techniques and the progress of defence.
Introduction
In the series so far, we have discussed the purpose and intentions behind a C2, and the design considerations for both the implant and server.
In this post, we will move beyond this theoretical discussion and begin building a basic implant. We'll start by looking at the evolution of offensive and defensive techniques since 2010, to give us context and understanding of the current landscape. We'll then, as with our previous posts, discuss some important concepts that we'll be incorporating into the implant. Finally, we'll walk through the implant design, writing the base of both stage0 and stage1 of the implant for our exemplar C2, Maelstrom.
When discussing C2 implants, people often say that their implant is fully undetectable (ironically, "FUD"). A newly written implant, which hasn't been seen before, will be undetectable as it has not been seen before. Therefore, on disk, and potentially even when run, it won't be flagged. However, this doesn't account for runtime detections, telemetry generated by Windows, or the various methods of reputation ranking used by a modern day endpoint detection.
In 2022, not all companies have yet implemented all the protections that are available to them, including a full SIEM with comprehensive event logging, or even an EDR agent on every device. This can give the impression that steps like we will discuss in this post are not required, but that is simply a result of not having yet met an environment with anything more than Defender. The days of running commands via a command interpreter are long gone:
Objectives
This post will cover:
The background and development of offensive and defensive techniques around implants.
The functions and code required for a contempory Stage 0, including:
Environmental Keying
Detecting Suspicious Processes
Anti-Sandbox Protections
Anti-Debug Protections
The functions and code required for a contemporary Stage 1, including:
Reflective Loading
DLL Debugging
Server Checkins
Sleeping
From this point, we will have an implant which can manage basic checkins, and which can be augmented with more sophisticated functionality evasive techniques, and other opsec features. We will explore these in later blogs, but for further information on evasive techniques, Check Point Research: Evasion techniques can be used as a reference.
As we've mentioned in a similar paragraph in every blog post so far, and will continue mentioning in every post so far but after, the code will serve to illustrate the functionality, but is far from being immediately usable within a functional C2.
Stage 0:
Environmental Keying
Detecting Suspicious Processes
Anti-Sandbox
Anti-Debug
Stage 1:
Checking-in to the server
For further information on evasive techniques, Check Point Research: Evasion techniques can be used as a reference.
Evolution of Offensive and Defensive Techniques
Over the years, code execution has gotten more and more complicated as defensive techniques and processes, improved requiring more sophisticated approaches. In this section, we want to just nail down the evolution and history of both offence and defence within this space. By doing so, we hope to build an understanding of why some behaviours are absolutely necessary in today's red team environment.
While some implants may be anti-virus proof, able to run without detection and execute commands within a system, this is a far cry from being able to operate as a viable C2 within a network with an up-to-date EDR and a correctly configured SIEM. Indeed, without these actions in place, a red team is likely to not provide value to an organisation as many of the recommendations will simply be unapplicable to a network with that level of maturity.
2010s
Back in the day when Metasploit was king, it would be possible to get away with running commands from the shell. Meaning the implant.exe running on the host would call cmd.exe
, and then the command wrapped within the /c
flag. This would produce the following process tree:
This is all fine when runtime rules are not being executed on specific behaviour. Also around this time, we had one-shot-kill exploits such as MS08-67 which would essentially work as a point-and-click exploit giving NT AUTHORITY/SYSTEM
access.
Obviously we cannot speak for every Anti-Virus vendor, but around this time almost all detections were performed on static analysis and required malware families to be known. This is still partially the case in modern day with static detection, however now there is a lot of crowdsourcing with companies such as Virus Total, and the adaption of Machine Learning - as seen in Intercept X: Powered by Deep Learning.
2014 - 2016
From the cmd.exe phase, the community went into a very PowerShell oriented style. This spawned projects like Empire in 2016 which was the first Command and Control (C2) Framework which was written entirely in PowerShell. Around the same time, the original PoshC2 was produced. At the time, PowerShell was working well. Around the same time, Antimalware Scan Interface (AMSI) was picking up. From This is how attackers bypass Microsoft's AMSI anti-malware scanning protection, the release appears to be 2015. At the time, and still somewhat to this day, AMSI has been trivial to bypass. Because of this, websites such as amsi.fail were created to generate obfuscated AMSI Bypasses from the following sources:
Also, around 2016, Invoke-Obfuscation was produced to severely obfuscate PowerShell. Later, in 2016, Raphael Mudge wrote Modern Defenses and YOU!. This blog post details why operators should move away from PowerShell due to its popularity. This was reinforced by Microsoft in 2017 when they released Defending Against PowerShell Attacks and then a tweet from Matt Graeber which alludes to PowerShell being too popular and the new technique being .NET .
Whilst all this was going on, every aspect of offensive PowerShell required was built into one suite: PowerSploit.
Cobalt Strike 3.11 - The snake that eats its tail introduces execute-assembly which would dictate the next few years...
Around time time in the Defensive component of the industry, Anti-Virus vendors were making a migration into the detection and mitigation of Zero Day Exploits due to an increase in the usage of these from APTs, a portion of which were attributed to Chinese Military Groups.
Over this period of time, we saw the rise of companies such as CrowdStrike, SentinelOne, Cylance and a few others. We do not know the internals of these companies and how/when/why they started implemented their in-memory and technique based detections. But this period of time is likely where techniques such as Userland Hooking, registering Kernel Callback's to determine suspicious behaviour and then the introduction of languages such as Lua to write rules to parse the logs generated by such protections. Using Lua in such a way is a known use case of Microsoft Defender for Endpoint (MDE) and has been extracted by researchers, as seen in ExtractedDefender.
2017 - 2019
When Cobalt Strike introduced execute-assembly
, the usage of .NET exploded and is still somewhat popular today. Projects like SharpCollection were created to build nightly releases of a bunch of tools, but this doesn't touch the surface on the attack tools throughout the internet. Around this time, Covenant was the first C2 to popularize .NET as a C2 Framework.
Likely due to this popularity, Microsoft added backwards compatibility and general support for AMSI. In Whats new in .NET 4.8:
Antimalware scanning for all assemblies. In previous versions of .NET Framework, the runtime scans all assemblies loaded from disk using either Windows Defender or third-party antimalware software. However, assemblies loaded from other sources, such as by the [Assembly.Load(Byte])) method, are not scanned and can potentially contain undetected malware. Starting with .NET Framework 4.8 running on Windows 10, the runtime triggers a scan by antimalware solutions that implement the Antimalware Scan Interface (AMSI).
At the time, it received some praise online. This would be trivial to handle by heavily obfuscating the assembly, or creating .NET Loaders to encrypt and reflect the malicious tool with Assembly.Load
. Dom Chell did a great talk on this in 2020: Dominic Chell - Offensive Development: Post Exploitation Tradecraft in an EDR World.
Similarly to PowerShell, SharpSploit was produced solving a huge portion of offensive requirements. An argument can be made that when a full attack suite for a given language is developed, it could be the end of an era for that language.
It was around 2019/2020 where the community began experimenting with things like Nim and Dynamic Language Runtime Overview (DLR) with projects such as SILENTTRINITY and OffensiveDLR.
2019 - 2020
Like execute-assembly
, Cobalt Strike somewhat changed the typical tooling approach by introducing inline-execute in Cobalt Strike 4.0 – Bring Your Own Weaponization:
Finally, Cobalt Strike 4.0 introduces an internal inline-execute post-exploitation pattern. Inline-execute passes a capability to Beacon as needed, executes it inline, and cleans up the capability after it ran. This post-exploitation interface paves the way for future features that execute within Beacon’s process context without bloating the agent itself.
Along with inline-execute
, Cobalt Strike introduced the idea of Beacon Object Files:
A Beacon Object File (BOF) is a compiled C program, written to a convention that allows it to execute within a Beacon process and use internal Beacon APIs. BOFs are a way to rapidly extend the Beacon agent with new post-exploitation features.
Essentially, they are just specifically crafted Common Object File Format (COFF) Files. The benefit, as TrusedSec point out in A Developer’s Introduction To Beacon Object Files, is that the operator benefits from running code inside of beacon process itself, avoiding creating a child process which is something that the in-built execute-assembly
suffers from.
TrustedSec then went onto produce:
Around the same time, people began reinterpreting the execute-assembly
function by rewriting the CLR and executing it as a RDLL:
With this heavy investment in rewriting key parts of Cobalt Strike, the stream of new C2s became a torrent. While custom C2 development had always been a part of the industry, Cobalt Strike's off-the-shelf nature and market dominance seemed to eclipse much of this activity. However, from 2019 onwards, more and more courses and blogs endorsed the concept of custom C2 authorship as a viable alternative to a commercial C2, or even as a straightforward learning exercise.
2022
Cobalt Strike for many years, in our experience at any rate, was the C2. Even with the growth of other C2s, Cobalt Strike remains the C2 that C2s are compared to, the Sennheiser HD600's of the offensive tools. Cobalt Strike's interface and operation (and Armitage before it) remain "what a C2 looks like", at least in our minds. Although we've not seen many imitate the device canvas (or, sadly, the lightning).
While there are arguments to be made for other projects, Cobalt Strike has been steering the industry, for both Offence and Defence, for years. The frequent and information dense blogs and videos helped both offensive and defensive teams improve their techniques in a way that few other vendors have done.
Raphael Mudge's video playlists:
Then then entire blog: Cobalt Strike: Blog
Researchers worked on "How to improve X in Cobalt Strike" for a long time, and the change to actually building new and unique tooling has only shifted over the past few years. For defensive teams, Cobalt Strike is still frequently seen and will be for a while. This comes from its leaks and cracks over the years and its continued effectiveness.
Since Raphael Mudge stepped down from the team, Help Systems have been primarily working on stability which has given detection a lot of time to catch-up. Due to this, the detection rate for Cobalt Strike both on disk, and in memory, have drastically increased. Obviously, Cobalt Strike remains a completely viable and good option for a C2, but the industry has started to see some titans emerge to rival Cobalt Strike.
In response, in recent posts Cobalt Strike has begun to discuss working on more evasive features, such as: Arsenal Kit Update: Thread Stack Spoofing. The Cobalt Strike Roadmap Update discusses this further, mapping their future progression.
As Raphael Mudge took his foot off the gas and the research efforts slowed down, it caused the industry to begin building out their own tooling to reduce the amount of signatures that they would have to deal with. As more and more people began building these tools, the C2 Matrix began in order to track them. However, there are two titans who are at the forefront of advanced functionality:
Both of these offer advanced evasive technology baked into the product, and are aimed at working in sophisticated environments with high levels of protection in place.
By writing an entirely new C2 from scratch, if gives the operators full control of the implant and communications. For example, as the use of memory sweeps becomes more common, it may be a requirement to fluctuate the page permissions of the memory region in which the implant is operating out of. If the operator is using Cobalt Strike, then something like ShellcodeFluctuation could be used. The issue here is that its an extra piece of shellcode to execute, and it places a hook on the KERNEL32!Sleep
function, increasing the indicators of compromise. Whereas the the C2 was completely open to the operators, then this could just be a setting to enable and disable on a per-implant basis.
When it comes to modern day defences, its a continuation of the things we've recently discussed. However, the internals of these techniques have gone through endless amount of research and development to better empower the techniques. We've also seen the introduction of feeds into Event Tracing for Windows (ETW) for Threat Intelligence known as ETWTi
, more on this in Introduction to Threat Intelligence ETW. As well as ingesting ETWTi feeds, more generic ETW feeds have seen use. For example, the usage of the DotNet Runtime traces to determine assemblies being loaded.
In the next two blogs, we will look at implementing a few of these techniques. Namely, ETWTi, Userland hooks, ETW, AMSI and memory sweeps.
Important Concepts
In this section, we want to outline a few topics that will come up when building out the implant so that they make sense and we can demonstrate the implant effectively.
OS Shell Commands
When discussing OS Shell Commands, we don't mean just cmd.exe
. This is anything that causes a a child process to spawn to run the command, every language has its equivalent. To name a few:
We've mentioned it a few times now, but lets look at why running post exploitation under cmd.exe
is a bad idea. In a more traditional environment, running commands directly on a host may be considered normal behaviour for an operator. However, as we've explored, the level of detections and awareness that an operator can expect within a contemporary environment is far higher. Advances in logging, especially within Windows, as well as a greater awareness of which events to pay attention to, as well as EDR and intermediary security devices have resulted in a state of play where directly running commands can worst case be immediately considered an indicator of compromise, and best case a highly suspicious activity as can be seen by the fact it has a formal MITRE ATT&CK reference as: Command and Scripting Interpreter (T1059).
While LOLBINs and aliases still have a role to play, using these for downloads and command execution is an exercise in operational security by obscurity. Techniques relying on increasingly more unknown Windows built-ins can be quickly neutralised with a simple blocklist. This may be by reimplementing the logic within the implant, or by finding the base functions that the commands themselves use and calling them directly, bypassing any calls to run commands via cmd.exe
.
Fundamentally Windows cannot block the features that Windows itself has to use. Since these calls are so ubiquitous, since every feature in Windows makes use of these, they are now reliant on EDR using hooks and callbacks.
Overall, there are more ways to reimplement and refactor code with the WinAPI than there will be to execute commands via OS-based command execution or random LOLBINs. This is something that Cobalt Strike documented: OPSEC Considerations for Beacon Commands.
WinAPI
The WinAPI are functions that are exported from various DLLs, most of which can be seen in c:\windows\system32
, and they give access to all different components of Windows. Its utility is far too comprehensive to discuss, but here is an example. Within Kernel32.dll
theres a function called VirtualAlloc:
And for the most part, these APIs are documented on MSDN. As these functions are written by Microsoft, and marked as proprietary, projects such as ReactOS attempt at recreating this. So, when we get discuss Userland Hooks and such in future blogs, we will also discuss how and why reimplementing the function, without using the function, will typically avoid specific detections.
For now, though, the WINAPI is giving us access to calls that will make this entire process easier.
Process Environment Block
Windows is an Object Oriented Operating System. Meaning, everything operated is an object and will have some form of data structure. Processes fall into this category. A Process, like calc.exe
, has an object called Process Environment Block (PEB) which contains all sorts of information:
Process Name
Location
Is it being debugged
Loaded modules
Environment Path
Etc
This is all stored in a structure like so:
Throughout this blog, we will interact with the PEB a lot, mainly to get enumerate loaded modules and such. As this is a pretty extensive topic, we won't discuss it all and have some recommended reads. But for now, know the PEB as the structure in which the process is build upon.
Position Independent Code
When we talk about Position Independent Code, we are talking about C code that is written in a very specific way, with additional restrictions. The goal is to have all the code we plan to execute inside the .text
section of the PE.
Writing C normally will cause different parts of the code to be stored in different sections:
Global Variables in
.bss
Imported DLLs in
.idata
Exports in
.data
CHAR*
andWCHAR*
in.rdata
Even with all those limitations, we can still achieve our goal. We just need to write code in a very specific way to avoid these different section allocations. By doing so, we ensure all the code is in the .text
section. We need this because that is the section required for storing all of the binary code. If part of the code is in .bss
, then it will crash because we're only going to extract the .text
.
For example, lets assume this string:
Because this is read-only initialised data, it goes into .rdata
. To get this to be PIC, we write it as such:
What if we want to use VirtualAlloc
? If its just called as is, then it will have Kernel32
as an import. To get around this, we will need to dynamically load the DLL, and then resolve the address (more on this later).
One final note, to ensure we don't have CRT controlling the execution flow of the PE, we need to make sure that the entry-point is not main
or some other form of winmain
, wmain
, etc. We will show this later on in the Makefile
.
For more on this, we recommend: PE Reflection: The King is Dead, Long Live the King.
Supporting Post Exploitation
When discussing implants, there are several methods of supporting post explotation utilities. For the most part, implants will have a majority of their functionality embedded in the implant. So, when the implant recieves a command, the command will go through some sort of switch
statement:
Alternatively, the implant could work as a loader; supporting:
Common Language Runtime (CLR): Execute .NET Assemblies in memory
Common Object File Format (COFF): Execute COFF Objects in memory
This ensures that the actual implant is significantly smaller, and all functionality is modular. However, this comes at the cost of constant memory allocations for each job. The method chosen is entirely defendant on the use case, but we should it will be addressed. For us, we will stick the the traditional all functionality embedded variant.
Types of implants
If the implant is to be .NET, then a simple assembly that's dynamically loaded is fine. However, this is not the type of implant we are discussing. For an implant written in C(++) there are some options on the type of implant to use.
Position Independent
The implant could quite well be Position Independent and the entry point could be resolved, this is seen in SleepyCrypt where the functionality is allocated with VirtualAlloc and casted to a function, like so:
Dynamic Link Library
More commonly, the implant could be written as a Dynamic Link Library (DLL). DLLs are typically loaded with LoadLibraryA:
The issue here is that LoadLibraryA
requires the DLL to be on disk which would break the golden rule of OpSec: Don't write to disk. Doing so will leave artifacts behind, allowing for the implant to be signatured, resulting in more time on trying to break the signature.
The Golden Rule of OpSec: Don't write to disk!*
* Unless you need to, or unless you know how to avoid the detection, or... except... and... ... other caveats
Reflective DLLs
This led to a technique known as Reflective DLLs (RDLL), first produced by Stephen Fewer around 11 years ago. The ReflectiveDLLInjection repository contains the original code. Since then, the technique has been updated, but lets discuss the original. The description:
Reflective DLL injection is a library injection technique in which the concept of reflective programming is employed to perform the loading of a library from memory into a host process. As such the library is responsible for loading itself by implementing a minimal Portable Executable (PE) file loader. It can then govern, with minimal interaction with the host system and process, how it will load and interact with the host.
Essentially whats going to happen is the RDLL will be allocated similarly to typical shellcode:
However, before the thread is created, the Relative Virtual Address (RVA) is calculated by searching the Process Environment Block (PEB) for the Export Directory, and then all the exports to identify the RDLLs Export (which is simply a function exposed from the DLL).
See also: The .edata Section, more on the PEB structure later.
Once the exported address has been found, the offset is added to the base address of the allocated space for the RDLL. Like so:
First off,
lpBuffer
can be from anywhere; downloaded from the internet, read from a file, etc. For an implant, its likely downloaded over some sort of channel (HTTP
).With the buffer, it is then cycled through to find the RVA of the exported function.
Now that the offset is determined, and stored in
dwReflectiveLoaderOffset
,lpRemoteLibraryBuffer
will be the base address returned fromVirtualAllocEx
.The space is allocated, and the export offset found, they can be added together to get the address of the exported function.
All that needs to happen now is for the thread to be created at this point to execute the loader:
All of this can be seen in LoadRemoteLibraryR from the repository.
The exported function can be seen in the DLLEXPORT
of ReflectiveLoader; this is the function the thread triggers on. The code is well documented, so we will not discuss the codebase.
There are some issues with RDLLs, and we will discuss them in the future post in which we perform a static/runtime analysis of the implant. For defenders, make sure there is a signature for the this technique and ensure the ReflectiveLoader
string is treated as malicious as seen on alienvault.com:
This is the technique we will follow for Maelstrom.
Recap of the Execution Flow
During Maelstrom: The C2 Architecture we discussed the execution flow that the implant will take:
Stage 0: A Position Independent Loader
Stage 1: Reflective DLL
By making the stage 0 loader PIC, we can wrap it into any other form of loader required. Once the Stage 0 executes, it will load a Reflective DLL which will be the main implant (Stage 1).
Simple.
Stage 0
Maelstrom WinAPI Resolution
Before getting into the stager, we need to cover how Maelstrom resolves WinAPI functions. In order to keep our actual C2s functionality somewhat guarded, we're opting to use publicly accessible code throughout this series. One solution is paranoidninja/PIC-Get-Privileges/blob/main/addresshunter.h, and an alternative could be: Speedi13/Custom-GetProcAddress-and-GetModuleHandle-and-more/blob/master/CustomWinApi.cpp#L168.
Parsing the PEB is not a difficult task, and it is all over the internet. CAPA even has rules for this. The function from Paranoid Ninja's example:
First, pass in a module base address and cast it to uiModuleAddress
:
This is the used to identify the Export Directory, again, this is a standard technique:
Get the offset of the NT Headers by adding the base address and offsetting it with the DOS Headers to give the e_lfanew
. Then, using that value, extract the Data Directory struct. Finally, specifically get the Export Directory by offsetting the module base with the data directories virtual address. Now access to the Export Directory has been achieved.
Now it is just a case of looping through all the exported functions from that directory until the strings match:
As strcmp
cannot be used without resolving it... its easier to just get the source code:
When STRCMP
matches, we return the symbolAddress
after the break
:
So where is the module base address coming from? Well:
Three DJB2 hashes are defined:
Then, parsing the PEB we can obtain the DLLBase
:
First off, get the PEB Struct:
Where PPEB_PTR
is:
Read from the offset of 0x60
gives access to the PEB. Next, we can get the PEB_LDR_DATA struct by simply accessing it:
Then get access to the module list:
As seen in the struct:
Then loop over it until the hashes match. When they do, that will be the DLL required.
Now its a case of casting to the function type, but before that; here is how the APIs are stored:
In the case of LoadLibraryA
:
Then using it:
Now onto the stager!
Quick recap of Stage 0. Before running malicious code on a host to get an implant, some initial enumeration and checks are going to be put into place. For an Adversary Simulation exercise, this keeps the attackers within scope, whilst also ensuring that the implant is only executed when it is safe to do so.
Additionally, this entry point will all be Position Independent; meaning that all of the code will be within the .text
section, allowing for the opcodes to be extracted, thus giving shellcode to execute in other methods.
Note, Position Independent Code will not be discussed at length within this post, it is recommended to read Executing Position Independent Shellcode from Object Files in Memory.
Functionality
In this section, we want to discuss some functionality that can be added to a stage 0. Obviously, it doesn't need to ALL go in, but its just some things we found interesting and/or useful.
Environmental Keying
First up, Environmental Keying, or Guardrailing. This has two purposes:
If the Environmental information embedded in the stager does not match what was enumerated, then return.
Encrypt the stage 1 DLL with some information obtained from the environment, and decrypt it at runtime.
The second point can be completely automated, this is not something done in Maelstrom, but it easy to send some information back to the C2, and then encrypt the DLL with that information before returning it to the stager.
As far as methods of doing this, there are a ton and quite frankly its down to creativity. A few examples can be shown here:
An even easier method is use something like GetComputerNameW or GetUserNameW. This is pretty basic and a combination of these types of calls could be used.
In the case of Maelstrom, we simply hash the computername and check it with this function:
Which is called like so:
To AES256 encrypt a payload using this technique, it can be read in Greta: Windows Crypto, and Recursive Keying. Maelstrom will not make use of this as this is purely a Proof-of-concept.
So, back to the keying. If the computername doesnt match, then it returns -1
and will exit. Otherwise, it moves on.
Detecting Suspicious Processes
This is a fun one, it adds an extra layer of hindering blue teams. Its quite simple, if a process is found, exit. In the following example only one process is being checked for, but its not an extra issue to loop over a bunch:
Loop over all processes, if the hashed value of process is the same as the one defined, then return TRUE. In this case, it is Process Hacker.exe
:
This is executed like so:
Anti-Sandbox
Sandboxes are a great way to automate and identify what the purpose of malware is. Essentially, they run malware inside an isolated virtual machine, watch its behaviour, report on it.
Commonly, these are small virtual machines with a limited amount of time they can wait. Some common solutions to handling sandboxes:
Waiting for the expiration time (usually 180 seconds)
Only executing if not in a virtual machine
Only executing if a disk size is above a certain threshold
They are just a few to consider, in the case of maelstrom we simply check RAM size > 4:
If this function is true, then we continue.
Combined with a sleep:
Anti-Debug
Anti-Debug, again, is about creativity. Repos such as LordNoteworthy/al-khaser contain loads of examples of this, however Maelstrom keeps this simple:
Read the PEB Struct, check if BeingDebugged
is set to 1. Simple. Looking at the AntiDebug section of Al-Khaser there are tons methods, just implement these as/when needed.
These techniques are useful at hindering the blue teams if the payload is retrieved; it will slow them down from identifying the purpose of the malware, as well as furthering identifying the server. This should not be the only method of doing this. For example, if it is debugged and the IPs of the server are found, then there should be server side protections to control which implants are allowed to communicate with the server.
Downloading the Reflective DLL
For this, we will use WinHTTP as the code is ready an accessible. However, this is a fairly older library and WinInet is more modern. For readability of code, the following struct is defined:
And then passed into the function:
We'll get to that shortly. But first, the config of the request is defined:
These strings are hard-coded in the function has does not support any sort of update. Also, the password in which the server is requiring is hardcoded in the header. Finally, these strings are in the array format so that they are placed within the .text
section.
We now create a few variables, including the port:
We aren't going to step through the code, but there are a few things to point out.
If its SSL, set these flags:
And:
Then, this is how headers are added:
If multiple headers are required, then the WCHAR
needs to have them in the same string and containing the \r
as per the RFC.
After the request is done, we fill the structure:
The entire process is encapsulated in the following request:
In the stage 1 section we will discuss why a Reflective DLL was chosen and what it is, but for now lets discuss how it will be loaded. For reference, here is the code used to execute the DLL:
memcpy
is reimplemented using the source code:
We discussed this earlier on, but lets revisit. We first need to identify the offset of the export function so we can get the proper address to start a thread on the function:
Lets go over the GetReflectiveLoaderOffset()
function.
The function is declared like so:
The parameter taken in here is the unsigned char*
buffer containing the DLL retrieved from the server.
First things first, define the exported function name:
With that, the next thing that happens is the IMAGE_DOS_HEADER struct is identified within the buffer:
The struct:
From here, the IMAGE_NT_HEADERS are extracted:
The struct:
Extract the export directory, the virtual addresses, and so on:
And then loop over all the exported function names by casting the RVA to an offset:
Where RVA2offset()
is:
Then, using a custom strstr, compare the export name with the one we hardcoded at the start:
At this point, there should be some clear OpSec issues, if they're not obvious, we will point them out in the next few sections!
Once this is done, and the base address of the exported function is achieved, we can simple start a thread on it:
Aside from the glaring IOC here, there is one missing WinAPI call which would operate as a cleanup... More on that in the OpSec review posts.
Maelstrom's Entry-point
This is currently how the stager looks:
We deem this as the safe version, as it has all the checks we discussed. As SAFE
is a preprocessor definition, we can control whether or not its used by passing the -DSAFE
flag to MingW
.
The makefile
:
For the eagle-eyed, this is fully position-independent and we can show this at the end of the post.
Stage 1
Stage 1, or Maelstrom.x64.dll
, is the actual implant. As tempting as it is to utilize a typical PE and operate out of main
, its probably best not to. If something like sRDI is used, or Donut, they work by bootstrapping the PE. To avoid this, and any other complication, we found a Reflective DLL to be the most effective, and easiest to work with.
Custom Reflective Loader
As we discussed earlier on, Stephen Fewer provided the first proof-of-concept of Reflective DLLs. Since then, the community has developed a few iterations:
For our demonstration, we will use the original proof-of-concept as this uses common IOCs which we want to keep in the project to ensure that Maelstrom is easily detectable.
DLLMain
Once the DLL has been loaded from the Stage 0, DllMain
will be:
When the DLL load reason is DLL_PROCESS_ATTACH
, a new thread is created on Maelstrom()
which looks like this:
DLL Debugging
To debug this in Visual Studio, the pre-processor definition of _DEBUG
is checked for. If its not present, then allow for the thread to be created. Otherwise we resolve this function:
And a seperate loader was written to debug it:
We found this to be a cleaner debugging experience than messing with x64dbg.
Checking In
As soon as the implant is launched, the first thing to occur is some basic enumeration which will identify the host:
In the code above, the process, computer, and username are packed into a json string, along with the process ID. This is just XOR'd with a hardcoded hex value as a proof-of-concept. In a production C2, this should be encrypted with something like AES256-CBC or an equivalent encryption algorithm. As this is an example project, we don't care for this step.
This is something discussed in Maelstrom: Building the Team Server, and it was making the data being sent between client and server difficult to read. Whether its layers of encryption, or masking data as a MAC Address; we highly recommend something is done to transform the data. For this demo, we don't care about any of that, so its just sent to the Initialise()
function:
Which is just a wrapper around the SendRequestA()
function:
The SendRequestA()
function uses WinHTTP, and relies on a bunch of WinAPI Calls. So, lets get into the configuration of the requests.
Similarly to stage 0, the config is hard-coded:
And some additional config:
Again, to repeat ourselves, do not leave these hard-coded.
Once it has initialised, we hit Start()
:
This is our simulation of tasking. Essentially it is operating as the component of the implant which will check, run, and return tasks. We are not providing that functionality though.
Safe Sleeping
One of the important ones is how the implant will look in memory in between operations. If the implant is just idling with nothing to do, it should sleep in such a way that memory scanners or engineers cannot easily identify it as malicious. This is something we will look at more in the runtime analysis, but lets take a quick look. If Process Hacker is used and the RWX
region identified, this is how the region looks:
In the above, we can see the MZ
Header, the DOS
Message, and various section names. This needs to be removed, but we will not be providing a solution to this as we want to align with the objectives we set out in section one; but we will offer some example projects for the enthusiastic reader:
Gargoyle: A a technique for hiding all of a program’s executable code in non-executable memory. At some programmer-defined interval, gargoyle will wake up–and with some ROP trickery–mark itself executable and do some work.
Foliage: Another ROP-based project (Also adapted and demonstrated in Brute Ratel Live Demo @Un1k0d3r's Patreon (Charles Hamilton)).
SleepyCrypt: Position-Independent Code to encrypt sections.
ShellcodeFluctuation: A proof-of-concept implementation for an another in-memory evasion technique that cyclically encrypts and decrypts shellcode's contents to then make it fluctuate between
RW
(orNoAccess
) andRX
memory protection.Studying “Next Generation Malware” - NightHawk’s Attempt At Obfuscate and Sleep: Replicating one of Nighthawk's sleep protection mechanisms
Ekko: Proof-of-concept of sleeping with Timers
On May 5th 2022, Austin Hudson posted a tweet with a blog: Studying “Next Generation Malware” - NightHawk’s Attempt At Obfuscate and Sleep
This blog went through how Austin was able to identify a sample of Nighthawk which is a proprietary C2 from a UK-based Cyber Security Consultancy, MDSec. In this post, Austin discusses how the technique uses thread contexts and callbacks to flip the memory regions permissions (which we will discuss further in later posts).
For clarity, the research efforts for this technique, on behalf of MDSec, was Peter Winter-Smith and modexp.
Once the proof-of-concept was made public by Austin, C5pider then built it out into an open-source tool called Ekko. However, this proof-of-concept uses the base address of the entire image as the region to protect, this only works when the malware is the entire EXE on disk, or loaded as a proper DLL. This can be seen on line 36:
In the event that malware wants to load in the implant entirely through memory, so something like a Reflective DLL, this technique will not work as the GetModuleHandleA
call will get the base address of the image the DLL is being loaded into. For example, say the DLL is being reflectively loaded into calc.exe
, then the GetModuleHandleA
will be the base of calc.exe
.
Producing Shellcode for Loaders and Droppers
As we already have stage 0 as position independent which generates both an exe and bin for each stage 0 type, we can easily get the hex from the bin with:
Produces:
This can then be loaded with:
Instead of calling WaitForSingleObject
on the thread, we use a Sleep
in the above because the shellcode will create a new thread and exit when the RDLL is loaded, causing the thread we are waiting on to exit successfully. So, for demonstration purposes, we just sleep.
Bare in mind, with the SAFE
defined, it goes up to 8192.
To see how Metasploit got their payload so small, see block_reverse_https.asm and the build script at build.py.
Now that shellcode is achieved and is loadable, this can now be wrapped in any shellcode loader:
.NET
Go
Rust
Nim
You name it, it should work!
Conclusion
After long last, we finally have some code that runs, and a plan for more functionality and security. There are manifold ways to progress the implant from here, from improving the implant's operational security to fleshing out its communication channels.
This blog post has been pretty heavily in favour of the offense, and light on operational security. As we've discussed, defensive techniques such as hooking AMSI and ETW TI present a potent limitation on the operational security of the implant. Our next two posts will look at these protections, how they work, and how an implant can attempt to bypass them.
Last updated