Maelstrom #2: The C2 Architecture
A look into the design choices behind the C2, along side some design concepts to keep it stable, and the workflow smooth.
Last updated
A look into the design choices behind the C2, along side some design concepts to keep it stable, and the workflow smooth.
Last updated
In this post, we are going to discuss some of the architectural decisions when it comes to writing a C2. In this first episode we'll examine the decisions required for an implant, which we'll explore as we write the accompanying proof-of-concept C2. As it's not a C2 without a snazzy name, we've named this exemplar C2 "Maelstrom". We'll also reference a more feature-complete private C2, "Vulpes", to illustrate the difference between the proof-of-concept used for this series and an operational C2.
This post will cover:
reviewing some concepts which may impact our choice of language,
choosing a language for the implant,
choosing a language for the server,
choosing a compiler,
choosing the appropriate cryptography, and
some basic exercises for the reader on operational security considerations.
Don't panic! We are just designing the implant here - we will cover the implementation and techniques in later posts! In this blog, we're discussing the decisions we made on the implants design.
Malware is just software with malicious intentions and being naughty doesn't mean that the code itself can't be well-written, and definitely doesn't mean that the code is bug-free! When reversing malware samples, many tend to be poorly architected and written - for example, using insecure behaviours to manage their code, leaving clear indicators of compromise, and even negatively impacting the perfomance of the target device with memory leaks and other bugs. This all makes the implant more visible, and thus less secure.
When you're writing an implant that is going to run for a while, things like memory leaks need to be appropriately taken care of - even if you've gotten past the EDR, unexplained crashes are noticed. There are a many solutions to this, but the simplest which we've used in Maelstrom and Vulpes is keeping the code Object Oriented. This provides structure, and standardisation to specific functionality, and makes resource management that much easier.
As we've just discussed, objects make code more manageable and can improve the security of the implant by simplifying the creation and deletion of objects in memory.
Using Objects for Resource Management
Here is some psuedo-code from Vulpes:
The example above is a class to handle the execution of .NET with the Common Language Runtime (CLR). When the class is instantiated, the CLR is also initialised. And then when the function returns, the destructor is called; causing the CleanupCLR()
to execute.
This leads right into the next topic: Resource Acquisition Is Initialization (RAII).
Resource Acquisition Is Initialization or RAII, is a C++ programming technique which binds the life cycle of a resource that must be acquired before use (allocated heap memory, thread of execution, open socket, open file, locked mutex, disk space, database connection—anything that exists in limited supply) to the lifetime of an object.
With these simple concepts, we reduce the human error that could otherwise have been introduced if we needed to call a function every time we carry out an action. With our objects handled implicitly at both construct and destruct the potential of a forgotten about object hanging around in memory is hugely reduced. This is crucial for an implant - a forgotten object is another potential indicator of compromise (IoC) - so by ensuring that objects are properly and fully removed after use, our implant becomes more secure, and may even cause fewer blue-screens while it's at it!
Moving away from specific programming concepts, when it comes to considering the implant's language, we should also consider the goals of the implant. Design patterns help here. By structuring our intentions, our decision is made easier. As an added bonus, if this is a commercial C2, a managerial sign off on your intentions can be brilliant for some CYA later down the line if so required.
There are a ton of design patterns, but we went with MoSCoW as its fairly straight-forward and easy to translate to GitHub tags and milestones:
The term Moscow itself is an acronym derived from the first letter (ish) of each of four prioritization categories: M - Must have, S - Should have, C - Could have, W - Won't have.
This is particularly useful with C2s as the functionality (and potential functionlity) can become overwhelming. For example, a very secure method of sleeping could be extremely important for the C2s use-case (Must). Whereas ransomware simulation might be cool, but is completely out of scope and a huge time-sink (Won't). There are a huge number of features that the implant, communication channels, and the server can all feature - but in a world of finite time and effort, not all of these can be entertained! When an idea appears for a feature, bear this model in mind. It will save you a lot of time and hassle if it is implemented early on and stuck to throughout.
By sticking to known standards, development styles, and patterns, a C2 is benefited as much as any other software. Importantly, since implants are required to run in isolation, there are few opportunities to debug during an engagement!
As we've said a few times throughout this series, we aren't going to be building a huge and fully functional C2. Because of our focus on implant detection on the host, we're going to discuss the specifics of the server a lot less.
The example server we're using here is essentially not important as it's simply the user interface. In a real-world C2, decisions such as text user interfaces versus thick clients versus web browsers are far more important, as while they will not affect the operation of the C2 they do affect the user experience. A C2 is written as a quality of life tool, so the A E S T H E T I C
and V I B E S
are important and should be considered - but time spent here isn't going to improve its chances against an EDR! In this case, Maelstrom is simple and uses Python - it's easy to write and natively multi-platform, and we only need a straight-forward API to receive our example requests. In a full-fat C2, supporting more detailed operations and potentially multiple concurrent users, more time would of course be spent here.
With that in mind, lets look at our options for the implant!
There are tons of languages available, but to name a few that often crop up and are worth considering:
C
/C++
: Probably the most common (see BruteRatel, Nighthawk, Vulpes & Havoc)
Go
: Another option, much harder to detect statically but the portable files are huge (See Sliver)
Nim
: This could be good, but we haven't seen too many offensive tools written using this as of yet (See Nimplant)
Rust
: Similar to Nim
, the language itself is pretty interesting, but there is currently less of a community (See Offensive Rust)
Naturally, there's nothing except your dignity stopping you from writing a C2 in Java or PHP, or any other language.
When it comes to making a decision on which language to use, we have a few things to consider:
Which language provides the most utility to get to the end-goal required?
Does the language support your requirements - for us, we wanted low-level memory manipulation for improved opsec?
What is the build process like? Does it need to be cross-compilable? Is compilation fast? Are the generated binaries practical for engagements?
Can the code be obfuscated? If the product is to be commercialised, remember the server will be sent out to customers; if it's written in Python
, the source will be difficult to protect.
Finally, which language are we happy working with and maintaining?
For context, for the Vulpes C2's implant, mez-0 chose C++
for the following reasons:
Easy Object Oriented Programming (OOP) and Resource Acquisition is Initialisation (RAII).
Naturally easy access to manipulate memory.
Cross-compilation and mingw compilation is simple.
The code base can be easily obfuscated.
It's not a new language to the authors, and it's one with a wide ecosystem for debugging support.
Our exemplar C2 - Maelstrom
- will be written in C
because:
It's just a PoC so we don't need extra functionality.
We want direct access to memory, tiny binaries, and easy position independent code.
It just needs to run on our development machines in a controlled environment.
We're already sharing the source code and we're (deliberately) writing it badly.
C's C isn't it.
With an idea of what we want to achieve and how we want to write it, we should also consider what behaviours we want to implement and how this should be defined and controlled. In order to have the C2 operational in different environments, the implant needs to be adaptable. Ultimately, configuration or not, values like the server's address need to be stored somewhere, and our operational security is improved if this isn't easily accessible by EDR.
Most C2s will also include some way of specifying the target platform and architecture at a minimum, and normally also include options for more advanced configuration, such as defining how memory is allocated, when and how the beacon hides its presence, and other configuration options to improve its effectiveness and operational security. Instead of re-compiling and re-sending a hard-coded implant every time we change a setting, we need a way to securely store this configuration so the implant can access and update it.
Cobalt Strike does this with Malleable PE, Process Injection, and Post Exploitation. Its configuration is stored within the .data
section (the part of the object file where static variables are stored at runtime) and lightly obfuscated by XOR
'ing with a 4 byte key. While this makes the configuration readily available to the beacon, it also means that it can be easily extracted2 by the target machine.
Another option is to embed a configuration file within another section, such as the .rsrc
3, but this is likely to fall victim to the same process if researchers are able to get their hands on the portable executable (PE)4.
Our execution path, which we will discuss in the following section, adds more layers to this and makes it harder to get access to the actual implant. Again, we don't want to provide a C2 which works well, we just want to discuss it and make sure the ideas are known, so we won't actually be implementing this in the released version of the C2.
Whether the implant should be staged vs stageless is a choice made for all implants for all C2s - even Metasploit. The decision is simple - should all potential content be included initially or should be be provided in stages. A single stage results in a larger initial download, with potential indicators of compromise from content that may never even be used. Conversely, a stageless payload may decrease the chance of being caught, but means that more content must be hosted externally and requested. Famously with Cobalt Strike, if the Staged payload is used, it can allow for anyone to request the full beacon, and therefore the configuration. But we feel a staged approach, if done correctly, is still something that should be considered.
However, Vulpes makes use of this staged method, as well as using individually encrypted C2 connection details, with the full configuration is downloaded at runtime. With this approach, a defender would need to find the C2 authentication information, and then match all the correct keying information to decrypt it. If this is done, then they would also need to dig the implant out of memory (which is also masked and obfuscated, more on this later).
With all this decided upon, we can now plan how our implant will be run. The implant could just immediately call out to the C2 server and request a command, just like a classic nc
or PowerShell reverse shell. But there's no guarantee that this won't be detected, or that the implant hasn't been opened in the wrong location (for example, on the target user's home computer).
There a number of steps that should to happen within the implant between being initially run and being able to run commands from the server securely. We need to consider whether we are actually on the victim machine (versus being trapped in an anti-virus sandbox, for example), if the implant itself is able to run, and whether further code and commands should be obtained.
The following steps represent an acceptable minimum:
Is this the intended environment?
Is this a sandbox?
Are there any suspicious processes running?
Is there EDR active?
Is the C2 server accessible?
Securely retrieve the code, then
Do the hack.
The following pseudocode shows what this looks like in our example Maelstrom implant:
So before any code is retrieved and run, the implant first ensures that it is safe and correct to do so. This means that:
There's no packed binary inside the portable executable file (PE) which may act as an IOC.
With the staged ReflectiveDLL, we get a small initial payload.
The keying ensures that our implant is run on the correct machine - important where we have a specific scope of engagement.
By listing the running processes and drivers we can return a targeted Reflective DLL which can subvert the specific Endpoint Protection in place this code is left as an exercise for the reader
The Implant -> C2 authentication removes the opportunity for the C2 to be enumerated
Naturally, none of this has to happen; it is just what we found to be best after reviewing how other malware and implants behaved.
It should be noted that at this point the implant's executable has to be on disk for this to work. In a real scenario, the implant's stage 0 code should be adapted to:
Optional: Spawn a new Process (PPID Spoof if required.)
Inject the DLL into a remote Process
By doing this, the stage 0 loading functionality can return, and therefore exit cleanly. This is what would make the whole process flow properly. But, for now, we will stick with a loader because its easier to debug.
Cryptography is maths and that's where most things in life get complicated. For that reason we won't go into much detail here, but at a high level this should be included and configurable. EDRs can and do review the entropy of executables - a large encrypted blob will look more random than code typically looks, which means that the implant goes full circle back to looking like an IOC. That's not to say that data is best left in the clear - we don't want our implant and its configuration to be read by any passing process, and we especially want to ensure that an EDR can't cheat and just add a static signature to detect the implant.
Let's test out the entropy of encryption by generating three sample Metasploit implants - one encoded with XOR, one "encrypted" with RC4, and finally one encrypted with AES256:
We can use the ent
utility to check the entropy of each file:
This gives:
Using AES256 for a packed PE gives an entropy of 7.98. Whereas XOR and RC4 are at 1.5, and 1.3 respectfully. A general rule of thumb could be to keep static files below 5, but that is down to the design. But wherever data is being transferred, especially if it is client information too, then ensure that it is AES256. It should also go without saying that SSL should be enabled on the traffic to stop your traffic being read as well.
Finally we need to manage how the implant will access the server. Not just for staging, but also to manage sending updates and responses and receiving commands. The implant should include a way to authenticate to the C2 server, and the C2 server should be able to receive communications from the implant.
The C2 server should naturally have some way to authenticate - we don't want to provide stages to any client. We also want to ensure that our server can't be easily located on the Internet and blow our cover, and that responses from the implant go to the right place.
As Maelstrom is a simple POC and we don't want to put an operational C2 out into the world, it uses a hardcoded header which will be used for authentication:
Obviously, this header should be completely dynamic and can go literally anywhere in the request. A production C2 shouldn't, of couse, rely on a predictable static credential, But for a proof-of-concept, we will just leave it as is.
As we're just making a simple POC C2, we're gonna take the easy way out and set up a Python Flask server. The API means that - provided the implant is aware of the endpoints (which we'll define in its configuration) - we can receive data in a structured way and co-ordinate multiple implants and users if required.
The implant will therefore handle:
Dynamic URI Handling
Requesting: /a?something
, or a/a/
should be fine
Each implant should have its own unique authentication
If headers are to be sent back and forth, it could be a good way to add validation to the request.
I.E: If X-1, X-2, and X-3 are present then okay
Otherwise, ignore
Consider using more than one API Endpoint, a setup could be:
Staging
New Beacon
Check-in
Return Job
Obviously these names won't be the same, but this is just a general overview. By splitting the requests and URIs it becomes harder to pinpoint if it is infact a C2. If there are 1000 request to http://127.0.0.1/IMPLANT, then it's likely a compromise, but multiple requests to different pages should hopefully blend in to the background.
This one was wordy, but the first one or two posts of this series will be. We are trying to set the scene before getting into code and explain what and why we are setting things up in a certain way.
So, in this blog we have decided on a language for the server and implant, set an execution plan, and discussed some crypto and fancy hardening scenarios. The next blog will be going through and designing the implant!