Maelstrom #3: Building the Team Server
In this post we are discussing building a C2 Teamserver, common pitfalls, and the difficulty of identifying singular malicious requests.
Last updated
In this post we are discussing building a C2 Teamserver, common pitfalls, and the difficulty of identifying singular malicious requests.
Last updated
In the previous post, Maelstrom: The C2 Architecture, we discussed the general architecture for a Command & Control Framework, including connections from the implant to the server and the execution flow of the implant it'self. This post will continue this architecture discussion, this time looking at the further C2 development considerations around the user interface and experience. We'll also (lightly!) look at how designing and protecting the C2's communication channels.
This post will cover:
choosing a user interface from command line, thick / thin clients, and browser-based UIs.
choosing a language for the backend to manage the implants (and the interface)
deciding which actions the C2 will handle for the user, including:
listener initialisation
payload generation
communications security
the choices we made for Maelstrom
As we've mentioned previously, most of the discussions we're going to have in this series will focus on the offensive and defensive actions to take on the C2 implant. However, we still felt it would be worth looking at the points listed above because they are relevant to the implant's behaviour and security. We won't (in this post at any rate) look too much at redirectors, channels, and other parts of the red team infrastructure.
Our preferred way of handling C2 communications is by using a simple API. HTTP exists in almost every client environment, and where it doesn't, traffic can be handled by intermediary devices (such as the aforementioned redirectors and channels which are so interesting that we won't discuss them here).
Our earlier choice to use an object oriented approach was partially informed by this decision, as the way we modelled the implant helped inform the structure our API should use. By using an OOP approach within the server as well, we found that the API architecture largely wrote it'self. Ensuring that the API is written in a logical and predictable fashion makes future changes and additions so much simpler than having to un-spaghetti your spaghetti.
There are infinite ways to interact with a computer program, depending on how pedantic you feel like being. When it comes to C2's, there are three broad camps:
CLI (such as with Posh and Sliver)
Browser based GUI (such as with Covenant and Mythic)
Client base GUI (such as with Nighthawk, Brute Ratel C4, Havoc, and Cobalt Strike)
These are broad camps, and it's worth stating that a C2 can absolutely have both a command line interface and a graphical interface. Several C2s also support "virtual" CLI interfaces where a browser based GUI mimics or passes through a CLI.
These user interfaces are not necessarily tightly-bound to the server it'self. C2's, especially corporate C2s or those intended to be part of a wider red team infrastructure, will often come with a separate way to interact with a central "team server". A good example of this is Cobalt Strike, where the main team server is remotely accessed by a client on the operator's local machine.
This may seem all somewhat whimsical to discuss, but the design of the interface is important. It's the only way that the operators will be able to interact with the C2, and what is shown and not shown will influence it's effectiveness. If the only way to properly secure traffic and implants is buried within submenus, while a quick and lazy option exists, an operator will use the quick and easy option. Even in situations where the default choice is a good one, heavy overreliance on a single technique can it'self become an indicator of compromise or a fingerprint (such as the PowerShell persistence used by APT28).
With a clear design and a usable UI, opsec becomes more than just muscle memory and runes.
The C2 Matrix has an Ask The C2 Matrix function which allows for users to filter on all sorts of C2 component, with UI being one of them. As we've previously mentioned, there are three broad camps: CLI, Web GUI, and Client GUI.
Lets look at some examples of contempory C2s which implement the different options, and some libraries which could help you achieve a similar experience:
C2s
Libraries worth a look
Web Application
C2s
Libraries worth a look
GUI
C2s
Libraries worth a look
In a real world scenario, the server software should be tested for it's ability to handle requests in a timely fashion. For example, ThePrimeagen did a video called Go is faster than Rust??! Go vs Rust vs TypeScript Servers (as a scientist) which tested the loads between Go, Rust, and TypeScript. We would recommend a similar test for the teamserver.
For example, at around 5:34, Primeagen discusses the experiment in which 4 Linode machines make 800 connections each to a Golang and Rust Server. Each connection plays a game and measures how many active games were played:
Golang was able to handle the loads extremely well, whilst being easy to develop. Rust on the other hand, was up and down whilst being a lot more difficult to actually write. Albeit this server was not a malware-oriented team-server, but the data is something to consider.
With that said, we are going to work with Flask/Python, purely out of ease for this demonstration.
One final note here before we move on to code - outside of what we're discussing, this code is deliberately dependent on insecure behaviours. Maelstrom is just for demonstration purposes. Some of these red herrings will become obvious throughout the next few blog posts.
When it comes to commmuncation, there are a multitude of protocols that should be considered.
First off, Hypertext Transfer Protocol (HTTP). HTTP is kind of perfect for C2 communications because data can be embedded in a the request in. This is something that most, if not all, C2s will support. Below are some locations in the request where data can be embedded:
Uris
Headers
Body
Lets take a look at an example:
In this example from tutorialspoint, a form has been filled and posted to the server. If this was being used to mask C2 communication, an extra parameter could be added to the body:
As well as transferring C2 data effectively, its also great for obfuscating the location of the server with techniques such as Domain Fronting. Although, this is getting more difficult. This spawns projects such as C3. We'll look at masking traffic further in later blogs.
To be honest, explaining HTTP specifically for C2 communication could be an entire blog in-of-itself. So we will leave HTTP here. But, this is the communication we will be using for our proof-of-concept.
Domain Name System (DNS) is another method that people tend to use for C2 communcations.
This is something that Cobalt Strike natively supports which is explained like so:
Today, the DNS Beacon can download tasks over DNS TXT records, DNS AAAA records, or DNS A records. This payload has the flexibility to change between these data channels while its on target. Use Beacon’s mode command to change the current Beacon’s data channel. mode dns is the DNS A record data channel. mode dns6 is the DNS AAAA record channel. And, mode dns-txt is the DNS TXT record data channel. The default is the DNS TXT record data channel.
The only issue with this is that its substantially slower than HTTP but provides better protections as these channels are less inspected than HTTP.
Server Message Block (SMB) is another protocol used for C2 communications. However, its not used for the traditional data transfer. This is typically used for peer-to-peer, or implant-to-implant. In order for this to work, one of the beacons needs to operate as a "server" which communicates out across HTTP/DNS to the teamserver. This allows subsequent implants to communicate with the implant in server-mode. This is particularly useful for getting around secure networks with limited ingress/egress as well as daisy chaining implants together for less outbound traffic.
Similarly to DNS, this is something we arent implementing but we recommended this protocol to be implemented, whereas DNS is somewhat optional and dependant on the usecase for the end-user.
Maelstrom is quick and easy, and we don't need to consider the user experience since it'll be us, and briefly at that. So to make our lives easier, we'll show our workings with Flask, python-prompt-toolkit and Python 3.9. The advantage of Python is that, provided you can work out how to get Windows to stop installing python from the Store, it's platform agnostic and quick to develop in. Given that python is basically executable pseudo-code this should also help illustrate our points.
Now that we have overloaded you with options, let's look into how Maelstrom is going to work.
First up, the server. This is the component that the implant will respond to. So, when maelstrom.py
is run in the terminal, the following is printed:
There are two options:
Run the server
Generate a payload
However, we are only implementing the server:
Lets step through run_maelstrom()
...
The very first thing it does is get the available commands:
Where get_commands()
is:
This function parses the return of available_commands
and builds out a dictionary of the required info.
In Maelstrom these commands are somewhat hardcoded, limiting it's extensibility:
Note the Command
object, it is a dataclass:
This method does not provide any flexibility. In the case of Vulpes, the Factory Design Pattern is used. This enables the server to pragmatically identify the commands, the info required, pre/post actions and so on. The reason we are pointing this out is because to make a C2 useful, it needs to be extendable. Maelstrom does not support that.
So, get_commands()
again:
The reason it rebuilds the dictionary in this way is because Maelstrom makes use of Prompt Toolkit and the commands then auto-populate the Nested Dictionary information.
Running the server and typing help
shows this:
These match up to the dataclasses seen earlier:
Because this is a lazy implementation of the server, the next thing to happen is a big while loop which gives a prompt and parses commands:
The ONLY functionality maelstrom provides is the ability to create listeners and show connected implants. However, lets discuss post-exploitation briefly (we discuss this much later on in the series in a bit more depth).
For the sake of the discussion, our example command will be whoami
. If you were to go onto the C2 Matrix, and pick a C2 that isn't .NET, you'd have a 90% chance of getting this:
Which will subsequently produce the following process tree:
Meaning every time this is ran, it spawns to processes and produces T1059/003: Command and Scripting Interpreter: Windows Command Shell. Doing so would either require the code to use system() or CreateProcessA() with cmd.exe
as the target, and /c whoami
as the argument. Overall, producing a bunch of IOCs for such a simple command. When in reality, whoami
only calls GetUserNameA:
So.. why not use it:
Moreover, the first thing 99% of people do when they receive an implant is an instinctive ls
or whoami
. By reimplementing commands, the potential IOCs are drastically reduced. This narrative spawned projects like CS-Situational-Awareness-BOF, CS-Remote-OPs-BOF, and C2-Tool-Collection.
Want to know the hostname? then GetComputerNameA(). Want to get the current process directory? Get it from the PEB:
These are obviously simple commands, but lets say kerberoast is an internal command. Then these things are possible to write in C, after all, Windows is written in C: c2-tool-collection/BOF/Kerberoast.
More on this in a later part. For now, listeners!
Lets take a look at how we implemented listener creation. If the user input start on the CLI starts with listener
then the handle_listener()
function is called:
First thing that happens is that it tries to find a space to assume that the input is correct:
If that is fine, it then split's on the space and counts the length. If it's not 5, then the command is wrong:
Note the usage
here:
A valid input would be:
As the first index, 0, should be the command name, it attempts to validate it:
Like so:
The dataclass returned is never used, but it's just checked to see if the command is valid. Now that it is, the Listener dataclass can be used to pass the information into the class, like so:
Where the dataclass is:
This is then appended to a global list:
This is obviously not persistent. Realistically, this should be a database which is checked and restored if the server is restarted:
Next, the dataclass is used as an argument for a thread:
This creates a thread on the start_server()
function which registers the endpoint specified with the flask app using add_url_rule:
This completes the listener creation:
This can be validated by running:
Which produces:
So, how does the implant communicate with this?
As we've said, we aren't implementing anything to make this C2 useful; therefore we are leaving this component out. But, we still want to discuss it. The help menu for Maelstrom has a positional argument meaning that actual server.py file needs to be ran again with the payload
switch. This doesnt really comply with best practices for user experience, and a better work flow would be to do it from within the C2. For example, as a Web App, it's easy. Use something like a modal, like we did in Vulpes. Or, popups like Cobalt Strike. If it's a CLI, then maybe some form of interactive prompt inside the C2 which works as a command as we showed previously.
When it comes to automating the payload generation, we found the easiest way was to progrmatically create a Makefile and copy/paste all the source code files into /tmp
, execute the Makefile
with something like:
Which will compile with MinGW-w64, then move the compiled implant and remove the temporary directory.
More on what to consider for payload generation in the next blog post.
At this point in the post, we have a up and running team server which can create a listener. Now we need to look at what the endpoints actually do with incoming traffic to:
Determine if it's an implant
If it is, allow the implant to communicate
Handling HTTP Requests
Earlier in the post, we showed the listener being started as a new app, like so:
In a typical Flask fashion, app
is defined globally:
And one function on /
is created to handle all requests. We could start an endpoint for each type, but we found it easier to listen at the root, and filter down with logic. This allows us to log ALL requests, whilst meticulously filtering requests by their expected values.
The route:
Lets run through what happens inside this route. In the previous section, we created an endpoint on /a
, so we will use that example.
First thing that happens here is that a bunch of information is pulled from the request object, things that we may or may not use:
Note the request.headers.get
call. This is one of the hard-coded values sent between the client and server.
This information is then passed to the is_valid_listener
function:
This just ensures that the dataclass information matches up to the request by ensuring that the URI, Server Address, Server Port, and Header are all correct:
This is an extremely primitive example of ensuring the request is as intended...
If it is valid, then the next chunk of code determines which type of request this is:
In this case, it only has two options:
init
stage
In a real world example, init
and stage
are terrible endpoints. But this allows us to illustrate out point. This is then followed up with:
If it's of type init, then initialise_implant()
simply parses the info and prints a new connection, thats all:
There is poor error handling, no database tracking, nothing. Realistically, here there is a requirement for logging. No matter the request, it should be logged into a uniform format. Typically, what we found best with our projects is to use a file where each line is a JSON Object:
This is then easy to process by either:
Looping over every line and loading each line as a JSON Object
Or a POST
:
Depending on the project, we will use one of these methods.
Moving onto the staging endpoint, then a hardcoded path is returned:
Again, this is intentional. We don't care for making this fancy, it's a proof-of-concept. In the next section, we go over some methods of masking the DLL, and this is where that logic would go. But, for Maelstrom, we just return the bytes of a hard-coded DLL Path.
From here, multiple switches could be implemented to do specific jobs like get a new task, or return information about a task ran. Each of these endpoints could be configured differently, giving flexibility within the communcation.
To recap, in this section we've shown two hard-coded endpoints, init and stage, and then filter down the requests based on expected information. To illustrate our point, we've just used X-Maelstrom
as the header. But realistically this could just be genuine headers like ASPSESSIONID
. All we are doing is making sure that only very specific requests can communicate with the teamserver.
Inspecting the requests
Using WireShark, lets inspect our HTTP Requests. The following filter will find any HTTP requests between the C2 and the Dev Machine:
Remember, the only thing this does is stage. There are no additional requests:
The above shows the /a?stage
URI being requested, along with our poorly configured headers:
And as this is a stage, if we follow it in TCP Stream...
Using Snort as an example, we can match that header and immediately identify Maelstrom with the following rule:
Which looks something like this:
There are tons of improvements that could be made here, and it could get complicated very quickly with all sorts of masquerading and cryptography. However, it doesn't need to be that complicated (some people may need it to be, though). As long as the HTTP Requests are fully customizable by the end-user, then the request can be crafted however it needs to be. For those complicated engagements with every log being ingested, then cool, the request can be worked to fix that. Don't need anything? Cool, then just request /stage
.
This is one thing Cobalt Strike does do well with their Malleable Command and Control, specifically the HTTP Staging where requests can be configured however:
What is particularly good is their chained data manipulation:
In the above the GIF89a
type is used which uses the magic bytes of a GIF and adds it to the end. Alternatively, this could be done by appending PNG magic bytes.
An example implementation with a PDF, assume the following data as the magic bytes for a PDF:
Converting it to bytes on CyberChef:
Then, with the following command, prepend those bytes to an EXE:
Finally, running file on both:
Now all that needs to happen is the first 8 bytes are removed from the buffer...
From a user experience standpoint, these behaviours can easily be extended by the server. These steps should be invisible to the operator, but applying them should be straightforward. Older C2s were more reliant on users customising this behaviour directly, by modifying the implants and droppers, if it was changeable at all. Most C2s could have the endpoint changeable, but even then this was less common.
Supporting these behaviours with the extensible approach to implant development we outlined in the previous post using simple software development practices massively improves the opsec of the implant. Buzz-words like polymorphic code and military grade encryption aren't helpful when discussing these steps, since so many contempory C2s still rely on these opsec behaviours being hard-coded. This results in red teams who have a C2 with one opsec implant and a C2, rather than a C2 with a library of modular steps capable of generating infinite opsec implants.
For example, the following behaviours could be added to a C2's payload generation to further obfuscate implant data:
Masquerading data as legitimate file formats or data-types (for a rough example, see our project bluffy)
Masquerading data as legitimate strings, such as IPv6 addresses (such as in ORCA's hellshell)
Hardcoded embedded environment keying where keys are generated on a payload by payload basis (coming next week!)
Obfuscation steps such as XOR and encryption
Obviously, these are completely rudimentary but it proves the point. Blue teams know that big blobs of high entropy data going back and forth between never-before-seen Azure and AWS instances are a lil' sus. As live traffic inspection becomes more widespread, we anticipate that C2s will shift to a more modular approach where implants can switch their data obfuscation and encryption on the fly, reducing the ability of a blue team to identify an implant.
This post has been a little lighter on the blue team side of affairs for a simple reason - this is really difficult to identify already at a network level, in the world of hardcoded implant opsec, without including any other factors like destinations, EDR flags, or other heuristics. Identifying a single malicious request every 20 minutes to 2 hours, travelling over an established channel such as Microsoft Teams is like trying to find a needle in a haystack made of needles. Only by paying close attention to endpoint telemetry, deep-packet inspection of all network traffic to look for anything specific when a process is flagged as suspicious can these be identified. Ensuring that your EDR and network devices are up-to-date with domain reputation and threat intelligence feeds, and that these are being ingested appropriately, may go some way to closing this gap.
To recap:
Maelstrom's communcation's extremely simple. With that said, it's quite unlikely that a tool will spot these unless specifically told to. A good analyst may spot them, but this again is unlikely. Remember, we made it this way on purpose.
Ensure that the requests are fully malleable for end-users. Even though we gave an example file magic bytes, the data could also be embedded into something else. Like a base64 blob inside JavaScript and put in the body of the request:
This post has been more of a reference post, depending on the type of user experience, there are tons of different methods. For Maelstrom, it's only going to be a basic python CLI with the Prompt Toolkit.
Now that this has all been set up, we can finally get into coding the implant in the next post!
We mentioned at the start that we wouldn't be addressing channels, redirectors, or other parts of the red team infrastructure. As some honourable mentions, the following three blogs are a great starting point:
CobaltBus: Cobalt Strike External C2 Integration With Azure Servicebus, C2 traffic via Azure Servicebus
AzureC2Relay: AzureC2Relay is an Azure Function that validates and relays Cobalt Strike beacon traffic by verifying the incoming requests based on a Cobalt Strike Malleable C2 profile.
C3: C3 (Custom Command and Control) is a tool that allows Red Teams to rapidly develop and utilise esoteric command and control channels (C2).