Implementing SysCall Detection into Fennec

Adding SysCall Detection into our PoC EDR!

Introduction

First of all, I have a EDR project called PreEmpt. However, michaeljranaldo and I started preemptdev which is our duo on research and development, and we are currently working through a C2 Development Series over at pre.empt.dev. With that being said, a self-named project is a bit odd, so the EDR project has now been renamed to Fennec.

Below is a naïve architectural and component overview of Fennec:

Essentially, the Windows host has:

  • An ETWTi Agent (can be seen here)

  • Time and event triggered process/memory sweeps (can be seen here)

  • A Userland DLL which is loaded into processes to hook common offensive WinAPI Calls and is the target of this blog

  • An orchestration process which is used to display toasts and send information to Elastic

With the disclaimer out of the way, lets look at the topic of the blog; Over the passed few years, syscalls have became increasing more common, and in some scenarios, a requirement. This is something michaeljranaldo and I discussed in Maelstrom: EDR Kernel Callbacks, Hooks, and Call Stacks, specifically the Bypassing Userland Hooks section.

As Fennec is able identify attacks such as process injection via ETWTi and manual hooks, it only makes sense to expand this into syscalls as well.

I tend to write the introduction to the blog at the very end, and whilst looking for references to put into the blog, I found that someone has already implemented a very similar project back in February 2021. winternl released Detecting Manual Syscalls from User Mode which makes use of the same method I went for, but with a much cleaner instrumentation callback. As well as providing an in-depth blog, winternl provides great resources that expand on their ideas, highly recommend that blog if this topic is of interest to you. Their code can be seen in syscalls-detect.

As I go through my implementation, I will point out the improvements that winternl's implementation allowed me to make.

The Detection Logic

Typically, when a WinAPI called is used, the call will make its way from user-land into kernel-land. Again, this is something we discussed in User-land and Kernel-land from the Maelstrom series.

So when a call like VirtuaAlloc is used, it will then go to NtAllocateVirtualMemory in NTDLL. NTDLL will then do what it needs to do get it into kernel land (read the referenced post for more information).

This image sums it up:

When tools like SysWhispers3 or Tartarus Gate are used, the syscall, which is expected from NTDLL, comes from the actual executable image. So instead of it going:

Loader.exe -> VirtualAlloc -> NTDLL -> NtAllocateVirtualMemory

It now goes:

Loader.exe -> NtAllocateVirtualMemory

In order to detect it, the idea should be:

If the syscall instruction comes from the executable image and not NTDLL, then its probably suspicious?

And that's the narrative this blog will focus on.

Setting up the Process Instrumentation

Again, this is something discussed in Maelstrom: EDR Kernel Callbacks, Hooks, and Call Stacks. In REcon 2015 - Hooking Nirvana (Alex Ionescu), Alex Ionescu demonstrates Process Instrumentation with NtSetProcessInformation and specifically with the ProcessInstrumentationCallback flag. Highly recommend that talk. Not just for this particular post, but in general.

Detecting Manual Syscalls from User Mode discusses the internals of NtSetProcessInformation very clearly, specifically this quote:

Each time the kernel encounters a scenario in which it returns to user mode code, it will check if the KPROCESS!InstrumentationCallback member is not NULL. If it is not NULL and it points to valid memory, the kernel will swap out the RIP on the trap frame and replace it with the value stored in the InstrumentationCallback field.

Lets get into the code and explain things as we go. As this is a DLL, here is DLLMain:

BOOL WINAPI DllMain(HINSTANCE hinstDLL, DWORD fdwReason, LPVOID lpReserved)
{
    switch (fdwReason)
    {
    case DLL_PROCESS_ATTACH:
    {
        SymSetOptions(SYMOPT_UNDNAME);

        SymInitialize(
            GetCurrentProcess(),
            NULL,
            TRUE
        );

        SetInstrumentationCallback();
    }
    break;
    case DLL_THREAD_ATTACH:
        break;

    case DLL_THREAD_DETACH:
        break;

    case DLL_PROCESS_DETACH:
        break;
    }

    return TRUE;
}

SymSetOptions sets the option mask for how the symbols are to be loaded. Here, SYMOPT_UNDNAME is:

This symbol option causes public symbol names to be undecorated when they are displayed, and causes searches for symbol names to ignore symbol decorations. Private symbol names are never decorated, regardless of whether this option is active. For information on symbol name decorations, see Public and Private Symbols.

The next thing that happens is that the symbols are initialised with SymInitialize. The definition:

BOOL IMAGEAPI SymInitialize(
  [in]           HANDLE hProcess,
  [in, optional] PCSTR  UserSearchPath,
  [in]           BOOL   fInvadeProcess
);

With fInvadeProcess set to TRUE, SymLoadModuleEx will be used which loads the symbol table for the specified module.

Now that the symbols are ready to go, we enter SetInstrumentationCallback() which is where NtSetInformationProcess is called.

The first thing to occur in this function is the declaration of PROCESS_INSTRUMENTATION_CALLBACK_INFORMATION:

PROCESS_INSTRUMENTATION_CALLBACK_INFORMATION Callback    = { 0 };

This is then filled out the way Alex Ionescu shows in Hooking Nirvana:

Callback.Version = 0;
Callback.Reserved = 0;
Callback.Callback = InstrumentationCallbackThunk;

The Version represents the architecture:

  • 0: x64

  • 1: x86

The interesting part is the Callback member. Originally, I had some very complicated assembly which just looked unstable. This is the first improvement winternl's work provided. And in homage, I've kept the function naming the same for full kudos. The assembly:

;;; https://github.com/jackullrich/syscall-detect/blob/master/Thunk.asm ;;;
include ksamd64.inc

extern InstrumentationCallback:proc
EXTERNDEF __imp_RtlCaptureContext:QWORD

.code

InstrumentationCallbackThunk proc
                mov     gs:[2e0h], rsp            ; Win10 TEB InstrumentationCallbackPreviousSp
                mov     gs:[2d8h], r10            ; Win10 TEB InstrumentationCallbackPreviousPc
                mov     r10, rcx                  ; Save original RCX
                sub     rsp, 4d0h                 ; Alloc stack space for CONTEXT structure
                and     rsp, -10h                 ; RSP must be 16 byte aligned before calls
                mov     rcx, rsp
                call    __imp_RtlCaptureContext   ; Save the current register state. RtlCaptureContext does not require shadow space
                sub     rsp, 20h                  ; Shadow space
                call    InstrumentationCallback   ; Call main instrumentation routine
InstrumentationCallbackThunk endp

end

It makes sense in hindsight, but I never knew that WinAPI calls can be called from Assembly:

EXTERNDEF __imp_RtlCaptureContext:QWORD

This is perfect because instead of passing in tons of registers into the extern function, CONTEXT can be used to pass them all in within a clean structure.

As this callback is entirely winternl's, I won't explain their code line-by-line as I can't explain it as well as them: Detecting Manual Syscalls from User Mode

Just know that we are grabbing InstrumentationCallbackPreviousSp, InstrumentationCallbackPreviousPc and allocating space so that RtlCaptureContext can be called, and the CONTEXT retrieved.

I haven't statically linked to NTDLL, which I probably should, so I just resolve the call:

typedef NTSTATUS(NTAPI* _NtSetInformationProcess)
(
    _In_ HANDLE ProcessHandle,
    _In_ PROCESS_INFORMATION_CLASS ProcessInformationClass,
    _In_reads_bytes_(ProcessInformationLength) PVOID ProcessInformation,
    _In_ ULONG ProcessInformationLength
);


_NtSetInformationProcess pNtSetInformationProcess = reinterpret_cast<_NtSetInformationProcess>(GetProcAddress(hNtdll, "NtSetInformationProcess"));

Finally, call NtSetInformationProcess:

Status = pNtSetInformationProcess(
    hProcess,
    (PROCESS_INFORMATION_CLASS)ProcessInstrumentationCallback,
    &Callback,
    sizeof(Callback)
);

At this point, the callback is set and InstrumentationCallbackDisabled in KPROCESS will be set to FALSE.

Now when an API is called, the assembly mentioned earlier will be triggered. Within that is a call to a function which will do all the parsing:

call    InstrumentationCallback

Full function to set up the callback:

BOOL SetInstrumentationCallback() {
    PROCESS_INSTRUMENTATION_CALLBACK_INFORMATION Callback   = { 0 };
    HANDLE hProcess                                         = GetCurrentProcess();
    NTSTATUS Status                                         = { 0 };

    Callback.Version = 0;
    Callback.Reserved = 0;
    Callback.Callback = InstrumentationCallbackThunk;

    HMODULE hNtdll = GetModuleHandleA("ntdll");
    if (hNtdll == nullptr)
    {
        return FALSE;
    }

    _NtSetInformationProcess pNtSetInformationProcess = reinterpret_cast<_NtSetInformationProcess>(GetProcAddress(hNtdll, "NtSetInformationProcess"));

    if (pNtSetInformationProcess == nullptr)
    {
        return FALSE;
    }

    Status = pNtSetInformationProcess(
        hProcess,
        (PROCESS_INFORMATION_CLASS)ProcessInstrumentationCallback,
        &Callback,
        sizeof(Callback)
    );

    if (NT_SUCCESS(Status))
    {
        return TRUE;
    }
    else
    {
        return FALSE;
    }
}

The Instrumentation Hook

This is the function responsible for parsing the CONTEXT, the function definition:

VOID InstrumentationCallback(PCONTEXT Context);

In order to identify the origins of the information gathered by the CONTEXT, two functions are used.

SymFromAddr will retrieve symbol information for the specified address:

BOOL IMAGEAPI SymFromAddr(
  [in]            HANDLE       hProcess,
  [in]            DWORD64      Address,
  [out, optional] PDWORD64     Displacement,
  [in, out]       PSYMBOL_INFO Symbol
);

The HANDLE passed in here has a requirement for the SymInitialize to have been called, which we did in DLLMain. The second parameter is the address to be queried, third is the displacement from the beginning of the symbol, and finally, a pointer to SYMBOL_INFO:

typedef struct _SYMBOL_INFO {
  ULONG   SizeOfStruct;
  ULONG   TypeIndex;
  ULONG64 Reserved[2];
  ULONG   Index;
  ULONG   Size;
  ULONG64 ModBase;
  ULONG   Flags;
  ULONG64 Value;
  ULONG64 Address;
  ULONG   Register;
  ULONG   Scope;
  ULONG   Tag;
  ULONG   NameLen;
  ULONG   MaxNameLen;
  CHAR    Name[1];
} SYMBOL_INFO, *PSYMBOL_INFO;

Microsoft have actually documented how to get the symbol from an address: Retrieving Symbol Information by Address.

A note from Microsoft on the SYMBOL_INFO structure:

Because the name is variable in length, you must supply a buffer that is large enough to hold the name stored at the end of the SYMBOL_INFO structure. Also, the MaxNameLen member must be set to the number of bytes reserved for the name. In this example, dwAddress is the address to be mapped to a symbol. The SymFromAddr function will store an offset to the beginning of the symbol to the address in dwDisplacement

This is easily done:

CHAR buffer[sizeof(SYMBOL_INFO) + MAX_SYM_NAME] = { 0 };
PSYMBOL_INFO Symbol = (PSYMBOL_INFO)buffer;
Symbol->SizeOfStruct = sizeof(SYMBOL_INFO);
Symbol->MaxNameLen = MAX_SYM_NAME;

The next function to set up is SymGetModuleInfo64:

BOOL IMAGEAPI SymGetModuleInfo64(
  [in]  HANDLE             hProcess,
  [in]  DWORD64            qwAddr,
  [out] PIMAGEHLP_MODULE64 ModuleInfo
);

This function takes the symbol address, and pulls some information on the module and stores it in IMAGEHLP_MODULE64. To set up this structure:

IMAGEHLP_MODULE64 ModuleInfo = { 0 };

memset(
    &ModuleInfo,
    0,
    sizeof(IMAGEHLP_MODULE64)
);

ModuleInfo.SizeOfStruct = sizeof(IMAGEHLP_MODULE64);

The first function to call is SymFromAddr, that looks something like this:

if (SymFromAddr(hProcess, Context->Rip, &Displacement, Symbol))
{
    
}

If that succeeeds, then SymGetModuleInfo can be used to obtain information on the module:

if (SymFromAddr(hProcess, Context->Rip, &Displacement, Symbol))
{
    if (SymGetModuleInfo64(hProcess, Symbol->Address, &ModuleInfo))
    {
        printf("%s!%s\n", ModuleInfo.ModuleName, Symbol->Name);
    }
}

At this point, if everything succeeds, the module and function name are found.

Using the following loader:

int main()
{
    LoadLibraryA("Fennec.dll");

    OpenProcess(MAXIMUM_ALLOWED, FALSE, EXPLORER);

    return 0;
}

Quite simple, load the DLL and try to get a HANDLE to explorer (just so we can get some activity in the callback):

At the moment, I'm just working from symbols. If something were to happen to the symbols at runtime, then this wouldn't work. Currently, I am not aware of any impairment of symbol loading? If there is, this will likely break. Other solutions would be to work from the memory location and try to determine if it is infact inside NTDLL. But, for a proof-of-concept, I'll use symbols and something awful... string compares:

if (strcmp(ModuleInfo.ModuleName, "ntdll") != 0 || strlen(ModuleInfo.ModuleName) ==  0)
{
    
}

If the module name doesn't match ntdll, or the module name isn't obtained, then we'll do some extra work. Obviously, this is kind of a horrible solution, but its enough to prove a point.

Inside this match, some information is extracted and packed into a structure:

Record.FunctionName = std::string(Symbol->Name);
Record.ModuleName = std::string(ModuleInfo.ModuleName);
Record.ModulePath = std::string(ModuleInfo.ImageName);
Record.ProcessId = GetProcessId(hProcess);
Record.ThreadId = GetThreadId(GetCurrentThread());

Where Record is:

typedef struct MALICIOUS_RECORD_
{
    std::string ModuleName;
    std::string FunctionName;
    std::string ModulePath;
    DWORD ProcessId;
    DWORD ThreadId;
} MALICIOUS_RECORD, *PMALICIOUS_RECORD;

As well as a Boolean being set:

bMalicious = TRUE;

Then the after all the checks are done, see if the bool is true:

if (bMalicious)
{
    j["data"]["function_name"] = Record.FunctionName;
    j["data"]["module_name"] = Record.ModuleName;
    j["data"]["module_path"] = Record.ModulePath;
    j["data"]["process_id"] = std::to_string(Record.ProcessId);
    j["data"]["thread_id"] = std::to_string(Record.ThreadId);
    j["task"] = "Process Instrumentation";
    j["report_time"] = GetCurrentTimeA();
    j["id"] = GenerateGUID();
    j["reason"] = "Detected SysCall Usage";
    bMalicious = FALSE;
    printf("Malicious Activity: %s\n", j.dump().c_str());
}

If it is, build a json object with nlohmann/json.

Instead of just opening a handle, lets open the handle and allocate some space with Tartarus Gate:

int main()
{
    LoadLibraryA("Fennec.dll");

    VX_TABLE Table = {
        0
    };

    if (GetVXTable(&Table) == FALSE) {
        return -1;
    }

    NTSTATUS status = 0;
    LPVOID pAddress = NULL;
    HANDLE hProcess = OpenProcess(PROCESS_ALL_ACCESS, FALSE, EXPLORER);
    SIZE_T bufSz = 1024;

    HellsGate(Table.NtAllocateVirtualMemory.wSystemCall);
    status = HellDescent(hProcess, &pAddress, 0, &bufSz, MEM_COMMIT | MEM_RESERVE, PAGE_EXECUTE_READWRITE);
    printf("[LOADER] => NtAllocateVirtualMemory: 0x%lx\n", status);
    if (status != 0) return -1;

    HellsGate(Table.NtFreeVirtualMemory.wSystemCall);
    status = HellDescent(hProcess, &pAddress, &bufSz, MEM_RELEASE);
    printf("[LOADER] => NtFreeVirtualMemory: 0x%lx\n", status);
    if (status != 0) return -1;

    return 0;
}
  1. Load the DLL

  2. Use OpenProcess to get a handle to the process

  3. NtAllocateVirtualMemory syscall

  4. NtFreeVirtualMemory syscall

This should produce two json blobs:

The above shows the loader printing ERROR_SUCCESS, and then a json blob:

{
  "function_name": "HellDescent",
  "module_name": "Loader",
  "module_path": "C:\\Users\\mez0\\Desktop\\PreEmpt.Interceptor\\x64\\Debug\\Loader.exe",
  "process_id": "21996",
  "thread_id": "7248"
}

With the logs being generated, its time to get them into Kibana.

In order to ingest json into Logstash and then Kibana, the following configuration file is used (as seen in Maelstrom: Building the Team Server):

input {
    beats {
        port => 5044
    }

    tcp {
        port => 5000
    }

    http {
        port => 5043
    }
}

filter {
  json {
    source => "message"
      tag_on_failure => [ "_parsefailure", "parsefailure-critical", "parsefailure-json_codec" ]
      remove_field => [ "message" ]
      skip_on_invalid_json => true
  }
}

output {
    elasticsearch {
        hosts => "elasticsearch:9200"
        user => "logstash_internal"
        password => "${LOGSTASH_INTERNAL_PASSWORD}"
    }
}

Now the DLL just needs to send the data to the orchestration process:

void SendLogToPipeA(std::string log)
{
    HANDLE hPipe;
    DWORD dwWritten;


    hPipe = CreateFileA("\\\\.\\pipe\\Fennec.Orchestrator",
        GENERIC_READ | GENERIC_WRITE,
        0,
        NULL,
        OPEN_EXISTING,
        0,
        NULL);

    if (hPipe != INVALID_HANDLE_VALUE)
    {
        WriteFile(hPipe,
            log.c_str(),
            log.size() + 1,
            &dwWritten,
            NULL);

        CloseHandle(hPipe);
    }
    return;
}

In the orchestration process:

Then checking ELK:

Testing vs. SysWhispers3

As this was built against Tartarus gate, lets see how it works against SysWhispers3.

Going back to this code:

if (SymFromAddr(hProcess, Context->Rip, &Displacement, Symbol))
{
    if (SymGetModuleInfo64(hProcess, Symbol->Address, &ModuleInfo))
    {
        printf("%s!%s\n", ModuleInfo.ModuleName, Symbol->Name);
    }
}

Same thing, as expected:

The json for this log:

{
  "function_name": "NtAllocateVirtualMemory",
  "module_name": "SysWhispers",
  "module_path": "C:\\Users\\mez0\\Desktop\\PreEmpt.Interceptor\\x64\\Debug\\SysWhispers.exe",
  "process_id": "24464",
  "thread_id": "13448"
}

Bypassing the Instrumentation

This is all well and good, but this is quite easy to bypass. We can take the exact same code that was used to install the callback, to NULL it out:

BOOL SetInstrumentationCallback() {
    PROCESS_INSTRUMENTATION_CALLBACK_INFORMATION Callback = { 0 };
    HANDLE hProcess = GetCurrentProcess();
    NTSTATUS Status = { 0 };

    Callback.Version = 0;
    Callback.Reserved = 0;
    Callback.Callback = NULL;

    HMODULE hNtdll = GetModuleHandleA("ntdll");
    if (hNtdll == NULL)
    {
        return FALSE;
    }

    _NtSetInformationProcess pNtSetInformationProcess = (_NtSetInformationProcess)GetProcAddress(hNtdll, "NtSetInformationProcess");

    if (pNtSetInformationProcess == NULL)
    {
        return FALSE;
    }

    Status = pNtSetInformationProcess(
        hProcess,
        (PROCESS_INFORMATION_CLASS)ProcessInstrumentationCallback,
        &Callback,
        sizeof(Callback)
    );

    if (NT_SUCCESS(Status))
    {
        return TRUE;
    }
    else
    {
        return FALSE;
    }
}

Note this part:

Callback.Callback = NULL;

Then its called after the DLL is loaded:

int main()
{
    LoadLibraryA("PreEmpt.Interceptor.dll");

    SetInstrumentationCallback();

    NTSTATUS status = 0;
    LPVOID pAddress = NULL;
    HANDLE hProcess = OpenProcess(PROCESS_ALL_ACCESS, FALSE, EXPLORER);
    SIZE_T bufSz = 1024;

    status = NtAllocateVirtualMemory(hProcess, &pAddress, 0, &bufSz, MEM_COMMIT | MEM_RESERVE, PAGE_EXECUTE_READWRITE);
    printf("[LOADER] => NtAllocateVirtualMemory: 0x%lx\n", status);
    if (status != 0) return -1;

    status = NtFreeVirtualMemory(hProcess, &pAddress, &bufSz, MEM_RELEASE);
    printf("[LOADER] => NtFreeVirtualMemory: 0x%lx\n", status);
    if (status != 0) return -1;

    printf("Look at the modules!\n");
    getchar();

    return 0;
}

This isn't the most expressive gif, but I'll try to explain it:

In this gif, two breakpoints are set in main.c, which is the SysWhispers loader. Then dllmain.cpp has a breakpoint set at the top of the callback function.

The first breakpoint is hit, and the DLL is loaded. Then, the breakpoint inside the callback keep spamming me until I breakout of that and call the function to NULL the callback, then I can step through the code without being taken to the callback function. I hope that makes sense.

Conclusion

If the DLL is loaded from user-land, like this one is then the callback can just be emptied. I've tried this on EDR products which load the DLL from the Kernel, and this does not seem to impact them. If anyone knows anything more on that particular part, I'd love to know.

Overall, though, this blog was a more of a devlog on how my EDR project is trying to handle syscalls. Again, thanks to winternl for producingDetecting Manual Syscalls from User Mode and allowing me to have a much cleaner hook!

Last updated