Protecting the Heap: Encryption & Hooks
Looking into some heap encryption shenanigans...
Introduction
As Endpoint Protection gets better, and more of the community build tooling to detect malware in memory, the more evasive implants must become. In this blog I want to look at encrypting the "Heap". More on that in a moment, but for now, the Heap will hold data for a lot longer than the "Stack". The stack will clear as a a function returns. A typical example of Heap usage is a Command and Control (C2) Frameworks configuration; after all, the data to communicate must live somewhere. A generic example of this would be connection strings, and that is what we will use as sample data for this test.
Cobalt Strike introduced Sleep Mask in Cobalt Strike 4.4:
The sleep_mask is Cobalt Strike’s ability to mask and unmask itself in memory. The goal of this feature is to push memory detections away from content-based signatures. Although sleep_mask can encode Beacon’s data and code (if the agent is in RWX memory), the static stub is still a target for in-memory hunting based on content.
And then updated in 4.5 and is a recommended read... This was then popularised by MDSec again back on July 30th 2021 and is core functionality of their proprietary C2, Nighthawk.
Off the back of this, waldo-irc put together Hook Heaps and Live Free and LockdExeDemo in which he replicates this functionality to further protect a Cobalt Strike Beacon. Similarly, SolomonSklash then produced SleepyCrypt: Encrypting a running PE image while it sleeps which was aimed towards encrypting the sections of a PE in memory.
I've been writing Vulpes since around 2019 and is becoming more stable and the modules I want are almost all there, so now I'm looking into some more evasive behaviour, specifically when the implant is not operational, hence this blog post!
WTF is a Heap
I don't want to turn this into a Computer Science class, so I won't discuss this TOO much. So, what is the heap?
Well, its considered dynamic storage. Meaning it can house large pools of memory which aren't allocated in a contiguous order. Furthermore, the Heap is not managed, and to use it, it must be allocated specifically with functions such as malloc
, and then freeing with free
. If this is not done, then a memory leak can occur. This is where it differs from the stack. If something is allocated on the stack, it is cleared when he calling routine returns.
With this in mind, imagine if a configuration for an implant was a big struct. The config would be required quite often, so its likely going to be stored on the heap. This is because if the configuration was completely done at runtime, then the configuration object would be constantly created and deleted. Meaning, if settings are applied at runtime, then they will constantly need re-updating. Thus, the heap is better for this.
Two great references for this:
Identifying the Heaps
Microsoft have documented this quite well:
Stringing these two posts together got me 99% of the way there, so lets look at it.
First off, CreateToolhelp32Snapshot is used with the TH32CS_SNAPHEAPLIST
, 0x00000001
, value:
All the snapshot values:
Value | Meaning |
---|---|
TH32CS_INHERIT 0x80000000 | Indicates that the snapshot handle is to be inheritable. |
TH32CS_SNAPALL | Includes all processes and threads in the system, plus the heaps and modules of the process specified in th32ProcessID. Equivalent to specifying the TH32CS_SNAPHEAPLIST, TH32CS_SNAPMODULE, TH32CS_SNAPPROCESS, and TH32CS_SNAPTHREAD values combined using an OR operation ('|'). |
TH32CS_SNAPHEAPLIST 0x00000001 | Includes all heaps of the process specified in th32ProcessID in the snapshot. To enumerate the heaps, see Heap32ListFirst. |
TH32CS_SNAPMODULE 0x00000008 | Includes all modules of the process specified in th32ProcessID in the snapshot. To enumerate the modules, see Module32First. If the function fails with ERROR_BAD_LENGTH, retry the function until it succeeds.64-bit Windows: Using this flag in a 32-bit process includes the 32-bit modules of the process specified in th32ProcessID, while using it in a 64-bit process includes the 64-bit modules. To include the 32-bit modules of the process specified in th32ProcessID from a 64-bit process, use the TH32CS_SNAPMODULE32 flag. |
TH32CS_SNAPMODULE32 0x00000010 | Includes all 32-bit modules of the process specified in th32ProcessID in the snapshot when called from a 64-bit process. This flag can be combined with TH32CS_SNAPMODULE or TH32CS_SNAPALL. If the function fails with ERROR_BAD_LENGTH, retry the function until it succeeds. |
TH32CS_SNAPPROCESS 0x00000002 | Includes all processes in the system in the snapshot. To enumerate the processes, see Process32First. |
TH32CS_SNAPTHREAD 0x00000004 | Includes all threads in the system in the snapshot. To enumerate the threads, see Thread32First.To identify the threads that belong to a specific process, compare its process identifier to the th32OwnerProcessID member of the THREADENTRY32 structure when enumerating the threads. |
In typical fashion with the snapshotting functions, the setup:
At this point, the snapshot is ready to parse. But before that, something needs to actually be put on the heap. This can be done with HeapAlloc , GetProcessHeap and memcpy
:
Next thing is to grab the first heap entry with Heap32ListFirst:
Now loop over with Heap32First and Heap32Next:
Working with the heap
Before operating on the heap, it must be locked with HeapLock:
If the function succeeds, the calling thread owns the heap lock. Only the calling thread will be able to allocate or release memory from the heap. The execution of any other thread of the calling process will be blocked if that thread attempts to allocate or release memory from the heap. Such threads will remain blocked until the thread that owns the heap lock calls the HeapUnlock function.
By doing this, the heap now belongs to the calling thread, meaning no other threads will be accessing the heap whilst we encrypt it:
Walk and Encrypt
Conveniently, HeapWalk allows just that:
This is shown in Enumerating a Heap where a lot of prints are done, we don't care about that here.
When this returns, a PROCESS_HEAP_ENTRY struct will give us allows to all the following information:
cbData
is the size, and lpData
is the actual data on the heap. There is all kinds of information here, and the only check we are going to make is that wFlags
is PROCESS_HEAP_ENTRY_BUSY
, 0x0004
:
The heap element is an allocated block.
If PROCESS_HEAP_ENTRY_MOVEABLE is also specified, the Block structure becomes valid. The hMem member of the Block structure contains a handle to the allocated, moveable memory block.
This just means that memory is allocated here.
In terms of encrypting, a simple XOR will be used:
Before actually doing the encryption, lets set a breakpoint on the Xor
call:
Note, there are two strings. Because the string is just stored in the PE, one of those strings will be from one of the data sections, this is not a focus of the blog for now.
Calling the encryption function:
Rerunning the code:
Now there is only one, remember the other is in the PE data sections.
Before returning, the heap is unlocked with HeapUnlock, releasing ownership:
Alternatively, everything in the heap can be encrypted, but this could end up giving unexpected results:
In order to hook heap allocations effectively, three fnctions are required:
As their names imply, they are responsible for allocating, reallocating, and freeing space on the heap. For hooking, and ease, minhook will be used.
Setting up the hooks
First thing required is to create the three functions as types:
Then, three functions that will be used to replace the functionality once hooked:
Once that is done, minhook needs to be initialised:
Then place the hooks with MH_CreateHookApi
:
One final thing is to create three variables to store the original address:
So now, any call to RtlAllocateHeap
will be replaced with RtlAllocateHeapHook
and the original RtlAllocateHeap
will be stored in pRtlAllocateHeap
. Then, enable the hooks:
Capturing the heap data
In order to do this, a new struct is created:
This will hold:
The handle to the heap
The space allocated by the heap
The flags used to allocate the heap
As this is a POC, a global vector will be used to hold them:
Updating the RtlAllocateHeapHook
function:
The original RtlAllocateHeap
is used to allocate space and then stored in the struct, this same value is returned from the hook so the function operates as expected. Once that is done, it is added to the vector. However, std::vector
is not thread safe, and that push_back
will cause a crash.
That part is left as a task for the reader, look at:
These two should sort it out, good luck!
Conclusion
In this short blog, I wanted to take a look at working with the heap from both an allocation and hooking perspective.
Last updated