Catergorising DLL Exports with an LLM

Wanted an excuse to use an LLM to automate some processing - this is the product of that!

Introduction

In this blog, we set out to achieve two objectives that will be support some future projects. Firstly, we will identify common Windows Dynamic Link Libraries (DLLs) and then categorise each exported function they contain via a Large Language Model (LLM). Though our objective is to categorise and explore some preliminary data, we also just find this to be interesting.

Malapi.io came in clutch when we were looking at grouping malware families based on groupings of imports. Our goal was to try and identify malware purely on these groupings, for example:

The above shows a list of functions grouped by a technique such as enumeration, injection, and evasion. Looking at CreateToolhelp32Snapshot, this function is used to snapshot threads and processes to allow developers to loop over each one. Conversely, CreateRemoteThread does just that, it creates a remote thread – ergo, injection.

To categorise hundreds, and maybe thousands, of functions across some of the most common DLLs would take an awfully long time. So, to do this in a reasonable amount of time, several Large Language Models (LLMs) will be tested for the best response, and then one will be chosen to carry out the full categorisation.

To quickly define an LLM, before continuing down the rabbit hole, A Large Language Model (LLM) is a system designed to understand, generate, and manipulate human language by learning from vast amounts of text data (Tom, et al., 2020).

Data Gathering

To begin, an initial dataset is required. This was achieved by simply taking all the DLLs within c:\windows\system32 and using pefile to extract the exported functions.

Creating a JSON object of this data is as simple as DLL JSON key, list of functions as the value.

Looking at the top 20 DLLs by export count, we can quickly see dui70.dll is exporting 4321 functions, and Kernel32.dll is exporting 1671.

We have been working on a dataset for goodware, malware, and “winware” as a continuation of our Maelstrom series, which we presented at SteelCon 2024. The goodware and malware datasets were borrowed for this project, and the available datasets comprise the following:

From all these samples, a list of 12 DLLs was obtained via average occurrence. That said, KERNEL32.DLL will be in the list due to its mandatory requirements at the Operating System level (note: tried to find a reference for this, but it I can’t – just trust me). Additional DLLs such as WINHTTP.DLL and WININET.DLL were added to this manually as they are the functions which provide HTTP communication and is often how malware egresses.

  1. ADVAPI32.DLL

  2. COMCTL32.DLL

  3. GDI32.DLL

  4. KERNEL32.DLL

  5. MSVCRT.DLL

  6. OLE32.DLL

  7. OLEAUT32.DLL

  8. SHELL32.DLL

  9. SHLWAPI.DLL

  10. USER32.DLL

  11. WINHTTP.DLL

  12. WININET.DLL

With this list of 12 DLLs, the export parsing was redone for each of these DLLs.

Enriching

For those who have had the displeasure of working through the Microsoft Win32 API, you will be familiar with pages such as this:

As Microsoft publish this API information, its also viewable on GitHub as markdown: https://github.com/MicrosoftDocs/sdk-api/blob/docs/sdk-api-src/content/memoryapi/nf-memoryapi-virtualalloc.md

An offline copy of this documentation is now cloneable, which means it can be easily parsed without having to worry about scraping:

Armed with offline documentation and a JSON file of DLLs and exports, it was time to begin categorisation. This was done manually via the UI with both Claude and ChatGPT, with the following prompt:

The prompt (OpenAI, 2024) is essentially telling the LLMs that they are malware analysts tasked with reviewing exported functions to categorise into a set of predefined categories. It also gives the LLM a structure to respond with, as well as the documentation from the markdown files. The response structure is an important part to this, so the LLM is judged on the quality of response, as well as the structure of the response.

The list of categories:

  • File Operations

  • Network Operations

  • Process and Thread Management

  • Memory Management

  • Registry Operations

  • System Information and Control

  • DLL Injection and Manipulation

  • Cryptographic Operations

  • Hooking and Interception

Prompting ChatGPT:

Prompting Claude:

Most notably from these responses, Claude was insisting on adding additional data after the required structure, whereas ChatGPT consistently respected the output variation. This, along with cost, proved that ChatGPT was the LLM to go for.

ChatGPT Categorisation

A python script was written to automate the parsing of function names and matching them up to the corresponding markdown. This then populated the predefined prompt and targeted gpt-4o-mini. Each response was recorded, and here is an example for CreateThread:

Title: CreateThread
Description: Creates a thread to execute within the virtual address space of the calling process.
Category: Process and Thread Management

From this, it was then parsed into JSON:

{
    "title": "CreateThread",
    "description": "Creates a thread to execute within the virtual address space of the calling process.",
    "category": "Process and Thread Management"
}

Additionally, any failed lookups were saved – these were:

  • SetParent

  • GetAclInformation

  • ImageList_LoadImageW

  • remove

  • SHStrDupA

Once all this data has been processed, this is how the final CSV looks:

Title

Description

Category

ADVAPI32.DLL!RegEnumKeyW

Enumerates subkeys of an open registry key- indicating direct registry manipulation.

Registry Operations

GDI32FULL.DLL!UpdateColors

Updates the client area of a device context by remapping current colours to the logical palette.

System Information and Control

KERNEL32.DLL!TerminateJobObject

This function terminates all processes associated with a job- managing processes and threads.

Process and Thread Management

RPCRT4.DLL!IUnknown_AddRef_Proxy

Implements the AddRef method for interface proxies- managing reference counting in COM.

Process and Thread Management

RPCRT4.DLL!NdrServerCall2

Facilitates remote procedure calls (RPC) but is not user-invoked.

Network Operations

SECHOST.DLL!CredDeleteA

Deletes a credential from the user's credential set- modifying stored authentication data.

Registry Operations

SHLWAPI.DLL!StrCSpnW

Searches a string for specific characters- providing their index. Involves string manipulation rather than file or network processes.

Memory Management

Data Summary

with all the exports parsed from the System32 DLLs, we are left with a graph that looks like this:

Most notably a majority of the imports are filed under “System Information and Control” at 1744. Whereas “DLL Injection and Manipulation” is only at 150.

Machine Learning

One method of turning this dataset into something appropriate for Machine Learning is to count each category for the imported samples across each sample:

This gives you a pretty clean numerical dataset with a boolean value to work towards and is something that will be explored in the future. However, for now, I will leave this as is.

Conclusion

In this blog, I wanted to expand on the malapi.io data whilst finding an excuse to use an LLM. The category names do not necessarily align with malapi, but they are what I felt was appropriate. The gist of the full data can be found here:

https://gist.github.com/mez-0/833314d8e920a17aa3ca703eabbfa4a5

References

OpenAI. (2024). https://platform.openai.com/docs/guides/prompt-engineering. Retrieved from OpenAI: https://platform.openai.com/docs/guides/prompt-engineering

Tom, B. B., Mann, B., Ryder, N., Subbiah, M., Kaplan, J., Dhariwal, P., . . . Amodei, D. (2020). Language Models are Few-Shot Learners. Retrieved from https://arxiv.org/pdf/2005.14165

Last updated