Unrestricting Android Native Dynamic Library Linking

Introduction

Hacking around Android sometimes requires getting your hands dirty at the native level. And while I was on one such escapade, I discovered that Android has a tendency to make things quite restrictive with what you can do or interact with on the system. Rightfully so, perhaps, as they document on their developer website that changes from Android 7.0 start restricting access to dynamically linking against non-NDK libraries for stability reasons. The exact level of restriction on certain libraries vary between different API levels as they show in the following diagram.

Native dynamic library linking restrictions based on API level

Well, this is troublesome for the things I want to do so I had to investigate further. There weren't many helpful references online for bypassing these restrictions and only this relatively old (2019) Quarkslab post on Android Runtime Restrictions Bypass gave some idea around how it would be possible. It seems quite involved but there is a simpler workaround without tampering with any internal data structures - or so I think. This is a short post on how it's possible to bypass these restrictions and definitely not stealing from Frida's codebase.

Android dlopen

Android dlopen is quite different from the base Linux implemention in that it gets the namespace of the calling function's module and checks it against the namespaces of the target library to open. To do this, the dlopen function gets an additional argument of the return address (rsp) and passes it to void* __loader_dlopen(const char* filename, int flags, const void* caller_addr).

dlopen:
	jmp    qword ptr [rip + 0xc762] 
	push   0x15                     
	jmp    0x136320                 


dlopen:
	mov    rdx, qword ptr [rsp]     
	jmp    0x41a0                     ; symbol stub for: __loader_dlopen


__loader_dlopen:
	jmp    qword ptr [rip + 0x3f8a] 
	push   0x1                      
	jmp    0x4180                   


__dl___loader_dlopen:
	push   rbp                      
	push   r15                      
	push   r14                      
	push   rbx                      
	push   rax
dlopen passing the return address to __loader_dlopen

In do_dlopen, it will use the caller address to get the calling module's namespace.

void* do_dlopen(const char* name, int flags,
                const android_dlextinfo* extinfo,
                const void* caller_addr) {
  std::string trace_prefix = std::string("dlopen: ") + (name == nullptr ? "(nullptr)" : name);
  ScopedTrace trace(trace_prefix.c_str());
  ScopedTrace loading_trace((trace_prefix + " - loading and linking").c_str());
  soinfo* const caller = find_containing_library(caller_addr);
  android_namespace_t* ns = get_caller_namespace(caller);
  
  ...
}
Start of do_dlopen getting the namespace of the calling module

There is a list of soinfo data structures that contain information about each loaded module, some of which are the module's base address, its size in memory, and its primary namespace.

struct soinfo {
#if defined(__work_around_b_24465209__)
 private:
  char old_name_[SOINFO_NAME_LEN];
#endif
 public:
  const ElfW(Phdr)* phdr;
  size_t phnum;
#if defined(__work_around_b_24465209__)
  ElfW(Addr) unused0; // DO NOT USE, maintained for compatibility.
#endif
  ElfW(Addr) base;
  size_t size;

#if defined(__work_around_b_24465209__)
  uint32_t unused1;  // DO NOT USE, maintained for compatibility.
#endif

  ElfW(Dyn)* dynamic;

#if defined(__work_around_b_24465209__)
  uint32_t unused2; // DO NOT USE, maintained for compatibility
  uint32_t unused3; // DO NOT USE, maintained for compatibility
#endif

  soinfo* next;

...

  // version >= 3
  std::vector<std::string> dt_runpath_;
  android_namespace_t* primary_namespace_;
  android_namespace_list_t secondary_namespaces_;

...

The find_containing_library function will enumerate the list of soinfo to find the module's soinfo by checking if the caller address lies within the module's address range. The head of this list is a global variable called solist which can be obtained using the solist_get_head function.

soinfo* find_containing_library(const void* p) {
  // Addresses within a library may be tagged if they point to globals. Untag
  // them so that the bounds check succeeds.
  ElfW(Addr) address = reinterpret_cast<ElfW(Addr)>(untag_address(p));
  for (soinfo* si = solist_get_head(); si != nullptr; si = si->next) {
    if (address < si->base || address - si->base >= si->size) {
      continue;
    }
    ElfW(Addr) vaddr = address - si->load_bias;
    for (size_t i = 0; i != si->phnum; ++i) {
      const ElfW(Phdr)* phdr = &si->phdr[i];
      if (phdr->p_type != PT_LOAD) {
        continue;
      }
      if (vaddr >= phdr->p_vaddr && vaddr < phdr->p_vaddr + phdr->p_memsz) {
        return si;
      }
    }
  }
  return nullptr;
}

Using the soinfo, it will get its primary namespace.

static android_namespace_t* get_caller_namespace(soinfo* caller) {
  return caller != nullptr ? caller->get_primary_namespace() : g_anonymous_namespace;
}

And somewhere down the call hierarchy, if the calling module's namespace is incompatible with the libary's,  then the loader will reject the request.

Bypassing Restrictions

A namespace is obviously compatible with itself - I hope! So to get the namespace of the library that you want to load, the solution is blindingly trivial: pass the address of the library to __loader_dlopen's caller_addr argument. An unrestricted dlopen would directly call __loader_dlopen which means that it has to be found through parsing its symbol in linker64. Procfs can be used to get the initial information.

Here is code to parse the symtab symbols to find __loader_dlopen and then using it as an unrestricted dlopen.

uint64_t linker64_base = ...;
const ElfW(Sym)* ssym = nullptr;
size_t ssym_size = 0;
const char* sstr_table = nullptr;
void*(*unrestricted_dlopen)(const char*, int, void*) = nullptr;

// Get the string table and symbol headers.
for (ElfW(Half) i = 0; i < ehdr->e_shnum; i++) {
    if (shdr[i].sh_type == SHT_SYMTAB) {
        ssym = reinterpret_cast<const ElfW(Sym)*>(file + shdr[i].sh_offset);
        const ElfW(Sym)* ssym_end = reinterpret_cast<const ElfW(Sym)*>(
                reinterpret_cast<const uint8_t*>(ssym) + shdr[i].sh_size);
        ssym_size = (reinterpret_cast<const uint8_t*>(ssym_end) -
                     reinterpret_cast<const uint8_t*>(ssym)) / shdr[i].sh_entsize;
        sstr_table = reinterpret_cast<const char*>(file + shdr[shdr[i].sh_link].sh_offset);
    }
}

// Enumerate the string table and symbols.
for (size_t i = 0; i < ssym_size; i++) {
    if (ssym[i].st_name) {
        const char *sym_name = &sstr_table[ssym[i].st_name];
        if (std::string(sym_name).find("__dl___loader_dlopen")) 
        	// Calculate the memory address of __loader_dlopen.
        	unrestricted_dlopen = reinterpret_cast<void*(*)(const char*, int, void*)>(linker64_base + ssym[i].st_value);
        }
    }
}

// Use unrestricted dlopen.
void* libart_base = ...;
void* libart_handle = unrestricted_dlopen("libart.so", RTLD_LAZY, libart_base);

Using the standard dlsym function works in the same way but once I had the handle to the library, it didn't seem to have issues about namespaces... yet.

Conclusion

It might seem like a roundabout way and unnecessary to get to an unrestricted dlopen once you already have code to parse symbols for an arbitrary library. I suppose the other solution is to modify the namespace of your own module by tampering with the solist. But it works!

I also hacked together an LLDB - why have you done this to me, Google - Python script that dumps the soinfo list and each primary namespace if you give it any (ideally solist) starting address.

#!/usr/bin/env python3

import argparse
import lldb
import re
import shlex

from pathlib import Path

SIZEOF_POINTER = 8

# [name, size, pad_size]
SOINFO_DEF = [
    ["phdr", 8, 0],
    ["phnum", 8, 0],
    ["base", 8, 0],
    ["size", 8, 0],
    ["dyn", 8, 0],
    ["next", 8, 0],
    ["flags", 4, 4],
    ["strtab", 8, 0],
    ["symtab", 8, 0],
    ["nbucket", 8, 0],
    ["nchain", 8, 0],
    ["bucket", 8, 0],
    ["chain", 8, 0],
    ["plt_relx", 8, 0],
    ["plt_relx_count", 8, 0],
    ["relx", 8, 0],
    ["relx_count", 8, 0],
    ["preinit_array", 8, 0],
    ["preinit_array_count", 8, 0],
    ["init_array", 8, 0],
    ["init_array_count", 8, 0],
    ["fini_array", 8, 0],
    ["fini_array_count", 8, 0],
    ["init_func", 8, 0],
    ["fini_func", 8, 0],
    ["ref_count", 8, 0],
    ["link_map.l_addr", 8, 0],
    ["link_map.l_name", 8, 0],
    ["link_map.l_ld", 8, 0],
    ["link_map.l_next", 8, 0],
    ["link_map.l_prev", 8, 0],
    ["contructors_called", 1, 7],
    ["load_bias", 8, 0],
    ["has_dt_symbolic", 1, 3],
    ["version", 4, 0],
    ["st_dev", 8, 0],
    ["st_ino", 8, 0],
    ["children", 8, 0],
    ["parents", 8, 0],
    ["file_offset", 8, 0],
    ["rtld_flags", 4, 0],
    ["dt_flags_1", 4, 0],
    ["strtab_size", 8, 0],
    ["gnu_nbucket", 8, 0],
    ["gnu_bucket", 8, 0],
    ["gnu_chain", 8, 0],
    ["gnu_maskwords", 4, 0],
    ["gnu_shift2", 4, 0],
    ["gnu_bloom_filter", 8, 0],
    ["local_group_root", 8, 0],
    ["android_relocs", 8, 0],
    ["android_relocs_size", 8, 0],
    ["soname", 8*3, 0],
    ["realpath", 8*3, 0],
    ["versym", 8, 0],
    ["verdef_ptr", 8, 0],
    ["verdef_cnt", 8, 0],
    ["verneed_ptr", 8, 0],
    ["verneed_cnt", 8, 0],
    ["target_sdk_version", 4, 4],
    ["dt_runpath", 8*3, 0],
    ["primary_namespace", 8, 0],
    ["secondary_namespace.head", 8, 0],
    ["secondary_namespace.tail", 8, 0],
    ["handle", 8, 0]
]

SIZEOF_SOINFO = 0 #len(SOINFO_DEF) * SIZEOF_POINTER
for i in range(len(SOINFO_DEF)):
    SIZEOF_SOINFO += SOINFO_DEF[i][1] + SOINFO_DEF[i][2]

ANDROID_NAMESPACE_DEF = [
    ["name", 8*3, 0],
    ["is_isolated", 2, 0],
    ["is_exempt_list_enabled", 2, 0],
    ["is_also_used_as_anonymous", 2, 2],
    ["ld_library_paths", 8*3, 0],
    ["default_library_paths", 8*3, 0],
    ["permitted_paths", 8*3, 0],
    ["allowed_libs", 8*3, 0],
    ["linked_namespaces", 8*3, 0],
    ["soinfo_list.head", 8, 0],
    ["soinfo_list.tail", 8, 0]
]

SIZEOF_ANDROID_NAMESPACE = 0
for i in range(len(ANDROID_NAMESPACE_DEF)):
    SIZEOF_ANDROID_NAMESPACE += ANDROID_NAMESPACE_DEF[i][1] + ANDROID_NAMESPACE_DEF[i][2]


def resolve_module_name(debugger, addr: int):
    result = lldb.SBCommandReturnObject()
    debugger.GetCommandInterpreter().HandleCommand(f"im loo -va {addr}", result)

    m = re.search(r'file = "(.*)?",', result.GetOutput())
    return m.group(1) if m is not None else None
    

def parse_std_string(process, bytes: bytes):
    try:
        return bytes.decode('utf-8').split('\0')[0].strip()
    except UnicodeDecodeError as e:
        pass

    error = lldb.SBError()
    len = int.from_bytes(bytes[0:8], 'little')
    addr = int.from_bytes(bytes[0x10:0x18], 'little')
    s_bytes = process.ReadMemory(addr, len, error)
    if not error.Success():
        print(f"Error reading memory: {error}")
        return ""

    return s_bytes.decode('utf-8').split('\0')[0].strip()


def parse_android_namespace(process, addr: int, verbose: bool):
    error = lldb.SBError()
    bytes = process.ReadMemory(addr, SIZEOF_ANDROID_NAMESPACE, error)
    if not error.Success():
        return

    # start at 1st index
    android_namespace_index = ANDROID_NAMESPACE_DEF[0][1] + ANDROID_NAMESPACE_DEF[0][2]

    for i in range(1, len(ANDROID_NAMESPACE_DEF)):
        member_size = ANDROID_NAMESPACE_DEF[i][1]
        pad_size = ANDROID_NAMESPACE_DEF[i][2]
        value = int.from_bytes(bytes[android_namespace_index:android_namespace_index+member_size], 'little')

        if verbose:
            print(f"\t[{int(android_namespace_index / SIZEOF_POINTER)}] {ANDROID_NAMESPACE_DEF[i][0]}: {hex(value)}")

        android_namespace_index += member_size + pad_size


def parse_soinfo(debugger, bytes: bytes, verbose: bool):
        process = debugger.GetSelectedTarget().process

        soinfo_index = 0
        soname = parse_std_string(process, bytes[49*8:49*8+8*3])
        
        mod_base = int.from_bytes(bytes[2*8:2*8+8], 'little')
        mod_size = int.from_bytes(bytes[3*8:3*8+8], 'little')
        print(f"Module: {soname} [{hex(mod_base)}-{hex(mod_base + mod_size)}]")
        for i in range(len(SOINFO_DEF)):
            member_size = SOINFO_DEF[i][1]
            pad_size = SOINFO_DEF[i][2]
            value = int.from_bytes(bytes[soinfo_index:soinfo_index+member_size], 'little')

            if SOINFO_DEF[i][0] == "primary_namespace":  # parse primary namespace
                error = lldb.SBError()
                bytes = process.ReadMemory(value, 8*3, error)
                if error.Success():
                    name = parse_std_string(process, bytes)
                    print(f"[{int(soinfo_index / SIZEOF_POINTER)}] {SOINFO_DEF[i][0]}: {name} [{hex(value)}]")
                else:
                    print(f"[{int(soinfo_index / SIZEOF_POINTER)}] {SOINFO_DEF[i][0]}: [{hex(value)}]")

                parse_android_namespace(process, value, verbose)
            elif verbose:
                print(f"[{int(soinfo_index / SIZEOF_POINTER)}] {SOINFO_DEF[i][0]}: {hex(value)}")

            soinfo_index += member_size + pad_size
            
        if verbose:
            print()


def enum_solist(debugger, command, result, dict):
    comm_args = shlex.split(command)

    desc = """Enumerate and print information of the solist."""
    parser = argparse.ArgumentParser(
        description=desc,
        prog='enum_solist'
    )
    parser.add_argument(
        'address',
        help='address of an solist'
    )

    parser.add_argument(
        '-v',
        "--verbose",
        action='store_true',
        help='verbose outptu of soinfo structure'
    )

    try:
        args = parser.parse_args(comm_args)
    except Exception as e:
        print(f"Failed to parse args: {e}")
        return

    start_addr = int(args.address, 0)

    target = debugger.GetSelectedTarget()
    if not target:
        print("Error: invalid target", file=target)

    process = target.process
    if not process:
        print("Error: invalid process", file=result)

    error = lldb.SBError()

    curr_soinfo_addr = start_addr
    while curr_soinfo_addr != 0:
        bytes = process.ReadMemory(curr_soinfo_addr, SIZEOF_SOINFO, error)
        if not error.Success():
            print(f"Error: {error.GetCString()}", file=result)
            return
        
        parse_soinfo(debugger, bytes, args.verbose)
        curr_soinfo_addr = int.from_bytes(bytes[5*8:5*8+8], 'little')
        

lldb.debugger.HandleCommand(
    "command script add -f enum_solist.enum_solist enum_solist")
print("A new command called 'enum_solist' was added, type 'enum_solist --help' for more information.")