diff --git a/src/generic-methodologies-and-resources/basic-forensic-methodology/malware-analysis.md b/src/generic-methodologies-and-resources/basic-forensic-methodology/malware-analysis.md index 2f416c681..8572b25be 100644 --- a/src/generic-methodologies-and-resources/basic-forensic-methodology/malware-analysis.md +++ b/src/generic-methodologies-and-resources/basic-forensic-methodology/malware-analysis.md @@ -183,6 +183,145 @@ See the Android native reversing page for setup details and log paths: --- +### Android/JNI native string deobfuscation with angr + Ghidra + +Some Android malware and RASP-protected apps hide JNI method names and signatures by decoding them at runtime before calling RegisterNatives. When Frida/ptrace instrumentation is killed by anti-debug, you can still recover the plaintext offline by executing the in-binary decoder with angr and then pushing results back into Ghidra as comments. + +Key idea: treat the decoder inside the .so as a callable function, execute it on the obfuscated byte blobs in .rodata, and concretize the output bytes up to the first \x00 (C-string terminator). Keep angr and Ghidra using the same image base to avoid address mismatches. + +Workflow overview +- Triage in Ghidra: identify the decoder and its calling convention/arguments in JNI_OnLoad and RegisterNatives setup. +- Run angr (CPython3) to execute the decoder for each target string and dump results. +- Annotate in Ghidra: auto-comment decoded strings at each call site for fast JNI reconstruction. + +Ghidra triage (JNI_OnLoad pattern) +- Apply JNI datatypes to JNI_OnLoad so Ghidra recognises JNINativeMethod structures. +- Typical JNINativeMethod per Oracle docs: + + ```c + typedef struct { + char *name; // e.g., "nativeFoo" + char *signature; // e.g., "()V", "()[B" + void *fnPtr; // native implementation address + } JNINativeMethod; + ``` +- Look for calls to RegisterNatives. If the library constructs the name/signature with a local routine (e.g., FUN_00100e10) that references a static byte table (e.g., DAT_00100bf4) and takes parameters like (encoded_ptr, out_buf, length), that is an ideal target for offline execution. + +angr setup (execute the decoder offline) +- Load the .so with the same base used in Ghidra (example: 0x00100000) and disable auto-loading of external libs to keep the state small. + + ```python + import angr, json + + project = angr.Project( + '/path/to/libtarget.so', + load_options={'main_opts': {'base_addr': 0x00100000}}, + auto_load_libs=False, + ) + + ENCODING_FUNC_ADDR = 0x00100e10 # decoder function discovered in Ghidra + + def decode_string(enc_addr, length): + # fresh blank state per evaluation + st = project.factory.blank_state() + outbuf = st.heap.allocate(length) + call = project.factory.callable(ENCODING_FUNC_ADDR, base_state=st) + ret_ptr = call(enc_addr, outbuf, length) # returns outbuf pointer + rs = call.result_state + raw = rs.solver.eval(rs.memory.load(ret_ptr, length), cast_to=bytes) + return raw.split(b'\x00', 1)[0].decode('utf-8', errors='ignore') + + # Example: decode a JNI signature at 0x100933 of length 5 → should be ()[B + print(decode_string(0x00100933, 5)) + ``` + +- At scale, build a static map of call sites to the decoder’s arguments (encoded_ptr, size). Wrappers may hide arguments, so you may create this mapping manually from Ghidra xrefs if API recovery is noisy. + + ```python + # call_site -> (encoded_addr, size) + call_site_args_map = { + 0x00100f8c: (0x00100b81, 0x41), + 0x00100fa8: (0x00100bca, 0x04), + 0x00100fcc: (0x001007a0, 0x41), + 0x00100fe8: (0x00100933, 0x05), + 0x0010100c: (0x00100c62, 0x41), + 0x00101028: (0x00100c15, 0x16), + 0x00101050: (0x00100a49, 0x101), + 0x00100cf4: (0x00100821, 0x11), + 0x00101170: (0x00100940, 0x101), + 0x001011cc: (0x0010084e, 0x13), + 0x00101334: (0x001007e9, 0x0f), + 0x00101478: (0x0010087d, 0x15), + 0x001014f8: (0x00100800, 0x19), + 0x001015e8: (0x001008e6, 0x27), + 0x0010160c: (0x00100c33, 0x13), + } + + decoded_map = { hex(cs): decode_string(enc, sz) + for cs, (enc, sz) in call_site_args_map.items() } + + print(json.dumps(decoded_map, indent=2)) + with open('decoded_strings.json', 'w') as f: + json.dump(decoded_map, f, indent=2) + ``` + +Annotate call sites in Ghidra +Option A: Jython-only comment writer (use a pre-computed JSON) +- Since angr requires CPython3, keep deobfuscation and annotation separated. First run the angr script above to produce decoded_strings.json. Then run this Jython GhidraScript to write PRE_COMMENTs at each call site (and include the caller function name for context): + + ```python + #@category Android/Deobfuscation + # Jython in Ghidra 10/11 + import json + from ghidra.program.model.listing import CodeUnit + + # Ask for the JSON produced by the angr script + f = askFile('Select decoded_strings.json', 'Load') + mapping = json.load(open(f.absolutePath, 'r')) # keys as hex strings + + fm = currentProgram.getFunctionManager() + rm = currentProgram.getReferenceManager() + + # Replace with your decoder address to locate call-xrefs (optional) + ENCODING_FUNC_ADDR = 0x00100e10 + enc_addr = toAddr(ENCODING_FUNC_ADDR) + + callsite_to_fn = {} + for ref in rm.getReferencesTo(enc_addr): + if ref.getReferenceType().isCall(): + from_addr = ref.getFromAddress() + fn = fm.getFunctionContaining(from_addr) + if fn: + callsite_to_fn[from_addr.getOffset()] = fn.getName() + + # Write comments from JSON + for k_hex, s in mapping.items(): + cs = int(k_hex, 16) + site = toAddr(cs) + caller = callsite_to_fn.get(cs, None) + text = s if caller is None else '%s @ %s' % (s, caller) + currentProgram.getListing().setComment(site, CodeUnit.PRE_COMMENT, text) + print('[+] Annotated %d call sites' % len(mapping)) + ``` + +Option B: Single CPython script via pyhidra/ghidra_bridge +- Alternatively, use pyhidra or ghidra_bridge to drive Ghidra’s API from the same CPython process running angr. This allows calling decode_string() and immediately setting PRE_COMMENTs without an intermediate file. The logic mirrors the Jython script: build callsite→function map via ReferenceManager, decode with angr, and set comments. + +Why this works and when to use it +- Offline execution sidesteps RASP/anti-debug: no ptrace, no Frida hooks required to recover strings. +- Keeping Ghidra and angr base_addr aligned (e.g., 0x00100000) ensures that function/data addresses match across tools. +- Repeatable recipe for decoders: treat the transform as a pure function, allocate an output buffer in a fresh state, call it with (encoded_ptr, out_ptr, len), then concretize via state.solver.eval and parse C-strings up to \x00. + +Notes and pitfalls +- Respect the target ABI/calling convention. angr.factory.callable picks one based on arch; if arguments look shifted, specify cc explicitly. +- If the decoder expects zeroed output buffers, initialize outbuf with zeros in the state before the call. +- For position-independent Android .so, always supply base_addr so addresses in angr match those seen in Ghidra. +- Use currentProgram.getReferenceManager() to enumerate call-xrefs even if the app wraps the decoder behind thin stubs. + +For angr basics, see: [angr basics](../../reversing/reversing-tools-basic-methods/angr/README.md) + +--- + ## Deobfuscating Dynamic Control-Flow (JMP/CALL RAX Dispatchers) Modern malware families heavily abuse Control-Flow Graph (CFG) obfuscation: instead of a direct jump/call they compute the destination at run-time and execute a `jmp rax` or `call rax`. A small *dispatcher* (typically nine instructions) sets the final target depending on the CPU `ZF`/`CF` flags, completely breaking static CFG recovery. @@ -283,6 +422,13 @@ adaptixc2-config-extraction-and-ttps.md - [Unit42 – Evolving Tactics of SLOW#TEMPEST: A Deep Dive Into Advanced Malware Techniques](https://unit42.paloaltonetworks.com/slow-tempest-malware-obfuscation/) - SoTap: Lightweight in-app JNI (.so) behavior logger – [github.com/RezaArbabBot/SoTap](https://github.com/RezaArbabBot/SoTap) +- Strategies for Analyzing Native Code in Android Applications: Combining Ghidra and Symbolic Execution for Code Decryption and Deobfuscation – [revflash.medium.com](https://revflash.medium.com/strategies-for-analyzing-native-code-in-android-applications-combining-ghidra-and-symbolic-aaef4c9555df) +- Ghidra – [github.com/NationalSecurityAgency/ghidra](https://github.com/NationalSecurityAgency/ghidra) +- angr – [angr.io](https://angr.io/) +- JNI_OnLoad and invocation API – [docs.oracle.com](https://docs.oracle.com/javase/8/docs/technotes/guides/jni/spec/invocation.html#JNJI_OnLoad) +- RegisterNatives – [docs.oracle.com](https://docs.oracle.com/javase/8/docs/technotes/guides/jni/spec/functions.html#RegisterNatives) +- Tracing JNI Functions – [valsamaras.medium.com](https://valsamaras.medium.com/tracing-jni-functions-75b04bee7c58) +- Native Enrich: Scripting Ghidra and Frida to discover hidden JNI functions – [laripping.com](https://laripping.com/blog-posts/2021/12/20/nativeenrich.html) - [Unit42 – AdaptixC2: A New Open-Source Framework Leveraged in Real-World Attacks](https://unit42.paloaltonetworks.com/adaptixc2-post-exploitation-framework/) {{#include ../../banners/hacktricks-training.md}} \ No newline at end of file