mirror of
				https://github.com/HackTricks-wiki/hacktricks.git
				synced 2025-10-10 18:36:50 +00:00 
			
		
		
		
	Merge pull request #1409 from HackTricks-wiki/update_Strategies_for_Analyzing_Native_Code_in_Android_Ap_20250916_124743
Strategies for Analyzing Native Code in Android Applications...
This commit is contained in:
		
						commit
						395ecdf2ea
					
				| @ -183,6 +183,145 @@ See the Android native reversing page for setup details and log paths: | ||||
| 
 | ||||
| --- | ||||
| 
 | ||||
| ### Android/JNI native string deobfuscation with angr + Ghidra | ||||
| 
 | ||||
| Some Android malware and RASP-protected apps hide JNI method names and signatures by decoding them at runtime before calling RegisterNatives. When Frida/ptrace instrumentation is killed by anti-debug, you can still recover the plaintext offline by executing the in-binary decoder with angr and then pushing results back into Ghidra as comments. | ||||
| 
 | ||||
| Key idea: treat the decoder inside the .so as a callable function, execute it on the obfuscated byte blobs in .rodata, and concretize the output bytes up to the first \x00 (C-string terminator). Keep angr and Ghidra using the same image base to avoid address mismatches. | ||||
| 
 | ||||
| Workflow overview | ||||
| - Triage in Ghidra: identify the decoder and its calling convention/arguments in JNI_OnLoad and RegisterNatives setup. | ||||
| - Run angr (CPython3) to execute the decoder for each target string and dump results. | ||||
| - Annotate in Ghidra: auto-comment decoded strings at each call site for fast JNI reconstruction. | ||||
| 
 | ||||
| Ghidra triage (JNI_OnLoad pattern) | ||||
| - Apply JNI datatypes to JNI_OnLoad so Ghidra recognises JNINativeMethod structures. | ||||
| - Typical JNINativeMethod per Oracle docs: | ||||
| 
 | ||||
|   ```c | ||||
|   typedef struct { | ||||
|       char *name;      // e.g., "nativeFoo" | ||||
|       char *signature; // e.g., "()V", "()[B" | ||||
|       void *fnPtr;     // native implementation address | ||||
|   } JNINativeMethod; | ||||
|   ``` | ||||
| - Look for calls to RegisterNatives. If the library constructs the name/signature with a local routine (e.g., FUN_00100e10) that references a static byte table (e.g., DAT_00100bf4) and takes parameters like (encoded_ptr, out_buf, length), that is an ideal target for offline execution. | ||||
| 
 | ||||
| angr setup (execute the decoder offline) | ||||
| - Load the .so with the same base used in Ghidra (example: 0x00100000) and disable auto-loading of external libs to keep the state small. | ||||
| 
 | ||||
|   ```python | ||||
|   import angr, json | ||||
| 
 | ||||
|   project = angr.Project( | ||||
|       '/path/to/libtarget.so', | ||||
|       load_options={'main_opts': {'base_addr': 0x00100000}}, | ||||
|       auto_load_libs=False, | ||||
|   ) | ||||
| 
 | ||||
|   ENCODING_FUNC_ADDR = 0x00100e10  # decoder function discovered in Ghidra | ||||
| 
 | ||||
|   def decode_string(enc_addr, length): | ||||
|       # fresh blank state per evaluation | ||||
|       st = project.factory.blank_state() | ||||
|       outbuf = st.heap.allocate(length) | ||||
|       call = project.factory.callable(ENCODING_FUNC_ADDR, base_state=st) | ||||
|       ret_ptr = call(enc_addr, outbuf, length)  # returns outbuf pointer | ||||
|       rs = call.result_state | ||||
|       raw = rs.solver.eval(rs.memory.load(ret_ptr, length), cast_to=bytes) | ||||
|       return raw.split(b'\x00', 1)[0].decode('utf-8', errors='ignore') | ||||
| 
 | ||||
|   # Example: decode a JNI signature at 0x100933 of length 5 → should be ()[B | ||||
|   print(decode_string(0x00100933, 5)) | ||||
|   ``` | ||||
| 
 | ||||
| - At scale, build a static map of call sites to the decoder’s arguments (encoded_ptr, size). Wrappers may hide arguments, so you may create this mapping manually from Ghidra xrefs if API recovery is noisy. | ||||
| 
 | ||||
|   ```python | ||||
|   # call_site -> (encoded_addr, size) | ||||
|   call_site_args_map = { | ||||
|       0x00100f8c: (0x00100b81, 0x41), | ||||
|       0x00100fa8: (0x00100bca, 0x04), | ||||
|       0x00100fcc: (0x001007a0, 0x41), | ||||
|       0x00100fe8: (0x00100933, 0x05), | ||||
|       0x0010100c: (0x00100c62, 0x41), | ||||
|       0x00101028: (0x00100c15, 0x16), | ||||
|       0x00101050: (0x00100a49, 0x101), | ||||
|       0x00100cf4: (0x00100821, 0x11), | ||||
|       0x00101170: (0x00100940, 0x101), | ||||
|       0x001011cc: (0x0010084e, 0x13), | ||||
|       0x00101334: (0x001007e9, 0x0f), | ||||
|       0x00101478: (0x0010087d, 0x15), | ||||
|       0x001014f8: (0x00100800, 0x19), | ||||
|       0x001015e8: (0x001008e6, 0x27), | ||||
|       0x0010160c: (0x00100c33, 0x13), | ||||
|   } | ||||
| 
 | ||||
|   decoded_map = { hex(cs): decode_string(enc, sz) | ||||
|                   for cs, (enc, sz) in call_site_args_map.items() } | ||||
| 
 | ||||
|   print(json.dumps(decoded_map, indent=2)) | ||||
|   with open('decoded_strings.json', 'w') as f: | ||||
|       json.dump(decoded_map, f, indent=2) | ||||
|   ``` | ||||
| 
 | ||||
| Annotate call sites in Ghidra | ||||
| Option A: Jython-only comment writer (use a pre-computed JSON) | ||||
| - Since angr requires CPython3, keep deobfuscation and annotation separated. First run the angr script above to produce decoded_strings.json. Then run this Jython GhidraScript to write PRE_COMMENTs at each call site (and include the caller function name for context): | ||||
| 
 | ||||
|   ```python | ||||
|   #@category Android/Deobfuscation | ||||
|   # Jython in Ghidra 10/11 | ||||
|   import json | ||||
|   from ghidra.program.model.listing import CodeUnit | ||||
| 
 | ||||
|   # Ask for the JSON produced by the angr script | ||||
|   f = askFile('Select decoded_strings.json', 'Load') | ||||
|   mapping = json.load(open(f.absolutePath, 'r'))  # keys as hex strings | ||||
| 
 | ||||
|   fm = currentProgram.getFunctionManager() | ||||
|   rm = currentProgram.getReferenceManager() | ||||
| 
 | ||||
|   # Replace with your decoder address to locate call-xrefs (optional) | ||||
|   ENCODING_FUNC_ADDR = 0x00100e10 | ||||
|   enc_addr = toAddr(ENCODING_FUNC_ADDR) | ||||
| 
 | ||||
|   callsite_to_fn = {} | ||||
|   for ref in rm.getReferencesTo(enc_addr): | ||||
|       if ref.getReferenceType().isCall(): | ||||
|           from_addr = ref.getFromAddress() | ||||
|           fn = fm.getFunctionContaining(from_addr) | ||||
|           if fn: | ||||
|               callsite_to_fn[from_addr.getOffset()] = fn.getName() | ||||
| 
 | ||||
|   # Write comments from JSON | ||||
|   for k_hex, s in mapping.items(): | ||||
|       cs = int(k_hex, 16) | ||||
|       site = toAddr(cs) | ||||
|       caller = callsite_to_fn.get(cs, None) | ||||
|       text = s if caller is None else '%s @ %s' % (s, caller) | ||||
|       currentProgram.getListing().setComment(site, CodeUnit.PRE_COMMENT, text) | ||||
|   print('[+] Annotated %d call sites' % len(mapping)) | ||||
|   ``` | ||||
| 
 | ||||
| Option B: Single CPython script via pyhidra/ghidra_bridge | ||||
| - Alternatively, use pyhidra or ghidra_bridge to drive Ghidra’s API from the same CPython process running angr. This allows calling decode_string() and immediately setting PRE_COMMENTs without an intermediate file. The logic mirrors the Jython script: build callsite→function map via ReferenceManager, decode with angr, and set comments. | ||||
| 
 | ||||
| Why this works and when to use it | ||||
| - Offline execution sidesteps RASP/anti-debug: no ptrace, no Frida hooks required to recover strings. | ||||
| - Keeping Ghidra and angr base_addr aligned (e.g., 0x00100000) ensures that function/data addresses match across tools. | ||||
| - Repeatable recipe for decoders: treat the transform as a pure function, allocate an output buffer in a fresh state, call it with (encoded_ptr, out_ptr, len), then concretize via state.solver.eval and parse C-strings up to \x00. | ||||
| 
 | ||||
| Notes and pitfalls | ||||
| - Respect the target ABI/calling convention. angr.factory.callable picks one based on arch; if arguments look shifted, specify cc explicitly. | ||||
| - If the decoder expects zeroed output buffers, initialize outbuf with zeros in the state before the call. | ||||
| - For position-independent Android .so, always supply base_addr so addresses in angr match those seen in Ghidra. | ||||
| - Use currentProgram.getReferenceManager() to enumerate call-xrefs even if the app wraps the decoder behind thin stubs. | ||||
| 
 | ||||
| For angr basics, see: [angr basics](../../reversing/reversing-tools-basic-methods/angr/README.md) | ||||
| 
 | ||||
| --- | ||||
| 
 | ||||
| ## Deobfuscating Dynamic Control-Flow (JMP/CALL RAX Dispatchers) | ||||
| 
 | ||||
| Modern malware families heavily abuse Control-Flow Graph (CFG) obfuscation: instead of a direct jump/call they compute the destination at run-time and execute a `jmp rax` or `call rax`.  A small *dispatcher* (typically nine instructions) sets the final target depending on the CPU `ZF`/`CF` flags, completely breaking static CFG recovery. | ||||
| @ -283,6 +422,13 @@ adaptixc2-config-extraction-and-ttps.md | ||||
| 
 | ||||
| - [Unit42 – Evolving Tactics of SLOW#TEMPEST: A Deep Dive Into Advanced Malware Techniques](https://unit42.paloaltonetworks.com/slow-tempest-malware-obfuscation/) | ||||
| - SoTap: Lightweight in-app JNI (.so) behavior logger – [github.com/RezaArbabBot/SoTap](https://github.com/RezaArbabBot/SoTap) | ||||
| - Strategies for Analyzing Native Code in Android Applications: Combining Ghidra and Symbolic Execution for Code Decryption and Deobfuscation – [revflash.medium.com](https://revflash.medium.com/strategies-for-analyzing-native-code-in-android-applications-combining-ghidra-and-symbolic-aaef4c9555df) | ||||
| - Ghidra – [github.com/NationalSecurityAgency/ghidra](https://github.com/NationalSecurityAgency/ghidra) | ||||
| - angr – [angr.io](https://angr.io/) | ||||
| - JNI_OnLoad and invocation API – [docs.oracle.com](https://docs.oracle.com/javase/8/docs/technotes/guides/jni/spec/invocation.html#JNJI_OnLoad) | ||||
| - RegisterNatives – [docs.oracle.com](https://docs.oracle.com/javase/8/docs/technotes/guides/jni/spec/functions.html#RegisterNatives) | ||||
| - Tracing JNI Functions – [valsamaras.medium.com](https://valsamaras.medium.com/tracing-jni-functions-75b04bee7c58) | ||||
| - Native Enrich: Scripting Ghidra and Frida to discover hidden JNI functions – [laripping.com](https://laripping.com/blog-posts/2021/12/20/nativeenrich.html) | ||||
| - [Unit42 – AdaptixC2: A New Open-Source Framework Leveraged in Real-World Attacks](https://unit42.paloaltonetworks.com/adaptixc2-post-exploitation-framework/) | ||||
| 
 | ||||
| {{#include ../../banners/hacktricks-training.md}} | ||||
		Loading…
	
	
			
			x
			
			
		
	
		Reference in New Issue
	
	Block a user