mirror of
https://github.com/HackTricks-wiki/hacktricks.git
synced 2025-10-10 18:36:50 +00:00
Merge pull request #1409 from HackTricks-wiki/update_Strategies_for_Analyzing_Native_Code_in_Android_Ap_20250916_124743
Strategies for Analyzing Native Code in Android Applications...
This commit is contained in:
commit
395ecdf2ea
@ -183,6 +183,145 @@ See the Android native reversing page for setup details and log paths:
|
||||
|
||||
---
|
||||
|
||||
### Android/JNI native string deobfuscation with angr + Ghidra
|
||||
|
||||
Some Android malware and RASP-protected apps hide JNI method names and signatures by decoding them at runtime before calling RegisterNatives. When Frida/ptrace instrumentation is killed by anti-debug, you can still recover the plaintext offline by executing the in-binary decoder with angr and then pushing results back into Ghidra as comments.
|
||||
|
||||
Key idea: treat the decoder inside the .so as a callable function, execute it on the obfuscated byte blobs in .rodata, and concretize the output bytes up to the first \x00 (C-string terminator). Keep angr and Ghidra using the same image base to avoid address mismatches.
|
||||
|
||||
Workflow overview
|
||||
- Triage in Ghidra: identify the decoder and its calling convention/arguments in JNI_OnLoad and RegisterNatives setup.
|
||||
- Run angr (CPython3) to execute the decoder for each target string and dump results.
|
||||
- Annotate in Ghidra: auto-comment decoded strings at each call site for fast JNI reconstruction.
|
||||
|
||||
Ghidra triage (JNI_OnLoad pattern)
|
||||
- Apply JNI datatypes to JNI_OnLoad so Ghidra recognises JNINativeMethod structures.
|
||||
- Typical JNINativeMethod per Oracle docs:
|
||||
|
||||
```c
|
||||
typedef struct {
|
||||
char *name; // e.g., "nativeFoo"
|
||||
char *signature; // e.g., "()V", "()[B"
|
||||
void *fnPtr; // native implementation address
|
||||
} JNINativeMethod;
|
||||
```
|
||||
- Look for calls to RegisterNatives. If the library constructs the name/signature with a local routine (e.g., FUN_00100e10) that references a static byte table (e.g., DAT_00100bf4) and takes parameters like (encoded_ptr, out_buf, length), that is an ideal target for offline execution.
|
||||
|
||||
angr setup (execute the decoder offline)
|
||||
- Load the .so with the same base used in Ghidra (example: 0x00100000) and disable auto-loading of external libs to keep the state small.
|
||||
|
||||
```python
|
||||
import angr, json
|
||||
|
||||
project = angr.Project(
|
||||
'/path/to/libtarget.so',
|
||||
load_options={'main_opts': {'base_addr': 0x00100000}},
|
||||
auto_load_libs=False,
|
||||
)
|
||||
|
||||
ENCODING_FUNC_ADDR = 0x00100e10 # decoder function discovered in Ghidra
|
||||
|
||||
def decode_string(enc_addr, length):
|
||||
# fresh blank state per evaluation
|
||||
st = project.factory.blank_state()
|
||||
outbuf = st.heap.allocate(length)
|
||||
call = project.factory.callable(ENCODING_FUNC_ADDR, base_state=st)
|
||||
ret_ptr = call(enc_addr, outbuf, length) # returns outbuf pointer
|
||||
rs = call.result_state
|
||||
raw = rs.solver.eval(rs.memory.load(ret_ptr, length), cast_to=bytes)
|
||||
return raw.split(b'\x00', 1)[0].decode('utf-8', errors='ignore')
|
||||
|
||||
# Example: decode a JNI signature at 0x100933 of length 5 → should be ()[B
|
||||
print(decode_string(0x00100933, 5))
|
||||
```
|
||||
|
||||
- At scale, build a static map of call sites to the decoder’s arguments (encoded_ptr, size). Wrappers may hide arguments, so you may create this mapping manually from Ghidra xrefs if API recovery is noisy.
|
||||
|
||||
```python
|
||||
# call_site -> (encoded_addr, size)
|
||||
call_site_args_map = {
|
||||
0x00100f8c: (0x00100b81, 0x41),
|
||||
0x00100fa8: (0x00100bca, 0x04),
|
||||
0x00100fcc: (0x001007a0, 0x41),
|
||||
0x00100fe8: (0x00100933, 0x05),
|
||||
0x0010100c: (0x00100c62, 0x41),
|
||||
0x00101028: (0x00100c15, 0x16),
|
||||
0x00101050: (0x00100a49, 0x101),
|
||||
0x00100cf4: (0x00100821, 0x11),
|
||||
0x00101170: (0x00100940, 0x101),
|
||||
0x001011cc: (0x0010084e, 0x13),
|
||||
0x00101334: (0x001007e9, 0x0f),
|
||||
0x00101478: (0x0010087d, 0x15),
|
||||
0x001014f8: (0x00100800, 0x19),
|
||||
0x001015e8: (0x001008e6, 0x27),
|
||||
0x0010160c: (0x00100c33, 0x13),
|
||||
}
|
||||
|
||||
decoded_map = { hex(cs): decode_string(enc, sz)
|
||||
for cs, (enc, sz) in call_site_args_map.items() }
|
||||
|
||||
print(json.dumps(decoded_map, indent=2))
|
||||
with open('decoded_strings.json', 'w') as f:
|
||||
json.dump(decoded_map, f, indent=2)
|
||||
```
|
||||
|
||||
Annotate call sites in Ghidra
|
||||
Option A: Jython-only comment writer (use a pre-computed JSON)
|
||||
- Since angr requires CPython3, keep deobfuscation and annotation separated. First run the angr script above to produce decoded_strings.json. Then run this Jython GhidraScript to write PRE_COMMENTs at each call site (and include the caller function name for context):
|
||||
|
||||
```python
|
||||
#@category Android/Deobfuscation
|
||||
# Jython in Ghidra 10/11
|
||||
import json
|
||||
from ghidra.program.model.listing import CodeUnit
|
||||
|
||||
# Ask for the JSON produced by the angr script
|
||||
f = askFile('Select decoded_strings.json', 'Load')
|
||||
mapping = json.load(open(f.absolutePath, 'r')) # keys as hex strings
|
||||
|
||||
fm = currentProgram.getFunctionManager()
|
||||
rm = currentProgram.getReferenceManager()
|
||||
|
||||
# Replace with your decoder address to locate call-xrefs (optional)
|
||||
ENCODING_FUNC_ADDR = 0x00100e10
|
||||
enc_addr = toAddr(ENCODING_FUNC_ADDR)
|
||||
|
||||
callsite_to_fn = {}
|
||||
for ref in rm.getReferencesTo(enc_addr):
|
||||
if ref.getReferenceType().isCall():
|
||||
from_addr = ref.getFromAddress()
|
||||
fn = fm.getFunctionContaining(from_addr)
|
||||
if fn:
|
||||
callsite_to_fn[from_addr.getOffset()] = fn.getName()
|
||||
|
||||
# Write comments from JSON
|
||||
for k_hex, s in mapping.items():
|
||||
cs = int(k_hex, 16)
|
||||
site = toAddr(cs)
|
||||
caller = callsite_to_fn.get(cs, None)
|
||||
text = s if caller is None else '%s @ %s' % (s, caller)
|
||||
currentProgram.getListing().setComment(site, CodeUnit.PRE_COMMENT, text)
|
||||
print('[+] Annotated %d call sites' % len(mapping))
|
||||
```
|
||||
|
||||
Option B: Single CPython script via pyhidra/ghidra_bridge
|
||||
- Alternatively, use pyhidra or ghidra_bridge to drive Ghidra’s API from the same CPython process running angr. This allows calling decode_string() and immediately setting PRE_COMMENTs without an intermediate file. The logic mirrors the Jython script: build callsite→function map via ReferenceManager, decode with angr, and set comments.
|
||||
|
||||
Why this works and when to use it
|
||||
- Offline execution sidesteps RASP/anti-debug: no ptrace, no Frida hooks required to recover strings.
|
||||
- Keeping Ghidra and angr base_addr aligned (e.g., 0x00100000) ensures that function/data addresses match across tools.
|
||||
- Repeatable recipe for decoders: treat the transform as a pure function, allocate an output buffer in a fresh state, call it with (encoded_ptr, out_ptr, len), then concretize via state.solver.eval and parse C-strings up to \x00.
|
||||
|
||||
Notes and pitfalls
|
||||
- Respect the target ABI/calling convention. angr.factory.callable picks one based on arch; if arguments look shifted, specify cc explicitly.
|
||||
- If the decoder expects zeroed output buffers, initialize outbuf with zeros in the state before the call.
|
||||
- For position-independent Android .so, always supply base_addr so addresses in angr match those seen in Ghidra.
|
||||
- Use currentProgram.getReferenceManager() to enumerate call-xrefs even if the app wraps the decoder behind thin stubs.
|
||||
|
||||
For angr basics, see: [angr basics](../../reversing/reversing-tools-basic-methods/angr/README.md)
|
||||
|
||||
---
|
||||
|
||||
## Deobfuscating Dynamic Control-Flow (JMP/CALL RAX Dispatchers)
|
||||
|
||||
Modern malware families heavily abuse Control-Flow Graph (CFG) obfuscation: instead of a direct jump/call they compute the destination at run-time and execute a `jmp rax` or `call rax`. A small *dispatcher* (typically nine instructions) sets the final target depending on the CPU `ZF`/`CF` flags, completely breaking static CFG recovery.
|
||||
@ -283,6 +422,13 @@ adaptixc2-config-extraction-and-ttps.md
|
||||
|
||||
- [Unit42 – Evolving Tactics of SLOW#TEMPEST: A Deep Dive Into Advanced Malware Techniques](https://unit42.paloaltonetworks.com/slow-tempest-malware-obfuscation/)
|
||||
- SoTap: Lightweight in-app JNI (.so) behavior logger – [github.com/RezaArbabBot/SoTap](https://github.com/RezaArbabBot/SoTap)
|
||||
- Strategies for Analyzing Native Code in Android Applications: Combining Ghidra and Symbolic Execution for Code Decryption and Deobfuscation – [revflash.medium.com](https://revflash.medium.com/strategies-for-analyzing-native-code-in-android-applications-combining-ghidra-and-symbolic-aaef4c9555df)
|
||||
- Ghidra – [github.com/NationalSecurityAgency/ghidra](https://github.com/NationalSecurityAgency/ghidra)
|
||||
- angr – [angr.io](https://angr.io/)
|
||||
- JNI_OnLoad and invocation API – [docs.oracle.com](https://docs.oracle.com/javase/8/docs/technotes/guides/jni/spec/invocation.html#JNJI_OnLoad)
|
||||
- RegisterNatives – [docs.oracle.com](https://docs.oracle.com/javase/8/docs/technotes/guides/jni/spec/functions.html#RegisterNatives)
|
||||
- Tracing JNI Functions – [valsamaras.medium.com](https://valsamaras.medium.com/tracing-jni-functions-75b04bee7c58)
|
||||
- Native Enrich: Scripting Ghidra and Frida to discover hidden JNI functions – [laripping.com](https://laripping.com/blog-posts/2021/12/20/nativeenrich.html)
|
||||
- [Unit42 – AdaptixC2: A New Open-Source Framework Leveraged in Real-World Attacks](https://unit42.paloaltonetworks.com/adaptixc2-post-exploitation-framework/)
|
||||
|
||||
{{#include ../../banners/hacktricks-training.md}}
|
Loading…
x
Reference in New Issue
Block a user