hacktricks/src/mobile-pentesting/android-app-pentesting/manual-deobfuscation.md

# Manual De-obfuscation Techniques

{{#include ../../banners/hacktricks-training.md}}

## Manual **De-obfuscation Techniques**

In the realm of **software security**, the process of making obscured code understandable, known as **de-obfuscation**, is crucial. This guide delves into various strategies for de-obfuscation, focusing on static analysis techniques and recognizing obfuscation patterns. Additionally, it introduces an exercise for practical application and suggests further resources for those interested in exploring more advanced topics.

### **Strategies for Static De-obfuscation**

When dealing with **obfuscated code**, several strategies can be employed depending on the nature of the obfuscation:

- **DEX bytecode (Java)**: One effective approach involves identifying the application's de-obfuscation methods, then replicating these methods in a Java file. This file is executed to reverse the obfuscation on the targeted elements.
- **Java and Native Code**: Another method is to translate the de-obfuscation algorithm into a scripting language like Python. This strategy highlights that the primary goal is not to fully understand the algorithm but to execute it effectively.

### **Identifying Obfuscation**

Recognizing obfuscated code is the first step in the de-obfuscation process. Key indicators include:

- The **absence or scrambling of strings** in Java and Android, which may suggest string obfuscation.
- The **presence of binary files** in the assets directory or calls to `DexClassLoader`, hinting at code unpacking and dynamic loading.
- The use of **native libraries alongside unidentifiable JNI functions**, indicating potential obfuscation of native methods.

## **Dynamic Analysis in De-obfuscation**

By executing the code in a controlled environment, dynamic analysis **allows for the observation of how the obfuscated code behaves in real time**. This method is particularly effective in uncovering the inner workings of complex obfuscation patterns that are designed to hide the true intent of the code.

### **Applications of Dynamic Analysis**

- **Runtime Decryption**: Many obfuscation techniques involve encrypting strings or code segments that only get decrypted at runtime. Through dynamic analysis, these encrypted elements can be captured at the moment of decryption, revealing their true form.
- **Identifying Obfuscation Techniques**: By monitoring the application's behavior, dynamic analysis can help identify specific obfuscation techniques being used, such as code virtualization, packers, or dynamic code generation.
- **Uncovering Hidden Functionality**: Obfuscated code may contain hidden functionalities that are not apparent through static analysis alone. Dynamic analysis allows for the observation of all code paths, including those conditionally executed, to uncover such hidden functionalities.

### Automated De-obfuscation with LLMs (Androidmeda)

While the previous sections focus on fully manual strategies, in 2025 a new class of *Large-Language-Model (LLM) powered* tooling emerged that can automate most of the tedious renaming and control-flow recovery work.
One representative project is **[Androidmeda](https://github.com/In3tinct/Androidmeda)** – a Python utility that takes *decompiled* Java sources (e.g. produced by `jadx`) and returns a greatly cleaned-up, commented and security-annotated version of the code.

#### Key capabilities
* Renames meaningless identifiers generated by ProGuard / DexGuard / DashO / Allatori / … to *semantic* names.
* Detects and restructures **control-flow flattening**, replacing opaque switch-case state machines with normal loops / if-else constructs.
* Decrypts common **string encryption** patterns when possible.
* Injects **inline comments** that explain the purpose of complex blocks.
* Performs a *lightweight static security scan* and writes the findings to `vuln_report.json` with severity levels (informational → critical).

#### Installation
```bash
git clone https://github.com/In3tinct/Androidmeda
cd Androidmeda
pip3 install -r requirements.txt
```

#### Preparing the inputs
1. Decompile the target APK with `jadx` (or any other decompiler) and keep only the *source* directory that contains the `.java` files:
   ```bash
   jadx -d input_dir/ target.apk
   ```
2. (Optional) Trim `input_dir/` so that it only contains the application packages you want to analyse – this massively speeds-up processing and LLM costs.

#### Usage examples

Remote provider (Gemini-1.5-flash):
```bash
export OPENAI_API_KEY=<your_key>
python3 androidmeda.py \
  --llm_provider google \
  --llm_model gemini-1.5-flash \
  --source_dir input_dir/ \
  --output_dir out/ \
  --save_code true
```

Offline (local `ollama` backend with llama3.2):
```bash
python3 androidmeda.py \
  --llm_provider ollama \
  --llm_model llama3.2 \
  --source_dir input_dir/ \
  --output_dir out/ \
  --save_code true
```

#### Output
* `out/vuln_report.json` – JSON array with `file`, `line`, `issue`, `severity`.
* A mirrored package tree with **de-obfuscated `.java` files** (only if `--save_code true`).

#### Tips & troubleshooting
* **Skipped class** ⇒ usually caused by an unparsable method; isolate the package or update the parser regex.
* **Slow run-time / high token usage** ⇒ point `--source_dir` to *specific* app packages instead of the entire decompile.
* Always *manually review* the vulnerability report – LLM hallucinations can lead to false positives / negatives.

#### Practical value – Crocodilus malware case study
Feeding a heavily obfuscated sample from the 2025 *Crocodilus* banking trojan through Androidmeda reduced analysis time from *hours* to *minutes*: the tool recovered call-graph semantics, revealed calls to accessibility APIs and hard-coded C2 URLs, and produced a concise report that could be imported into analysts’ dashboards.

---

## References and Further Reading

- [https://maddiestone.github.io/AndroidAppRE/obfuscation.html](https://maddiestone.github.io/AndroidAppRE/obfuscation.html)
- BlackHat USA 2018: “Unpacking the Packed Unpacker: Reverse Engineering an Android Anti-Analysis Library” [[video](https://www.youtube.com/watch?v=s0Tqi7fuOSU)]
  - This talk goes over reverse engineering one of the most complex anti-analysis native libraries I’ve seen used by an Android application. It covers mostly obfuscation techniques in native code.
- REcon 2019: “The Path to the Payload: Android Edition” [[video](https://recon.cx/media-archive/2019/Session.005.Maddie_Stone.The_path_to_the_payload_Android_Edition-J3ZnNl2GYjEfa.mp4)]
  - This talk discusses a series of obfuscation techniques, solely in Java code, that an Android botnet was using to hide its behavior.
- Deobfuscating Android Apps with Androidmeda (blog post) – [mobile-hacker.com](https://www.mobile-hacker.com/2025/07/22/deobfuscating-android-apps-with-androidmeda-a-smarter-way-to-read-obfuscated-code/)
- Androidmeda source code – [https://github.com/In3tinct/Androidmeda](https://github.com/In3tinct/Androidmeda)

- [https://maddiestone.github.io/AndroidAppRE/obfuscation.html](https://maddiestone.github.io/AndroidAppRE/obfuscation.html)
- BlackHat USA 2018: “Unpacking the Packed Unpacker: Reverse Engineering an Android Anti-Analysis Library” \[[video](https://www.youtube.com/watch?v=s0Tqi7fuOSU)]
  - This talk goes over reverse engineering one of the most complex anti-analysis native libraries I’ve seen used by an Android application. It covers mostly obfuscation techniques in native code.
- REcon 2019: “The Path to the Payload: Android Edition” \[[video](https://recon.cx/media-archive/2019/Session.005.Maddie_Stone.The_path_to_the_payload_Android_Edition-J3ZnNl2GYjEfa.mp4)]
  - This talk discusses a series of obfuscation techniques, solely in Java code, that an Android botnet was using to hide its behavior.

{{#include ../../banners/hacktricks-training.md}}