mirror of
https://github.com/HackTricks-wiki/hacktricks.git
synced 2025-10-10 18:36:50 +00:00
70 lines
4.7 KiB
Markdown
70 lines
4.7 KiB
Markdown
# Unicode Injection
|
||
|
||
{{#include ../../banners/hacktricks-training.md}}
|
||
|
||
## Introduction
|
||
|
||
Depending on how the back-end/front-end is behaving when it **receives weird unicode characters** an attacker might be able to **bypass protections and inject arbitrary characters** that could be used to **abused injection vulnerabilities** such as XSS or SQLi.
|
||
|
||
## Unicode Normalization
|
||
|
||
Unicode normalization occurs when **unicode characters are normalized to ascii characters**.
|
||
|
||
One common scenario of this type of vulnerability occurs when the system is **modifying** somehow the **input** of the user **after having checked it**. For example, in some languages a simple call to make the **input uppercase or lowercase** could normalize the given input and the **unicode will be transformed into ASCII** generating new characters.\
|
||
For more info check:
|
||
|
||
|
||
{{#ref}}
|
||
unicode-normalization.md
|
||
{{#endref}}
|
||
|
||
## `\u` to `%`
|
||
|
||
Unicode characters are usually represented with the **`\u` prefix**. For example the char `㱋` is `\u3c4b`([check it here](https://unicode-explorer.com/c/3c4B)). If a backend **transforms** the prefix **`\u` in `%`**, the resulting string will be `%3c4b`, which URL decoded is: **`<4b`**. And, as you can see, a **`<` char is injected**.\
|
||
You could use this technique to **inject any kind of char** if the backend is vulnerable.\
|
||
Check [https://unicode-explorer.com/](https://unicode-explorer.com/) to find the chars you need.
|
||
|
||
This vuln actually comes from a vulnerability a researcher found, for a more in depth explanation check [https://www.youtube.com/watch?v=aUsAHb0E7Cg](https://www.youtube.com/watch?v=aUsAHb0E7Cg)
|
||
|
||
## Emoji Injection
|
||
|
||
Back-ends something behaves weirdly when they **receives emojis**. That's what happened in [**this writeup**](https://medium.com/@fpatrik/how-i-found-an-xss-vulnerability-via-using-emojis-7ad72de49209) where the researcher managed to achieve a XSS with a payload such as: `💋img src=x onerror=alert(document.domain)//💛`
|
||
|
||
In this case, the error was that the server after removing the malicious characters **converted the UTF-8 string from Windows-1252 to UTF-8** (basically the input encoding and the convert from encoding mismatched). Then this does not give a proper < just a weird unicode one: `‹`\
|
||
``So they took this output and **converted again now from UTF-8 ot ASCII**. This **normalized** the `‹`to`<` this is how the exploit could work on that system.\
|
||
This is what happened:
|
||
|
||
```php
|
||
<?php
|
||
|
||
$str = isset($_GET["str"]) ? htmlspecialchars($_GET["str"]) : "";
|
||
|
||
$str = iconv("Windows-1252", "UTF-8", $str);
|
||
$str = iconv("UTF-8", "ASCII//TRANSLIT", $str);
|
||
|
||
echo "String: " . $str;
|
||
```
|
||
|
||
Emoji lists:
|
||
|
||
- [https://github.com/iorch/jakaton_feminicidios/blob/master/data/emojis.csv](https://github.com/iorch/jakaton_feminicidios/blob/master/data/emojis.csv)
|
||
- [https://unicode.org/emoji/charts-14.0/full-emoji-list.html](https://unicode.org/emoji/charts-14.0/full-emoji-list.html)
|
||
|
||
## Windows Best-Fit/Worst-fit
|
||
|
||
As explained in **[this great post](https://blog.orange.tw/posts/2025-01-worstfit-unveiling-hidden-transformers-in-windows-ansi/)**, Windows has a feature called **Best-Fit** which will **replace unicode characters** that cannot be displayed in ASCII mode with a similar one. This can lead to **unexpected behavior** when the backend is **expecting a specific character** but it receives a different one.
|
||
|
||
It's possible to find best-fit characters in **[https://worst.fit/mapping/](https://worst.fit/mapping/)**.
|
||
|
||
As Windows will usually convert unicode strings to ascii strings as one of the last parts of the execution (usually going from a "W" suffixed API to an "A" suffixed API like `GetEnvironmentVariableA` and `GetEnvironmentVariableW`) this would allow atackers to bypass protections by sending unicode characters that will be converted lastly in ASCII characters that would perform nuexpected actions.
|
||
|
||
In the blog post there are proposed methods to bypass vulnerabilities fixed using a **blacklist of chars**, exploit **path traversals** using [characters mapped to “/“ (0x2F)](https://worst.fit/mapping/#to%3A0x2f) and [characters mapped to “\“ (0x5C)](https://worst.fit/mapping/#to%3A0x5c) or even bypassing shell escape protections like PHP's `escapeshellarg` or Python's `subprocess.run` using a list, this was done for example using **fullwidth double quotes (U+FF02)** instead of double quotes so at the end what looked like 1 argument was transformed in 2 arguments.
|
||
|
||
**Note that for an app to be vulnerable it needs to use "W" Windows APIs but end calling an "A" Windows api so the "Best-fit" of the unicode string is created.**
|
||
|
||
**Several dicovered vulnerabilities won't be fixed as people don't agree who should be fixing this issue"**.
|
||
|
||
{{#include ../../banners/hacktricks-training.md}}
|
||
|
||
|