# Unicode Injection {{#include ../../banners/hacktricks-training.md}} ## Introduction Depending on how the back-end/front-end is behaving when it **receives weird unicode characters** an attacker might be able to **bypass protections and inject arbitrary characters** that could be used to **abused injection vulnerabilities** such as XSS or SQLi. ## Unicode Normalization Unicode normalization occurs when **unicode characters are normalized to ascii characters**. One common scenario of this type of vulnerability occurs when the system is **modifying** somehow the **input** of the user **after having checked it**. For example, in some languages a simple call to make the **input uppercase or lowercase** could normalize the given input and the **unicode will be transformed into ASCII** generating new characters.\ For more info check: {{#ref}} unicode-normalization.md {{#endref}} ## `\u` to `%` Unicode characters are usually represented with the **`\u` prefix**. For example the char `㱋` is `\u3c4b`([check it here](https://unicode-explorer.com/c/3c4B)). If a backend **transforms** the prefix **`\u` in `%`**, the resulting string will be `%3c4b`, which URL decoded is: **`<4b`**. And, as you can see, a **`<` char is injected**.\ You could use this technique to **inject any kind of char** if the backend is vulnerable.\ Check [https://unicode-explorer.com/](https://unicode-explorer.com/) to find the chars you need. This vuln actually comes from a vulnerability a researcher found, for a more in depth explanation check [https://www.youtube.com/watch?v=aUsAHb0E7Cg](https://www.youtube.com/watch?v=aUsAHb0E7Cg) ## Emoji Injection Back-ends something behaves weirdly when they **receives emojis**. That's what happened in [**this writeup**](https://medium.com/@fpatrik/how-i-found-an-xss-vulnerability-via-using-emojis-7ad72de49209) where the researcher managed to achieve a XSS with a payload such as: `💋img src=x onerror=alert(document.domain)//💛` In this case, the error was that the server after removing the malicious characters **converted the UTF-8 string from Windows-1252 to UTF-8** (basically the input encoding and the convert from encoding mismatched). Then this does not give a proper < just a weird unicode one: `‹`\ ``So they took this output and **converted again now from UTF-8 ot ASCII**. This **normalized** the `‹`to`<` this is how the exploit could work on that system.\ This is what happened: ```php