105 lines
5.3 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Homograph / Homoglyph Attacks in Phishing
{{#include ../../banners/hacktricks-training.md}}
## Overview
Shambulio la homograph (pia inajulikana kama homoglyph) linatumia ukweli kwamba **mifumo mingi ya Unicode kutoka kwa maandiko yasiyo ya Kilatini ni sawa kabisa au yanafanana sana na wahusika wa ASCII**. Kwa kubadilisha wahusika mmoja au zaidi wa Kilatini na wenzao wanaofanana, mshambuliaji anaweza kuunda:
* Majina ya kuonyesha, mada au maudhui ya ujumbe yanayoonekana kuwa halali kwa jicho la binadamu lakini yanapita kwenye ugunduzi wa msingi wa maneno muhimu.
* Majina ya kikoa, sub-kikoa au njia za URL ambazo zinawadanganya waathirika kuamini wanatembelea tovuti ya kuaminika.
Kwa sababu kila glyph inatambulika ndani kwa **nambari ya Unicode**, wahusika mmoja tu aliyebadilishwa inatosha kushinda kulinganisha nyuzi zisizo na busara (mfano, `"Παypal.com"` dhidi ya `"Paypal.com"`).
## Typical Phishing Workflow
1. **Craft message content** Badilisha herufi maalum za Kilatini katika chapa / neno muhimu linalojulikana na wahusika wasioonekana kutoka kwa maandiko mengine (Kigiriki, Kirusi, Kiarumeni, Cherokee, nk.).
2. **Register supporting infrastructure** Kurekebisha kikoa cha homoglyph na kupata cheti cha TLS (zaidi ya CAs hazifanyi ukaguzi wa kufanana kwa kuona).
3. **Send email / SMS** Ujumbe unajumuisha homoglyphs katika moja au zaidi ya maeneo yafuatayo:
* Jina la kuonyesha la mtumaji (mfano, `Ηеlрdеѕk`)
* Mstari wa mada (`Urgеnt Аctіon Rеquіrеd`)
* Maandishi ya kiungo au jina kamili la kikoa
4. **Redirect chain** Mwathirika anarudishwa kupitia tovuti zinazonekana kuwa salama au wafupishaji wa URL kabla ya kutua kwenye mwenyeji mbaya anayekusanya akidi / kupeleka malware.
## Unicode Ranges Commonly Abused
| Script | Range | Example glyph | Looks like |
|--------|-------|---------------|------------|
| Greek | U+0370-03FF | `Η` (U+0397) | Latin `H` |
| Greek | U+0370-03FF | `ρ` (U+03C1) | Latin `p` |
| Cyrillic | U+0400-04FF | `а` (U+0430) | Latin `a` |
| Cyrillic | U+0400-04FF | `е` (U+0435) | Latin `e` |
| Armenian | U+0530-058F | `օ` (U+0585) | Latin `o` |
| Cherokee | U+13A0-13FF | `` (U+13A2) | Latin `T` |
> Tip: Full Unicode charts are available at [unicode.org](https://home.unicode.org/).
## Detection Techniques
### 1. Mixed-Script Inspection
Barua pepe za phishing zinazolenga shirika linalozungumza Kiingereza zinapaswa nadra kuchanganya wahusika kutoka kwa mifumo mingi. Heuristics rahisi lakini yenye ufanisi ni:
1. Pitia kila wahusika wa nyuzi inayokaguliwa.
2. Ramani ya nambari ya nambari kwa kizuizi chake cha Unicode.
3. Pandisha arifa ikiwa mifumo zaidi ya mmoja ipo **au** ikiwa mifumo isiyo ya Kilatini inaonekana mahali ambapo haitarajiwi (jina la kuonyesha, kikoa, mada, URL, nk.).
Python proof-of-concept:
```python
import unicodedata as ud
from collections import defaultdict
SUSPECT_FIELDS = {
"display_name": "Ηоmоgraph Illusion", # example data
"subject": "Finаniаl Տtatеmеnt",
"url": "https://xn--messageconnecton-2kb.blob.core.windows.net" # punycode
}
for field, value in SUSPECT_FIELDS.items():
blocks = defaultdict(int)
for ch in value:
if ch.isascii():
blocks['Latin'] += 1
else:
name = ud.name(ch, 'UNKNOWN')
block = name.split(' ')[0] # e.g., 'CYRILLIC'
blocks[block] += 1
if len(blocks) > 1:
print(f"[!] Mixed scripts in {field}: {dict(blocks)} -> {value}")
```
### 2. Punycode Normalisation (Domains)
Majina ya Kikoa ya Kimataifa (IDNs) yanakodishwa kwa **punycode** (`xn--`). Kubadilisha kila jina la mwenyeji kuwa punycode na kisha kurudi kwenye Unicode kunaruhusu kulinganisha dhidi ya orodha ya kibali au kufanya ukaguzi wa kufanana (kwa mfano, umbali wa Levenshtein) **baada ya** mfuatano kuwa wa kawaida.
```python
import idna
hostname = "Ρаypal.com" # Greek Rho + Cyrillic a
puny = idna.encode(hostname).decode()
print(puny) # xn--yl8hpyal.com
```
### 3. Kamusi za Homoglyph / Algorithms
Tools such as **dnstwist** (`--homoglyph`) or **urlcrazy** can enumerate visually-similar domain permutations and are useful for proactive takedown / monitoring.
## Kuzuia & Kupunguza
* Tekeleza sera kali za DMARC/DKIM/SPF zuia uongo kutoka kwa maeneo yasiyoidhinishwa.
* Tekeleza mantiki ya kugundua hapo juu katika **Secure Email Gateways** na **SIEM/XSOAR** playbooks.
* Flag au karantini ujumbe ambapo jina la kuonyesha domain ≠ domain ya mtumaji.
* Elimisha watumiaji: nakala-paste maandiko ya shaka kwenye mkaguzi wa Unicode, piga juu ya viungo, kamwe usiamini URL shorteners.
## Mifano ya Uhalisia
* Jina la kuonyesha: `Сonfidеntiаl ikеt` (Cyrillic `С`, `е`, `а`; Cherokee ``; Latin small capital ``).
* Mnyororo wa domain: `bestseoservices.com` ➜ municipal `/templates` directory ➜ `kig.skyvaulyt.ru` ➜ fake Microsoft login at `mlcorsftpsswddprotcct.approaches.it.com` protected by custom OTP CAPTCHA.
* Spotify impersonation: `Sρօtifւ` sender with link hidden behind `redirects.ca`.
These samples originate from Unit 42 research (July 2025) and illustrate how homograph abuse is combined with URL redirection and CAPTCHA evasion to bypass automated analysis.
## Marejeo
- [The Homograph Illusion: Not Everything Is As It Seems](https://unit42.paloaltonetworks.com/homograph-attacks/)
- [Unicode Character Database](https://home.unicode.org/)
- [dnstwist domain permutation engine](https://github.com/elceef/dnstwist)
{{#include ../../banners/hacktricks-training.md}}