Translated ['', 'src/pentesting-web/regular-expression-denial-of-service

2025-10-10 18:36:50 +00:00 · 2025-10-01 15:15:27 +00:00 · 2025-10-01 15:15:27 +00:00 · d5698bf933
commit d5698bf933
parent 556285ae00
1 changed files with 66 additions and 18 deletions
--- a/src/pentesting-web/regular-expression-denial-of-service-redos.md
+++ b/src/pentesting-web/regular-expression-denial-of-service-redos.md
@ -1,41 +1,70 @@
-# 正则表达式拒绝服务 - ReDoS
+# Regular expression Denial of Service - ReDoS

 {{#include ../banners/hacktricks-training.md}}

-# 正则表达式拒绝服务 (ReDoS)
+# Regular Expression Denial of Service (ReDoS)

-**正则表达式拒绝服务 (ReDoS)** 是指有人利用正则表达式（用于搜索和匹配文本模式的一种方式）工作中的弱点。有时，当使用正则表达式时，它们可能会变得非常慢，尤其是当它们处理的文本变得更大时。这种缓慢可能会变得非常严重，甚至在文本大小稍微增加时就会迅速增长。攻击者可以利用这个问题使使用正则表达式的程序长时间无法正常工作。
+A **Regular Expression Denial of Service (ReDoS)** happens when someone takes advantage of weaknesses in how regular expressions (a way to search and match patterns in text) work. Sometimes, when regular expressions are used, they can become very slow, especially if the piece of text they're working with gets larger. This slowness can get so bad that it grows really fast with even small increases in the text size. Attackers can use this problem to make a program that uses regular expressions stop working properly for a long time.

-## 有问题的正则表达式天真算法
+## The Problematic Regex Naïve Algorithm

-**查看详细信息 [https://owasp.org/www-community/attacks/Regular*expression_Denial_of_Service*-\_ReDoS](https://owasp.org/www-community/attacks/Regular_expression_Denial_of_Service_-_ReDoS)**
+**Check the details in [https://owasp.org/www-community/attacks/Regular*expression_Denial_of_Service*-_ReDoS](https://owasp.org/www-community/attacks/Regular_expression_Denial_of_Service_-_ReDoS)**

-## 恶意正则表达式 <a href="#evil-regexes" id="evil-regexes"></a>
+### Engine behavior and exploitability

-恶意正则表达式模式是指那些**在特制输入上卡住导致拒绝服务**的模式。恶意正则表达式模式通常包含重复的分组和重复或交替的重叠在重复组内。一些恶意模式的示例包括：
+- Most popular engines (PCRE, Java `java.util.regex`, Python `re`, JavaScript `RegExp`) use a **backtracking** VM. Crafted inputs that create many overlapping ways to match a subpattern force exponential or high-polynomial backtracking.
+- Some engines/libraries are designed to be **ReDoS-resilient** by construction (no backtracking), e.g. **RE2** and ports based on finite automata that provide worst‑case linear time; using them for untrusted input removes the backtracking DoS primitive. See the references at the end for details.
+
+## Evil Regexes <a href="#evil-regexes" id="evil-regexes"></a>
+
+An evil regular expression pattern is that one that can **get stuck on crafted input causing a DoS**. Evil regex patterns typically contain grouping with repetition and repetition or alternation with overlapping inside the repeated group. Some examples of evil patterns include:

 - (a+)+
 - ([a-zA-Z]+)\*
 - (a|aa)+
 - (a|a?)+
- (.\*a){x} for x > 10
+- (.*a){x} for x > 10

-所有这些都对输入`aaaaaaaaaaaaaaaaaaaaaaaa!`脆弱。
+All those are vulnerable to the input `aaaaaaaaaaaaaaaaaaaaaaaa!`.

+### Practical recipe to build PoCs
+
+Most catastrophic cases follow this shape:
+
+- Prefix that gets you into the vulnerable subpattern (optional).
+- Long run of a character that causes ambiguous matches inside nested/overlapping quantifiers (e.g., many `a`, `_`, or spaces).
+- A final character that forces overall failure so the engine must backtrack through all possibilities (often a character that won’t match the last token, like `!`).
+
+Minimal examples:
+
+- `(a+)+$` vs input `"a"*N + "!"`
+- `\w*_*\w*$` vs input `"v" + "_"*N + "!"`
+
+Increase N and observe super‑linear growth.
+
+#### Quick timing harness (Python)
+```python
+import re, time
+pat = re.compile(r'(\w*_)\w*$')
+for n in [2**k for k in range(8, 15)]:
+s = 'v' + '_'*n + '!'
+t0=time.time(); pat.search(s); dt=time.time()-t0
+print(n, f"{dt:.3f}s")
+```
 ## ReDoS 载荷

-### 通过 ReDoS 字符串外泄
+### 通过 ReDoS 的字符串外泄

-在 CTF（或漏洞赏金）中，也许你**控制了与敏感信息（标志）匹配的正则表达式**。然后，如果**正则表达式匹配**，使**页面冻结（超时或更长处理时间）**可能会很有用，而**如果没有匹配则不冻结**。这样你就可以**逐字符外泄**字符串：
+在 CTF（或 bug bounty）中，你可能会 **控制用于匹配敏感信息（flag）的 Regex**。然后，当 **Regex 匹配** 时，让页面 **冻结（超时或更长的处理时间）**，而在未匹配时不冻结，可能会很有用。这样你就能够逐字符 **exfiltrate** 字符串：

- 在 [**这篇文章**](https://portswigger.net/daily-swig/blind-regex-injection-theoretical-exploit-offers-new-way-to-force-web-apps-to-spill-secrets) 中你可以找到这个 ReDoS 规则：`^(?=<flag>)((.*)*)*salt$`
+- 在 [**this post**](https://portswigger.net/daily-swig/blind-regex-injection-theoretical-exploit-offers-new-way-to-force-web-apps-to-spill-secrets) 中你可以找到这个 ReDoS 规则：`^(?=<flag>)((.*)*)*salt$`
 - 示例：`^(?=HTB{sOmE_fl§N§)((.*)*)*salt$`
- 在 [**这篇写作**](https://github.com/jorgectf/Created-CTF-Challenges/blob/main/challenges/TacoMaker%20%40%20DEKRA%20CTF%202022/solver/solver.html) 中你可以找到这个：`<flag>(((((((.*)*)*)*)*)*)*)!`
- 在 [**这篇写作**](https://ctftime.org/writeup/25869) 中他使用了：`^(?=${flag_prefix}).*.*.*.*.*.*.*.*!!!!$`
+- 在 [**this writeup**](https://github.com/jorgectf/Created-CTF-Challenges/blob/main/challenges/TacoMaker%20@%20DEKRA%20CTF%202022/solver/solver.html) 中你可以找到这个：`<flag>(((((((.*)*)*)*)*)*)*)!`
+- 在 [**this writeup**](https://ctftime.org/writeup/25869) 中他使用了：`^(?=${flag_prefix}).*.*.*.*.*.*.*.*!!!!$`

-### ReDoS 控制输入和正则表达式
+### ReDoS 控制输入和 Regex

-以下是**ReDoS**示例，其中你**控制**输入和**正则表达式**：
+下面是一些 **ReDoS** 示例，在这些示例中你**同时控制** **输入** 和 **Regex**：
 ```javascript
 function check_time_regexp(regexp, text) {
 var t0 = new Date().getTime()
@ -65,16 +94,35 @@ Regexp ([a-zA-Z]+)*$ took 773 milliseconds.
 Regexp (a+)*$ took 723 milliseconds.
 */
 ```
+### 语言/引擎 注意事项（针对攻击者）
+
+- JavaScript (browser/Node): 内置的 `RegExp` 是回溯引擎，当 regex+input 都受攻击者影响时通常可被利用。
+- Python: `re` 使用回溯。长时间的模糊匹配序列加上失败的尾部常常导致灾难性回溯。
+- Java: `java.util.regex` 使用回溯。如果你只控制输入，寻找使用复杂校验器的端点；如果你控制 patterns（例如存储的规则），ReDoS 通常很容易实现。
+- 像 **RE2/RE2J/RE2JS** 或 **Rust regex** crate 这样的引擎被设计为避免灾难性回溯。如果遇到这些引擎，应该关注其他瓶颈（例如过大的 patterns）或寻找仍在使用回溯引擎的组件。
+
 ## 工具

 - [https://github.com/doyensec/regexploit](https://github.com/doyensec/regexploit)
+- 查找易受攻击的 regexes 并自动生成恶意输入。示例：
+- `pip install regexploit`
+- 交互式分析单个 pattern：`regexploit`
+- 扫描 Python/JS 代码中的 regexes：`regexploit-py path/` 和 `regexploit-js path/`
 - [https://devina.io/redos-checker](https://devina.io/redos-checker)
+- [https://github.com/davisjam/vuln-regex-detector](https://github.com/davisjam/vuln-regex-detector)
+- 端到端管道，用于从项目中提取 regexes、检测易受攻击的模式，并在目标语言中验证 PoC。适用于在大型代码库中搜寻。
+- [https://github.com/tjenkinson/redos-detector](https://github.com/tjenkinson/redos-detector)
+- 简单的 CLI/JS 库，能分析回溯行为并报告某个 pattern 是否安全。
+
+> 提示：当你仅能控制输入时，生成长度成倍增长的字符串（例如 2^k 个字符）并追踪延迟。延迟的指数增长强烈表明存在可利用的 ReDoS。

 ## 参考资料

- [https://owasp.org/www-community/attacks/Regular*expression_Denial_of_Service*-\_ReDoS](https://owasp.org/www-community/attacks/Regular_expression_Denial_of_Service_-_ReDoS)
+- [https://owasp.org/www-community/attacks/Regular*expression_Denial_of_Service*-_ReDoS](https://owasp.org/www-community/attacks/Regular_expression_Denial_of_Service_-_ReDoS)
 - [https://portswigger.net/daily-swig/blind-regex-injection-theoretical-exploit-offers-new-way-to-force-web-apps-to-spill-secrets](https://portswigger.net/daily-swig/blind-regex-injection-theoretical-exploit-offers-new-way-to-force-web-apps-to-spill-secrets)
- [https://github.com/jorgectf/Created-CTF-Challenges/blob/main/challenges/TacoMaker%20%40%20DEKRA%20CTF%202022/solver/solver.html](https://github.com/jorgectf/Created-CTF-Challenges/blob/main/challenges/TacoMaker%20%40%20DEKRA%20CTF%202022/solver/solver.html)
+- [https://github.com/jorgectf/Created-CTF-Challenges/blob/main/challenges/TacoMaker%20@%20DEKRA%20CTF%202022/solver/solver.html](https://github.com/jorgectf/Created-CTF-Challenges/blob/main/challenges/TacoMaker%20@%20DEKRA%20CTF%202022/solver/solver.html)
 - [https://ctftime.org/writeup/25869](https://ctftime.org/writeup/25869)
+- SoK (2024)：关于 Regular Expression Denial of Service (ReDoS) 的文献与工程综述 — [https://arxiv.org/abs/2406.11618](https://arxiv.org/abs/2406.11618)
+- Why RE2 (线性时间 regex 引擎) — [https://github.com/google/re2/wiki/WhyRE2](https://github.com/google/re2/wiki/WhyRE2)

 {{#include ../banners/hacktricks-training.md}}