format fixes

This commit is contained in:
carlospolop 2025-07-08 14:26:56 +02:00
parent 827e6354da
commit 0a5242b46a
51 changed files with 102 additions and 69 deletions

View File

@ -1,5 +1,7 @@
# 0. Basic LLM Concepts
{{#include /banners/hacktricks-training.md}}
## Pretraining
Pretraining is the foundational phase in developing a large language model (LLM) where the model is exposed to vast and diverse amounts of text data. During this stage, **the LLM learns the fundamental structures, patterns, and nuances of language**, including grammar, vocabulary, syntax, and contextual relationships. By processing this extensive data, the model acquires a broad understanding of language and general world knowledge. This comprehensive base enables the LLM to generate coherent and contextually relevant text. Subsequently, this pretrained model can undergo fine-tuning, where it is further trained on specialized datasets to adapt its capabilities for specific tasks or domains, enhancing its performance and relevance in targeted applications.
@ -297,3 +299,5 @@ During the backward pass:
- **Efficiency:** Avoids redundant calculations by reusing intermediate results.
- **Accuracy:** Provides exact derivatives up to machine precision.
- **Ease of Use:** Eliminates manual computation of derivatives.
{{#include /banners/hacktricks-training.md}}

View File

@ -1,5 +1,7 @@
# 1. Tokenizing
{{#include /banners/hacktricks-training.md}}
## Tokenizing
**Tokenizing** is the process of breaking down data, such as text, into smaller, manageable pieces called _tokens_. Each token is then assigned a unique numerical identifier (ID). This is a fundamental step in preparing text for processing by machine learning models, especially in natural language processing (NLP).
@ -96,3 +98,5 @@ print(token_ids[:50])
- [https://www.manning.com/books/build-a-large-language-model-from-scratch](https://www.manning.com/books/build-a-large-language-model-from-scratch)
{{#include /banners/hacktricks-training.md}}

View File

@ -1,5 +1,7 @@
# 2. Data Sampling
{{#include /banners/hacktricks-training.md}}
## **Data Sampling**
**Data Sampling** is a crucial process in preparing data for training large language models (LLMs) like GPT. It involves organizing text data into input and target sequences that the model uses to learn how to predict the next word (or token) based on the preceding words. Proper data sampling ensures that the model effectively captures language patterns and dependencies.
@ -238,3 +240,5 @@ tensor([[ 367, 2885, 1464, 1807],
- [https://www.manning.com/books/build-a-large-language-model-from-scratch](https://www.manning.com/books/build-a-large-language-model-from-scratch)
{{#include /banners/hacktricks-training.md}}

View File

@ -1,5 +1,7 @@
# 3. Token Embeddings
{{#include /banners/hacktricks-training.md}}
## Token Embeddings
After tokenizing text data, the next critical step in preparing data for training large language models (LLMs) like GPT is creating **token embeddings**. Token embeddings transform discrete tokens (such as words or subwords) into continuous numerical vectors that the model can process and learn from. This explanation breaks down token embeddings, their initialization, usage, and the role of positional embeddings in enhancing model understanding of token sequences.
@ -216,3 +218,5 @@ print(input_embeddings.shape) # torch.Size([8, 4, 256])
- [https://www.manning.com/books/build-a-large-language-model-from-scratch](https://www.manning.com/books/build-a-large-language-model-from-scratch)
{{#include /banners/hacktricks-training.md}}

View File

@ -1,5 +1,7 @@
# 4. Attention Mechanisms
{{#include /banners/hacktricks-training.md}}
## Attention Mechanisms and Self-Attention in Neural Networks
Attention mechanisms allow neural networks to f**ocus on specific parts of the input when generating each part of the output**. They assign different weights to different inputs, helping the model decide which inputs are most relevant to the task at hand. This is crucial in tasks like machine translation, where understanding the context of the entire sentence is necessary for accurate translation.
@ -427,3 +429,6 @@ For another compact and efficient implementation you could use the [`torch.nn.Mu
- [https://www.manning.com/books/build-a-large-language-model-from-scratch](https://www.manning.com/books/build-a-large-language-model-from-scratch)
{{#include /banners/hacktricks-training.md}}

View File

@ -1,5 +1,7 @@
# 5. LLM Architecture
{{#include /banners/hacktricks-training.md}}
## LLM Architecture
> [!TIP]
@ -697,4 +699,7 @@ print("Output length:", len(out[0]))
## References
- [https://www.manning.com/books/build-a-large-language-model-from-scratch](https://www.manning.com/books/build-a-large-language-model-from-scratch)
- [https://www.manning.com/books/build-a-large-language-model-from-scratch](https://www.manning.com/books/build-a-large-language-model-from-scratch)
{{#include /banners/hacktricks-training.md}}

View File

@ -1,5 +1,7 @@
# 6. Pre-training & Loading models
{{#include /banners/hacktricks-training.md}}
## Text Generation
In order to train a model we will need that model to be able to generate new tokens. Then we will compare the generated tokens with the expected ones in order to train the model into **learning the tokens it needs to generate**.
@ -968,3 +970,5 @@ There 2 quick scripts to load the GPT2 weights locally. For both you can clone t
- [https://www.manning.com/books/build-a-large-language-model-from-scratch](https://www.manning.com/books/build-a-large-language-model-from-scratch)
{{#include /banners/hacktricks-training.md}}

View File

@ -1,5 +1,7 @@
# 7.0. LoRA Improvements in fine-tuning
{{#include /banners/hacktricks-training.md}}
## LoRA Improvements
> [!TIP]
@ -60,4 +62,6 @@ def replace_linear_with_lora(model, rank, alpha):
## References
- [https://www.manning.com/books/build-a-large-language-model-from-scratch](https://www.manning.com/books/build-a-large-language-model-from-scratch)
- [https://www.manning.com/books/build-a-large-language-model-from-scratch](https://www.manning.com/books/build-a-large-language-model-from-scratch)
{{#include /banners/hacktricks-training.md}}

View File

@ -1,5 +1,7 @@
# 7.1. Fine-Tuning for Classification
{{#include /banners/hacktricks-training.md}}
## What is
Fine-tuning is the process of taking a **pre-trained model** that has learned **general language patterns** from vast amounts of data and **adapting** it to perform a **specific task** or to understand domain-specific language. This is achieved by continuing the training of the model on a smaller, task-specific dataset, allowing it to adjust its parameters to better suit the nuances of the new data while leveraging the broad knowledge it has already acquired. Fine-tuning enables the model to deliver more accurate and relevant results in specialized applications without the need to train a new model from scratch.
@ -113,4 +115,6 @@ You can find all the code to fine-tune GPT2 to be a spam classifier in [https://
## References
- [https://www.manning.com/books/build-a-large-language-model-from-scratch](https://www.manning.com/books/build-a-large-language-model-from-scratch)
- [https://www.manning.com/books/build-a-large-language-model-from-scratch](https://www.manning.com/books/build-a-large-language-model-from-scratch)
{{#include /banners/hacktricks-training.md}}

View File

@ -1,5 +1,7 @@
# 7.2. Fine-Tuning to follow instructions
{{#include /banners/hacktricks-training.md}}
> [!TIP]
> The goal of this section is to show how to **fine-tune an already pre-trained model to follow instructions** rather than just generating text, for example, responding to tasks as a chat bot.
@ -103,4 +105,6 @@ You can find an example of the code to perform this fine tuning in [https://gith
## References
- [https://www.manning.com/books/build-a-large-language-model-from-scratch](https://www.manning.com/books/build-a-large-language-model-from-scratch)
- [https://www.manning.com/books/build-a-large-language-model-from-scratch](https://www.manning.com/books/build-a-large-language-model-from-scratch)
{{#include /banners/hacktricks-training.md}}

View File

@ -1,5 +1,7 @@
# LLM Training - Data Preparation
{{#include /banners/hacktricks-training.md}}
**These are my notes from the very recommended book** [**https://www.manning.com/books/build-a-large-language-model-from-scratch**](https://www.manning.com/books/build-a-large-language-model-from-scratch) **with some extra information.**
## Basic Information
@ -96,3 +98,5 @@ You should start by reading this post for some basic concepts you should know ab
{{#ref}}
7.2.-fine-tuning-to-follow-instructions.md
{{#endref}}
{{#include /banners/hacktricks-training.md}}

View File

@ -403,7 +403,6 @@
- [Flask](network-services-pentesting/pentesting-web/flask.md)
- [Git](network-services-pentesting/pentesting-web/git.md)
- [Golang](network-services-pentesting/pentesting-web/golang.md)
- [GWT - Google Web Toolkit](network-services-pentesting/pentesting-web/gwt-google-web-toolkit.md)
- [Grafana](network-services-pentesting/pentesting-web/grafana.md)
- [GraphQL](network-services-pentesting/pentesting-web/graphql.md)
- [H2 - Java SQL database](network-services-pentesting/pentesting-web/h2-java-sql-database.md)
@ -889,7 +888,6 @@
- [Other Web Tricks](todo/other-web-tricks.md)
- [Interesting HTTP$$external:todo/interesting-http.md$$]()
- [Android Forensics](todo/android-forensics.md)
- [TR-069](todo/tr-069.md)
- [Online Platforms with API](todo/online-platforms-with-api.md)
- [Stealing Sensitive Information Disclosure from a Web](todo/stealing-sensitive-information-disclosure-from-a-web.md)
- [Post Exploitation](todo/post-exploitation.md)

View File

@ -1,6 +1,6 @@
# Arbitrary Write 2 Exec
{{#include /banners/hacktricks-training.md}}

View File

@ -1,5 +1,7 @@
# iOS Exploiting
{{#include /banners/hacktricks-training.md}}
## Physical use-after-free
This is a summary from the post from [https://alfiecg.uk/2024/09/24/Kernel-exploit.html](https://alfiecg.uk/2024/09/24/Kernel-exploit.html) moreover further information about exploit using this technique can be found in [https://github.com/felix-pb/kfd](https://github.com/felix-pb/kfd)
@ -211,5 +213,4 @@ void iosurface_kwrite64(uint64_t addr, uint64_t value) {
With these primitives, the exploit provides controlled **32-bit reads** and **64-bit writes** to kernel memory. Further jailbreak steps could involve more stable read/write primitives, which may require bypassing additional protections (e.g., PPL on newer arm64e devices).
{{#include /banners/hacktricks-training.md}}

View File

@ -1,5 +1,7 @@
# Libc Heap
{{#include /banners/hacktricks-training.md}}
## Heap Basics
The heap is basically the place where a program is going to be able to store data when it requests data calling functions like **`malloc`**, `calloc`... Moreover, when this memory is no longer needed it's made available calling the function **`free`**.
@ -529,4 +531,4 @@ heap-memory-functions/heap-functions-security-checks.md
- [https://azeria-labs.com/heap-exploitation-part-2-glibc-heap-free-bins/](https://azeria-labs.com/heap-exploitation-part-2-glibc-heap-free-bins/)
{{#include /banners/hacktricks-training.md}}

View File

@ -1,7 +1,5 @@
# Cryptographic/Compression Algorithms
## Cryptographic/Compression Algorithms
{{#include ../../banners/hacktricks-training.md}}
## Identifying Algorithms

View File

@ -1,7 +1,5 @@
# Windows Artifacts
## Windows Artifacts
{{#include ../../../banners/hacktricks-training.md}}
## Generic Windows Artifacts

View File

@ -1,7 +1,5 @@
# Interesting Windows Registry Keys
### Interesting Windows Registry Keys
{{#include ../../../banners/hacktricks-training.md}}
### **Windows Version and Owner Info**
@ -101,4 +99,3 @@ This guide condenses the crucial paths and methods for accessing detailed system
{{#include ../../../banners/hacktricks-training.md}}

View File

@ -1,5 +1,7 @@
# Threat Modeling
{{#include /banners/hacktricks-training.md}}
## Threat Modeling
Welcome to HackTricks' comprehensive guide on Threat Modeling! Embark on an exploration of this critical aspect of cybersecurity, where we identify, understand, and strategize against potential vulnerabilities in a system. This thread serves as a step-by-step guide packed with real-world examples, helpful software, and easy-to-understand explanations. Ideal for both novices and experienced practitioners looking to fortify their cybersecurity defenses.
@ -111,4 +113,5 @@ Now your finished model should look something like this. And this is how you mak
This is a free tool from Microsoft that helps in finding threats in the design phase of software projects. It uses the STRIDE methodology and is particularly suitable for those developing on Microsoft's stack.
{{#include /banners/hacktricks-training.md}}

View File

@ -1,7 +1,5 @@
# Exploiting Content Providers
## Exploiting Content Providers
{{#include ../../../banners/hacktricks-training.md}}
## Intro

View File

@ -1,7 +1,5 @@
# 623/UDP/TCP - IPMI
## 623/UDP/TCP - IPMI
{{#include ../banners/hacktricks-training.md}}

View File

@ -1,6 +1,5 @@
# 8086 - Pentesting InfluxDB
{{#include ../banners/hacktricks-training.md}}
## Basic Information

View File

@ -1,9 +1,9 @@
# 9001 - Pentesting HSQLDB
## Basic Information
{{#include ../banners/hacktricks-training.md}}
## Basic Information
**HSQLDB \([HyperSQL DataBase](http://hsqldb.org/)\)** is the leading SQL relational database system written in Java. It offers a small, fast multithreaded and transactional database engine with in-memory and disk-based tables and supports embedded and server modes.
**Default port:** 9001

View File

@ -1,6 +1,5 @@
# 5432,5433 - Pentesting Postgresql
{{#include ../banners/hacktricks-training.md}}
## **Basic Information**

View File

@ -1,5 +1,7 @@
# Angular
{{#include /banners/hacktricks-training.md}}
## The Checklist
Checklist [from here](https://lsgeurope.com/post/angular-security-checklist).
@ -614,4 +616,5 @@ According to the W3C documentation, the `window.location` and `document.location
{{#include /banners/hacktricks-training.md}}

View File

@ -1,5 +1,7 @@
# Django
{{#include /banners/hacktricks-training.md}}
## Cache Manipulation to RCE
Django's default cache storage method is [Python pickles](https://docs.python.org/3/library/pickle.html), which can lead to RCE if [untrusted input is unpickled](https://media.blackhat.com/bh-us-11/Slaviero/BH_US_11_Slaviero_Sour_Pickles_Slides.pdf). **If an attacker can gain write access to the cache, they can escalate this vulnerability to RCE on the underlying server**.
@ -9,4 +11,4 @@ This HackerOne report provides a great, reproducible example of exploiting Djang
{{#include /banners/hacktricks-training.md}}

View File

@ -1,6 +0,0 @@
# GWT - Google Web Toolkit

View File

@ -1,5 +1,7 @@
# NodeJS Express
{{#include /banners/hacktricks-training.md}}
## Cookie Signature
The tool [https://github.com/DigitalInterruption/cookie-monster](https://github.com/DigitalInterruption/cookie-monster) is a utility for automating the testing and re-signing of Express.js cookie secrets.
@ -37,5 +39,5 @@ cookie-monster -e -f new_cookie.json -k secret
```
{{#include /banners/hacktricks-training.md}}

View File

@ -1,7 +1,5 @@
# LDAP Injection
## LDAP Injection
{{#include ../banners/hacktricks-training.md}}
## LDAP Injection

View File

@ -1,7 +1,5 @@
# Parameter Pollution | JSON Injection
## Parameter Pollution
{{#include ../banners/hacktricks-training.md}}

View File

@ -1,7 +1,5 @@
# PostMessage Vulnerabilities
## PostMessage Vulnerabilities
{{#include ../../banners/hacktricks-training.md}}
## Send **PostMessage**

View File

@ -1,11 +1,7 @@
# RSQL Injection
## RSQL Injection
{{#include ../banners/hacktricks-training.md}}
## RSQL Injection
## What is RSQL?
RSQL is a query language designed for parameterized filtering of inputs in RESTful APIs. Based on FIQL (Feed Item Query Language), originally specified by Mark Nottingham for querying Atom feeds, RSQL stands out for its simplicity and ability to express complex queries in a compact and URI-compliant way over HTTP. This makes it an excellent choice as a general query language for REST endpoint searching.

View File

@ -1,7 +1,5 @@
# SAML Attacks
## SAML Attacks
{{#include ../../banners/hacktricks-training.md}}
## Basic Information

View File

@ -1,4 +1,5 @@
# SQLMap
{{#include ../../banners/hacktricks-training.md}}
## Basic arguments for SQLmap

View File

@ -1,5 +1,7 @@
# XSS (Cross Site Scripting)
{{#include /banners/hacktricks-training.md}}
## Methodology
1. Check if **any value you control** (_parameters_, _path_, _headers_?, _cookies_?) is being **reflected** in the HTML or **used** by **JS** code.

View File

@ -1,7 +1,5 @@
# Debugging Client Side JS
## Debugging Client Side JS
{{#include ../../banners/hacktricks-training.md}}
Debugging client side JS can be a pain because every-time you change the URL (including a change in the params used or param values) you need to **reset the breakpoint and reload the page**.

View File

@ -1,7 +1,5 @@
# Cryptographic/Compression Algorithms
## Cryptographic/Compression Algorithms
{{#include ../../banners/hacktricks-training.md}}
## Identifying Algorithms

View File

@ -1,9 +1,11 @@
# Fault Injection Attacks
{{#include /banners/hacktricks-training.md}}
Fault injections attacks includes introducing external distrubance in electronic circuits to influence it's behaviour, resulting to disclose information or even bypass certian restrictions in the circuit. This attacks opens a lot of possibilities for attacking electronic circuits. This attack is also referred as glitching of electronic circuits.
There are a lot of methods and mediums for injecting fault into an electronic circuit.
{{#include /banners/hacktricks-training.md}}

View File

@ -1,5 +1,7 @@
# Side Channel Analysis Attacks
{{#include /banners/hacktricks-training.md}}
Side Channel Analysis Attacks refers to determining the information from a device or entity by some other channel or source that has an indirect influence on it and information can be extracted from it. This can be explained better with an example:
Analysing the vibrations in glass sheets which is near the sound source, but the sound source is not accessible. The vibrations in glass are influenced by the sound source and if monitored and analysed, the sound can be decoded and interpreted.
@ -8,4 +10,4 @@ These attacks are very popular in case of leaking data such as private keys or f
{{#include /banners/hacktricks-training.md}}

View File

@ -1,5 +1,7 @@
# Industrial Control Systems Hacking
{{#include /banners/hacktricks-training.md}}
## About this Section
This section contains all about Industrial Control Systems including concepts as well as methodologies to hack them with various security issues that persists in them.
@ -16,5 +18,5 @@ These techniques can also be used to protect against attacks and blue teaming fo
{{#include /banners/hacktricks-training.md}}

View File

@ -1,5 +1,7 @@
# The Modbus Protocol
{{#include /banners/hacktricks-training.md}}
## Introduction to Modbus Protocol
The Modbus protocol is a widely used protocol in Industrial Automation and Control Systems. Modbus allows communication between various devices such as programmable logic controllers (PLCs), sensors, actuators, and other industrial devices. Understanding the Modbus Protocol is essential since this is the single most used communication protocol in the ICS and has a lot of potential attack surface for sniffing and even injecting commands into PLCs.
@ -32,6 +34,6 @@ Due to it's large scale use and lack of upgradations, attacking Modbus provides
{{#include /banners/hacktricks-training.md}}

View File

@ -1,5 +1,7 @@
# Investment Terms
{{#include /banners/hacktricks-training.md}}
## Spot
This is the most basic way to do some trading. You can **indicate the amount of the asset and the price** that you want to buy or sell, and whenever that price is reached the operation is done.
@ -69,4 +71,4 @@ However, the buyer will be paying some fee to the seller for opening the option
{{#include /banners/hacktricks-training.md}}

View File

@ -1,6 +1,6 @@
# Radio Hacking
{{#include /banners/hacktricks-training.md}}

View File

@ -1,5 +1,7 @@
# FISSURE - The RF Framework
{{#include /banners/hacktricks-training.md}}
**Frequency Independent SDR-based Signal Understanding and Reverse Engineering**
FISSURE is an open-source RF and reverse engineering framework designed for all skill levels with hooks for signal detection and classification, protocol discovery, attack execution, IQ manipulation, vulnerability analysis, automation, and AI/ML. The framework was built to promote the rapid integration of software modules, radios, protocols, signal data, scripts, flow graphs, reference material, and third-party tools. FISSURE is a workflow enabler that keeps software in one location and allows teams to effortlessly get up to speed while sharing the same proven baseline configuration for specific Linux distributions.
@ -185,4 +187,5 @@ Special thanks to Dr. Samuel Mantravadi and Joseph Reith for their contributions
{{#include /banners/hacktricks-training.md}}

View File

@ -1,5 +1,7 @@
# Rust Basics
{{#include /banners/hacktricks-training.md}}
### Generic Types
Create a struct where 1 of their values could be any type
@ -318,5 +320,5 @@ fn main() {
```
{{#include /banners/hacktricks-training.md}}

View File

@ -1,5 +1,7 @@
# Test LLMs
{{#include /banners/hacktricks-training.md}}
## Run & train models locally
### [**Hugging Face Transformers**](https://github.com/huggingface/transformers)
@ -50,5 +52,5 @@ It offers several sections like:
* **API Access:** Simple APIs for running models the enable developers to deploy and scale models effortlessly within their own applications.
{{#include /banners/hacktricks-training.md}}

View File

@ -1,6 +0,0 @@
# TR-069

View File

@ -1,5 +1,7 @@
# Cobalt Strike
{{#include /banners/hacktricks-training.md}}
### Listeners
### C2 Listeners
@ -369,5 +371,5 @@ pscp -r root@kali:/opt/cobaltstrike/artifact-kit/dist-pipe .
```
{{#include /banners/hacktricks-training.md}}

View File

@ -1,7 +1,5 @@
# Windows Credentials Protections
## Credentials Protections
{{#include ../../banners/hacktricks-training.md}}
## WDigest

View File

@ -1,7 +1,5 @@
# Named Pipe Client Impersonation
## Named Pipe Client Impersonation
{{#include ../../banners/hacktricks-training.md}}
Check: [**https://ired.team/offensive-security/privilege-escalation/windows-namedpipes-privilege-escalation**](https://ired.team/offensive-security/privilege-escalation/windows-namedpipes-privilege-escalation)

View File

@ -1,6 +1,5 @@
# SeDebug + SeImpersonate - Copy Token
{{#include ../../banners/hacktricks-training.md}}
The following code **exploits the privileges SeDebug and SeImpersonate** to copy the token from a **process running as SYSTEM** and with **all the token privileges**. \