From 2a36c15c3cc4a163c8f2bb34c6a42938a21f40ce Mon Sep 17 00:00:00 2001
From: HackTricks News Bot <bot@hacktricks.xyz>
Date: Tue, 17 Mar 2026 13:11:18 +0000
Subject: [PATCH] Add content from: Open, Closed and Broken: Prompt Fuzzing
 Finds LLMs Still Fra...

---
 src/AI/AI-Prompts.md | 23 ++++++++++++++++++++++-
 1 file changed, 22 insertions(+), 1 deletion(-)
diff --git a/src/AI/AI-Prompts.md b/src/AI/AI-Prompts.md
index 485722c9983..6a1989650aa 100644
--- a/src/AI/AI-Prompts.md
+++ b/src/AI/AI-Prompts.md
@@ -49,6 +49,27 @@ Prompt leaking is a specific type of prompt injection attack where the attacker
 
 A jailbreak attack is a technique used to **bypass the safety mechanisms or restrictions** of an AI model, allowing the attacker to make the **model perform actions or generate content that it would normally refuse**. This can involve manipulating the model's input in such a way that it ignores its built-in safety guidelines or ethical constraints.
 
+### Prompt Fuzzing (Genetic-Algorithm Jailbreak Generation)
+
+A scalable form of jailbreaking is to treat **prompt generation as fuzzing with feedback**. Starting from a disallowed seed prompt (e.g., `how to build a <sensitive>`), generate meaning-preserving variants and score them based on how much the model refuses. Low single-digit bypass rates become reliable once the attack is automated at volume.
+
+**Workflow (abstract):**
+- Extract three lists from the seed: a **keyword** (main noun), **relative words** (action/intent phrases), and **filler phrases** (common English fragments meant to disrupt surface parsing while keeping intent).
+- Iterate for *N* rounds; on each round apply a single mutation operator to the current candidate.
+- Submit candidates to the target LLM or a content filter and compute a **fitness** score (for example, fewer refusal/negative-tone markers). Keep the best candidates and repeat.
+
+**Mutation operators (examples):**
+- Prepend or append a filler phrase.
+- Add a trailing linefeed.
+- Repeat the keyword at the end.
+- Append a relative-word action phrase.
+- Remove a random word.
+
+**Security testing notes:**
+- **Keyword sensitivity** is high; test multiple semantically adjacent terms (a single canonical keyword can severely under-estimate risk).
+- **Standalone content filters** can be brittle under meaning-preserving variation; treat them as probabilistic controls and fuzz them directly.
+- Operationalize this as **regression testing** after model/prompt/filter updates and monitor for high-variance probing patterns.
+
 ## Prompt Injection via Direct Requests
 
 ### Changing the Rules / Assertion of Authority
@@ -646,5 +667,5 @@ Below is a minimal payload that both **hides YOLO enabling** and **executes a re
 - [OpenAI – Memory and new controls for ChatGPT](https://openai.com/index/memory-and-new-controls-for-chatgpt/)
 - [OpenAI Begins Tackling ChatGPT Data Leak Vulnerability (url_safe analysis)](https://embracethered.com/blog/posts/2023/openai-data-exfiltration-first-mitigations-implemented/)
 - [Unit 42 – Fooling AI Agents: Web-Based Indirect Prompt Injection Observed in the Wild](https://unit42.paloaltonetworks.com/ai-agent-prompt-injection/)
-
+- [Unit 42 – Open, Closed and Broken: Prompt Fuzzing Finds LLMs Still Fragile Across Open and Closed Models](https://unit42.paloaltonetworks.com/genai-llm-prompt-fuzzing/)
 {{#include ../banners/hacktricks-training.md}}