Open, Closed and Broken Prompt Fuzzing Finds LLMs Still Frag...#2017
Open
carlospolop wants to merge 1 commit intomasterfrom
Open
Open, Closed and Broken Prompt Fuzzing Finds LLMs Still Frag...#2017carlospolop wants to merge 1 commit intomasterfrom
carlospolop wants to merge 1 commit intomasterfrom
Conversation
Collaborator
Author
🔗 Additional ContextOriginal Blog Post: https://unit42.paloaltonetworks.com/genai-llm-prompt-fuzzing/ Content Categories: Based on the analysis, this content was categorized under "🤖 AI → AI Security (or a new subpage under AI Security such as “Prompt fuzzing / Genetic-algorithm jailbreak generation” alongside prompt-injection/jailbreak techniques)". Repository Maintenance:
Review Notes:
Bot Version: HackTricks News Bot v1.0 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
🤖 Automated Content Update
This PR was automatically generated by the HackTricks News Bot based on a technical blog post.
📝 Source Information
🎯 Content Summary
Title / context
Unit 42 (Palo Alto Networks) presents a genetic-algorithm-inspired prompt fuzzing method to automatically generate meaning-preserving variants of disallowed requests (prompt jailbreaking) and use them to measure guardrail fragility across both closed-source and open-weight LLMs plus a separate open-source content-filter model. The central security point is scalability: even low single-digit bypass rates become operationa...
🔧 Technical Details
Genetic-algorithm-inspired prompt fuzzing to jailbreak LLMs: Start from a disallowed seed prompt (e.g., “how to build a <weapon>”). Extract three lists: (1) a keyword noun (e.g.,
bomb), (2) relative words capturing the action/intent as verbs or action phrases (e.g.,build,list the ingredients of,components of), and (3) generic filler phrases (e.g.,Has anyone,Is it,Do you think) intended to disrupt interpretation while preserving meaning. Iteratively generate candidates for N iterations by repeatedly applying one mutation operator per step: prepend/append filler phrase, add a trailing newline, repeat the keyword at the end, append a relative-word action phrase, or remove a random word. Submit variants to the target LLM or a content-filter model; compute “fitness” from the output (example proxy: fewer refusal/negative-tone markers). Automating this l...🤖 Agent Actions
Summary: Added a new “Prompt Fuzzing (Genetic-Algorithm Jailbreak Generation)” section to capture the GA-style prompt fuzzing workflow, mutation operators, and testing notes, and added the Unit42 blog to References.
Files updated:
src/AI/AI-Prompts.mdTests: Not run.
If you want, I can also add a short cross-link from
src/AI/README.mdto highlight the new prompt-fuzzing subsection.This PR was automatically created by the HackTricks Feed Bot. Please review the changes carefully before merging.