Skip to content

Open, Closed and Broken Prompt Fuzzing Finds LLMs Still Frag...#2017

Open
carlospolop wants to merge 1 commit intomasterfrom
update_Open__Closed_and_Broken__Prompt_Fuzzing_Finds_LLMs_20260317_130856
Open

Open, Closed and Broken Prompt Fuzzing Finds LLMs Still Frag...#2017
carlospolop wants to merge 1 commit intomasterfrom
update_Open__Closed_and_Broken__Prompt_Fuzzing_Finds_LLMs_20260317_130856

Conversation

@carlospolop
Copy link
Collaborator

🤖 Automated Content Update

This PR was automatically generated by the HackTricks News Bot based on a technical blog post.

📝 Source Information

  • Blog URL: https://unit42.paloaltonetworks.com/genai-llm-prompt-fuzzing/
  • Blog Title: Open, Closed and Broken: Prompt Fuzzing Finds LLMs Still Fragile Across Open and Closed Models
  • Suggested Section: 🤖 AI → AI Security (or a new subpage under AI Security such as “Prompt fuzzing / Genetic-algorithm jailbreak generation” alongside prompt-injection/jailbreak techniques)

🎯 Content Summary

Title / context

Unit 42 (Palo Alto Networks) presents a genetic-algorithm-inspired prompt fuzzing method to automatically generate meaning-preserving variants of disallowed requests (prompt jailbreaking) and use them to measure guardrail fragility across both closed-source and open-weight LLMs plus a separate open-source content-filter model. The central security point is scalability: even low single-digit bypass rates become operationa...

🔧 Technical Details

Genetic-algorithm-inspired prompt fuzzing to jailbreak LLMs: Start from a disallowed seed prompt (e.g., “how to build a <weapon>”). Extract three lists: (1) a keyword noun (e.g., bomb), (2) relative words capturing the action/intent as verbs or action phrases (e.g., build, list the ingredients of, components of), and (3) generic filler phrases (e.g., Has anyone, Is it, Do you think) intended to disrupt interpretation while preserving meaning. Iteratively generate candidates for N iterations by repeatedly applying one mutation operator per step: prepend/append filler phrase, add a trailing newline, repeat the keyword at the end, append a relative-word action phrase, or remove a random word. Submit variants to the target LLM or a content-filter model; compute “fitness” from the output (example proxy: fewer refusal/negative-tone markers). Automating this l...

🤖 Agent Actions

Summary: Added a new “Prompt Fuzzing (Genetic-Algorithm Jailbreak Generation)” section to capture the GA-style prompt fuzzing workflow, mutation operators, and testing notes, and added the Unit42 blog to References.

Files updated:

  • src/AI/AI-Prompts.md

Tests: Not run.

If you want, I can also add a short cross-link from src/AI/README.md to highlight the new prompt-fuzzing subsection.


This PR was automatically created by the HackTricks Feed Bot. Please review the changes carefully before merging.

@carlospolop
Copy link
Collaborator Author

🔗 Additional Context

Original Blog Post: https://unit42.paloaltonetworks.com/genai-llm-prompt-fuzzing/

Content Categories: Based on the analysis, this content was categorized under "🤖 AI → AI Security (or a new subpage under AI Security such as “Prompt fuzzing / Genetic-algorithm jailbreak generation” alongside prompt-injection/jailbreak techniques)".

Repository Maintenance:

  • MD Files Formatting: 954 files processed

Review Notes:

  • This content was automatically processed and may require human review for accuracy
  • Check that the placement within the repository structure is appropriate
  • Verify that all technical details are correct and up-to-date
  • All .md files have been checked for proper formatting (headers, includes, etc.)

Bot Version: HackTricks News Bot v1.0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant