From 9ba3c1a1f34515dfc9997c31cde7f10d4f7abd7d Mon Sep 17 00:00:00 2001 From: Raja Sekhar Rao Dheekonda Date: Mon, 11 May 2026 22:01:14 -0700 Subject: [PATCH] fix: update GOAT attack description from 'Conversational jailbreaks' to 'Graph of Attacks with Pruning' Updates ai-red-teaming-agent attack types table to use correct GOAT terminology. Aligns with academic paper reference and resolves incorrect description in terminal output. GOAT refers to 'Graph of Attacks with Pruning' algorithm, not generic conversational jailbreaks. --- capabilities/ai-red-teaming/agents/ai-red-teaming-agent.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/capabilities/ai-red-teaming/agents/ai-red-teaming-agent.md b/capabilities/ai-red-teaming/agents/ai-red-teaming-agent.md index 76cd4f2..8919fd8 100644 --- a/capabilities/ai-red-teaming/agents/ai-red-teaming-agent.md +++ b/capabilities/ai-red-teaming/agents/ai-red-teaming-agent.md @@ -192,7 +192,7 @@ When you call `generate_attack`, it: | `tap` | General jailbreak testing (tree-search) | ~200-500 | | `pair` | Query-efficient parallel testing | ~100-300 | | `crescendo` | Multi-turn conversation weaknesses | ~200-500 | -| `goat` | Conversational jailbreaks | ~200-500 | +| `goat` | Graph of Attacks with Pruning | ~200-500 | | `prompt` | Simple single-prompt baseline | ~10-50 | | `rainbow` | Broad risk coverage (MAP-Elites) | ~500-2000 | | `gptfuzzer` | Template-based fuzzing | ~200-500 |