Skip to content

fix: update GOAT attack description from 'Conversational jailbreaks' to 'Graph of Attacks with Pruning'#8

Merged
rdheekonda merged 1 commit into
mainfrom
fix/goat-terminology-agent-description
May 12, 2026
Merged

fix: update GOAT attack description from 'Conversational jailbreaks' to 'Graph of Attacks with Pruning'#8
rdheekonda merged 1 commit into
mainfrom
fix/goat-terminology-agent-description

Conversation

@rdheekonda
Copy link
Copy Markdown
Contributor

Summary

Updates GOAT attack description in AI red teaming agent to use correct terminology.

Changes

  • ai-red-teaming-agent.md: Update attack types table description

Details

  • GOAT refers to "Graph of Attacks with Pruning" algorithm, not generic "Conversational jailbreaks"
  • Aligns with academic paper reference and implementation in main repository
  • Provides more accurate description of the attack methodology

Context

This is part of a coordinated terminology fix across repositories:

  • Main repo PR: dreadnode/dreadnode-tiger#1480
  • Resolves incorrect terminal output showing "GOAT (Generative Offensive Agent Tester)"

References

  • Academic paper: "Graph of Attacks v2: Enhanced Adversarial Graph Reasoning for LLMs" arXiv:2504.19019
  • Implementation: dreadnode-tiger/packages/sdk/dreadnode/airt/goat_v2.py

Testing

  • No functional changes, documentation only
  • Terminology now consistent with implementation

…to 'Graph of Attacks with Pruning'

Updates ai-red-teaming-agent attack types table to use correct
GOAT terminology. Aligns with academic paper reference and
resolves incorrect description in terminal output.

GOAT refers to 'Graph of Attacks with Pruning' algorithm, not
generic conversational jailbreaks.
@rdheekonda rdheekonda merged commit 8551ae3 into main May 12, 2026
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant