Skip to content

Latest commit

 

History

History
56 lines (54 loc) · 6.47 KB

File metadata and controls

56 lines (54 loc) · 6.47 KB

Tools

See below for a list of tools that can be used for AI red teaming, organized by the phases defined in the GenAI Red Teaming Manual.

AI Red Team Phase Relevant Tools
Phase 1: Planning & Scoping Python Risk Identification Tool for generative AI (PyRIT)
Generative AI Red-teaming and Assessment Kit (Garak)
Promptfoo: LLM evals & red teaming
MITRE ATLAS: Threat Mapping Matrix
BlackIce
EleutherAI
CleverHans
Adversarial Robustness Toolbox (ART)
Giskard
CyberSecEval
Promptmap
Fuzzyai
Fickling
Rigging
judges
Phase 2: Reconnaissance & Fingerprinting Burp Suite / ZAP (note that AI systems often make use of things like server-side events: if you use a web proxy, make sure it has robust support for this technology or you will likely run into trouble)
whois
cURL
GitHub
WireShark
Model Repositories/Registries
Google Dorks/Web Searches
GitLeaks / TruffleHog / GitGuardian (Secret & Repo Scanning)
Model Image runtime scanning
LLMmap
ModelScan: Scans model artifacts (Pickle, H5, SavedModel, PyTorch) for unsafe code / deserialization payloads
Phase 3: Surface Mapping & Vulnerability Traffic capture: Chrome DevTools (HAR), mitmproxy.
API specs and testing: Swagger, Postman.
Diagramming and Threat Modeling: Mermaid, Draw.io, OWASP Threat Dragon, ThreatCanvas by SecureFlag, ThreatFinderAI.
RAG and data inspection: Unstructured (Docling, Tika), vector DB console (Weaviate, Pinecone, Qdrant).
ASCII Smuggler
mcp-scan: Scans MCP servers for tool poisoning, tool-description prompt injection, and rug-pull tool redefinition
Auth and secrets: GitLeaks, Vault, KMS CLI.
Observability: OpenTelemetry, Grafana Loki, ELK Stack, Morpheus, Datadog.
LLM recon/evals (optional): PyRIT, Promptfoo, Garak, Deepteam, Spikee
Phase 4: Exploitation CleverHans: Helps generating adversarial examples and attacks on AI models
Agent0: Containerized Agentic System based on Kali Linux. See the examples in GenAI-Red-Team-Lab/exploitation/agent0
AgentDojo: Framework for prompt-injection attacks (and defenses) against tool-using LLM agents
llm-attacks (GCG): Generates universal, transferable adversarial suffixes against LLMs via Greedy Coordinate Gradient
Phase 5: Persistence & Escalation LLMGuard, Giskard, or Garak for continuous guardrail validation
LangSmith / LlamaIndex tracing for memory and state monitoring
Pinecone Console, Weaviate Management UI, Chroma Vector DB Analyzer
MLFlow and model registry logs for identifying drift or unauthorized retraining events
API Gateway and IAM audit tools (AWS CloudTrail, Azure Monitor, GCP Cloud Audit Logs)
LangFuse, AIExploit, PromptFoo
FAISS, Annoy, HNSWlib
Privacy inference frameworks (Privacy Meter, Copycat CNN, CleverHans) for long-term model inversion or data extraction detection
Phase 6: Post-Exploitation & Impact Tools may include those that emulate post‑exploitation behavior, or help observe and measure impact. Specific tools can depend on the organization’s preference or already existing security infrastructure. For example: Data discovery and data loss prevention (DLP) tools, Telemetry and SIEM logging
Phase 7: Evaluation & Reporting Not Applicable
Phase 8: Post-Engagement & Remediation Not Applicable