You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This document outlines the design, architecture, and implementation plan for a unified Red Team Lab UI. This UI serves as a central orchestrator connecting the security exploitation frameworks directly to the isolated target sandboxes, providing an interface suitable for cybersecurity specialists who are accustomed to vendor-style platforms.
This plan draws direct inspiration from the Agent Zero (agent0) Web UI architecture, focusing on a lightweight, dependency-free local orchestration panel.
🛠️ Technology Stack (Inspired by Agent Zero UI)
To keep the application simple, responsive, and easy to run locally without a node compiler/build pipeline, we utilize:
Backend: FastAPI (Python 3.12). FastAPI provides native asynchronous subprocess handling and built-in WebSocket/SSE support, making it excellent for streaming scanner logs and managing container states.
Frontend: Alpine.js + Vanilla HTML/CSS (styled with Tailwind CSS). This micro-framework approach injects reactivity directly into standard HTML files without requiring build steps (like Vite/React).
Communication: WebSockets or Server-Sent Events (SSE) for streaming real-time shell logs, step completions, and container health states.
Asset Delivery (Offline Vendor-Packaged): All frontend assets (Alpine.js, Tailwind CSS stylesheets, and xterm.js libraries) are saved locally under a /static/vendor/ folder in the repository and served directly by FastAPI.
Zero-Install Profile: No npm install, node_modules folders, or bundlers are required.
100% Offline Capability: Runs seamlessly in isolated VMs, corporate networks, or air-gapped environments with zero dependency on external servers or internet access.
🏛️ System Architecture
The UI acts as an orchestrator sitting between the user and the repository's modular directories. By standardizing execution patterns, the backend can treat sandboxes and exploits as plug-and-play components.
graph TD
subgraph "Lab UI (Alpine.js / Tailwind)"
Dashboard[Overview Dashboard]
CampaignTabs[Campaign-specific Tabs]
LogStreamer[Real-Time Terminal Console]
ReportViewer[Interactive Report Hub]
CredDrawer[Credentials & Settings Sidebar]
end
subgraph "Lab UI Backend (FastAPI / Python)"
API[API Gateway]
SandboxMgr[Sandbox Manager]
ExploitEng[Exploitation Engine]
ConfigMgr[Config & Credentials Manager]
Parser[Report Parser]
end
subgraph "Local OS / Containers"
Podman[Podman / Docker Daemon]
Containers[(Sandboxes & Attack Containers)]
HostFS[(Workspace File System)]
end
%% UI to Backend
Dashboard --> API
CampaignTabs --> API
CredDrawer --> API
%% Backend to Host / Containers
API --> SandboxMgr
API --> ExploitEng
API --> ConfigMgr
API --> Parser
SandboxMgr -->|Manage Containers| Podman
ExploitEng -->|Trigger Attacks| HostFS
ConfigMgr -->|Write Ephemeral Settings| HostFS
Parser -->|Parse Reports/Logs| HostFS
Podman --> Containers
Loading
🔌 Core Modules
1. Sandbox Manager (Backend)
Discovery: Automatically scans the sandboxes/ directory to discover available test environments (e.g., llm_local, RAG_local, mcp_local, and vulnerable versions of Langflow, InvokeAI, LocalAI, and n8n).
Execution: Uses Python's subprocess or the Podman/Docker API to start/stop the sandbox containers dynamically by invoking the respective Makefile commands:
Start: make -C sandboxes/<name> run-gradio-headless
Stop: make -C sandboxes/<name> down or make -C sandboxes/<name> stop-gradio
State Monitoring: Periodically tests container ports or probes target health URLs (e.g., http://localhost:8000/health or http://localhost:7860/) to check status (🔴 Stopped, 🟢 Active, ⏳ Starting).
2. Exploitation Engine (Backend)
Discovery: Discovers available tools in exploitation/ (e.g. garak, promptfoo, agent0, AdversarialGenerator).
Execution: Spawns asynchronous processes to execute the desired scan or attack framework:
Promptfoo: Executes npx promptfoo redteam run
Garak: Runs python attack.py
Agent0: Runs python run_agent.py
Real-Time Logs: Establishes a WebSocket or Server-Sent Events (SSE) stream to pipe stdout and stderr directly from the subprocess to the user interface.
3. Config & Credentials Manager (Backend & UI)
Credential Handling (Agent0-Inspired): Secures and manages credentials required by external LLM orchestrators. For example, Agent0 requires valid external LLM keys (like a Gemini GOOGLE_API_KEY or OPENAI_API_KEY) to run its internal autonomous agent loops.
Ephemeral Configuration Injection: In alignment with production security standards, the actual local config files in the source directories must never be altered. Modifying source files directly introduces race conditions, risk of committing active keys to Git, and configuration corruption. Instead:
Template-Based Copying: Source configs (e.g. settings.json, .env, config.toml) are treated as read-only templates.
Dynamic Temp Directory: Prior to execution, the backend clones these templates to an isolated, temporary session folder (e.g. /tmp/redteam_runs/<session_uuid>/).
In-Memory & Volume Mounting Injection:
For Garak/Promptfoo: The backend updates the cloned copy in the temporary folder and passes its path explicitly via parameters (e.g. npx promptfoo redteam run -c /tmp/redteam_runs/<session_uuid>/promptfooconfig.js).
For Agent0: The backend populates the cloned settings.json and .env in the temp directory, then points Podman to mount those specific files (e.g., -v /tmp/redteam_runs/<session_uuid>/settings.json:/a0/tmp/settings.json) instead of mounting workspace-local paths.
Automated Lifecycle Teardown: Upon attack termination or backend reboot, the /tmp/redteam_runs/<session_uuid>/ folder is recursively deleted, leaving the host system completely clean.
4. Report & Log Parser (Backend)
Parses files in reports/ and logs/ folders upon execution completion.
Unifies tool-specific formats (Garak's .jsonl lines, Promptfoo's .yaml/.json exports, and Agent0's markdown execution logs) into structured JSON responses for the frontend.
🎨 User Interface (Frontend Dashboard)
The frontend uses Alpine.js and Tailwind CSS to render a single-page dashboard structured around Exploitation Campaigns (Tab-per-Exploitation). This matches the reality that each exploitation is hardcoded or configured to target a specific sandbox environment.
1. Main Navigation & Dashboard Console (Overview)
Top Metrics: Showcases key performance indicators (Total runs, vulnerabilities confirmed, test success rates).
Each tab acts as a self-contained "Playbook" representing the execution path for that exploit:
A. Sandbox Target & Status Panel:
CVE Exploits (Implicit): Shows the hardcoded target sandbox required (e.g. Target Sandbox: llm_local_langchain_core_v1.2.4) and its status (🔴 Stopped, 🟢 Active).
General Exploits (Explicit Selection - e.g., Agent0): Since tools like Agent0 are versatile and can run against any LLM-backed sandbox, the UI displays a dropdown to select the target (e.g. llm_local, RAG_local, mcp_local). The status indicator changes to reflect the active health of the selected target.
B. Launch Controller:
A single "Start Attack Sequence" button. The backend starts the chosen/implicit sandbox container, waits for health validation, and runs the scan.
C. Context-Specific Inputs:
Displays configuration inputs tailored only to that campaign.
For Agent0: API key forms (Gemini/OpenAI) similar to Agent Zero's slide-over panel, plus prompt selection files (e.g. selecting OWASP_Top10_LLM_App.md).
For Garak or Promptfoo: Checkboxes to select target OWASP Top 10 categories.
For manual CVE exploits: Target payloads or exfiltration variables.
D. Streaming Log Terminal:
An inline terminal screen showing real-time ANSI-color execution logs for that campaign's run.
3. Interactive Report Hub
A dedicated general tab or panel in each tab displaying aggregated results:
Payload Log: A filterable database table displaying:
Adversarial Input: The prompt used.
Response Received: The model/sandbox output.
Outcome Status: 🔴 Confirmed Leak/Exploit or 🟢 Safely Blocked.
Payload Traces: Generated stack traces or files.
🚀 Step-by-Step Implementation Strategy
Phase 1: Backend CLI Wrappers & API
Implement the Python wrapper functions to start/stop the containers and execute scans programmatically.
Build API endpoints in FastAPI to read logs/reports.
Configure dynamic ephemeral directory generation under /tmp/redteam_runs/.
Phase 2: WebSocket Streaming & SSE
Wire up WebSocket handlers to stream stdout from the subprocesses so that users see real-time updates.
Phase 3: Frontend Dashboard UI (Alpine.js)
Create a clean web UI using pure HTML, Alpine.js, and Tailwind CSS.
Design the settings drawer/sidebar (inspired by Agent Zero) for model configurations and api keys.
Render status badges and the live console logs.
Phase 4: Unified Parsing & Reporting
Write custom parsers for Garak, Promptfoo, and Agent0 outputs to populate the interactive metrics dashboard.
⚡ Prerequisites & Zero-Installation Startup
Since the front-end loads its scripts dynamically, starting the dashboard requires zero Node.js setups or package installations.
1. Prerequisites (Already present for the lab)
Python 3.12 with uv installed.
Podman or Docker installed and running on the host machine.
2. Startup Command
The backend dependencies (e.g., fastapi, uvicorn, websockets) are configured in a standalone Python virtual environment managed by uv. The user can boot up the entire system with a single command:
# Run the UI locally on port 8000
uv run uvicorn main:app --port 8000 --reload
When opened in the browser at http://localhost:8000, the page renders immediately using the locally-served frontend assets from the /static/vendor/ folder, requiring no internet connection (although, an internet connection is needed for the agent0 exploitation to work).
GenAI Red Team Lab UI Plan
This document outlines the design, architecture, and implementation plan for a unified Red Team Lab UI. This UI serves as a central orchestrator connecting the security exploitation frameworks directly to the isolated target sandboxes, providing an interface suitable for cybersecurity specialists who are accustomed to vendor-style platforms.
This plan draws direct inspiration from the Agent Zero (agent0) Web UI architecture, focusing on a lightweight, dependency-free local orchestration panel.
🛠️ Technology Stack (Inspired by Agent Zero UI)
To keep the application simple, responsive, and easy to run locally without a node compiler/build pipeline, we utilize:
/static/vendor/folder in the repository and served directly by FastAPI.npm install,node_modulesfolders, or bundlers are required.🏛️ System Architecture
The UI acts as an orchestrator sitting between the user and the repository's modular directories. By standardizing execution patterns, the backend can treat sandboxes and exploits as plug-and-play components.
graph TD subgraph "Lab UI (Alpine.js / Tailwind)" Dashboard[Overview Dashboard] CampaignTabs[Campaign-specific Tabs] LogStreamer[Real-Time Terminal Console] ReportViewer[Interactive Report Hub] CredDrawer[Credentials & Settings Sidebar] end subgraph "Lab UI Backend (FastAPI / Python)" API[API Gateway] SandboxMgr[Sandbox Manager] ExploitEng[Exploitation Engine] ConfigMgr[Config & Credentials Manager] Parser[Report Parser] end subgraph "Local OS / Containers" Podman[Podman / Docker Daemon] Containers[(Sandboxes & Attack Containers)] HostFS[(Workspace File System)] end %% UI to Backend Dashboard --> API CampaignTabs --> API CredDrawer --> API %% Backend to Host / Containers API --> SandboxMgr API --> ExploitEng API --> ConfigMgr API --> Parser SandboxMgr -->|Manage Containers| Podman ExploitEng -->|Trigger Attacks| HostFS ConfigMgr -->|Write Ephemeral Settings| HostFS Parser -->|Parse Reports/Logs| HostFS Podman --> Containers🔌 Core Modules
1. Sandbox Manager (Backend)
sandboxes/directory to discover available test environments (e.g.,llm_local,RAG_local,mcp_local, and vulnerable versions of Langflow, InvokeAI, LocalAI, and n8n).subprocessor the Podman/Docker API to start/stop the sandbox containers dynamically by invoking the respectiveMakefilecommands:make -C sandboxes/<name> run-gradio-headlessmake -C sandboxes/<name> downormake -C sandboxes/<name> stop-gradiohttp://localhost:8000/healthorhttp://localhost:7860/) to check status (🔴 Stopped, 🟢 Active, ⏳ Starting).2. Exploitation Engine (Backend)
exploitation/(e.g.garak,promptfoo,agent0,AdversarialGenerator).npx promptfoo redteam runpython attack.pypython run_agent.py3. Config & Credentials Manager (Backend & UI)
GOOGLE_API_KEYorOPENAI_API_KEY) to run its internal autonomous agent loops.settings.json,.env, config.toml) are treated as read-only templates./tmp/redteam_runs/<session_uuid>/).npx promptfoo redteam run -c /tmp/redteam_runs/<session_uuid>/promptfooconfig.js).settings.jsonand.envin the temp directory, then points Podman to mount those specific files (e.g.,-v /tmp/redteam_runs/<session_uuid>/settings.json:/a0/tmp/settings.json) instead of mounting workspace-local paths./tmp/redteam_runs/<session_uuid>/folder is recursively deleted, leaving the host system completely clean.4. Report & Log Parser (Backend)
reports/andlogs/folders upon execution completion..jsonllines, Promptfoo's.yaml/.jsonexports, and Agent0's markdown execution logs) into structured JSON responses for the frontend.🎨 User Interface (Frontend Dashboard)
The frontend uses Alpine.js and Tailwind CSS to render a single-page dashboard structured around Exploitation Campaigns (Tab-per-Exploitation). This matches the reality that each exploitation is hardcoded or configured to target a specific sandbox environment.
1. Main Navigation & Dashboard Console (Overview)
2. Campaign Interface Layout (Inside Each Tab)
Each tab acts as a self-contained "Playbook" representing the execution path for that exploit:
Target Sandbox: llm_local_langchain_core_v1.2.4) and its status (🔴 Stopped, 🟢 Active).llm_local,RAG_local,mcp_local). The status indicator changes to reflect the active health of the selected target.Agent0: API key forms (Gemini/OpenAI) similar to Agent Zero's slide-over panel, plus prompt selection files (e.g. selectingOWASP_Top10_LLM_App.md).GarakorPromptfoo: Checkboxes to select target OWASP Top 10 categories.3. Interactive Report Hub
🚀 Step-by-Step Implementation Strategy
Phase 1: Backend CLI Wrappers & API
/tmp/redteam_runs/.Phase 2: WebSocket Streaming & SSE
Phase 3: Frontend Dashboard UI (Alpine.js)
Phase 4: Unified Parsing & Reporting
Garak,Promptfoo, andAgent0outputs to populate the interactive metrics dashboard.⚡ Prerequisites & Zero-Installation Startup
Since the front-end loads its scripts dynamically, starting the dashboard requires zero Node.js setups or package installations.
1. Prerequisites (Already present for the lab)
uvinstalled.2. Startup Command
The backend dependencies (e.g.,
fastapi,uvicorn,websockets) are configured in a standalone Python virtual environment managed byuv. The user can boot up the entire system with a single command:# Run the UI locally on port 8000 uv run uvicorn main:app --port 8000 --reloadWhen opened in the browser at
http://localhost:8000, the page renders immediately using the locally-served frontend assets from the/static/vendor/folder, requiring no internet connection (although, an internet connection is needed for theagent0exploitation to work).