Skip to content

User Interface #46

Description

@felipepenha

GenAI Red Team Lab UI Plan

This document outlines the design, architecture, and implementation plan for a unified Red Team Lab UI. This UI serves as a central orchestrator connecting the security exploitation frameworks directly to the isolated target sandboxes, providing an interface suitable for cybersecurity specialists who are accustomed to vendor-style platforms.

This plan draws direct inspiration from the Agent Zero (agent0) Web UI architecture, focusing on a lightweight, dependency-free local orchestration panel.


🛠️ Technology Stack (Inspired by Agent Zero UI)

To keep the application simple, responsive, and easy to run locally without a node compiler/build pipeline, we utilize:

  1. Backend: FastAPI (Python 3.12). FastAPI provides native asynchronous subprocess handling and built-in WebSocket/SSE support, making it excellent for streaming scanner logs and managing container states.
  2. Frontend: Alpine.js + Vanilla HTML/CSS (styled with Tailwind CSS). This micro-framework approach injects reactivity directly into standard HTML files without requiring build steps (like Vite/React).
  3. Communication: WebSockets or Server-Sent Events (SSE) for streaming real-time shell logs, step completions, and container health states.
  4. Asset Delivery (Offline Vendor-Packaged): All frontend assets (Alpine.js, Tailwind CSS stylesheets, and xterm.js libraries) are saved locally under a /static/vendor/ folder in the repository and served directly by FastAPI.
    • Zero-Install Profile: No npm install, node_modules folders, or bundlers are required.
    • 100% Offline Capability: Runs seamlessly in isolated VMs, corporate networks, or air-gapped environments with zero dependency on external servers or internet access.

🏛️ System Architecture

The UI acts as an orchestrator sitting between the user and the repository's modular directories. By standardizing execution patterns, the backend can treat sandboxes and exploits as plug-and-play components.

graph TD
    subgraph "Lab UI (Alpine.js / Tailwind)"
        Dashboard[Overview Dashboard]
        CampaignTabs[Campaign-specific Tabs]
        LogStreamer[Real-Time Terminal Console]
        ReportViewer[Interactive Report Hub]
        CredDrawer[Credentials & Settings Sidebar]
    end

    subgraph "Lab UI Backend (FastAPI / Python)"
        API[API Gateway]
        SandboxMgr[Sandbox Manager]
        ExploitEng[Exploitation Engine]
        ConfigMgr[Config & Credentials Manager]
        Parser[Report Parser]
    end

    subgraph "Local OS / Containers"
        Podman[Podman / Docker Daemon]
        Containers[(Sandboxes & Attack Containers)]
        HostFS[(Workspace File System)]
    end

    %% UI to Backend
    Dashboard --> API
    CampaignTabs --> API
    CredDrawer --> API
    
    %% Backend to Host / Containers
    API --> SandboxMgr
    API --> ExploitEng
    API --> ConfigMgr

    API --> Parser
    
    SandboxMgr -->|Manage Containers| Podman
    ExploitEng -->|Trigger Attacks| HostFS
    ConfigMgr -->|Write Ephemeral Settings| HostFS
    Parser -->|Parse Reports/Logs| HostFS
    Podman --> Containers
Loading

🔌 Core Modules

1. Sandbox Manager (Backend)

  • Discovery: Automatically scans the sandboxes/ directory to discover available test environments (e.g., llm_local, RAG_local, mcp_local, and vulnerable versions of Langflow, InvokeAI, LocalAI, and n8n).
  • Execution: Uses Python's subprocess or the Podman/Docker API to start/stop the sandbox containers dynamically by invoking the respective Makefile commands:
    • Start: make -C sandboxes/<name> run-gradio-headless
    • Stop: make -C sandboxes/<name> down or make -C sandboxes/<name> stop-gradio
  • State Monitoring: Periodically tests container ports or probes target health URLs (e.g., http://localhost:8000/health or http://localhost:7860/) to check status (🔴 Stopped, 🟢 Active, ⏳ Starting).

2. Exploitation Engine (Backend)

  • Discovery: Discovers available tools in exploitation/ (e.g. garak, promptfoo, agent0, AdversarialGenerator).
  • Execution: Spawns asynchronous processes to execute the desired scan or attack framework:
    • Promptfoo: Executes npx promptfoo redteam run
    • Garak: Runs python attack.py
    • Agent0: Runs python run_agent.py
  • Real-Time Logs: Establishes a WebSocket or Server-Sent Events (SSE) stream to pipe stdout and stderr directly from the subprocess to the user interface.

3. Config & Credentials Manager (Backend & UI)

  • Credential Handling (Agent0-Inspired): Secures and manages credentials required by external LLM orchestrators. For example, Agent0 requires valid external LLM keys (like a Gemini GOOGLE_API_KEY or OPENAI_API_KEY) to run its internal autonomous agent loops.
  • Ephemeral Configuration Injection: In alignment with production security standards, the actual local config files in the source directories must never be altered. Modifying source files directly introduces race conditions, risk of committing active keys to Git, and configuration corruption. Instead:
    • Template-Based Copying: Source configs (e.g. settings.json, .env, config.toml) are treated as read-only templates.
    • Dynamic Temp Directory: Prior to execution, the backend clones these templates to an isolated, temporary session folder (e.g. /tmp/redteam_runs/<session_uuid>/).
    • In-Memory & Volume Mounting Injection:
      • For Garak/Promptfoo: The backend updates the cloned copy in the temporary folder and passes its path explicitly via parameters (e.g. npx promptfoo redteam run -c /tmp/redteam_runs/<session_uuid>/promptfooconfig.js).
      • For Agent0: The backend populates the cloned settings.json and .env in the temp directory, then points Podman to mount those specific files (e.g., -v /tmp/redteam_runs/<session_uuid>/settings.json:/a0/tmp/settings.json) instead of mounting workspace-local paths.
    • Automated Lifecycle Teardown: Upon attack termination or backend reboot, the /tmp/redteam_runs/<session_uuid>/ folder is recursively deleted, leaving the host system completely clean.

4. Report & Log Parser (Backend)

  • Parses files in reports/ and logs/ folders upon execution completion.
  • Unifies tool-specific formats (Garak's .jsonl lines, Promptfoo's .yaml/.json exports, and Agent0's markdown execution logs) into structured JSON responses for the frontend.

🎨 User Interface (Frontend Dashboard)

The frontend uses Alpine.js and Tailwind CSS to render a single-page dashboard structured around Exploitation Campaigns (Tab-per-Exploitation). This matches the reality that each exploitation is hardcoded or configured to target a specific sandbox environment.

1. Main Navigation & Dashboard Console (Overview)

  • Top Metrics: Showcases key performance indicators (Total runs, vulnerabilities confirmed, test success rates).
  • Tab Selection: Switch between campaigns:

2. Campaign Interface Layout (Inside Each Tab)

Each tab acts as a self-contained "Playbook" representing the execution path for that exploit:

  • A. Sandbox Target & Status Panel:
    • CVE Exploits (Implicit): Shows the hardcoded target sandbox required (e.g. Target Sandbox: llm_local_langchain_core_v1.2.4) and its status (🔴 Stopped, 🟢 Active).
    • General Exploits (Explicit Selection - e.g., Agent0): Since tools like Agent0 are versatile and can run against any LLM-backed sandbox, the UI displays a dropdown to select the target (e.g. llm_local, RAG_local, mcp_local). The status indicator changes to reflect the active health of the selected target.
  • B. Launch Controller:
    • A single "Start Attack Sequence" button. The backend starts the chosen/implicit sandbox container, waits for health validation, and runs the scan.
  • C. Context-Specific Inputs:
    • Displays configuration inputs tailored only to that campaign.
    • For Agent0: API key forms (Gemini/OpenAI) similar to Agent Zero's slide-over panel, plus prompt selection files (e.g. selecting OWASP_Top10_LLM_App.md).
    • For Garak or Promptfoo: Checkboxes to select target OWASP Top 10 categories.
    • For manual CVE exploits: Target payloads or exfiltration variables.
  • D. Streaming Log Terminal:
    • An inline terminal screen showing real-time ANSI-color execution logs for that campaign's run.

3. Interactive Report Hub

  • A dedicated general tab or panel in each tab displaying aggregated results:
    • Summary: Charts indicating vulnerability breakdowns (e.g., radar maps of OWASP coverage).
    • Payload Log: A filterable database table displaying:
      • Adversarial Input: The prompt used.
      • Response Received: The model/sandbox output.
      • Outcome Status: 🔴 Confirmed Leak/Exploit or 🟢 Safely Blocked.
      • Payload Traces: Generated stack traces or files.

🚀 Step-by-Step Implementation Strategy

  1. Phase 1: Backend CLI Wrappers & API

    • Implement the Python wrapper functions to start/stop the containers and execute scans programmatically.
    • Build API endpoints in FastAPI to read logs/reports.
    • Configure dynamic ephemeral directory generation under /tmp/redteam_runs/.
  2. Phase 2: WebSocket Streaming & SSE

    • Wire up WebSocket handlers to stream stdout from the subprocesses so that users see real-time updates.
  3. Phase 3: Frontend Dashboard UI (Alpine.js)

    • Create a clean web UI using pure HTML, Alpine.js, and Tailwind CSS.
    • Design the settings drawer/sidebar (inspired by Agent Zero) for model configurations and api keys.
    • Render status badges and the live console logs.
  4. Phase 4: Unified Parsing & Reporting

    • Write custom parsers for Garak, Promptfoo, and Agent0 outputs to populate the interactive metrics dashboard.

⚡ Prerequisites & Zero-Installation Startup

Since the front-end loads its scripts dynamically, starting the dashboard requires zero Node.js setups or package installations.

1. Prerequisites (Already present for the lab)

  • Python 3.12 with uv installed.
  • Podman or Docker installed and running on the host machine.

2. Startup Command

The backend dependencies (e.g., fastapi, uvicorn, websockets) are configured in a standalone Python virtual environment managed by uv. The user can boot up the entire system with a single command:

# Run the UI locally on port 8000
uv run uvicorn main:app --port 8000 --reload

When opened in the browser at http://localhost:8000, the page renders immediately using the locally-served frontend assets from the /static/vendor/ folder, requiring no internet connection (although, an internet connection is needed for the agent0 exploitation to work).

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions