User Interface

# GenAI Red Team Lab UI Plan

This document outlines the design, architecture, and implementation plan for a unified **Red Team Lab UI**. This UI serves as a central orchestrator connecting the security exploitation frameworks directly to the isolated target sandboxes, providing an interface suitable for cybersecurity specialists who are accustomed to vendor-style platforms.

This plan draws direct inspiration from the **Agent Zero (agent0) Web UI** architecture, focusing on a lightweight, dependency-free local orchestration panel.

---

## 🛠️ Technology Stack (Inspired by Agent Zero UI)

To keep the application simple, responsive, and easy to run locally without a node compiler/build pipeline, we utilize:

1. **Backend**: **FastAPI** (Python 3.12). FastAPI provides native asynchronous subprocess handling and built-in WebSocket/SSE support, making it excellent for streaming scanner logs and managing container states.
2. **Frontend**: **Alpine.js** + **Vanilla HTML/CSS** (styled with **Tailwind CSS**). This micro-framework approach injects reactivity directly into standard HTML files without requiring build steps (like Vite/React).
3. **Communication**: **WebSockets** or **Server-Sent Events (SSE)** for streaming real-time shell logs, step completions, and container health states.
4. **Asset Delivery (Offline Vendor-Packaged)**: All frontend assets (Alpine.js, Tailwind CSS stylesheets, and xterm.js libraries) are saved locally under a `/static/vendor/` folder in the repository and served directly by FastAPI.
   - **Zero-Install Profile**: No `npm install`, `node_modules` folders, or bundlers are required.
   - **100% Offline Capability**: Runs seamlessly in isolated VMs, corporate networks, or air-gapped environments with zero dependency on external servers or internet access.

---

## 🏛️ System Architecture

The UI acts as an orchestrator sitting between the user and the repository's modular directories. By standardizing execution patterns, the backend can treat sandboxes and exploits as plug-and-play components.

```mermaid
graph TD
    subgraph "Lab UI (Alpine.js / Tailwind)"
        Dashboard[Overview Dashboard]
        CampaignTabs[Campaign-specific Tabs]
        LogStreamer[Real-Time Terminal Console]
        ReportViewer[Interactive Report Hub]
        CredDrawer[Credentials & Settings Sidebar]
    end

    subgraph "Lab UI Backend (FastAPI / Python)"
        API[API Gateway]
        SandboxMgr[Sandbox Manager]
        ExploitEng[Exploitation Engine]
        ConfigMgr[Config & Credentials Manager]
        Parser[Report Parser]
    end

    subgraph "Local OS / Containers"
        Podman[Podman / Docker Daemon]
        Containers[(Sandboxes & Attack Containers)]
        HostFS[(Workspace File System)]
    end

    %% UI to Backend
    Dashboard --> API
    CampaignTabs --> API
    CredDrawer --> API
    
    %% Backend to Host / Containers
    API --> SandboxMgr
    API --> ExploitEng
    API --> ConfigMgr

    API --> Parser
    
    SandboxMgr -->|Manage Containers| Podman
    ExploitEng -->|Trigger Attacks| HostFS
    ConfigMgr -->|Write Ephemeral Settings| HostFS
    Parser -->|Parse Reports/Logs| HostFS
    Podman --> Containers
```

---

## 🔌 Core Modules

### 1. Sandbox Manager (Backend)
- **Discovery**: Automatically scans the `sandboxes/` directory to discover available test environments (e.g., `llm_local`, `RAG_local`, `mcp_local`, and vulnerable versions of Langflow, InvokeAI, LocalAI, and n8n).
- **Execution**: Uses Python's `subprocess` or the Podman/Docker API to start/stop the sandbox containers dynamically by invoking the respective `Makefile` commands:
  - **Start**: `make -C sandboxes/<name> run-gradio-headless`
  - **Stop**: `make -C sandboxes/<name> down` or `make -C sandboxes/<name> stop-gradio`
- **State Monitoring**: Periodically tests container ports or probes target health URLs (e.g., `http://localhost:8000/health` or `http://localhost:7860/`) to check status (🔴 Stopped, 🟢 Active, ⏳ Starting).

### 2. Exploitation Engine (Backend)
- **Discovery**: Discovers available tools in `exploitation/` (e.g. `garak`, `promptfoo`, `agent0`, `AdversarialGenerator`).
- **Execution**: Spawns asynchronous processes to execute the desired scan or attack framework:
  - **Promptfoo**: Executes `npx promptfoo redteam run`
  - **Garak**: Runs `python attack.py`
  - **Agent0**: Runs `python run_agent.py`
- **Real-Time Logs**: Establishes a WebSocket or Server-Sent Events (SSE) stream to pipe stdout and stderr directly from the subprocess to the user interface.

### 3. Config & Credentials Manager (Backend & UI)
- **Credential Handling (Agent0-Inspired)**: Secures and manages credentials required by external LLM orchestrators. For example, **Agent0** requires valid external LLM keys (like a Gemini `GOOGLE_API_KEY` or `OPENAI_API_KEY`) to run its internal autonomous agent loops.
- **Ephemeral Configuration Injection**: In alignment with production security standards, **the actual local config files in the source directories must never be altered**. Modifying source files directly introduces race conditions, risk of committing active keys to Git, and configuration corruption. Instead:
  - **Template-Based Copying**: Source configs (e.g. `settings.json`, `.env`, [config.toml](exploitation/garak/config/config.toml)) are treated as read-only templates.
  - **Dynamic Temp Directory**: Prior to execution, the backend clones these templates to an isolated, temporary session folder (e.g. `/tmp/redteam_runs/<session_uuid>/`).
  - **In-Memory & Volume Mounting Injection**:
    - For Garak/Promptfoo: The backend updates the cloned copy in the temporary folder and passes its path explicitly via parameters (e.g. `npx promptfoo redteam run -c /tmp/redteam_runs/<session_uuid>/promptfooconfig.js`).
    - For Agent0: The backend populates the cloned `settings.json` and `.env` in the temp directory, then points Podman to mount *those specific files* (e.g., `-v /tmp/redteam_runs/<session_uuid>/settings.json:/a0/tmp/settings.json`) instead of mounting workspace-local paths.
  - **Automated Lifecycle Teardown**: Upon attack termination or backend reboot, the `/tmp/redteam_runs/<session_uuid>/` folder is recursively deleted, leaving the host system completely clean.

### 4. Report & Log Parser (Backend)
- Parses files in `reports/` and `logs/` folders upon execution completion.
- Unifies tool-specific formats (Garak's `.jsonl` lines, Promptfoo's `.yaml`/`.json` exports, and Agent0's markdown execution logs) into structured JSON responses for the frontend.

---

## 🎨 User Interface (Frontend Dashboard)

The frontend uses **Alpine.js** and **Tailwind CSS** to render a single-page dashboard structured around **Exploitation Campaigns (Tab-per-Exploitation)**. This matches the reality that each exploitation is hardcoded or configured to target a specific sandbox environment.

### 1. Main Navigation & Dashboard Console (Overview)
- **Top Metrics**: Showcases key performance indicators (Total runs, vulnerabilities confirmed, test success rates).
- **Tab Selection**: Switch between campaigns:
  - 🛠️ **Agent0 Agentic Attack**
  - 🔍 **Garak Scan**
  - 🧪 **Promptfoo Evaluation**
  - 👹 **LangGrinch (CVE-2025-68664)**
  - 💀 **Ni8mare (CVE-2026-21858)**
  - 📦 **LocalAI Tarslip (CVE-2024-6868)**
  - etc.

### 2. Campaign Interface Layout (Inside Each Tab)
Each tab acts as a self-contained "Playbook" representing the execution path for that exploit:
- **A. Sandbox Target & Status Panel**: 
  - **CVE Exploits (Implicit)**: Shows the hardcoded target sandbox required (e.g. `Target Sandbox: llm_local_langchain_core_v1.2.4`) and its status (🔴 Stopped, 🟢 Active).
  - **General Exploits (Explicit Selection - e.g., Agent0)**: Since tools like **Agent0** are versatile and can run against any LLM-backed sandbox, the UI displays a dropdown to select the target (e.g. `llm_local`, `RAG_local`, `mcp_local`). The status indicator changes to reflect the active health of the selected target.
- **B. Launch Controller**:
  - A single **"Start Attack Sequence"** button. The backend starts the chosen/implicit sandbox container, waits for health validation, and runs the scan.
- **C. Context-Specific Inputs**:
  - Displays configuration inputs tailored *only* to that campaign.
  - For `Agent0`: API key forms (Gemini/OpenAI) similar to Agent Zero's slide-over panel, plus prompt selection files (e.g. selecting `OWASP_Top10_LLM_App.md`).
  - For `Garak` or `Promptfoo`: Checkboxes to select target OWASP Top 10 categories.
  - For manual CVE exploits: Target payloads or exfiltration variables.
- **D. Streaming Log Terminal**:
  - An inline terminal screen showing real-time ANSI-color execution logs for that campaign's run.


### 3. Interactive Report Hub
- A dedicated general tab or panel in each tab displaying aggregated results:
  - **Summary**: Charts indicating vulnerability breakdowns (e.g., radar maps of OWASP coverage).
  - **Payload Log**: A filterable database table displaying:
    - *Adversarial Input*: The prompt used.
    - *Response Received*: The model/sandbox output.
    - *Outcome Status*: 🔴 Confirmed Leak/Exploit or 🟢 Safely Blocked.
    - *Payload Traces*: Generated stack traces or files.

---

## 🚀 Step-by-Step Implementation Strategy
 
1. **Phase 1: Backend CLI Wrappers & API**
   - Implement the Python wrapper functions to start/stop the containers and execute scans programmatically.
   - Build API endpoints in FastAPI to read logs/reports.
   - Configure dynamic ephemeral directory generation under `/tmp/redteam_runs/`.

2. **Phase 2: WebSocket Streaming & SSE**
   - Wire up WebSocket handlers to stream stdout from the subprocesses so that users see real-time updates.

3. **Phase 3: Frontend Dashboard UI (Alpine.js)**
   - Create a clean web UI using pure HTML, Alpine.js, and Tailwind CSS.
   - Design the settings drawer/sidebar (inspired by Agent Zero) for model configurations and api keys.
   - Render status badges and the live console logs.

4. **Phase 4: Unified Parsing & Reporting**
   - Write custom parsers for `Garak`, `Promptfoo`, and `Agent0` outputs to populate the interactive metrics dashboard.

---

## ⚡ Prerequisites & Zero-Installation Startup

Since the front-end loads its scripts dynamically, starting the dashboard requires zero Node.js setups or package installations. 

### 1. Prerequisites (Already present for the lab)
- **Python 3.12** with `uv` installed.
- **Podman** or **Docker** installed and running on the host machine.

### 2. Startup Command
The backend dependencies (e.g., `fastapi`, `uvicorn`, `websockets`) are configured in a standalone Python virtual environment managed by `uv`. The user can boot up the entire system with a single command:

```bash
# Run the UI locally on port 8000
uv run uvicorn main:app --port 8000 --reload
```

When opened in the browser at `http://localhost:8000`, the page renders immediately using the locally-served frontend assets from the `/static/vendor/` folder, requiring no internet connection (although, an internet connection is needed for the `agent0` exploitation to work).


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

User Interface #46

GenAI Red Team Lab UI Plan

🛠️ Technology Stack (Inspired by Agent Zero UI)

🏛️ System Architecture

🔌 Core Modules

1. Sandbox Manager (Backend)

2. Exploitation Engine (Backend)

3. Config & Credentials Manager (Backend & UI)

4. Report & Log Parser (Backend)

🎨 User Interface (Frontend Dashboard)

1. Main Navigation & Dashboard Console (Overview)

2. Campaign Interface Layout (Inside Each Tab)

3. Interactive Report Hub

🚀 Step-by-Step Implementation Strategy

⚡ Prerequisites & Zero-Installation Startup

1. Prerequisites (Already present for the lab)

2. Startup Command

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

User Interface #46

Description

GenAI Red Team Lab UI Plan

🛠️ Technology Stack (Inspired by Agent Zero UI)

🏛️ System Architecture

🔌 Core Modules

1. Sandbox Manager (Backend)

2. Exploitation Engine (Backend)

3. Config & Credentials Manager (Backend & UI)

4. Report & Log Parser (Backend)

🎨 User Interface (Frontend Dashboard)

1. Main Navigation & Dashboard Console (Overview)

2. Campaign Interface Layout (Inside Each Tab)

3. Interactive Report Hub

🚀 Step-by-Step Implementation Strategy

⚡ Prerequisites & Zero-Installation Startup

1. Prerequisites (Already present for the lab)

2. Startup Command

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions