EEA Chatbot Tests

Playwright-based end-to-end test framework for volto-eea-chatbot with step-based logging, JSONL output, LLM-powered quality analysis, and PDF report generation.

Features

Browser automation with Playwright (Chromium, Firefox, WebKit)
Step-based test logging with timed actions and JSONL output
Data-driven testing via parametrized fixtures with validation rules
LLM quality verification of chatbot responses (relevance, specificity, citations, information)
Halloumi fact-check integration and scoring
Result analysis with performance metrics, failure categorization, and health assessment
Multi-run comparison with regression/fix/flaky detection and stability scoring
PDF report generation with EEA branding and executive summaries

Quick Start

Prerequisites

Python 3.10+
A running volto-eea-chatbot instance

Installation

pip install -e .
playwright install chromium

Or using Make:

make install

Configuration

Copy the example config and edit it:

cp config.example.json config.json

Set the chatbot URL and desired options:

{
  "chatbot_base_url": "https://www.eea.europa.eu",
  "chatbot_path": "/en/chatbot",
  "headless": true,
  "browser": "chromium",
  "timeout": 240000,
  "expect_timeout": 30000,
  "reports_dir": "./reports",
  "fixtures_dir": "./fixtures"
}

Run Tests

# Run all tests
chatbot_tests run

# Run specific markers
chatbot_tests run -m basic

# Run with visible browser
chatbot_tests run --headed

# Limit fixture questions
chatbot_tests run --limit 5

Analyze Results

# Summary with all sections
chatbot_tests analyze ./reports/test_run_*.jsonl --all

# Compare multiple runs
chatbot_tests compare ./reports/run1.jsonl ./reports/run2.jsonl

CLI Reference

`chatbot_tests run`

Run Playwright tests against a chatbot instance.

Option	Description
`-c, --config`	Path to JSON config file (default: `config.json`)
`-m, --marker`	Comma-separated test markers to filter
`--headed`	Show browser window (overrides config `headless`)
`--limit N`	Limit to first N fixture questions
`-o, --output`	Custom output file or directory
`--color`	Force ANSI color output

`chatbot_tests analyze`

Analyze a test run JSONL file.

Option	Description
`input`	JSONL file to analyze
`-c, --config`	Path to JSON config file
`--failures`	Show failure details grouped by category
`--steps`	Show step-by-step breakdown
`--by-marker`	Group results by pytest marker
`--performance`	Show duration metrics
`--insights`	Show auto-generated health insights
`--all`	Show all analysis sections
`--llm`	Run LLM analysis and generate PDF report

`chatbot_tests compare`

Compare multiple test runs for trends and regressions.

Option	Description
`files`	Two or more JSONL files to compare
`-c, --config`	Path to JSON config file
`--llm`	Run LLM comparison and generate PDF report

Configuration

Settings are loaded from (in priority order):

Environment variables (e.g., CHATBOT_BASE_URL)
.env file
JSON config file (config.json by default)

Settings

Setting	Default	Description
`chatbot_base_url`	`http://localhost:3000`	Volto frontend URL
`chatbot_path`	`/chatbot`	Path to chatbot page
`headless`	`true`	Run browser headless
`browser`	`chromium`	Browser engine
`timeout`	`120000`	Default timeout (ms)
`expect_timeout`	`30000`	Assertion timeout (ms)
`reports_dir`	`./chatbot_tests/reports`	Output directory
`fixtures_dir`	`./fixtures`	Test fixtures directory
`pdf_font`	`null`	TTF font path for PDF Unicode support
`enable_llm_analysis`	`false`	Enable LLM quality analysis
`llm_model`	`Inhouse-LLM/gpt-oss-120b`	LLM model identifier
`llm_url`	`https://llmgw.eea.europa.eu`	LLM API endpoint
`llm_api_key`	``	LLM API key

Test Structure

Test Files

File	Description
`tests/test_basic.py`	UI and integration tests (`@pytest.mark.always`) — always run
`tests/test_questions.py`	Data-driven question validation from `fixtures/*.json`

Fixtures (v2.0)

Test questions are defined in fixtures/*.json with a minimal format:

{
  "version": "2.0.0",
  "default_validation": {
    "response": { "min_length": 100 },
    "sources": { "min_count": 1 },
    "llm": {
      "verify_answers_question": true,
      "verify_not_vague": true,
      "verify_citations": true,
      "verify_lack_information": true
    }
  },
  "default_feedback": true,
  "test_cases": [
    {
      "id": "Q-001",
      "priority": "high",
      "question": "What is the current state of air quality in Europe?",
      "markers": ["air_quality"]
    }
  ]
}

Each test case inherits default_validation and can override specific fields.

Markers

Marker	Description
`always`	Tests that run regardless of `-m` filter
`basic`	Basic chatbot functionality
`halloumi`	Halloumi fact-check tests
`feedback`	Feedback functionality
`follow_up`	Follow-up query tests
`high` / `medium` / `low`	Priority (auto-added from fixture `priority`)

Topic markers (e.g., satellite, copernicus, ai) are dynamically applied from fixture markers arrays.

LLM Quality Analysis

When enabled, the framework uses an external LLM to evaluate chatbot responses across four dimensions:

Dimension	Pass	Fail
Information	Has sufficient information	Lacks information
Relevance	On-topic	Off-topic
Specificity	Not vague	Too vague
Citations	Has citations	Missing citations

Setup

Add LLM settings to config.json:

{
  "enable_llm_analysis": true,
  "llm_model": "Inhouse-LLM/gpt-oss-120b",
  "llm_url": "https://llmgw.eea.europa.eu",
  "llm_api_key": "your_api_key"
}

Report Generation

# Single run analysis with PDF
chatbot_tests analyze ./results.jsonl --llm

# Multi-run comparison with PDF
chatbot_tests compare ./run1.jsonl ./run2.jsonl --llm

Generated reports include executive summaries, risk assessments, trend analysis, and actionable recommendations.

Output Files

Test runs produce files in the reports/ directory:

File	Description
`test_run_<timestamp>.jsonl`	Raw test execution events
`analysis_<timestamp>.json`	Analyzed test results
`analysis_<timestamp>.pdf`	LLM-generated analysis report
`comparison_<timestamp>.json`	Multi-run comparison data
`comparison_<timestamp>.pdf`	LLM-generated comparison report

Make Targets

make install    Install dependencies and Playwright browser
make run        Run all chatbot tests
make headed     Run tests with visible browser
make analyze    Analyze a test report (FILE=path)
make compare    Compare test runs (FILES="path1 path2")
make help       Show available targets

Docker Usage

The project is fully dockerized for easy execution without local dependencies.

Configuration (`.env`)

Before running, copy the example environment file:

cp .env.example .env

Edit .env to configure your test settings (URLs, timeouts, LLM keys). Docker Compose will automatically inject these variables into the containers.

Running Tests

To start the test suite using your .env configuration (by default running the basic and question tests with color output):

docker-compose up tests

If you need to override the default command (e.g., to run different markers):

docker-compose run --rm tests chatbot_tests run -m always --color

Analysis and Comparison Tools

The analyze and compare services are placed under the tools profile to prevent them from starting automatically. Run them on-demand:

Analyze a specific test report:

docker-compose run --rm analyze /app/reports/test_run_1234567890.jsonl --all

Compare two test reports:

docker-compose run --rm compare /app/reports/run1.jsonl /app/reports/run2.jsonl --llm

Documentation

GUIDE.md — Detailed guide for writing tests, page object reference, fixture format, LLM verification patterns
CLAUDE.md — Developer reference with architecture, implementation details, and module documentation

Copyright and license

See LICENSE.md for details.

Funding

European Environment Agency (EU)

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
fixtures		fixtures
fonts		fonts
page_objects		page_objects
tests		tests
.dockerignore		.dockerignore
.env.example		.env.example
.gitignore		.gitignore
01.questions.json		01.questions.json
01.questions.jsonl		01.questions.jsonl
CLAUDE.md		CLAUDE.md
Dockerfile		Dockerfile
GUIDE.md		GUIDE.md
HALLOUMI_SCORING.md		HALLOUMI_SCORING.md
LICENSE.md		LICENSE.md
Makefile		Makefile
README.md		README.md
__init__.py		__init__.py
analyze.py		analyze.py
config.example.json		config.example.json
config.py		config.py
console.py		console.py
docker-compose.yml		docker-compose.yml
eea-logo.svg		eea-logo.svg
html_report.py		html_report.py
llm_analysis.py		llm_analysis.py
main.py		main.py
output.py		output.py
plugin.py		plugin.py
pyproject.toml		pyproject.toml
pytest.ini		pytest.ini
requirements.txt		requirements.txt
runner.py		runner.py
step.py		step.py
styles.css		styles.css
utils.py		utils.py

Folders and files

Latest commit

History

Repository files navigation

EEA Chatbot Tests

Features

Quick Start

Prerequisites

Installation

Configuration

Run Tests

Analyze Results

CLI Reference

chatbot_tests run

chatbot_tests analyze

chatbot_tests compare

Configuration

Settings

Test Structure

Test Files

Fixtures (v2.0)

Markers

LLM Quality Analysis

Setup

Report Generation

Output Files

Make Targets

Docker Usage

Configuration (.env)

Running Tests

Analysis and Comparison Tools

Documentation

Copyright and license

Funding

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

`chatbot_tests run`

`chatbot_tests analyze`

`chatbot_tests compare`

Configuration (`.env`)

Packages