EvalOps Workbench

A local-first evaluation harness for prompts, tools, and agents with regression tracking and experiment history.

Problem

LLM teams lack a lightweight way to compare prompt and tool changes before shipping.

Users

Agent builders, prompt engineers, applied AI teams

Core Capabilities

Load datasets from JSON or CSV
Run prompt or agent variants
Score outputs with rubric functions
Compare runs and export regressions

Why This Matters

Evaluation is moving from optional best practice to baseline engineering hygiene.

Architecture

core: domain logic for evalops workbench.
cli: operator-facing entrypoint for local workflows and smoke checks.
docs/: product notes, roadmap, and architecture decisions.
tests/: baseline regression coverage for the project contract.

Local Usage

uv run evalops-workbench summary
uv run evalops-workbench capabilities
uv run evalops-workbench roadmap

Initial Stack Direction

Python, Typer, DuckDB, OpenTelemetry

Delivery Standard

Clear product thesis
Setup that works locally
Tests for the primary contract
Documentation for roadmap and architecture
Space for production integrations in the next iteration

Showcase

This repository ships with a static Vercel-ready landing page for demos and previews.

vercel deploy -y

The deployed site presents EvalOps Workbench as a standalone product page.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
.github/workflows		.github/workflows
docs		docs
src/evalops_workbench		src/evalops_workbench
tests		tests
.gitignore		.gitignore
.vercelignore		.vercelignore
LICENSE		LICENSE
README.md		README.md
index.html		index.html
pyproject.toml		pyproject.toml
styles.css		styles.css
uv.lock		uv.lock
vercel.json		vercel.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

EvalOps Workbench

Problem

Users

Core Capabilities

Why This Matters

Architecture

Local Usage

Initial Stack Direction

Delivery Standard

Showcase

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

EvalOps Workbench

Problem

Users

Core Capabilities

Why This Matters

Architecture

Local Usage

Initial Stack Direction

Delivery Standard

Showcase

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages