harness-python-react — Harness Primer

A plain-English companion to HARNESS.md. If HARNESS.md is the map, this is the guided tour: written for someone who understands modern AI coding conceptually but is not the engineer who would set up a harness from scratch. You should be able to read this end-to-end without opening any other file — though every claim links back to a source-of-truth file you can verify.

What a harness is, and why this template has one
The three principles, with code
The numbered invariants — at a glance
Pre-commit — the local checkpoint
CI gates — the cloud checkpoint
Agent-level hooks — opt-in per developer
Skills — guidance, not enforcement
The evaluation harness — a one-page summary
Process-as-code
Defence in depth — two worked examples
Operating within the harness — a survival guide
Setup checklist for your harness maintainer
Glossary
Where to go next

1. What a harness is, and why this template has one

A harness, in this codebase, is the set of automated rails that catch bad changes before they reach the main branch. Think of it like the safety systems in a modern car: the seatbelt warning that beeps at you, the interlock that stops the engine starting in gear, and the crash-rated chassis that tolerates the mistakes the first two missed. None of those systems are the driver — but together they make a single bad input survivable.

The template's harness has the same shape, in three layers:

Layer	What it does	Car analogy
Prompts describe	`CLAUDE.md` and the `docs/` files tell a human or AI agent what the project values. Advisory only.	The seatbelt warning beep — informative, ignorable.
Skills guide	Topic-specific briefings the agent loads when relevant (`.claude/skills/`). Shape style and approach but reject nothing.	Lane-keep assist — nudges you back, doesn't lock the wheel.
Hooks and CI enforce	Mechanical checks: pre-commit, agent-level hooks, GitHub Actions. A violation here blocks the change.	The crash-rated chassis and the interlock — non-negotiable.

The principle: anything that matters is enforced mechanically. Prose can drift across model versions, sessions, and reviewers; a regex in a CI job cannot. Where this primer says "the system requires X," it means a script somewhere will refuse to let X be violated — and this doc tells you which script.

A second harness is the evaluation harness (§8) — tests whether the agent answers questions correctly, not whether the code compiles.

If HARNESS.md is the map, this is the guided tour. Engineers reach for the map. Everyone else can start here.

2. The three principles, with code

2.1 Invariants — what cannot be true

An invariant is a numbered rule the system maintains, with a pointer to the code that enforces it. "An invariant in a validator is a fact; an invariant in prose is a suggestion." The repo treats unenforced invariants as bugs — every aspirational entry must cite a tracking issue (Aspirational ticket cite CI job).

→ Source of truth: docs/INVARIANTS.md.

2.2 Contracts — the shape at every seam

Every piece of data that crosses a module boundary or process boundary is a Pydantic model that inherits from a strict base class. The base class refuses unknown fields at construction. A typo fails at the seam, not three calls deeper.

# src/models/_base.py
from pydantic import BaseModel, ConfigDict


class StrictModel(BaseModel):
    """Base for every contract that crosses a module or process seam."""

    model_config = ConfigDict(extra="forbid")

What you'll see if you violate it. A typo like HealthResponse(status="ok", versoin="0.1.0") raises:

pydantic_core._pydantic_core.ValidationError: 1 validation error for HealthResponse
versoin
  Extra inputs are not permitted [type=extra_forbidden, input_value='0.1.0', input_type=str]

The rejection happens at the API boundary, in the FastAPI handler, before any business logic runs.

2.3 Boundaries — who depends on whom

src/ is a one-way dependency graph:

   src.api ─┐         ┌─ src.eval     (HTTP / golden dataset, siblings)
            ▼         ▼
            src.agent                 (the tool-calling loop)
                ▼
            src.tools                 (typed tool registry)
                ▼
            src.data                  (DB / external systems)
                ▼
            src.observability         (OTel spans, logging)
                ▼
            src.models                (Pydantic contracts; depends on nothing)

Enforced by import-linter (pyproject.toml [tool.importlinter]).

What you'll see if you violate it. Adding from src.api.routes import router to src/models/foo.py fails the Architecture (import-linter) CI job:

src.models is not allowed to import src.api:

-   src.models.foo -> src.api.routes (l.4)

Contracts: 0 kept, 2 broken.

There is no override flag.

3. The numbered invariants — at a glance

#	Rule	Where enforced
1	Every contract crossing a module seam forbids unknown keys	`StrictModel` in src/models/_base.py + tests/test_models.py
2	API endpoints live under `/api/v1/` and return typed responses	Router prefix in src/api/routes.py + route-walk test in tests/test_route_versioning.py
3	Layer flow is one-way	`import-linter` contracts in pyproject.toml + the `Architecture` CI job
4	Coverage ≥ 75 % on `src/`	`[tool.coverage.report].fail_under = 75` + the `Coverage` CI job
5	No secret leaves the repo unscanned	Three-layer scan: `.claude/hooks/pretooluse_bash.py` → pre-commit gitleaks → `Secret scan (gitleaks)` CI job

→ Full detail: docs/INVARIANTS.md. Slots 6+ are reserved for project-specific invariants the team adds as the domain stabilises; the Aspirational ticket cite gate enforces that any aspirational marker line cites a #NNN ticket.

4. Pre-commit — the local checkpoint

Pre-commit runs on your machine before each commit is created. Failures here cost zero CI minutes.

→ Source of truth: .pre-commit-config.yaml.

Hook	What it does
ruff (`--fix`)	Lints Python and auto-fixes what it can.
ruff-format	Applies the project's Python code style.
check-yaml / check-toml / check-json	Parses every YAML/TOML/JSON file you're committing.
check-merge-conflict	Refuses to commit a file with `<<<<<<<` markers.
check-added-large-files	Blocks any new file over 500 KB.
end-of-file-fixer / trailing-whitespace / mixed-line-ending	Hygiene.
gitleaks	Scans the diff for secrets — first of three independent scan layers.
commitizen	At `commit-msg` stage. Refuses commit messages that don't match `<type>(<scope>): <subject>` with one of the seven allowed types and a non-Title-Case subject.
mypy (`--strict`)	Type-checks all of `src/` and `tests/`.

Wire once with uv run pre-commit install --hook-type pre-commit --hook-type commit-msg.

5. CI gates — the cloud checkpoint

Every push and every PR triggers GitHub Actions. 21 required contexts must pass before merge.

→ Source of truth: .github/workflows/.

5.1 Code quality

Job	What it does
`Lint & Format`	`ruff check .` + `ruff format --check .`. Zero tolerance.
`Type Check`	`mypy --strict src/ tests/`.
`Pre-commit`	Re-runs `pre-commit run --all-files` in CI.
`Lint PR title`	Conventional-commit prefix + lowercase-or-initialism subject.
`Commit-type sync`	Asserts commitizen schema and `pr-title.yml` agree on both the type allowlist and the subject-case constraint.
`Branch-protection contexts sync`	Asserts every workflow job is listed in `.github/branch-protection/{develop,main}.json` (or in the script's `EXEMPT_WORKFLOWS` map).
`Version bump check`	`pyproject.toml` `[project].version` differs from base; `uv.lock` self-version matches. `release:` PRs exempt.
`Tests required`	`feat:`/`fix:` PRs that touch `src/` must touch `tests/`. Other prefixes get a warn-only.
`File length`	`*.py` files in `src/`, `tests/`, `eval/`, `.github/scripts/` capped at 300 lines. Function caps via ruff `PLR0915`/`PLR0912` (50 stmts / 12 branches).
`src/ README audit`	Every `src/` package with code has a README ≥ 200 bytes containing `## Key interfaces` (or `## Public surface`).
`Aspirational ticket cite`	Lines starting with `*Aspirational` in `docs/INVARIANTS.md` must cite at least one `#NNN` ticket; closed cites warn (or fail under `ASPIRATIONAL_STRICT=1`).

5.2 Correctness

Job	What it does
`Unit tests`	`pytest tests/ -m "not integration"` — fast feedback. Includes doc-vs-code drift backstops (`test_route_versioning`, `test_otel_semconv`, etc.).
`Coverage`	`pytest tests/ --cov=src` with `fail_under = 75`.

5.3 Architecture

Job	What it does
`Architecture (import-linter)`	Runs `import-linter` against the two contracts in `pyproject.toml`.
`Frontend Build`	`npm run build` from `frontend/`.
`Frontend Quality`	eslint (`max-warnings 0`) + prettier `--check` + `tsc --noEmit` + vitest.

5.4 Security

Job	What it does
`Secret scan (gitleaks)`	Repo-wide secret scan on every push and PR. Third independent layer.
`Python deps (pip-audit)`	CVE scan against `uv.lock`. Per-CVE ignore list at `.github/security/pip-audit-ignore.txt`.
`Frontend deps (npm audit)`	`--audit-level=high`.
`Container image scan (trivy)`	OS- and library-level CVE scan on the built image.
`Action pinning audit`	Walks every `.github/workflows/*.yml`. First-party = major tag; `astral-sh/setup-uv` = patch tag; third-party = SHA + `# vN.M.P` comment.

5.5 Operations (workflows on a schedule / trigger)

Workflow	When	What it does
`branch-protection.yml`	Weekly + dispatch	Re-applies `.github/branch-protection/*.json` via `gh api`. Drift re-assertion.
`eval-nightly.yml`	`workflow_dispatch` only by default	Runs `pytest eval/` against the configured LLM. Documented opt-in for `schedule:`.
`artifact-cleanup.yml`	Weekly + dispatch	Prunes Actions artifacts older than 7 days.
`release-drafter.yml`	Push to main + PR label events	Updates a draft GitHub Release under `v$RESOLVED_VERSION`.
`release.yml`	On tag `v..*`	Builds the image, pushes to GHCR, generates a CycloneDX SBOM, publishes the release.
`changelog-rollup.yml`	After release.yml succeeds + dispatch	Opens a `chore: roll up CHANGELOG …` PR against `develop` — moves `[Unreleased]` entries under the released version's heading + bumps `pyproject.toml` + `uv.lock` PATCH.
`codeql.yml`	`workflow_dispatch` only (placeholder)	Static analysis. Activate when the repo is public or has GHAS.

6. Agent-level hooks — opt-in per developer

.claude/hooks/ scripts run around the LLM agent's own actions — pre-commit and CI run when code moves, hooks run when the agent moves.

→ Source of truth: .claude/hooks/ (3 Python scripts) + .claude/settings.local.json.example.

Script	Purpose
`pretooluse_bash.py`	(1) Refuses Bash commands containing bypass flags (`git --no-verify`, `--no-hooks`, `--no-gpg-sign`); (2) on `git commit`, scans the staged diff for AWS/sk-/ghp_/PEM/Slack patterns and blocks if any match; (3) appends every Bash invocation to `.claude/bash-log.txt` (gitignored) for forensics.
`posttooluse_writeedit.py`	After every Write/Edit, dispatches the right formatter — ruff for `.py`, prettier for `.ts/.tsx/.js/.jsx/.css/.json/.html/.md`.
`sessionstart.py`	At session start, injects current branch + `git status --short` as `additionalContext` so the agent is grounded in the actual repo state.

Wire once: cp .claude/settings.local.json.example .claude/settings.local.json.

Until you wire it, your local agent's actions are not policed by these scripts — but pre-commit and CI still are, so nothing slips through to main.

7. Skills — guidance, not enforcement

A skill is a markdown briefing the agent loads when its topic matches the current task. Skills shape style; they reject nothing.

→ Source of truth: .claude/skills/.

Skill	When it activates
`architect`	Architecture decisions, module boundaries, tech-stack choices, API contracts.
`code-reviewer`	After code is written or edited. 10-point review checklist.
`devops`	Docker, docker-compose, CI/CD, `pyproject.toml`, observability config.
`frontend`	React/TS work in `frontend/`.
`qa-engineer`	Tests, eval harness, golden-dataset changes.
`technical-writer`	Docs, READMEs, inline documentation.

A skill is a librarian who hands you a relevant book before you start; a hook is a gate guard who refuses to let you leave with the wrong book.

8. The evaluation harness — a one-page summary

Distinct from the build harness (everything above), the evaluation harness tests whether the agent answers questions correctly.

→ Source of truth: docs/EVAL_HARNESS.md, with the dataset in eval/golden_qa.json and the runner in eval/test_golden_qa.py.

Shape:

A golden dataset of question / expected-answer pairs (one trivial echo case ships; replace with your domain dataset).
Three tolerance modes: exact_match, numeric_close (within 1 %), semantic_similar (LLM judge ≥ 0.8).
Provider-agnostic — wire your concrete LLM client via the LLMClient Protocol in src/eval/judge.py.
Disabled-by-default nightly (eval-nightly.yml) — workflow_dispatch only, opt-in to schedule: after configuring secrets.

9. Process-as-code

Concern	File / mechanism
PR template	.github/pull_request_template.md.
Issue templates	.github/ISSUE_TEMPLATE/: `bug.md`, `feature.md`, `eval-regression.md`. Blank issues disabled.
Optional Beads queue	docs/BEADS.md: GitHub Issues remain canonical while Beads can track local ready/blocked execution.
Code ownership	.github/CODEOWNERS.
Branch protection	.github/branch-protection/{main,develop}.json declarative configs, re-applied weekly by branch-protection.yml.
Commit message shape	Commitizen, configured in `pyproject.toml`.
Branching model	`feat/*` → `develop` → `main`. No direct commits.

The 7 allowed conventional-commit prefixes:

feat | fix | docs | test | refactor | chore | release

release: is project-specific — develop → main release PRs only.

10. Defence in depth — two worked examples

10.1 A secret is accidentally committed

Layer	What happens
1. Pre-commit `gitleaks`	Local hook refuses the commit; never makes a commit object.
2. Agent-level secret-scan hook (opt-in)	If the agent runs `git commit`, `pretooluse_bash.py` scans the staged diff itself and blocks.
3. CI `Secret scan (gitleaks)`	If layers 1–2 were bypassed, the GitHub Action runs on push and fails the PR.
4. Manual review via CODEOWNERS	Reviewers see the diff.

For the secret to land on main, all four have to fail or be bypassed on the same change.

10.2 A boundary violation is introduced

A developer adds from src.api.routes import router inside src/models/foo.py.

Layer	What happens
1. Local `lint-imports`	If the developer runs `just architecture` (or has it in their editor), they see the error at their desk.
2. CI `Architecture (import-linter)`	Runs on every push. `import-linter` reports the forbidden chain and exits non-zero. PR cannot merge.

The error names the offending module, line, and contract — no guessing.

11. Operating within the harness — a survival guide

Error you see	What it means	What to do
`pydantic_core ... extra_forbidden`	A contract got a field name it doesn't recognise. Often a typo.	Open the model under `src/models/`. Fix the field name.
CI `Lint & Format` failed	Ruff caught a style or correctness issue.	`uv run ruff check . --fix` locally.
CI `Type Check` failed	mypy strict caught an untyped function or unsafe coercion.	The error names file + line. Add the missing annotation.
CI `Architecture (import-linter)` failed	An import crossed a layer boundary.	Read the chain. Move the code or delete the import.
CI `Lint PR title` failed	Title doesn't start with one of the 7 prefixes, or starts with Title Case.	Edit the PR title. Lowercase verb after the colon, or all-caps initialism.
CI `Version bump check` failed	`pyproject.toml` `[project].version` is unchanged from the base branch.	Bump pyproject.toml + the matching `[[package]]` block in `uv.lock`.
CI `Tests required` failed	`feat:`/`fix:` PR touched `src/` without touching `tests/`.	Add a test. If genuinely test-exempt, use `chore:` / `refactor:` instead.
CI `File length` failed	A `.py` file > 300 lines or a function > 50 stmts / 12 branches.	Split. There is no `# noqa` exemption.
CI `src/ README audit` failed	A package missing README, < 200 bytes, or missing `## Key interfaces`.	Add or extend the README.
CI `Aspirational ticket cite` failed	A `*Aspirational` line in `docs/INVARIANTS.md` cites no ticket.	File the ticket and add the cite, or remove the marker line if the rule is now enforced.
CI `Action pinning audit` failed	A `uses:` line doesn't match the bucket policy.	First-party = major tag; setup-uv = patch tag; third-party = SHA + `# vN.M.P` comment.
Local commit rejected by commitizen	Commit message doesn't match `<type>(<scope>): <subject>`.	Re-run with the correct shape.
`Coverage` failed	Test coverage on `src/` dropped below 75 %.	Add tests for the lines flagged in the report.

Rule of thumb. If a check rejects a change, the rejection message names the file, the line, and the rule. Read it carefully before reaching for an override flag — there is almost always a real bug under it.

12. Setup checklist for your harness maintainer

Install dependencies. uv sync --extra dev.
Wire pre-commit. uv run pre-commit install --hook-type pre-commit --hook-type commit-msg.
(Opt-in) Wire agent-level hooks. cp .claude/settings.local.json.example .claude/settings.local.json.
Verify CI is green on a no-op PR before merging real work.
Apply branch protection by setting a BRANCH_PROTECTION_TOKEN secret with admin:repo scope and triggering branch-protection.yml. The default GITHUB_TOKEN cannot edit branch protection on the repo it runs in.
(Opt-in) Set LLM_* secrets for the eval harness when you flip eval-nightly.yml to schedule:.
(Opt-in) Provision RELEASE_BOT_TOKEN so changelog-rollup.yml's auto-PR triggers its own CI matrix on creation. Falls back to GITHUB_TOKEN (auto-PR opens, only the auto-CI is lost).

13. Glossary

Term	What it is
Pydantic	Python library for runtime data validation. The basis of every contract in this repo.
mypy	Static type checker. Strict mode refuses missing/implicit annotations.
ruff	Fast Rust-based Python linter and formatter. Catches style + dead code + security smells.
import-linter	Checks the import graph against declared contracts (layered, forbidden).
pre-commit	Framework that registers itself as a git hook so configured checks run before every commit.
GitHub Actions	GitHub's built-in CI. Workflows under `.github/workflows/` run on triggers.
Conventional commits	`<type>(<scope>): <subject>` shape with a fixed type set (this repo allows 7).
Semantic versioning	`MAJOR.MINOR.PATCH` — each component conveys the kind of change.
OpenTelemetry (OTel)	Vendor-neutral standard for traces, metrics, logs. The repo follows `gen_ai.` and `db.` semantic conventions for attribute names.
CycloneDX	An SBOM format. Generated per release and attached to the GitHub Release.
gitleaks	Pattern-based secret scanner.
Beads	Optional local issue queue used for dependency-aware execution and handoffs; GitHub Issues remain canonical.

14. Where to go next

Doc	What it covers
HARNESS.md	The engineer-facing map — same surface, terser, more linkable.
INVARIANTS.md	The numbered rules in full.
BOUNDARIES.md	The dependency graph + the layer rules.
ARCHITECTURE.md	The system design — components, request flow.
SECURITY.md	Threat model + defence-in-depth mapping.
EVAL_HARNESS.md	The eval flywheel.
BEADS.md	Optional local Beads queue layered under GitHub Issues.
DEVELOPMENT.md	Local setup, branching, releases.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

harness-python-react — Harness Primer

Table of contents

1. What a harness is, and why this template has one

2. The three principles, with code

2.1 Invariants — what cannot be true

2.2 Contracts — the shape at every seam

2.3 Boundaries — who depends on whom

3. The numbered invariants — at a glance

4. Pre-commit — the local checkpoint

5. CI gates — the cloud checkpoint

5.1 Code quality

5.2 Correctness

5.3 Architecture

5.4 Security

5.5 Operations (workflows on a schedule / trigger)

6. Agent-level hooks — opt-in per developer

7. Skills — guidance, not enforcement

8. The evaluation harness — a one-page summary

9. Process-as-code

10. Defence in depth — two worked examples

10.1 A secret is accidentally committed

10.2 A boundary violation is introduced

11. Operating within the harness — a survival guide

12. Setup checklist for your harness maintainer

13. Glossary

14. Where to go next

FilesExpand file tree

HARNESS_PRIMER.md

Latest commit

History

HARNESS_PRIMER.md

File metadata and controls

harness-python-react — Harness Primer

Table of contents

1. What a harness is, and why this template has one

2. The three principles, with code

2.1 Invariants — what cannot be true

2.2 Contracts — the shape at every seam

2.3 Boundaries — who depends on whom

3. The numbered invariants — at a glance

4. Pre-commit — the local checkpoint

5. CI gates — the cloud checkpoint

5.1 Code quality

5.2 Correctness

5.3 Architecture

5.4 Security

5.5 Operations (workflows on a schedule / trigger)

6. Agent-level hooks — opt-in per developer

7. Skills — guidance, not enforcement

8. The evaluation harness — a one-page summary

9. Process-as-code

10. Defence in depth — two worked examples

10.1 A secret is accidentally committed

10.2 A boundary violation is introduced

11. Operating within the harness — a survival guide

12. Setup checklist for your harness maintainer

13. Glossary

14. Where to go next