Skip to content

Latest commit

 

History

History
378 lines (272 loc) · 22 KB

File metadata and controls

378 lines (272 loc) · 22 KB

harness-python-react — Harness Primer

A plain-English companion to HARNESS.md. If HARNESS.md is the map, this is the guided tour: written for someone who understands modern AI coding conceptually but is not the engineer who would set up a harness from scratch. You should be able to read this end-to-end without opening any other file — though every claim links back to a source-of-truth file you can verify.


Table of contents

  1. What a harness is, and why this template has one
  2. The three principles, with code
  3. The numbered invariants — at a glance
  4. Pre-commit — the local checkpoint
  5. CI gates — the cloud checkpoint
  6. Agent-level hooks — opt-in per developer
  7. Skills — guidance, not enforcement
  8. The evaluation harness — a one-page summary
  9. Process-as-code
  10. Defence in depth — two worked examples
  11. Operating within the harness — a survival guide
  12. Setup checklist for your harness maintainer
  13. Glossary
  14. Where to go next

1. What a harness is, and why this template has one

A harness, in this codebase, is the set of automated rails that catch bad changes before they reach the main branch. Think of it like the safety systems in a modern car: the seatbelt warning that beeps at you, the interlock that stops the engine starting in gear, and the crash-rated chassis that tolerates the mistakes the first two missed. None of those systems are the driver — but together they make a single bad input survivable.

The template's harness has the same shape, in three layers:

Layer What it does Car analogy
Prompts describe CLAUDE.md and the docs/ files tell a human or AI agent what the project values. Advisory only. The seatbelt warning beep — informative, ignorable.
Skills guide Topic-specific briefings the agent loads when relevant (.claude/skills/). Shape style and approach but reject nothing. Lane-keep assist — nudges you back, doesn't lock the wheel.
Hooks and CI enforce Mechanical checks: pre-commit, agent-level hooks, GitHub Actions. A violation here blocks the change. The crash-rated chassis and the interlock — non-negotiable.

The principle: anything that matters is enforced mechanically. Prose can drift across model versions, sessions, and reviewers; a regex in a CI job cannot. Where this primer says "the system requires X," it means a script somewhere will refuse to let X be violated — and this doc tells you which script.

A second harness is the evaluation harness (§8) — tests whether the agent answers questions correctly, not whether the code compiles.

If HARNESS.md is the map, this is the guided tour. Engineers reach for the map. Everyone else can start here.


2. The three principles, with code

2.1 Invariants — what cannot be true

An invariant is a numbered rule the system maintains, with a pointer to the code that enforces it. "An invariant in a validator is a fact; an invariant in prose is a suggestion." The repo treats unenforced invariants as bugs — every aspirational entry must cite a tracking issue (Aspirational ticket cite CI job).

→ Source of truth: docs/INVARIANTS.md.

2.2 Contracts — the shape at every seam

Every piece of data that crosses a module boundary or process boundary is a Pydantic model that inherits from a strict base class. The base class refuses unknown fields at construction. A typo fails at the seam, not three calls deeper.

# src/models/_base.py
from pydantic import BaseModel, ConfigDict


class StrictModel(BaseModel):
    """Base for every contract that crosses a module or process seam."""

    model_config = ConfigDict(extra="forbid")

What you'll see if you violate it. A typo like HealthResponse(status="ok", versoin="0.1.0") raises:

pydantic_core._pydantic_core.ValidationError: 1 validation error for HealthResponse
versoin
  Extra inputs are not permitted [type=extra_forbidden, input_value='0.1.0', input_type=str]

The rejection happens at the API boundary, in the FastAPI handler, before any business logic runs.

2.3 Boundaries — who depends on whom

src/ is a one-way dependency graph:

   src.api ─┐         ┌─ src.eval     (HTTP / golden dataset, siblings)
            ▼         ▼
            src.agent                 (the tool-calling loop)
                ▼
            src.tools                 (typed tool registry)
                ▼
            src.data                  (DB / external systems)
                ▼
            src.observability         (OTel spans, logging)
                ▼
            src.models                (Pydantic contracts; depends on nothing)

Enforced by import-linter (pyproject.toml [tool.importlinter]).

What you'll see if you violate it. Adding from src.api.routes import router to src/models/foo.py fails the Architecture (import-linter) CI job:

src.models is not allowed to import src.api:

-   src.models.foo -> src.api.routes (l.4)

Contracts: 0 kept, 2 broken.

There is no override flag.


3. The numbered invariants — at a glance

# Rule Where enforced
1 Every contract crossing a module seam forbids unknown keys StrictModel in src/models/_base.py + tests/test_models.py
2 API endpoints live under /api/v1/ and return typed responses Router prefix in src/api/routes.py + route-walk test in tests/test_route_versioning.py
3 Layer flow is one-way import-linter contracts in pyproject.toml + the Architecture CI job
4 Coverage ≥ 75 % on src/ [tool.coverage.report].fail_under = 75 + the Coverage CI job
5 No secret leaves the repo unscanned Three-layer scan: .claude/hooks/pretooluse_bash.py → pre-commit gitleaks → Secret scan (gitleaks) CI job

→ Full detail: docs/INVARIANTS.md. Slots 6+ are reserved for project-specific invariants the team adds as the domain stabilises; the Aspirational ticket cite gate enforces that any aspirational marker line cites a #NNN ticket.


4. Pre-commit — the local checkpoint

Pre-commit runs on your machine before each commit is created. Failures here cost zero CI minutes.

→ Source of truth: .pre-commit-config.yaml.

Hook What it does
ruff (--fix) Lints Python and auto-fixes what it can.
ruff-format Applies the project's Python code style.
check-yaml / check-toml / check-json Parses every YAML/TOML/JSON file you're committing.
check-merge-conflict Refuses to commit a file with <<<<<<< markers.
check-added-large-files Blocks any new file over 500 KB.
end-of-file-fixer / trailing-whitespace / mixed-line-ending Hygiene.
gitleaks Scans the diff for secrets — first of three independent scan layers.
commitizen At commit-msg stage. Refuses commit messages that don't match <type>(<scope>): <subject> with one of the seven allowed types and a non-Title-Case subject.
mypy (--strict) Type-checks all of src/ and tests/.

Wire once with uv run pre-commit install --hook-type pre-commit --hook-type commit-msg.


5. CI gates — the cloud checkpoint

Every push and every PR triggers GitHub Actions. 21 required contexts must pass before merge.

→ Source of truth: .github/workflows/.

5.1 Code quality

Job What it does
Lint & Format ruff check . + ruff format --check .. Zero tolerance.
Type Check mypy --strict src/ tests/.
Pre-commit Re-runs pre-commit run --all-files in CI.
Lint PR title Conventional-commit prefix + lowercase-or-initialism subject.
Commit-type sync Asserts commitizen schema and pr-title.yml agree on both the type allowlist and the subject-case constraint.
Branch-protection contexts sync Asserts every workflow job is listed in .github/branch-protection/{develop,main}.json (or in the script's EXEMPT_WORKFLOWS map).
Version bump check pyproject.toml [project].version differs from base; uv.lock self-version matches. release: PRs exempt.
Tests required feat:/fix: PRs that touch src/ must touch tests/. Other prefixes get a warn-only.
File length *.py files in src/, tests/, eval/, .github/scripts/ capped at 300 lines. Function caps via ruff PLR0915/PLR0912 (50 stmts / 12 branches).
src/ README audit Every src/ package with code has a README ≥ 200 bytes containing ## Key interfaces (or ## Public surface).
Aspirational ticket cite Lines starting with *Aspirational in docs/INVARIANTS.md must cite at least one #NNN ticket; closed cites warn (or fail under ASPIRATIONAL_STRICT=1).

5.2 Correctness

Job What it does
Unit tests pytest tests/ -m "not integration" — fast feedback. Includes doc-vs-code drift backstops (test_route_versioning, test_otel_semconv, etc.).
Coverage pytest tests/ --cov=src with fail_under = 75.

5.3 Architecture

Job What it does
Architecture (import-linter) Runs import-linter against the two contracts in pyproject.toml.
Frontend Build npm run build from frontend/.
Frontend Quality eslint (max-warnings 0) + prettier --check + tsc --noEmit + vitest.

5.4 Security

Job What it does
Secret scan (gitleaks) Repo-wide secret scan on every push and PR. Third independent layer.
Python deps (pip-audit) CVE scan against uv.lock. Per-CVE ignore list at .github/security/pip-audit-ignore.txt.
Frontend deps (npm audit) --audit-level=high.
Container image scan (trivy) OS- and library-level CVE scan on the built image.
Action pinning audit Walks every .github/workflows/*.yml. First-party = major tag; astral-sh/setup-uv = patch tag; third-party = SHA + # vN.M.P comment.

5.5 Operations (workflows on a schedule / trigger)

Workflow When What it does
branch-protection.yml Weekly + dispatch Re-applies .github/branch-protection/*.json via gh api. Drift re-assertion.
eval-nightly.yml workflow_dispatch only by default Runs pytest eval/ against the configured LLM. Documented opt-in for schedule:.
artifact-cleanup.yml Weekly + dispatch Prunes Actions artifacts older than 7 days.
release-drafter.yml Push to main + PR label events Updates a draft GitHub Release under v$RESOLVED_VERSION.
release.yml On tag v*.*.* Builds the image, pushes to GHCR, generates a CycloneDX SBOM, publishes the release.
changelog-rollup.yml After release.yml succeeds + dispatch Opens a chore: roll up CHANGELOG … PR against develop — moves [Unreleased] entries under the released version's heading + bumps pyproject.toml + uv.lock PATCH.
codeql.yml workflow_dispatch only (placeholder) Static analysis. Activate when the repo is public or has GHAS.

6. Agent-level hooks — opt-in per developer

.claude/hooks/ scripts run around the LLM agent's own actions — pre-commit and CI run when code moves, hooks run when the agent moves.

→ Source of truth: .claude/hooks/ (3 Python scripts) + .claude/settings.local.json.example.

Script Purpose
pretooluse_bash.py (1) Refuses Bash commands containing bypass flags (git --no-verify, --no-hooks, --no-gpg-sign); (2) on git commit, scans the staged diff for AWS/sk-/ghp_/PEM/Slack patterns and blocks if any match; (3) appends every Bash invocation to .claude/bash-log.txt (gitignored) for forensics.
posttooluse_writeedit.py After every Write/Edit, dispatches the right formatter — ruff for .py, prettier for .ts/.tsx/.js/.jsx/.css/.json/.html/.md.
sessionstart.py At session start, injects current branch + git status --short as additionalContext so the agent is grounded in the actual repo state.

Wire once: cp .claude/settings.local.json.example .claude/settings.local.json.

Until you wire it, your local agent's actions are not policed by these scripts — but pre-commit and CI still are, so nothing slips through to main.


7. Skills — guidance, not enforcement

A skill is a markdown briefing the agent loads when its topic matches the current task. Skills shape style; they reject nothing.

→ Source of truth: .claude/skills/.

Skill When it activates
architect Architecture decisions, module boundaries, tech-stack choices, API contracts.
code-reviewer After code is written or edited. 10-point review checklist.
devops Docker, docker-compose, CI/CD, pyproject.toml, observability config.
frontend React/TS work in frontend/.
qa-engineer Tests, eval harness, golden-dataset changes.
technical-writer Docs, READMEs, inline documentation.

A skill is a librarian who hands you a relevant book before you start; a hook is a gate guard who refuses to let you leave with the wrong book.


8. The evaluation harness — a one-page summary

Distinct from the build harness (everything above), the evaluation harness tests whether the agent answers questions correctly.

→ Source of truth: docs/EVAL_HARNESS.md, with the dataset in eval/golden_qa.json and the runner in eval/test_golden_qa.py.

Shape:

  • A golden dataset of question / expected-answer pairs (one trivial echo case ships; replace with your domain dataset).
  • Three tolerance modes: exact_match, numeric_close (within 1 %), semantic_similar (LLM judge ≥ 0.8).
  • Provider-agnostic — wire your concrete LLM client via the LLMClient Protocol in src/eval/judge.py.
  • Disabled-by-default nightly (eval-nightly.yml) — workflow_dispatch only, opt-in to schedule: after configuring secrets.

9. Process-as-code

Concern File / mechanism
PR template .github/pull_request_template.md.
Issue templates .github/ISSUE_TEMPLATE/: bug.md, feature.md, eval-regression.md. Blank issues disabled.
Optional Beads queue docs/BEADS.md: GitHub Issues remain canonical while Beads can track local ready/blocked execution.
Code ownership .github/CODEOWNERS.
Branch protection .github/branch-protection/{main,develop}.json declarative configs, re-applied weekly by branch-protection.yml.
Commit message shape Commitizen, configured in pyproject.toml.
Branching model feat/*developmain. No direct commits.

The 7 allowed conventional-commit prefixes:

feat | fix | docs | test | refactor | chore | release

release: is project-specific — develop → main release PRs only.


10. Defence in depth — two worked examples

10.1 A secret is accidentally committed

Layer What happens
1. Pre-commit gitleaks Local hook refuses the commit; never makes a commit object.
2. Agent-level secret-scan hook (opt-in) If the agent runs git commit, pretooluse_bash.py scans the staged diff itself and blocks.
3. CI Secret scan (gitleaks) If layers 1–2 were bypassed, the GitHub Action runs on push and fails the PR.
4. Manual review via CODEOWNERS Reviewers see the diff.

For the secret to land on main, all four have to fail or be bypassed on the same change.

10.2 A boundary violation is introduced

A developer adds from src.api.routes import router inside src/models/foo.py.

Layer What happens
1. Local lint-imports If the developer runs just architecture (or has it in their editor), they see the error at their desk.
2. CI Architecture (import-linter) Runs on every push. import-linter reports the forbidden chain and exits non-zero. PR cannot merge.

The error names the offending module, line, and contract — no guessing.


11. Operating within the harness — a survival guide

Error you see What it means What to do
pydantic_core ... extra_forbidden A contract got a field name it doesn't recognise. Often a typo. Open the model under src/models/. Fix the field name.
CI Lint & Format failed Ruff caught a style or correctness issue. uv run ruff check . --fix locally.
CI Type Check failed mypy strict caught an untyped function or unsafe coercion. The error names file + line. Add the missing annotation.
CI Architecture (import-linter) failed An import crossed a layer boundary. Read the chain. Move the code or delete the import.
CI Lint PR title failed Title doesn't start with one of the 7 prefixes, or starts with Title Case. Edit the PR title. Lowercase verb after the colon, or all-caps initialism.
CI Version bump check failed pyproject.toml [project].version is unchanged from the base branch. Bump pyproject.toml + the matching [[package]] block in uv.lock.
CI Tests required failed feat:/fix: PR touched src/ without touching tests/. Add a test. If genuinely test-exempt, use chore: / refactor: instead.
CI File length failed A .py file > 300 lines or a function > 50 stmts / 12 branches. Split. There is no # noqa exemption.
CI src/ README audit failed A package missing README, < 200 bytes, or missing ## Key interfaces. Add or extend the README.
CI Aspirational ticket cite failed A *Aspirational line in docs/INVARIANTS.md cites no ticket. File the ticket and add the cite, or remove the marker line if the rule is now enforced.
CI Action pinning audit failed A uses: line doesn't match the bucket policy. First-party = major tag; setup-uv = patch tag; third-party = SHA + # vN.M.P comment.
Local commit rejected by commitizen Commit message doesn't match <type>(<scope>): <subject>. Re-run with the correct shape.
Coverage failed Test coverage on src/ dropped below 75 %. Add tests for the lines flagged in the report.

Rule of thumb. If a check rejects a change, the rejection message names the file, the line, and the rule. Read it carefully before reaching for an override flag — there is almost always a real bug under it.


12. Setup checklist for your harness maintainer

  • Install dependencies. uv sync --extra dev.
  • Wire pre-commit. uv run pre-commit install --hook-type pre-commit --hook-type commit-msg.
  • (Opt-in) Wire agent-level hooks. cp .claude/settings.local.json.example .claude/settings.local.json.
  • Verify CI is green on a no-op PR before merging real work.
  • Apply branch protection by setting a BRANCH_PROTECTION_TOKEN secret with admin:repo scope and triggering branch-protection.yml. The default GITHUB_TOKEN cannot edit branch protection on the repo it runs in.
  • (Opt-in) Set LLM_* secrets for the eval harness when you flip eval-nightly.yml to schedule:.
  • (Opt-in) Provision RELEASE_BOT_TOKEN so changelog-rollup.yml's auto-PR triggers its own CI matrix on creation. Falls back to GITHUB_TOKEN (auto-PR opens, only the auto-CI is lost).

13. Glossary

Term What it is
Pydantic Python library for runtime data validation. The basis of every contract in this repo.
mypy Static type checker. Strict mode refuses missing/implicit annotations.
ruff Fast Rust-based Python linter and formatter. Catches style + dead code + security smells.
import-linter Checks the import graph against declared contracts (layered, forbidden).
pre-commit Framework that registers itself as a git hook so configured checks run before every commit.
GitHub Actions GitHub's built-in CI. Workflows under .github/workflows/ run on triggers.
Conventional commits <type>(<scope>): <subject> shape with a fixed type set (this repo allows 7).
Semantic versioning MAJOR.MINOR.PATCH — each component conveys the kind of change.
OpenTelemetry (OTel) Vendor-neutral standard for traces, metrics, logs. The repo follows gen_ai.* and db.* semantic conventions for attribute names.
CycloneDX An SBOM format. Generated per release and attached to the GitHub Release.
gitleaks Pattern-based secret scanner.
Beads Optional local issue queue used for dependency-aware execution and handoffs; GitHub Issues remain canonical.

14. Where to go next

Doc What it covers
HARNESS.md The engineer-facing map — same surface, terser, more linkable.
INVARIANTS.md The numbered rules in full.
BOUNDARIES.md The dependency graph + the layer rules.
ARCHITECTURE.md The system design — components, request flow.
SECURITY.md Threat model + defence-in-depth mapping.
EVAL_HARNESS.md The eval flywheel.
BEADS.md Optional local Beads queue layered under GitHub Issues.
DEVELOPMENT.md Local setup, branching, releases.