From 7819adf20a6c904490c92702a8693e117c0c639e Mon Sep 17 00:00:00 2001
From: nullhack <nullhack@users.noreply.github.com>
Date: Sun, 19 Apr 2026 16:35:18 -0400
Subject: [PATCH 1/2] chore(workflow): integrate pytest-beehave and clean up
 @id conventions

- Add pytest-beehave[html]>=3.0 to dev deps; configure [tool.beehave]
- Fix test-build task: --cov=pytest_beehave -> --cov=app
- Remove manual deprecated skip hook from conftest.py (beehave owns it)
- Naming: feature-stem for .feature paths, feature_slug for test dirs
- @id tags now auto-assigned on first pytest run; remove all manual generation instructions
- Test stubs auto-generated at Step 2 end via test-fast; remove manual stub section
- Add pytest-beehave to README tooling table; add Why section and auto-gen stub example
- Update product-owner.md, AGENTS.md, and all affected skills accordingly
---
 .opencode/agents/product-owner.md          | 13 ++--
 .opencode/skills/implementation/SKILL.md   | 36 +++++------
 .opencode/skills/living-docs/SKILL.md      |  2 +-
 .opencode/skills/pr-management/SKILL.md    |  4 +-
 .opencode/skills/refactor/SKILL.md         |  6 +-
 .opencode/skills/scope/SKILL.md            | 23 +++----
 .opencode/skills/session-workflow/SKILL.md | 26 ++++----
 .opencode/skills/verify/SKILL.md           |  2 +-
 AGENTS.md                                  | 26 ++++----
 README.md                                  | 71 +++++++++++++++-------
 docs/architecture.md                       |  6 +-
 docs/discovery.md                          |  2 +-
 docs/discovery_journal.md                  |  2 +-
 pyproject.toml                             |  6 +-
 tests/conftest.py                          |  7 ---
 uv.lock                                    | 20 ++++++
 16 files changed, 142 insertions(+), 110 deletions(-)
diff --git a/.opencode/agents/product-owner.md b/.opencode/agents/product-owner.md
index 456c339..403da3c 100644
--- a/.opencode/agents/product-owner.md
+++ b/.opencode/agents/product-owner.md
@@ -51,19 +51,18 @@ When a gap is reported (by software-engineer or reviewer):
 
 | Situation | Action |
 |---|---|
-| Edge case within current user stories | Add a new Example with a new `@id` to the relevant `.feature` file. |
+| Edge case within current user stories | Add a new Example to the relevant `.feature` file. |
 | New behavior beyond current stories | Add to backlog as a new feature. Do not extend the current feature. |
-| Behavior contradicts an existing Example | Write a new Example with new `@id`. |
-| Post-merge defect | Move the `.feature` file back to `in-progress/`, add new Example with `@id`, resume at Step 3. |
+| Behavior contradicts an existing Example | Add `@deprecated` to the old Example; write a new Example. |
+| Post-merge defect | Move the `.feature` file back to `in-progress/`, add new Example, resume at Step 3. |
 
 ## Bug Handling
 
 When a defect is reported against any feature:
 
-1. Add a `@bug @id:<new-8-char-hex>` Example to the relevant `Rule:` block in the `.feature` file.
-2. Write the Example using the standard `Given/When/Then` format describing the correct behavior.
-3. Update TODO.md to note the new `@id` for the SE to implement.
-4. SE implements the `@id` test in `tests/features/` **and** a `@given` Hypothesis property test in `tests/unit/`. Both are required.
+1. Add a `@bug` Example to the relevant `Rule:` block in the `.feature` file using the standard `Given/When/Then` format describing the correct behavior.
+2. Update TODO.md to note the new bug Example for the SE to implement.
+3. SE implements the test in `tests/features/` **and** a `@given` Hypothesis property test in `tests/unit/`. Both are required.
 
 ## Available Skills
 
diff --git a/.opencode/skills/implementation/SKILL.md b/.opencode/skills/implementation/SKILL.md
index 8fbf5fd..826c6b7 100644
--- a/.opencode/skills/implementation/SKILL.md
+++ b/.opencode/skills/implementation/SKILL.md
@@ -116,12 +116,12 @@ Place stubs where responsibility dictates — do not pre-create `ports/` or `ada
 Append a new dated block to `docs/architecture.md` for each significant decision:
 
 ```markdown
-## YYYY-MM-DD — <feature-name>: <short title>
+## YYYY-MM-DD — <feature-stem>: <short title>
 
 Decision: <what was decided>
 Reason: <why, one sentence>
 Alternatives considered: <what was rejected and why>
-Feature: <feature-name>
+Feature: <feature-stem>
 ```
 
 Only write a block for non-obvious decisions with meaningful trade-offs. Routine YAGNI choices do not need a record.
@@ -141,7 +141,11 @@ Apply to the stub files just written:
 
 If any check fails: fix the stub files before committing.
 
-Commit: `feat(<feature-name>): add architecture stubs`
+### Generate Test Stubs
+
+Run `uv run task test-fast` once. It reads the in-progress `.feature` file, assigns `@id` tags to any untagged `Example:` blocks (writing them back to the `.feature` file), and generates `tests/features/<feature_slug>/<rule_slug>_test.py` — one file per `Rule:` block, one skipped function per `@id`. Verify the files were created, then stage all changes (including any `@id` write-backs to the `.feature` file).
+
+Commit: `feat(<feature-stem>): add architecture and test stubs`
 
 ---
 
@@ -152,26 +156,14 @@ Commit: `feat(<feature-name>): add architecture stubs`
 - [ ] Exactly one .feature `in_progress`. If not present, Load `skill feature-selection` 
 - [ ] Architecture stubs present in `<package>/` (committed by Step 2)
 - [ ] Read `docs/architecture.md` — understand all architectural decisions before writing any test
-- [ ] Test stub files exist in `tests/features/<feature-name>/<rule_slug>_test.py` — one file per `Rule:` block, all `@id` stub functions present with `@pytest.mark.skip`; if missing, write them now before entering RED
-
-### Write Test Stubs (if not present)
-
-For each `Rule:` block in the in-progress `.feature` file, create `tests/features/<feature-name>/<rule_slug>_test.py` if it does not already exist. Write one function per `@id` Example, all skipped:
-
-```python
-@pytest.mark.skip(reason="not yet implemented")
-def test_<feature_slug>_<@id>() -> None:
-    """
-    <@id steps raw text including new lines>
-    """
-```
+- [ ] Test stub files exist in `tests/features/<feature_slug>/<rule_slug>_test.py` — generated by pytest-beehave at Step 2 end; if missing, re-run `uv run task test-fast` and commit the generated files before entering RED
 
 ### Build TODO.md Test List
 
 1. List all `@id` tags from in-progress `.feature` file
 2. Order: fewest dependencies first; most impactful within that set
 3. Each `@id` = one TODO item, status: `pending`
-4. Confirm each `@id` has a corresponding skipped stub in `tests/features/<feature-name>/` — if any are missing, add them before proceeding
+4. Confirm each `@id` has a corresponding skipped stub in `tests/features/<feature_slug>/` — if any are missing, add them before proceeding
 
 ### Outer Loop — One @id at a time
 
@@ -182,7 +174,7 @@ For each pending `@id`:
 ```
 INNER LOOP
 ├── RED
-│   ├── Confirm stub for this @id exists in tests/features/<feature-name>/<rule_slug>.feature with @pytest.mark.skip
+│   ├── Confirm stub for this @id exists in tests/features/<feature_slug>/<rule_slug>_test.py with @pytest.mark.skip
 │   ├── Read existing stubs in `<package>/` — base the test on the current data model and signatures
 │   ├── Write test body (Given/When/Then → Arrange/Act/Assert); remove @pytest.mark.skip
 │   ├── Update <package> stub signatures as needed — edit the `.py` file directly
@@ -265,11 +257,11 @@ Signal completion to the reviewer. Provide:
 ### Test File Layout
 
 ```
-tests/features/<feature-name>/<rule_slug>_test.py
+tests/features/<feature_slug>/<rule_slug>_test.py
 ```
 
-- `<feature-name>` = the `.feature` file stem
-- `<rule_slug>` = the `Rule:` title slugified
+- `<feature_slug>` = the `.feature` file stem with hyphens replaced by underscores, lowercase
+- `<rule_slug>` = the `Rule:` title slugified (lowercase, underscores)
 
 ### Function Naming
 
@@ -299,7 +291,7 @@ def test_<feature_slug>_<@id>() -> None:
 ### Markers
 
 - `@pytest.mark.slow` — takes > 50ms (Hypothesis, DB, network, terminal I/O)
-- `@pytest.mark.deprecated` — auto-skipped by conftest; used for superseded Examples
+- `@pytest.mark.deprecated` — auto-skipped by pytest-beehave; used for superseded Examples
 
 ```python
 @pytest.mark.deprecated
diff --git a/.opencode/skills/living-docs/SKILL.md b/.opencode/skills/living-docs/SKILL.md
index ba2264b..8472547 100644
--- a/.opencode/skills/living-docs/SKILL.md
+++ b/.opencode/skills/living-docs/SKILL.md
@@ -188,7 +188,7 @@ If `docs/glossary.md` already exists:
 **When run standalone** (stakeholder on demand): commit after all diagrams and glossary are updated:
 
 ```
-docs(living-docs): update C4 and glossary after <feature-name>
+docs(living-docs): update C4 and glossary after <feature-stem>
 ```
 
 If triggered without a specific feature (general refresh):
diff --git a/.opencode/skills/pr-management/SKILL.md b/.opencode/skills/pr-management/SKILL.md
index f10605c..94af430 100644
--- a/.opencode/skills/pr-management/SKILL.md
+++ b/.opencode/skills/pr-management/SKILL.md
@@ -14,7 +14,7 @@ Create and manage pull requests after the reviewer approves the feature (Step 5)
 ## Branch Naming
 
 ```
-feature/<feature-name>    # new feature
+feature/<feature-stem>    # new feature
 fix/<issue-description>   # bug fix
 refactor/<scope>          # refactoring
 docs/<scope>              # documentation
@@ -42,7 +42,7 @@ git commit -m "chore(deps): add python-dotenv dependency"
 
 ```bash
 # Push branch
-git push -u origin feature/<feature-name>
+git push -u origin feature/<feature-stem>
 
 # Create PR
 gh pr create \
diff --git a/.opencode/skills/refactor/SKILL.md b/.opencode/skills/refactor/SKILL.md
index 6d84a2e..208d12d 100644
--- a/.opencode/skills/refactor/SKILL.md
+++ b/.opencode/skills/refactor/SKILL.md
@@ -265,9 +265,9 @@ Refactoring commits are always **separate** from feature commits.
 
 | Commit type | Message format | When |
 |---|---|---|
-| Preparatory refactoring | `refactor(<feature-name>): <what>` | Before RED, to make the feature easier |
-| REFACTOR phase | `refactor(<feature-name>): <what>` | After GREEN, cleaning up the green code |
-| Feature addition | `feat(<feature-name>): <what>` | After GREEN (never mixed with refactor) |
+| Preparatory refactoring | `refactor(<feature-stem>): <what>` | Before RED, to make the feature easier |
+| REFACTOR phase | `refactor(<feature-stem>): <what>` | After GREEN, cleaning up the green code |
+| Feature addition | `feat(<feature-stem>): <what>` | After GREEN (never mixed with refactor) |
 
 Never mix a structural cleanup with a behavior addition in one commit. This keeps history bisectable and CI green at every commit.
 
diff --git a/.opencode/skills/scope/SKILL.md b/.opencode/skills/scope/SKILL.md
index af0de0f..cd02bd5 100644
--- a/.opencode/skills/scope/SKILL.md
+++ b/.opencode/skills/scope/SKILL.md
@@ -130,7 +130,7 @@ Append all answered Q&A to `docs/discovery_journal.md`, in groups (general, cros
 Group headers use this format:
 - General group: `### General`
 - Cross-cutting group: `### <Group Name>`
-- Feature group: `### Feature: <feature-name>`
+- Feature group: `### Feature: <feature-stem>`
 
 **Step B — Update .feature descriptions**
 
@@ -216,7 +216,7 @@ Avoid: "As the system, I want..." (no business value). Break down stories that c
 - [ ] Rules collectively cover all entities in scope from the feature description
 - [ ] Every Rule passes the INVEST gate
 
-Commit: `feat(stories): write user stories for <name>`
+Commit: `feat(stories): write user stories for <feature-stem>`
 
 ### Step B — Criteria
 
@@ -244,7 +244,6 @@ All Rules must have their pre-mortems completed before any Examples are written.
 ```
 
 **Rules**:
-- `@id` tag on the line before `Example:`
 - `Example:` keyword (not `Scenario:`)
 - `Given/When/Then` in plain English
 - `Then` must be a single, observable, measurable outcome — no "and"
@@ -271,7 +270,6 @@ All Rules must have their pre-mortems completed before any Examples are written.
 
 **Review checklist:**
 - [ ] Every `Rule:` block has at least one Example
-- [ ] Every `@id` is unique within this feature
 - [ ] Every Example has `Given/When/Then`
 - [ ] Every `Then` is a single, observable, measurable outcome
 - [ ] No Example tests implementation details
@@ -291,15 +289,14 @@ Communicate verbally to the next agent. Every `DISAGREE` is a **hard blocker** 
 - No impl details: no Example tests internal state or implementation — AGREE/DISAGREE | file:line
 - Coverage: every entity in the feature description appears in at least one Rule — AGREE/DISAGREE | missing:
 - Distinct: no two Examples test the same observable behavior — AGREE/DISAGREE | file:line
-- Unique IDs: all @id values are unique within this feature — AGREE/DISAGREE
 - Pre-mortem: I ran a pre-mortem on each Rule and found no hidden failure modes — AGREE/DISAGREE | Rule:
 - Scope: no Example introduces behavior outside the feature boundary — AGREE/DISAGREE | file:line
 
-Commit: `feat(criteria): write acceptance criteria for <name>`
+Commit: `feat(criteria): write acceptance criteria for <feature-stem>`
 
 **After this commit, `Example:` blocks are frozen.** Any change requires:
 1. Add `@deprecated` tag to the old Example
-2. Write a new Example with a new `@id`
+2. Write a new Example (the `@id` tag will be assigned automatically)
 
 ---
 
@@ -310,14 +307,14 @@ When a defect is reported against a completed or in-progress feature:
 1. **PO** adds a new Example to the relevant `Rule:` block in the `.feature` file:
 
    ```gherkin
-   @bug @id:<new-8-char-hex>
+   @bug
    Example: <what the bug is>
      Given <conditions that trigger the bug>
      When <action>
      Then <correct behavior>
    ```
 
-2. **SE** implements the specific test in `tests/features/<feature-name>/` (the `@id` test).
+2. **SE** implements the specific test in `tests/features/<feature_slug>/` (the `@id` test).
 3. **SE** also writes a `@given` Hypothesis property test in `tests/unit/` covering the whole class of inputs that triggered the bug — not just the single case.
 4. Both tests are required — neither is optional.
 5. SE follows the normal TDD loop (Step 3) for the new `@id`.
@@ -404,7 +401,7 @@ Status: IN-PROGRESS
 |----|----------|--------|
 | Q8 | ... | ... |
 
-### Feature: <feature-name>
+### Feature: <feature-stem>
 
 | ID | Question | Answer |
 |----|----------|--------|
@@ -435,7 +432,7 @@ success/failure conditions, and out-of-scope boundaries.>
 (First session only. Omit this subsection in subsequent sessions.)
 
 ### Feature List
-- `<feature-name>` — <one-sentence description of what changed or was added>
+- `<feature-stem>` — <one-sentence description of what changed or was added>
 (Write "No changes" if no features were added or modified this session.)
 
 ### Domain Model
@@ -459,12 +456,12 @@ Rules:
 
 ---
 
-## YYYY-MM-DD — <feature-name>: <short title>
+## YYYY-MM-DD — <feature-stem>: <short title>
 
 Decision: <what was decided — one sentence>
 Reason: <why — one sentence>
 Alternatives considered: <what was rejected and why>
-Feature: <feature-name>
+Feature: <feature-stem>
 ```
 
 Rules: Append-only. When a decision changes, append a new block that supersedes the old one. Cross-feature decisions use `Cross-feature:` in the header. Only write a block for non-obvious decisions with meaningful trade-offs.
diff --git a/.opencode/skills/session-workflow/SKILL.md b/.opencode/skills/session-workflow/SKILL.md
index ab86736..0281f2c 100644
--- a/.opencode/skills/session-workflow/SKILL.md
+++ b/.opencode/skills/session-workflow/SKILL.md
@@ -24,7 +24,7 @@ Every session starts by reading state. Every session ends by writing state. This
 2. **If you are the PO** and Step 1 (SCOPE) is active: check `docs/discovery_journal.md` for the most recent session block.
    - If the most recent block has `Status: IN-PROGRESS` → the previous session was interrupted. Resume it before starting a new session: finish updating `.feature` files and `docs/discovery.md`, then mark the block `Status: COMPLETE`.
 3. If a feature is active at Step 2–5, read:
-   - `docs/features/in-progress/<name>.feature` — feature file (Rules + Examples + @id)
+   - `docs/features/in-progress/<feature-stem>.feature` — feature file (Rules + Examples + @id)
    - `docs/discovery.md` — project-level synthesis changelog (for context)
 4. Run `git status` — understand what is committed vs. what is not
 5. Confirm scope: you are working on exactly one step of one feature
@@ -43,7 +43,7 @@ Every session starts by reading state. Every session ends by writing state. This
 2. Commit any uncommitted work (even WIP):
    ```bash
    git add -A
-   git commit -m "WIP(<feature-name>): <what was done>"
+   git commit -m "WIP(<feature-stem>): <what was done>"
    ```
 3. If a step is fully complete, use the proper commit message instead of WIP.
 
@@ -55,7 +55,7 @@ When a step completes within a session:
 2. Commit the TODO.md update:
    ```bash
    git add TODO.md
-   git commit -m "chore: complete step <N> for <feature-name>"
+   git commit -m "chore: complete step <N> for <feature-stem>"
    ```
 3. Only then begin the next step (in a new session where possible — see Rule 4).
 
@@ -64,9 +64,9 @@ When a step completes within a session:
 ```markdown
 # Current Work
 
-Feature: <name>
+Feature: <feature-stem>
 Step: <1-5> (<step name>)
-Source: docs/features/in-progress/<name>.feature
+Source: docs/features/in-progress/<feature-stem>.feature
 
 ## Progress
 - [x] `@id:<hex>`: <description>
@@ -79,15 +79,15 @@ Run @<agent-name> — <one concrete action>
 
 **"Next" line format**: Always prefix with `Run @<agent-name>` so the human knows exactly which agent to invoke. Agent names are defined in `AGENTS.md` — use the name exactly as listed there. Examples:
 - `Run @<software-engineer-agent> — implement @id:a1b2c3d4 (Step 3 RED)`
-- `Run @<software-engineer-agent> — load skill implementation and begin Step 2 (Architecture) for <feature-name>`
-- `Run @<reviewer-agent> — verify feature <feature-name> at Step 4`
+- `Run @<software-engineer-agent> — load skill implementation and begin Step 2 (Architecture) for <feature-stem>`
+- `Run @<reviewer-agent> — verify feature <feature-stem> at Step 4`
 - `Run @<product-owner-agent> — pick next BASELINED feature from backlog`
-- `Run @<product-owner-agent> — accept feature <feature-name> at Step 5`
+- `Run @<product-owner-agent> — accept feature <feature-stem> at Step 5`
 
 **Source path by step:**
-- Step 1: `Source: docs/features/backlog/<name>.feature`
-- Steps 2–4: `Source: docs/features/in-progress/<name>.feature`
-- Step 5: `Source: docs/features/completed/<name>.feature`
+- Step 1: `Source: docs/features/backlog/<feature-stem>.feature`
+- Steps 2–4: `Source: docs/features/in-progress/<feature-stem>.feature`
+- Step 5: `Source: docs/features/completed/<feature-stem>.feature`
 
 Status markers:
 - `[ ]` — not started
@@ -110,9 +110,9 @@ During Step 3 (TDD Loop), TODO.md **must** include a `## Cycle State` block to t
 ```markdown
 # Current Work
 
-Feature: <name>
+Feature: <feature-stem>
 Step: 3 (TDD Loop)
-Source: docs/features/in-progress/<name>.feature
+Source: docs/features/in-progress/<feature-stem>.feature
 
 ## Cycle State
 Test: `@id:<hex>` — <description>
diff --git a/.opencode/skills/verify/SKILL.md b/.opencode/skills/verify/SKILL.md
index 5e2ee07..f95f9ba 100644
--- a/.opencode/skills/verify/SKILL.md
+++ b/.opencode/skills/verify/SKILL.md
@@ -162,7 +162,7 @@ Record what input was given and what output was observed.
 ### 9. Write the Report
 
 ```markdown
-## Step 4 Verification Report — <feature-name>
+## Step 4 Verification Report — <feature-stem>
 
 ### pyproject.toml Gate
 | Check | Result | Notes |
diff --git a/AGENTS.md b/AGENTS.md
index 30c737b..e483107 100644
--- a/AGENTS.md
+++ b/AGENTS.md
@@ -5,9 +5,9 @@ A Python template to quickstart any project with a production-ready workflow, qu
 ## Workflow Overview
 
 Features flow through 5 steps with a WIP limit of 1 feature at a time. The filesystem enforces WIP:
-- `docs/features/backlog/<feature-name>.feature` — features waiting to be worked on
-- `docs/features/in-progress/<feature-name>.feature` — exactly one feature being built right now
-- `docs/features/completed/<feature-name>.feature` — accepted and shipped features
+- `docs/features/backlog/<feature-stem>.feature` — features waiting to be worked on
+- `docs/features/in-progress/<feature-stem>.feature` — exactly one feature being built right now
+- `docs/features/completed/<feature-stem>.feature` — accepted and shipped features
 
 ```
 STEP 1: SCOPE          (product-owner)  → discovery + Gherkin stories + criteria
@@ -107,8 +107,8 @@ Commit: `feat(criteria): write acceptance criteria for <name>`
 ### Bug Handling
 
 When a defect is reported:
-1. **PO** adds a `@bug @id:<hex>` Example to the relevant `Rule:` in the `.feature` file and moves (or keeps) the feature in `backlog/` for normal scheduling.
-2. **SE** handles the bug when the feature is selected for development (standard Step 2–3 flow): implements the specific `@bug`-tagged test in `tests/features/<feature-name>/` and also writes a `@given` Hypothesis property test in `tests/unit/` covering the whole class of inputs.
+1. **PO** adds a `@bug` Example to the relevant `Rule:` in the `.feature` file and moves (or keeps) the feature in `backlog/` for normal scheduling.
+2. **SE** handles the bug when the feature is selected for development (standard Step 2–3 flow): implements the specific `@bug`-tagged test in `tests/features/<feature_slug>/` and also writes a `@given` Hypothesis property test in `tests/unit/` covering the whole class of inputs.
 3. Both tests are required. SE follows the normal TDD loop (Step 3).
 
 ## Filesystem Structure
@@ -123,12 +123,12 @@ docs/
     context.md                        ← C4 Level 1 diagram, PO updates via living-docs skill
     container.md                      ← C4 Level 2 diagram, PO updates via living-docs skill
   features/
-    backlog/<feature-name>.feature    ← narrative + Rules + Examples
-    in-progress/<feature-name>.feature
-    completed/<feature-name>.feature
+    backlog/<feature-stem>.feature    ← narrative + Rules + Examples
+    in-progress/<feature-stem>.feature
+    completed/<feature-stem>.feature
 
 tests/
-  features/<feature-name>/
+  features/<feature_slug>/
     <rule_slug>_test.py               ← one per Rule: block, software-engineer-written
   unit/
     <anything>_test.py                ← software-engineer-authored extras (no @id traceability)
@@ -143,10 +143,12 @@ Tests in `tests/unit/` are software-engineer-authored extras not covered by any
 ## Test File Layout
 
 ```
-tests/features/<feature-name>/<rule_slug>_test.py
+tests/features/<feature_slug>/<rule_slug>_test.py
 ```
 
-### Stub Format (mandatory)
+### Stub Format
+
+Stubs are auto-generated by pytest-beehave. The SE triggers generation at Step 2 end by running `uv run task test-fast`. pytest-beehave reads the in-progress `.feature` file and creates one skipped function per `@id`:
 
 ```python
 @pytest.mark.skip(reason="not yet implemented")
@@ -158,7 +160,7 @@ def test_<feature_slug>_<@id>() -> None:
 
 ### Markers
 - `@pytest.mark.slow` — takes > 50ms; applied to Hypothesis tests and any test with I/O, network, or DB
-- `@pytest.mark.deprecated` — auto-skipped by conftest; used for superseded Examples
+- `@pytest.mark.deprecated` — auto-skipped by pytest-beehave; used for superseded Examples
 
 ## Development Commands
 
diff --git a/README.md b/README.md
index cc4aed1..a89bb28 100644
--- a/README.md
+++ b/README.md
@@ -32,23 +32,35 @@ uv run task test && uv run task lint && uv run task static-check
 
 ---
 
-## What You Get
+## Why this template?
 
-### A structured 5-step development cycle
+Most Python templates give you a folder structure and a `Makefile`. This one gives you a **complete delivery system**:
+
+- **No feature starts without written acceptance criteria** — Gherkin `Example:` blocks traced to tests
+- **No feature ships without adversarial review** — the reviewer's default hypothesis is "broken"
+- **No guesswork on test stubs** — they are generated automatically from your `.feature` files
+- **No manual `@id` tags** — assigned automatically when you run tests
+- **AI agents for every role** — PO, SE, and reviewer each have scoped instructions; none can exceed their authority
+
+---
+
+## How it works
+
+### 5-step delivery cycle
 
 ```
 SCOPE → ARCH → TDD LOOP → VERIFY → ACCEPT
 ```
 
-| Step | Who | What |
-|------|-----|------|
-| **SCOPE** | Product Owner | Discovery interviews → Gherkin stories → `@id` criteria |
-| **ARCH** | Software Engineer | Module design, ADRs, test stubs |
-| **TDD LOOP** | Software Engineer | RED → GREEN → REFACTOR, one `@id` at a time |
-| **VERIFY** | Reviewer | Adversarial verification — default hypothesis: broken |
-| **ACCEPT** | Product Owner | Demo, validate, ship |
+| Step | Role | Output |
+|------|------|--------|
+| **1 · SCOPE** | Product Owner | Discovery interviews + Gherkin stories + acceptance criteria |
+| **2 · ARCH** | Software Engineer | Module stubs, ADRs, auto-generated test stubs |
+| **3 · TDD LOOP** | Software Engineer | RED → GREEN → REFACTOR, one criterion at a time |
+| **4 · VERIFY** | Reviewer | Adversarial check — lint, types, coverage, semantic review |
+| **5 · ACCEPT** | Product Owner | Demo, validate, ship |
 
-WIP limit of 1. Features are `.feature` files that move between filesystem folders:
+**WIP limit: 1 feature at a time.** Features are `.feature` files that move through folders:
 
 ```
 docs/features/backlog/      ← waiting
@@ -58,12 +70,12 @@ docs/features/completed/    ← shipped
 
 ### AI agents included
 
-```
-@product-owner      — scope, stories, acceptance
-@software-engineer  — architecture, TDD, git, releases
-@reviewer           — adversarial verification
-@setup-project      — one-time project initialisation
-```
+| Agent | Responsibility |
+|-------|---------------|
+| `@product-owner` | Scope, stories, acceptance criteria, delivery acceptance |
+| `@software-engineer` | Architecture, TDD loop, git, releases |
+| `@reviewer` | Adversarial verification — default position: broken |
+| `@setup-project` | One-time project initialisation |
 
 ### Quality tooling, pre-configured
 
@@ -73,6 +85,7 @@ docs/features/completed/    ← shipped
 | `ruff` | Lint + format (Google docstrings) |
 | `pyright` | Static type checking — 0 errors |
 | `pytest` + `hypothesis` | Tests + property-based testing |
+| `pytest-beehave` | Auto-generates test stubs from `.feature` files |
 | `pytest-cov` | Coverage — 100% required |
 | `pdoc` | API docs → GitHub Pages |
 | `taskipy` | Task runner |
@@ -91,7 +104,7 @@ uv run task run           # Run the app
 
 ---
 
-## Code Standards
+## Code standards
 
 | | |
 |---|---|
@@ -104,19 +117,31 @@ uv run task run           # Run the app
 
 ---
 
-## Test Convention
+## Test convention
+
+Write acceptance criteria in Gherkin:
+
+```gherkin
+@id:a3f2b1c4
+Example: User sees version on startup
+  Given the application starts
+  When no arguments are passed
+  Then the version string is printed to stdout
+```
+
+Run tests once — a traced, skipped stub appears automatically:
 
 ```python
 @pytest.mark.skip(reason="not yet implemented")
-def test_feature_a3f2b1c4() -> None:
+def test_display_version_a3f2b1c4() -> None:
     """
-    Given: ...
-    When:  ...
-    Then:  ...
+    Given the application starts
+    When no arguments are passed
+    Then the version string is printed to stdout
     """
 ```
 
-Each test is traced to exactly one `@id` acceptance criterion.
+Each test is traced to exactly one acceptance criterion. No orphan tests. No untested criteria.
 
 ---
 
diff --git a/docs/architecture.md b/docs/architecture.md
index 5ad7cbd..2edabcd 100644
--- a/docs/architecture.md
+++ b/docs/architecture.md
@@ -2,12 +2,12 @@
 
 ---
 
-## YYYY-MM-DD — <feature-name>: <short title>
+## YYYY-MM-DD — <feature-stem>: <short title>
 
 Decision: <what was decided — one sentence>
 Reason: <why — one sentence>
 Alternatives considered: <what was rejected and why>
-Feature: <feature-name>
+Feature: <feature-stem>
 
 ---
 
@@ -16,4 +16,4 @@ Feature: <feature-name>
 Decision: <what was decided>
 Reason: <why>
 Alternatives considered: <what was rejected and why>
-Affected features: <feature-name>, <feature-name>
+Affected features: <feature-stem>, <feature-stem>
diff --git a/docs/discovery.md b/docs/discovery.md
index 1c91da9..9b8a33f 100644
--- a/docs/discovery.md
+++ b/docs/discovery.md
@@ -10,7 +10,7 @@ success/failure conditions, and explicit out-of-scope boundaries.>
 (First session only. Omit this subsection in subsequent sessions.)
 
 ### Feature List
-- `<feature-name>` — <one-sentence description of what changed or was added>
+- `<feature-stem>` — <one-sentence description of what changed or was added>
 (Write "No changes" if no features were added or modified this session.)
 
 ### Domain Model
diff --git a/docs/discovery_journal.md b/docs/discovery_journal.md
index 4c4fd41..ef538fe 100644
--- a/docs/discovery_journal.md
+++ b/docs/discovery_journal.md
@@ -23,7 +23,7 @@ Status: IN-PROGRESS
 |----|----------|--------|
 | Q8 | ... | ... |
 
-### Feature: <feature-name>
+### Feature: <feature-stem>
 
 | ID | Question | Answer |
 |----|----------|--------|
diff --git a/pyproject.toml b/pyproject.toml
index 2c1bab2..f3c6b83 100644
--- a/pyproject.toml
+++ b/pyproject.toml
@@ -23,6 +23,7 @@ Documentation = "https://github.com/nullhack/python-project-template/tree/main/d
 dev = [
     "pdoc>=14.0",
     "pytest>=9.0.3",
+    "pytest-beehave[html]>=3.0",
     "pytest-cov>=6.1.1",
     "pytest-html>=4.1.1",
     "pytest-mock>=3.14.0",
@@ -121,7 +122,7 @@ pytest \
   --cov-config=pyproject.toml \
   --cov-report html:docs/coverage \
   --cov-report term:skip-covered \
-  --cov=pytest_beehave \
+  --cov=app \
   --cov-fail-under=100 \
   --hypothesis-show-statistics \
   --html=docs/tests/report.html \
@@ -153,3 +154,6 @@ dev = [
     "gherkin-official>=39.0.0",
     "safety>=3.7.0",
 ]
+
+[tool.beehave]
+features_path = "docs/features"
diff --git a/tests/conftest.py b/tests/conftest.py
index 9a606f7..a5c8f50 100644
--- a/tests/conftest.py
+++ b/tests/conftest.py
@@ -21,10 +21,3 @@ def pytest_html_results_table_header(cells):
 def pytest_html_results_table_row(report, cells):
     docstring = getattr(report, "docstrings", "") or ""
     cells.insert(2, f"<td style='white-space: pre-wrap;'>{docstring}</td>")
-
-
-def pytest_collection_modifyitems(items):
-    """Automatically skip tests marked as deprecated."""
-    for item in items:
-        if item.get_closest_marker("deprecated"):
-            item.add_marker(pytest.mark.skip(reason="deprecated"))
diff --git a/uv.lock b/uv.lock
index 0914f45..89d4c79 100644
--- a/uv.lock
+++ b/uv.lock
@@ -670,6 +670,24 @@ wheels = [
     { url = "https://files.pythonhosted.org/packages/d4/24/a372aaf5c9b7208e7112038812994107bc65a84cd00e0354a88c2c77a617/pytest-9.0.3-py3-none-any.whl", hash = "sha256:2c5efc453d45394fdd706ade797c0a81091eccd1d6e4bccfcd476e2b8e0ab5d9", size = 375249, upload-time = "2026-04-07T17:16:16.13Z" },
 ]
 
+[[package]]
+name = "pytest-beehave"
+version = "3.0.20260419"
+source = { registry = "https://pypi.org/simple" }
+dependencies = [
+    { name = "fire" },
+    { name = "gherkin-official" },
+]
+sdist = { url = "https://files.pythonhosted.org/packages/6b/45/a64788db805fc079792d28670846f8320045bd82e67ea2528f842857606b/pytest_beehave-3.0.20260419.tar.gz", hash = "sha256:bc114a0f809e3b437f09f5d42da0a36a105dc8b7b7e311410a7fdcdc915398f0", size = 28685, upload-time = "2026-04-19T19:11:15.811Z" }
+wheels = [
+    { url = "https://files.pythonhosted.org/packages/50/24/0bfacd345c1b75497f84d83ee3a4459ec30cc4e54fc4530c376f18346ccc/pytest_beehave-3.0.20260419-py3-none-any.whl", hash = "sha256:be3843af1e8691f6023007de147b4f92a8a4ca505f94439f2df210137e746acd", size = 30323, upload-time = "2026-04-19T19:11:14.168Z" },
+]
+
+[package.optional-dependencies]
+html = [
+    { name = "pytest-html" },
+]
+
 [[package]]
 name = "pytest-cov"
 version = "6.1.1"
@@ -748,6 +766,7 @@ dev = [
     { name = "pdoc" },
     { name = "pyright" },
     { name = "pytest" },
+    { name = "pytest-beehave", extra = ["html"] },
     { name = "pytest-cov" },
     { name = "pytest-html" },
     { name = "pytest-mock" },
@@ -769,6 +788,7 @@ requires-dist = [
     { name = "pdoc", marker = "extra == 'dev'", specifier = ">=14.0" },
     { name = "pyright", marker = "extra == 'dev'", specifier = ">=1.1.407" },
     { name = "pytest", marker = "extra == 'dev'", specifier = ">=9.0.3" },
+    { name = "pytest-beehave", extras = ["html"], marker = "extra == 'dev'", specifier = ">=3.0" },
     { name = "pytest-cov", marker = "extra == 'dev'", specifier = ">=6.1.1" },
     { name = "pytest-html", marker = "extra == 'dev'", specifier = ">=4.1.1" },
     { name = "pytest-mock", marker = "extra == 'dev'", specifier = ">=3.14.0" },

From 35aec6e8803628c019d0c48ff1a1a1f6fd8fef41 Mon Sep 17 00:00:00 2001
From: nullhack <nullhack@users.noreply.github.com>
Date: Sun, 19 Apr 2026 16:42:59 -0400
Subject: [PATCH 2/2] chore(workflow): number self-declaration items and add
 completeness enforcement

- implementation/SKILL.md: number all 25 items 1-25; add count reminder comment
- verify/SKILL.md: add completeness hard gate (count must be 25, sequence must be gapless); expand report table from 21 to 25 numbered rows matching implementation template exactly
---
 .opencode/skills/implementation/SKILL.md | 52 ++++++++++++------------
 .opencode/skills/verify/SKILL.md         | 52 +++++++++++++-----------
 2 files changed, 56 insertions(+), 48 deletions(-)

diff --git a/.opencode/skills/implementation/SKILL.md b/.opencode/skills/implementation/SKILL.md
index 826c6b7..61ada18 100644
--- a/.opencode/skills/implementation/SKILL.md
+++ b/.opencode/skills/implementation/SKILL.md
@@ -213,33 +213,35 @@ All must pass before Self-Declaration.
 
 ### Self-Declaration (once, after all quality gates pass)
 
+<!-- This list has exactly 25 items — count before submitting. If your count ≠ 25, you missed one. -->
+
 Communicate verbally to the reviewer. Answer honestly for each principle:
 
-- YAGNI: no code without a failing test — AGREE/DISAGREE | file:line
-- YAGNI: no speculative abstractions — AGREE/DISAGREE | file:line
-- KISS: simplest solution that passes — AGREE/DISAGREE | file:line
-- KISS: no premature optimization — AGREE/DISAGREE | file:line
-- DRY: no duplication — AGREE/DISAGREE | file:line
-- DRY: no redundant comments — AGREE/DISAGREE | file:line
-- SOLID-S: one reason to change per class — AGREE/DISAGREE | file:line
-- SOLID-O: open for extension, closed for modification — AGREE/DISAGREE | file:line
-- SOLID-L: subtypes substitutable — AGREE/DISAGREE | file:line
-- SOLID-I: no forced unused deps — AGREE/DISAGREE | file:line
-- SOLID-D: depend on abstractions, not concretions — AGREE/DISAGREE | file:line
-- OC-1: one level of indentation per method — AGREE/DISAGREE | deepest: file:line
-- OC-2: no else after return — AGREE/DISAGREE | file:line
-- OC-3: primitive types wrapped — AGREE/DISAGREE | file:line
-- OC-4: first-class collections — AGREE/DISAGREE | file:line
-- OC-5: one dot per line — AGREE/DISAGREE | file:line
-- OC-6: no abbreviations — AGREE/DISAGREE | file:line
-- OC-7: ≤20 lines per function, ≤50 per class — AGREE/DISAGREE | longest: file:line
-- OC-8: ≤2 instance variables per class (behavioural classes only; dataclasses, Pydantic models, value objects, and TypedDicts are exempt) — AGREE/DISAGREE | file:line
-- OC-9: no getters/setters — AGREE/DISAGREE | file:line
-- Patterns: I have no good reason to refactor parts of the code using OOP or Design Patterns — AGREE/DISAGREE | file:line
-- Patterns: no creational smell — AGREE/DISAGREE | file:line
-- Patterns: no structural smell — AGREE/DISAGREE | file:line
-- Patterns: no behavioral smell — AGREE/DISAGREE | file:line
-- Semantic: tests operate at same abstraction as AC — AGREE/DISAGREE | file:line
+1. YAGNI: no code without a failing test — AGREE/DISAGREE | file:line
+2. YAGNI: no speculative abstractions — AGREE/DISAGREE | file:line
+3. KISS: simplest solution that passes — AGREE/DISAGREE | file:line
+4. KISS: no premature optimization — AGREE/DISAGREE | file:line
+5. DRY: no duplication — AGREE/DISAGREE | file:line
+6. DRY: no redundant comments — AGREE/DISAGREE | file:line
+7. SOLID-S: one reason to change per class — AGREE/DISAGREE | file:line
+8. SOLID-O: open for extension, closed for modification — AGREE/DISAGREE | file:line
+9. SOLID-L: subtypes substitutable — AGREE/DISAGREE | file:line
+10. SOLID-I: no forced unused deps — AGREE/DISAGREE | file:line
+11. SOLID-D: depend on abstractions, not concretions — AGREE/DISAGREE | file:line
+12. OC-1: one level of indentation per method — AGREE/DISAGREE | deepest: file:line
+13. OC-2: no else after return — AGREE/DISAGREE | file:line
+14. OC-3: primitive types wrapped — AGREE/DISAGREE | file:line
+15. OC-4: first-class collections — AGREE/DISAGREE | file:line
+16. OC-5: one dot per line — AGREE/DISAGREE | file:line
+17. OC-6: no abbreviations — AGREE/DISAGREE | file:line
+18. OC-7: ≤20 lines per function, ≤50 per class — AGREE/DISAGREE | longest: file:line
+19. OC-8: ≤2 instance variables per class (behavioural classes only; dataclasses, Pydantic models, value objects, and TypedDicts are exempt) — AGREE/DISAGREE | file:line
+20. OC-9: no getters/setters — AGREE/DISAGREE | file:line
+21. Patterns: no good reason remains to refactor using OOP or Design Patterns — AGREE/DISAGREE | file:line
+22. Patterns: no creational smell — AGREE/DISAGREE | file:line
+23. Patterns: no structural smell — AGREE/DISAGREE | file:line
+24. Patterns: no behavioral smell — AGREE/DISAGREE | file:line
+25. Semantic: tests operate at same abstraction as AC — AGREE/DISAGREE | file:line
 
 A `DISAGREE` answer is not automatic rejection — state the reason and fix before handing off.
 
diff --git a/.opencode/skills/verify/SKILL.md b/.opencode/skills/verify/SKILL.md
index f95f9ba..3d5c449 100644
--- a/.opencode/skills/verify/SKILL.md
+++ b/.opencode/skills/verify/SKILL.md
@@ -60,6 +60,8 @@ Run before code review. If any row is FAIL, stop immediately with REJECTED.
 
 ### 5. Self-Declaration Audit
 
+**Completeness check (hard gate — REJECT if failed)**: Count the numbered items in the SE's Self-Declaration. The template in `implementation/SKILL.md` has exactly 25 items numbered 1–25. If the count is not 25, or any number in the sequence 1–25 is missing, REJECT immediately — do not proceed to item-level audit.
+
 Read the software-engineer's Self-Declaration from the handoff message.
 
 For every **AGREE** claim:
@@ -183,29 +185,33 @@ Record what input was given and what output was observed.
 | uv run task test | PASS / FAIL | |
 
 ### Self-Declaration Audit
-| Claim | Software-Engineer Claims | Reviewer Verdict | Evidence |
-|------|-------------------------|------------------|----------|
-| YAGNI | AGREE/DISAGREE | PASS/FAIL | |
-| KISS | AGREE/DISAGREE | PASS/FAIL | |
-| DRY | AGREE/DISAGREE | PASS/FAIL | |
-| SOLID-S | AGREE/DISAGREE | PASS/FAIL | |
-| SOLID-O | AGREE/DISAGREE | PASS/FAIL | |
-| SOLID-L | AGREE/DISAGREE | PASS/FAIL | |
-| SOLID-I | AGREE/DISAGREE | PASS/FAIL | |
-| SOLID-D | AGREE/DISAGREE | PASS/FAIL | |
-| OC-1 | AGREE/DISAGREE | PASS/FAIL | |
-| OC-2 | AGREE/DISAGREE | PASS/FAIL | |
-| OC-3 | AGREE/DISAGREE | PASS/FAIL | |
-| OC-4 | AGREE/DISAGREE | PASS/FAIL | |
-| OC-5 | AGREE/DISAGREE | PASS/FAIL | |
-| OC-6 | AGREE/DISAGREE | PASS/FAIL | |
-| OC-7 | AGREE/DISAGREE | PASS/FAIL | |
-| OC-8 | AGREE/DISAGREE | PASS/FAIL | |
-| OC-9 | AGREE/DISAGREE | PASS/FAIL | |
-| Patterns Creational | AGREE/DISAGREE | PASS/FAIL | |
-| Patterns Structural | AGREE/DISAGREE | PASS/FAIL | |
-| Patterns Behavioral | AGREE/DISAGREE | PASS/FAIL | |
-| Semantic | AGREE/DISAGREE | PASS/FAIL | |
+| # | Claim | SE Claims | Reviewer Verdict | Evidence |
+|---|-------|-----------|------------------|----------|
+| 1 | YAGNI: no code without a failing test | AGREE/DISAGREE | PASS/FAIL | |
+| 2 | YAGNI: no speculative abstractions | AGREE/DISAGREE | PASS/FAIL | |
+| 3 | KISS: simplest solution that passes | AGREE/DISAGREE | PASS/FAIL | |
+| 4 | KISS: no premature optimization | AGREE/DISAGREE | PASS/FAIL | |
+| 5 | DRY: no duplication | AGREE/DISAGREE | PASS/FAIL | |
+| 6 | DRY: no redundant comments | AGREE/DISAGREE | PASS/FAIL | |
+| 7 | SOLID-S: one reason to change per class | AGREE/DISAGREE | PASS/FAIL | |
+| 8 | SOLID-O: open for extension, closed for modification | AGREE/DISAGREE | PASS/FAIL | |
+| 9 | SOLID-L: subtypes substitutable | AGREE/DISAGREE | PASS/FAIL | |
+| 10 | SOLID-I: no forced unused deps | AGREE/DISAGREE | PASS/FAIL | |
+| 11 | SOLID-D: depend on abstractions, not concretions | AGREE/DISAGREE | PASS/FAIL | |
+| 12 | OC-1: one level of indentation per method | AGREE/DISAGREE | PASS/FAIL | |
+| 13 | OC-2: no else after return | AGREE/DISAGREE | PASS/FAIL | |
+| 14 | OC-3: primitive types wrapped | AGREE/DISAGREE | PASS/FAIL | |
+| 15 | OC-4: first-class collections | AGREE/DISAGREE | PASS/FAIL | |
+| 16 | OC-5: one dot per line | AGREE/DISAGREE | PASS/FAIL | |
+| 17 | OC-6: no abbreviations | AGREE/DISAGREE | PASS/FAIL | |
+| 18 | OC-7: ≤20 lines per function, ≤50 per class | AGREE/DISAGREE | PASS/FAIL | |
+| 19 | OC-8: ≤2 instance variables (behavioural classes only) | AGREE/DISAGREE | PASS/FAIL | |
+| 20 | OC-9: no getters/setters | AGREE/DISAGREE | PASS/FAIL | |
+| 21 | Patterns: no good reason remains to refactor using OOP or Design Patterns | AGREE/DISAGREE | PASS/FAIL | |
+| 22 | Patterns: no creational smell | AGREE/DISAGREE | PASS/FAIL | |
+| 23 | Patterns: no structural smell | AGREE/DISAGREE | PASS/FAIL | |
+| 24 | Patterns: no behavioral smell | AGREE/DISAGREE | PASS/FAIL | |
+| 25 | Semantic: tests operate at same abstraction as AC | AGREE/DISAGREE | PASS/FAIL | |
 
 ### Reviewer Stance Declaration