From 5a5429e16bf9ca7090befe20dd34a8f59cf3a01d Mon Sep 17 00:00:00 2001
From: nullhack <nullhack@users.noreply.github.com>
Date: Sat, 18 Apr 2026 20:19:08 -0400
Subject: [PATCH 1/2] chore(workflow): refine agents, skills, commands, and ADR
 structure for v5

- Consolidate ADR files into single adr.md (drop adr-NNN-<title>.md pattern)
- Add test-coverage and test-build tasks; fix test-fast/test flags
- Clarify design priority chain with full complexity ladder
- Remove self-selection language from agent instructions
- Add refactor and design-patterns skills to reviewer agent
- Add BASELINED guard to feature-selection skill
- Fix scope skill: drop silent pre-mortem requirement from Session 1
- Align function/class line-count wording to code-lines-only
---
 .opencode/agents/product-owner.md           |  3 +-
 .opencode/agents/reviewer.md                |  2 +
 .opencode/agents/software-engineer.md       |  3 +-
 .opencode/skills/feature-selection/SKILL.md |  4 ++
 .opencode/skills/implementation/SKILL.md    | 67 +++++++++------------
 .opencode/skills/scope/SKILL.md             |  3 +-
 AGENTS.md                                   | 52 ++++++----------
 docs/workflow.md                            |  2 +-
 pyproject.toml                              | 22 ++++---
 9 files changed, 71 insertions(+), 87 deletions(-)
diff --git a/.opencode/agents/product-owner.md b/.opencode/agents/product-owner.md
index e211ce7..d2ea85b 100644
--- a/.opencode/agents/product-owner.md
+++ b/.opencode/agents/product-owner.md
@@ -33,7 +33,6 @@ Load `skill session-workflow` first — it reads TODO.md, orients you to the cur
 - You are the **sole owner** of `.feature` files and `docs/features/discovery.md`
 - No other agent may edit these files
 - Software-engineer escalates spec gaps to you; you decide whether to extend criteria
-- **You pick** the next feature from backlog — the software-engineer never self-selects
 - **NEVER move a feature to `in-progress/` unless its discovery section has `Status: BASELINED`** — if not baselined, complete Step 1 (Phase 2 + 3 + 4) first
 
 ## Step 5 — Accept
@@ -60,4 +59,4 @@ When a gap is reported (by software-engineer or reviewer):
 
 - `session-workflow` — session start/end protocol
 - `feature-selection` — when TODO.md is idle: score and select next backlog feature using WSJF
-- `scope` — Step 1: 3-session discovery (Phase 1 + 2), stories (Phase 3), and criteria (Phase 4)
\ No newline at end of file
+- `scope` — Step 1: 3-session discovery (Phase 1 + 2), stories (Phase 3), and criteria (Phase 4)
diff --git a/.opencode/agents/reviewer.md b/.opencode/agents/reviewer.md
index 415d07f..ef80152 100644
--- a/.opencode/agents/reviewer.md
+++ b/.opencode/agents/reviewer.md
@@ -58,4 +58,6 @@ You never edit `.feature` files or add Examples yourself.
 ## Available Skills
 
 - `session-workflow` — session start/end protocol
+- `refactor` — Code refactoring heuristics
+- `design-patterns` — Reference for code smell and design patterns
 - `verify` — Step 4: full verification protocol with all tables, gates, and report template
diff --git a/.opencode/agents/software-engineer.md b/.opencode/agents/software-engineer.md
index a802229..76cb96a 100644
--- a/.opencode/agents/software-engineer.md
+++ b/.opencode/agents/software-engineer.md
@@ -45,7 +45,6 @@ Load `skill session-workflow` first — it reads TODO.md, orients you to the cur
 
 - You own all technical decisions: module structure, patterns, internal APIs, test tooling, linting config
 - **PO approves**: new runtime dependencies, changed entry points, scope changes
-- You are **never** the one to pick the next feature — only the PO picks from backlog
 
 ## Spec Gaps
 
@@ -61,4 +60,4 @@ If during implementation you discover behavior not covered by existing acceptanc
 - `design-patterns` — on-demand when smell detected during architecture or refactor
 - `pr-management` — Step 5: PRs with conventional commits
 - `git-release` — Step 5: calver versioning and themed release naming
-- `create-skill` — meta: create new skills when needed
\ No newline at end of file
+- `create-skill` — meta: create new skills when needed
diff --git a/.opencode/skills/feature-selection/SKILL.md b/.opencode/skills/feature-selection/SKILL.md
index 567e3ef..63be1ed 100644
--- a/.opencode/skills/feature-selection/SKILL.md
+++ b/.opencode/skills/feature-selection/SKILL.md
@@ -38,6 +38,10 @@ Read each `.feature` file in `docs/features/backlog/`. Check its discovery secti
 - Non-BASELINED features are not eligible — they need Step 1 (scope) first
 - If no BASELINED features exist: inform the stakeholder; run `@product-owner` with `skill scope` to baseline the most promising backlog item first
 
+**IMPORTANT**
+
+**NEVER move a feature to `in-progress/` unless its discovery section has `Status: BASELINED`**
+
 ### 3. Score Each Candidate
 
 For each BASELINED feature, fill this table:
diff --git a/.opencode/skills/implementation/SKILL.md b/.opencode/skills/implementation/SKILL.md
index 0cc4b7f..2472204 100644
--- a/.opencode/skills/implementation/SKILL.md
+++ b/.opencode/skills/implementation/SKILL.md
@@ -15,12 +15,12 @@ Steps 2 (Architecture) and 3 (TDD Loop) combined into a single skill. The softwa
 
 During implementation, correctness priorities are (in order):
 
-1. **Design correctness** — YAGNI > KISS > DRY > SOLID > Object Calisthenics > appropriate design patterns
+1. **Design correctness** — YAGNI > KISS > DRY > SOLID > Object Calisthenics > appropriate design patterns > complex code > complicated code > failing code > no code
 2. **One @id green** — the specific test under work passes, plus `test-fast` still passes
 3. **Commit** — when a meaningful increment is green
 4. **Quality tooling** — `lint`, `static-check`, full `test` with coverage run at end-of-feature handoff
 
-Design correctness is far more important than lint/pyright/coverage compliance. Never run lint, static-check, or coverage during the TDD loop — those are handoff-only checks.
+Design correctness is far more important than lint/pyright/coverage compliance. Never run lint (ruff check, ruff format), static-check (pyright), or coverage during the TDD loop — those are handoff-only checks.
 
 ---
 
@@ -37,7 +37,7 @@ Design correctness is far more important than lint/pyright/coverage compliance.
 
 1. Read `pyproject.toml` → locate `[tool.setuptools]` → record `packages = ["<name>"]`
 2. Confirm directory exists: `ls <name>/`
-3. All new source files go under `<name>/` — never under a template placeholder.
+3. All new source files go under `<name>/`
 
 ### Move Feature File
 
@@ -118,7 +118,7 @@ Place stubs where responsibility dictates — do not pre-create `ports/` or `ada
 
 ### Write ADR Files (significant decisions only)
 
-For each significant architectural decision, create `docs/architecture/adr-NNN-<title>.md`:
+For each significant architectural decision, create or append to `docs/architecture/adr.md`:
 
 ```markdown
 # ADR-NNN: <title>
@@ -153,25 +153,21 @@ Commit: `feat(<feature-name>): add architecture stubs`
 
 ### Prerequisites
 
+- [ ] Exactly one .feature `in_progress`. If not present, Load `skill feature-selection` 
 - [ ] Architecture stubs present in `<package>/` (committed by Step 2)
-- [ ] Read all `docs/architecture/adr-NNN-*.md` files — understand the architectural decisions before writing any test
-- [ ] Test stub files exist in `tests/features/<feature-name>/` — one file per `Rule:` block, all `@id` functions present with `@pytest.mark.skip`; if missing, write them now before entering RED
+- [ ] Read all `docs/architecture/adr.md` files — understand the architectural decisions before writing any test
+- [ ] Test stub files exist in `tests/features/<feature-name>/<rule_slug>_test.py` — one file per `Rule:` block, all `@id` stub functions present with `@pytest.mark.skip`; if missing, write them now before entering RED
 
 ### Write Test Stubs (if not present)
 
-For each `Rule:` block in the in-progress `.feature` file, create `tests/features/<feature-name>/<rule-slug>_test.py` if it does not already exist. Write one function per `@id` Example, all skipped:
+For each `Rule:` block in the in-progress `.feature` file, create `tests/features/<feature-name>/<rule_slug>_test.py` if it does not already exist. Write one function per `@id` Example, all skipped:
 
 ```python
 @pytest.mark.skip(reason="not yet implemented")
-def test_<rule_slug>_<8char_hex>() -> None:
+def test_<feature_slug>_<@id>() -> None:
     """
-    Given: ...
-    When: ...
-    Then: ...
+    <@id steps raw text including new lines>
     """
-    # Given
-    # When
-    # Then
 ```
 
 Run `uv run task gen-todo` after writing stubs to sync `@id` rows into `TODO.md`.
@@ -192,17 +188,17 @@ For each pending `@id`:
 ```
 INNER LOOP
 ├── RED
-│   ├── Confirm stub for this @id exists in tests/features/<feature-name>/ with @pytest.mark.skip
+│   ├── Confirm stub for this @id exists in tests/features/<feature-name>/<rule_slug>.feature with @pytest.mark.skip
 │   ├── Read existing stubs in `<package>/` — base the test on the current data model and signatures
 │   ├── Write test body (Given/When/Then → Arrange/Act/Assert); remove @pytest.mark.skip
-│   ├── Update stub signatures as needed — edit the `.py` file directly
+│   ├── Update <package> stub signatures as needed — edit the `.py` file directly
 │   ├── uv run task test-fast
 │   └── EXIT: this @id FAILS
 │       (if it passes: test is wrong — fix it first)
 │
 ├── GREEN
 │   ├── Write minimum code — YAGNI + KISS only
-│   │   (no DRY, SOLID, OC here — those belong in REFACTOR)
+│   │   (no DRY, SOLID, OC, Docstring, type hint here — those belong in REFACTOR)
 │   ├── uv run task test-fast
 │   └── EXIT: this @id passes AND all prior tests pass
 │       (fix implementation only; do not advance to next @id)
@@ -221,7 +217,7 @@ Commit when a meaningful increment is green
 ```bash
 uv run task lint
 uv run task static-check
-uv run task test          # coverage must be 100%
+uv run task test-coverage          # coverage must be 100%
 timeout 10s uv run task run
 ```
 
@@ -231,7 +227,7 @@ All must pass before Self-Declaration.
 
 ### Self-Declaration (once, after all quality gates pass)
 
-Write into `TODO.md` under a `## Self-Declaration` block:
+Answer honestly the `## Self-Declaration` report:
 
 ```markdown
 ## Self-Declaration
@@ -256,6 +252,7 @@ As a software-engineer I declare:
 * OC-7: ≤20 lines per function, ≤50 per class — AGREE/DISAGREE | longest: file:line
 * OC-8: ≤2 instance variables per class (behavioural classes only; dataclasses, Pydantic models, value objects, and TypedDicts are exempt) — AGREE/DISAGREE | file:line
 * OC-9: no getters/setters — AGREE/DISAGREE | file:line
+* Patterns: I have no good reason to refactor parts of the code using OOP or Design Patterns — AGREE/DISAGREE | file:line
 * Patterns: no creational smell — AGREE/DISAGREE | file:line
 * Patterns: no structural smell — AGREE/DISAGREE | file:line
 * Patterns: no behavioral smell — AGREE/DISAGREE | file:line
@@ -268,7 +265,7 @@ A `DISAGREE` answer is not automatic rejection — state the reason inline and f
 
 Signal completion to the reviewer. Provide:
 - Feature file path
-- Self-Declaration from TODO.md
+- Self-Declaration report
 - Summary of what was implemented
 
 ---
@@ -278,20 +275,20 @@ Signal completion to the reviewer. Provide:
 ### Test File Layout
 
 ```
-tests/features/<feature-name>/<rule-slug>_test.py
+tests/features/<feature-name>/<rule_slug>_test.py
 ```
 
 - `<feature-name>` = the `.feature` file stem
-- `<rule-slug>` = the `Rule:` title slugified
+- `<rule_slug>` = the `Rule:` title slugified
 
 ### Function Naming
 
 ```python
-def test_<rule_slug>_<8char_hex>() -> None:
+def test_<rule_slug>_<@id>() -> None:
 ```
 
 - `rule_slug` = the `Rule:` title with spaces/hyphens replaced by underscores, lowercase
-- `8char_hex` = the `@id` from the `Example:` block
+- `@id` = the `@id` from the `Example:` block
 
 ### Docstring Format (mandatory)
 
@@ -299,19 +296,14 @@ New tests start as skipped stubs. Remove `@pytest.mark.skip` when implementing i
 
 ```python
 @pytest.mark.skip(reason="not yet implemented")
-def test_wall_bounce_a3f2b1c4() -> None:
+def test_<feature_slug>_<@id>() -> None:
     """
-    Given: A ball moving upward reaches y=0
-    When: The physics engine processes the next frame
-    Then: The ball velocity y-component becomes positive
+    <@id steps raw text including new lines>
     """
-    # Given
-    # When
-    # Then
 ```
 
 **Rules**:
-- Docstring contains `Given:/When:/Then:` on separate indented lines
+- Docstring contains `Gherkin steps` as raw text on separate indented lines
 - No extra metadata in docstring — traceability comes from function name `@id` suffix
 
 ### Markers
@@ -320,6 +312,7 @@ def test_wall_bounce_a3f2b1c4() -> None:
 - `@pytest.mark.deprecated` — auto-skipped by conftest; used for superseded Examples
 
 ```python
+@pytest.mark.deprecated
 def test_wall_bounce_a3f2b1c4() -> None:
     ...
 
@@ -350,11 +343,11 @@ def test_wall_bounce_c4d5e6f7(x: float) -> None:
 **Rules**:
 - `@pytest.mark.slow` is mandatory on every `@given`-decorated test
 - `@example(...)` is optional but encouraged
-- Never use Hypothesis for: I/O, side effects, network calls, database writes
+- Do not use Hypothesis for: I/O, side effects, network calls, database writes
 
 ### Semantic Alignment Rule
 
-The test's Given/When/Then must operate at the **same abstraction level** as the AC's Given/When/Then.
+The test's Given/When/Then must operate at the **same abstraction level** as the AC's Steps.
 
 | AC says | Test must do |
 |---|---|
@@ -369,7 +362,7 @@ If testing through the real entry point is infeasible, escalate to PO to adjust
 - No `isinstance()`, `type()`, or internal attribute (`_x`) checks in assertions
 - One assertion concept per test (multiple `assert` ok if they verify the same thing)
 - No `pytest.mark.xfail` without written justification
-- `pytest.mark.skip` is only valid on stubs (`reason="not yet implemented"`) — remove it when implementing
+- `pytest.mark.skip(reason="not yet implemented")` is only valid on stubs — remove it when implementing
 - Test data embedded directly in the test, not loaded from external files
 
 ### Test Tool Decision
@@ -396,7 +389,7 @@ Extra tests in `tests/unit/` are allowed freely (coverage, edge cases, etc.) —
 
 ## Signature Design
 
-Signatures are written during Step 2 (Architecture) and refined during Step 3 (RED). They live directly in the package `.py` files — never in the `.feature` file.
+<package> signatures are written during Step 2 (Architecture) and refined during Step 3 (RED). They live directly in the package `.py` files — never in the `.feature` file.
 
 Key rules:
 - Bodies are always `...` in the architecture stub
@@ -420,4 +413,4 @@ class EmailAddress:
 class UserRepository(Protocol):
     def save(self, user: "User") -> None: ...
     def find_by_email(self, email: EmailAddress) -> "User | None": ...
-```
\ No newline at end of file
+```
diff --git a/.opencode/skills/scope/SKILL.md b/.opencode/skills/scope/SKILL.md
index 14af5f6..44f6f56 100644
--- a/.opencode/skills/scope/SKILL.md
+++ b/.opencode/skills/scope/SKILL.md
@@ -149,8 +149,7 @@ Commit: `feat(discovery): baseline project discovery`
 
 1. Write the **Session 1 Synthesis** in the `.feature` file: summarize the key entities, their relationships, and the constraints that emerged.
 2. Present the synthesis to the stakeholder. Stakeholder confirms or corrects. PO refines until approved.
-3. Run a **silent pre-mortem** on the confirmed synthesis.
-4. Mark `Template §1: CONFIRMED`. This unlocks Session 2.
+3. Mark `Template §1: CONFIRMED`. This unlocks Session 2.
 
 ### Session 2 — Behavior Groups / Big Picture for This Feature
 
diff --git a/AGENTS.md b/AGENTS.md
index 1223815..0f2acc3 100644
--- a/AGENTS.md
+++ b/AGENTS.md
@@ -90,8 +90,7 @@ docs/features/
   completed/<feature-name>.feature    ← file moves here at Step 5
 
 docs/architecture/
-  STEP2-ARCH.md                       ← Step 2 reference diagram (canonical)
-  adr-NNN-<title>.md                  ← one per significant architectural decision
+  adr.md                  ← one per significant architectural decision
 
 tests/
   features/<feature-name>/
@@ -112,25 +111,14 @@ Tests in `tests/unit/` are software-engineer-authored extras not covered by any
 tests/features/<feature-name>/<rule_slug>_test.py
 ```
 
-### Function Naming
-
-```python
-def test_<rule_slug>_<8char_hex>() -> None:
-```
-
-### Docstring Format (mandatory)
+### Stub Format (mandatory)
 
 ```python
 @pytest.mark.skip(reason="not yet implemented")
-def test_wall_bounce_a3f2b1c4() -> None:
+def test_<feature_slug>_<@id>() -> None:
     """
-    Given: A ball moving upward reaches y=0
-    When: The physics engine processes the next frame
-    Then: The ball velocity y-component becomes positive
+    <@id steps raw text including new lines>
     """
-    # Given
-    # When
-    # Then
 ```
 
 ### Markers
@@ -155,8 +143,8 @@ uv run task test-fast
 # Run full test suite with coverage
 uv run task test
 
-# Run slow tests only
-uv run task test-slow
+# Run tests with coverage report generation
+uv run task test-build
 
 # Lint and format
 uv run task lint
@@ -164,32 +152,30 @@ uv run task lint
 # Type checking
 uv run task static-check
 
-# Serve documentation
-uv run task doc-serve
+# Build documentation
+uv run task doc-build
 ```
 
 ## Code Quality Standards
 
-- **Principles (in priority order)**: YAGNI > KISS > DRY > SOLID > Object Calisthenics
-- **Linting**: ruff, Google docstring convention, `noqa` forbidden
+- **Principles (in priority order)**: YAGNI > KISS > DRY > SOLID > Object Calisthenics > appropriate design patterns > complex code > complicate code > failing code > no code
+- **Linting**: ruff format, ruff check, Google docstring convention, `noqa` forbidden
 - **Type checking**: pyright, 0 errors required
 - **Coverage**: 100% (measured against your actual package)
-- **Function length**: ≤ 20 lines
-- **Class length**: ≤ 50 lines
+- **Function length**: ≤ 20 lines (code lines only, excluding docstrings)
+- **Class length**: ≤ 50 lines (code lines only, excluding docstrings)
 - **Max nesting**: 2 levels
 - **Instance variables**: ≤ 2 per class *(exception: dataclasses, Pydantic models, value objects, and TypedDicts are exempt — they may carry as many fields as the domain requires)*
 - **Semantic alignment**: tests must operate at the same abstraction level as the acceptance criteria they cover
-- **Integration tests**: multi-component features require at least one test in `tests/features/` that exercises the public entry point end-to-end
 
 ### Software-Engineer Quality Gate Priority Order
 
 During Step 3 (TDD Loop), correctness priorities are:
 
-1. **Design correctness** — YAGNI > KISS > DRY > SOLID > Object Calisthenics > appropriate design patterns
+1. **Design correctness** — YAGNI > KISS > DRY > SOLID > Object Calisthenics > appropriated design patterns > complex code > complicated code > failing code > no code
 2. **One test green** — the specific test under work passes, plus `test-fast` still passes
-3. **Reviewer code-design check** — reviewer verifies design + semantic alignment (no lint/pyright/coverage)
-4. **Commit** — only after reviewer APPROVED
-5. **Quality tooling** — `lint`, `static-check`, full `test` with coverage run only at software-engineer handoff (before Step 5)
+3. **Reviewer code-design check** — reviewer verifies design + semantic alignment (no lint/pyright/coverage yet)
+5. **Quality tooling** — `lint`, `static-check`, full `test` with coverage run only at software-engineer handoff (before Step 4)
 
 Design correctness is far more important than lint/pyright/coverage compliance. A well-designed codebase with minor lint issues is better than a lint-clean codebase with poor design.
 
@@ -200,10 +186,6 @@ Design correctness is far more important than lint/pyright/coverage compliance.
 - Both are required. All-green automated checks are necessary but not sufficient for APPROVED.
 - Reviewer defaults to REJECTED unless correctness is proven.
 
-## Deprecation Process
-
-This template does not support deprecation. Criteria changes are handled by adding new Examples with new `@id` tags.
-
 ## Release Management
 
 Version format: `v{major}.{minor}.{YYYYMMDD}`
@@ -212,13 +194,13 @@ Version format: `v{major}.{minor}.{YYYYMMDD}`
 - Same-day second release: increment minor, keep same date
 - Each release gets a unique adjective-animal name
 
-Use `@software-engineer /skill git-release` for the full release process.
+Use `@software-engineer /skill git-release` for the full release process. When requested by the stakeholder
 
 ## Session Management
 
 Every session: load `skill session-workflow`. Read `TODO.md` first, update it at the end.
 
-`TODO.md` is a session bookmark — not a project journal. See `docs/workflow.md` for the full structure including the Cycle State and Self-Declaration blocks used during Step 4.
+`TODO.md` is a session bookmark — not a project journal. See `docs/workflow.md` for the full structure including the Cycle State and Self-Declaration blocks used during Step 3.
 
 ## Setup
 
diff --git a/docs/workflow.md b/docs/workflow.md
index 9e24894..8bea2cf 100644
--- a/docs/workflow.md
+++ b/docs/workflow.md
@@ -157,7 +157,7 @@ Each step has a designated agent and a specific deliverable. No step is skipped.
 │      No inline comments, no TODO, no speculative code              │
 │                                                                     │
 │  WRITE ADR FILES (significant decisions only)                       │
-│    docs/architecture/adr-NNN-<title>.md                            │
+│    docs/architecture/adr.md                                        │
 │      Decision: <what>  Reason: <why>                               │
 │      Alternatives considered: <what was rejected and why>           │
 │                                                                     │
diff --git a/pyproject.toml b/pyproject.toml
index a8dad4d..dfd968d 100644
--- a/pyproject.toml
+++ b/pyproject.toml
@@ -87,7 +87,6 @@ addopts = """
 --color=yes \
 --tb=short \
 -q \
---html=docs/tests/report.html \
 """
 testpaths = ["tests"]
 python_files = ["*_test.py"]
@@ -107,21 +106,28 @@ exclude_lines = [
 
 [tool.taskipy.tasks]
 run = "python -m app"
-test-report = """\
+test-coverage = """\
 pytest \
+  --cov-config=pyproject.toml \
+  --cov=app \
+  --cov-fail-under=100 \
+  --tb=no
+"""
+test-build = """\
+pytest \
+  -p no:beehave \
   --doctest-modules \
   --cov-config=pyproject.toml \
   --cov-report html:docs/coverage \
   --cov-report term:skip-covered \
-  --cov=app \
+  --cov=pytest_beehave \
   --cov-fail-under=100 \
   --hypothesis-show-statistics \
+  --html=docs/tests/report.html \
+  --self-contained-html \
 """
-test = """\
-pytest -m "not slow" -q && \
-task test-report\
-"""
-test-fast = "pytest -m \"not slow\" -q"
+test = "pytest --tb=short"
+test-fast = "pytest -m \"not slow\" -q --no-header --tb=no"
 test-slow = "pytest -m slow"
 ruff-check = "ruff check . --fix"
 ruff-format = "ruff format ."

From 4dd73091ca7ea78118b66f3834230442af8bc392 Mon Sep 17 00:00:00 2001
From: nullhack <nullhack@users.noreply.github.com>
Date: Sun, 19 Apr 2026 07:59:00 -0400
Subject: [PATCH 2/2] =?UTF-8?q?chore(workflow):=20document=20model=20v6=20?=
 =?UTF-8?q?=E2=80=94=202-stage=20discovery,=20PO-owned=20feature=20moves,?=
 =?UTF-8?q?=20unified=20session=20protocol?=
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

- Split docs/discovery.md into three append-only files:
  docs/discovery_journal.md (raw Q&A), docs/discovery.md (synthesis),
  docs/architecture.md (architectural decisions)
- Remove docs/features/discovery.md (superseded)
- Replace Phase 1/Phase 2 model with 2-stage model: Stage 1 Discovery
  (unified iterative sessions) + Stage 2 Specification (stories + criteria)
- PO is sole owner of all .feature file moves; SE and reviewer never move
  or edit .feature files; explicit escalation when no in-progress feature
- Add bug handling protocol: PO adds @bug @id, SE writes both @id test
  and @given Hypothesis property test
- Add real-time split rule: >2 concerns or >8 Examples splits immediately
  within the same session
- Add session status markers to discovery_journal.md: IN-PROGRESS/COMPLETE
- Update all agent files, skills, workflow docs, and research docs to match
---
 .opencode/agents/product-owner.md             |  18 +-
 .opencode/agents/reviewer.md                  |   8 +
 .opencode/agents/software-engineer.md         |   8 +
 .opencode/skills/feature-selection/SKILL.md   |   2 +-
 .opencode/skills/implementation/SKILL.md      |  35 +-
 .opencode/skills/scope/SKILL.md               | 378 ++++++++++--------
 .opencode/skills/scope/discovery-template.md  |  26 +-
 .opencode/skills/session-workflow/SKILL.md    |  21 +-
 .opencode/skills/verify/SKILL.md              |   6 +-
 AGENTS.md                                     |  93 +++--
 docs/architecture.md                          |  19 +
 docs/discovery.md                             |  21 +
 docs/discovery_journal.md                     |  32 ++
 .../completed/display-version.feature         |  24 +-
 docs/features/discovery.md                    |  41 --
 docs/scientific-research/domain-modeling.md   |   2 +-
 .../requirements-elicitation.md               |  26 +-
 docs/workflow.md                              | 214 ++++------
 18 files changed, 508 insertions(+), 466 deletions(-)
 create mode 100644 docs/architecture.md
 create mode 100644 docs/discovery.md
 create mode 100644 docs/discovery_journal.md
 delete mode 100644 docs/features/discovery.md

diff --git a/.opencode/agents/product-owner.md b/.opencode/agents/product-owner.md
index d2ea85b..fe268d3 100644
--- a/.opencode/agents/product-owner.md
+++ b/.opencode/agents/product-owner.md
@@ -25,15 +25,16 @@ Load `skill session-workflow` first — it reads TODO.md, orients you to the cur
 
 | Step | Action |
 |---|---|
-| **Step 1 — SCOPE** | Load `skill scope` — contains the full 4-phase discovery and criteria protocol |
+| **Step 1 — SCOPE** | Load `skill scope` — contains Stage 1 (Discovery sessions) and Stage 2 (Stories + Criteria) |
 | **Step 5 — ACCEPT** | See acceptance protocol below |
 
 ## Ownership Rules
 
-- You are the **sole owner** of `.feature` files and `docs/features/discovery.md`
+- You are the **sole owner** of `.feature` files, `docs/discovery_journal.md`, and `docs/discovery.md`
 - No other agent may edit these files
+- **You are the sole owner of all `.feature` file moves**: backlog → in-progress (before Step 2) and in-progress → completed (after Step 5 acceptance). No other agent moves `.feature` files.
 - Software-engineer escalates spec gaps to you; you decide whether to extend criteria
-- **NEVER move a feature to `in-progress/` unless its discovery section has `Status: BASELINED`** — if not baselined, complete Step 1 (Phase 2 + 3 + 4) first
+- **NEVER move a feature to `in-progress/` unless its `.feature` file has `Status: BASELINED`** — if not baselined, complete Step 1 (Stage 1 Discovery + Stage 2 Specification) first
 
 ## Step 5 — Accept
 
@@ -55,8 +56,17 @@ When a gap is reported (by software-engineer or reviewer):
 | Behavior contradicts an existing Example | Write a new Example with new `@id`. |
 | Post-merge defect | Move the `.feature` file back to `in-progress/`, add new Example with `@id`, resume at Step 3. |
 
+## Bug Handling
+
+When a defect is reported against any feature:
+
+1. Add a `@bug @id:<new-8-char-hex>` Example to the relevant `Rule:` block in the `.feature` file.
+2. Write the Example using the standard `Given/When/Then` format describing the correct behavior.
+3. Update TODO.md to note the new `@id` for the SE to implement.
+4. SE implements the `@id` test in `tests/features/` **and** a `@given` Hypothesis property test in `tests/unit/`. Both are required.
+
 ## Available Skills
 
 - `session-workflow` — session start/end protocol
 - `feature-selection` — when TODO.md is idle: score and select next backlog feature using WSJF
-- `scope` — Step 1: 3-session discovery (Phase 1 + 2), stories (Phase 3), and criteria (Phase 4)
+- `scope` — Step 1: Stage 1 (Discovery sessions with stakeholder) and Stage 2 (Stories + Criteria, PO alone)
diff --git a/.opencode/agents/reviewer.md b/.opencode/agents/reviewer.md
index ef80152..4bdab37 100644
--- a/.opencode/agents/reviewer.md
+++ b/.opencode/agents/reviewer.md
@@ -42,6 +42,14 @@ Load `skill session-workflow` first. Then load `skill verify` for Step 4 verific
 - **Never suggest `noqa`, `type: ignore`, or `pytest.skip` as a fix.** These are bypasses, not solutions.
 - **Report specific locations.** "`physics/engine.py:47`: unreachable return" not "there is dead code."
 - **Every PASS/FAIL cell must have evidence.** Empty evidence = UNCHECKED = REJECTED.
+- **Never move `.feature` files.** The PO is the sole owner of all feature file moves. After producing an APPROVED report, update TODO.md and stop — the PO accepts and moves the file.
+
+## After APPROVED
+
+When your report verdict is APPROVED:
+1. Write the report as described in `skill verify`.
+2. Update TODO.md `## Next` line: `Run @product-owner — accept feature <name> at Step 5.`
+3. Stop. Do not touch `.feature` files. The PO reviews the feature themselves and moves it to `completed/`.
 
 ## Gap Reporting
 
diff --git a/.opencode/agents/software-engineer.md b/.opencode/agents/software-engineer.md
index 76cb96a..10bdc5e 100644
--- a/.opencode/agents/software-engineer.md
+++ b/.opencode/agents/software-engineer.md
@@ -45,6 +45,14 @@ Load `skill session-workflow` first — it reads TODO.md, orients you to the cur
 
 - You own all technical decisions: module structure, patterns, internal APIs, test tooling, linting config
 - **PO approves**: new runtime dependencies, changed entry points, scope changes
+- **You never move `.feature` files.** The PO is the sole owner of all feature file moves (backlog → in-progress → completed). If you find no `.feature` file in `docs/features/in-progress/`, **STOP** — do not self-select a feature. Write the gap in TODO.md and escalate to PO.
+
+## No In-Progress Feature
+
+If `docs/features/in-progress/` contains only `.gitkeep` (no `.feature` file):
+1. Do not pick a feature from backlog yourself.
+2. Update TODO.md: `Next: Run @product-owner — load skill feature-selection and pick the next BASELINED feature from backlog.`
+3. Stop. The PO must move the chosen feature into `in-progress/` before you can begin Step 2.
 
 ## Spec Gaps
 
diff --git a/.opencode/skills/feature-selection/SKILL.md b/.opencode/skills/feature-selection/SKILL.md
index 63be1ed..a195b20 100644
--- a/.opencode/skills/feature-selection/SKILL.md
+++ b/.opencode/skills/feature-selection/SKILL.md
@@ -100,7 +100,7 @@ Run @<agent-name> — <first concrete action for this feature>
 ```
 
 - If the feature has no `Rule:` blocks yet → Step 1 (SCOPE): `Run @product-owner — load skill scope and write stories`
-- If the feature has `Rule:` blocks but no `@id` Examples → Step 1 Phase 4 (Criteria): `Run @product-owner — load skill scope and write acceptance criteria`
+- If the feature has `Rule:` blocks but no `@id` Examples → Step 1 Stage 2 Step B (Criteria): `Run @product-owner — load skill scope and write acceptance criteria`
 - If the feature has `@id` Examples → Step 2 (ARCH): `Run @software-engineer — load skill implementation and write architecture stubs`
 
 ### 6. Commit
diff --git a/.opencode/skills/implementation/SKILL.md b/.opencode/skills/implementation/SKILL.md
index 2472204..01d099c 100644
--- a/.opencode/skills/implementation/SKILL.md
+++ b/.opencode/skills/implementation/SKILL.md
@@ -28,7 +28,7 @@ Design correctness is far more important than lint/pyright/coverage compliance.
 
 ### Prerequisites (stop if any fail — escalate to PO)
 
-1. `docs/features/in-progress/` contains only `.gitkeep` (no `.feature` files). If another `.feature` file exists, **STOP** — another feature is already in progress.
+1. `docs/features/in-progress/` contains exactly one `.feature` file (not just `.gitkeep`). If none exists, **STOP** — update TODO.md `Next:` to `Run @product-owner — move the chosen feature to in-progress/` and stop. Never self-select or move a feature yourself.
 2. The feature file's discovery section has `Status: BASELINED`. If not, escalate to PO — Step 1 is incomplete.
 3. The feature file contains `Rule:` blocks with `Example:` blocks and `@id` tags. If not, escalate to PO — criteria have not been written.
 4. Package name confirmed: read `pyproject.toml` → locate `[tool.setuptools]` → confirm directory exists on disk.
@@ -39,24 +39,18 @@ Design correctness is far more important than lint/pyright/coverage compliance.
 2. Confirm directory exists: `ls <name>/`
 3. All new source files go under `<name>/`
 
-### Move Feature File
-
-```bash
-mv docs/features/backlog/<name>.feature docs/features/in-progress/<name>.feature
-```
-
-Update `TODO.md` Source path from `backlog/` to `in-progress/`.
+**Note on feature file moves**: The PO moves `.feature` files between folders. The software-engineer never moves or edits `.feature` files. Update TODO.md `Source:` path to reflect `in-progress/` once the PO has moved the file.
 
 ### Read Phase (all before writing anything)
 
-1. Read `docs/features/discovery.md` (project-level)
+1. Read `docs/discovery.md` (project-level synthesis changelog) and optionally `docs/discovery_journal.md` (Q&A history for context)
 2. Read **ALL** `.feature` files in `docs/features/backlog/` (discovery + entities sections)
 3. Read in-progress `.feature` file (full: Rules + Examples + @id)
 4. Read **ALL** existing `.py` files in `<package>/` — understand what already exists before adding anything
 
 ### Domain Analysis
 
-From Entities table + Rules (Business) in `.feature` file:
+From the Domain Model table in `docs/discovery.md` + Rules (Business) in the `.feature` file:
 - **Nouns** → named classes, value objects, aggregates
 - **Verbs** → method names with typed signatures
 - **Datasets** → named types (not bare dict/list)
@@ -116,19 +110,20 @@ class UserRepository(Protocol):
 
 Place stubs where responsibility dictates — do not pre-create `ports/` or `adapters/` folders unless a concrete external dependency was identified in scope. Structure follows domain analysis, not a template.
 
-### Write ADR Files (significant decisions only)
+### Record Architectural Decisions
 
-For each significant architectural decision, create or append to `docs/architecture/adr.md`:
+Append a new dated block to `docs/architecture.md` for each significant decision:
 
 ```markdown
-# ADR-NNN: <title>
+## YYYY-MM-DD — <feature-name>: <short title>
 
-**Decision:** <what was decided>
-**Reason:** <why, one sentence>
-**Alternatives considered:** <what was rejected and why>
+Decision: <what was decided>
+Reason: <why, one sentence>
+Alternatives considered: <what was rejected and why>
+Feature: <feature-name>
 ```
 
-Only write an ADR if the decision is non-obvious or has meaningful trade-offs. Routine YAGNI choices do not need an ADR.
+Only write a block for non-obvious decisions with meaningful trade-offs. Routine YAGNI choices do not need a record.
 
 ### Architecture Smell Check (hard gate)
 
@@ -155,7 +150,7 @@ Commit: `feat(<feature-name>): add architecture stubs`
 
 - [ ] Exactly one .feature `in_progress`. If not present, Load `skill feature-selection` 
 - [ ] Architecture stubs present in `<package>/` (committed by Step 2)
-- [ ] Read all `docs/architecture/adr.md` files — understand the architectural decisions before writing any test
+- [ ] Read `docs/architecture.md` — understand all architectural decisions before writing any test
 - [ ] Test stub files exist in `tests/features/<feature-name>/<rule_slug>_test.py` — one file per `Rule:` block, all `@id` stub functions present with `@pytest.mark.skip`; if missing, write them now before entering RED
 
 ### Write Test Stubs (if not present)
@@ -284,10 +279,10 @@ tests/features/<feature-name>/<rule_slug>_test.py
 ### Function Naming
 
 ```python
-def test_<rule_slug>_<@id>() -> None:
+def test_<feature_slug>_<@id>() -> None:
 ```
 
-- `rule_slug` = the `Rule:` title with spaces/hyphens replaced by underscores, lowercase
+- `feature_slug` = the `.feature` file stem with spaces/hyphens replaced by underscores, lowercase
 - `@id` = the `@id` from the `Example:` block
 
 ### Docstring Format (mandatory)
diff --git a/.opencode/skills/scope/SKILL.md b/.opencode/skills/scope/SKILL.md
index 44f6f56..e7ecc46 100644
--- a/.opencode/skills/scope/SKILL.md
+++ b/.opencode/skills/scope/SKILL.md
@@ -1,7 +1,7 @@
 ---
 name: scope
 description: Step 1 — discover requirements through stakeholder interviews and write Gherkin acceptance criteria
-version: "4.0"
+version: "5.0"
 author: product-owner
 audience: product-owner
 workflow: feature-lifecycle
@@ -13,20 +13,18 @@ This skill guides the PO through Step 1 of the feature lifecycle: interviewing t
 
 ## When to Use
 
-When the PO is starting a new project or a new feature. The output is a set of `.feature` files in `docs/features/backlog/`.
+When the PO is starting a new project, adding features, or refining an existing feature. The output is a set of `.feature` files in `docs/features/backlog/` ready for development.
 
 ## Overview
 
-Step 1 has 4 phases:
+Step 1 has two stages:
 
-| Phase | Who | Output |
+| Stage | Who | Output |
 |---|---|---|
-| 1. Project Discovery | PO + stakeholder | `docs/features/discovery.md` + feature list |
-| 2. Feature Discovery | PO + stakeholder | Discovery section embedded in `docs/features/backlog/<name>.feature` |
-| 3. Stories | PO alone | `Rule:` blocks in the `.feature` file (no Examples) |
-| 4. Criteria | PO alone | `Example:` blocks with `@id` tags under each `Rule:` |
+| **Stage 1 — Discovery** | PO + stakeholder | `docs/discovery_journal.md` (Q&A) + `docs/discovery.md` (synthesis) + `.feature` descriptions |
+| **Stage 2 — Specification** | PO alone | `Rule:` blocks + `Example:` blocks with `@id` tags in `.feature` files |
 
-Each phase produces a template-gated deliverable. A section must be complete and confirmed before the next section unlocks. Template enforcement is the process discipline — not a "baseline" command.
+Stage 1 is iterative and ongoing — sessions happen whenever the PO or stakeholder needs to discover or refine scope. Stage 2 runs per feature, only after that feature has `Status: BASELINED`.
 
 ---
 
@@ -64,21 +62,37 @@ Three levels of active listening apply throughout every interview session:
 
 - **Level 1 — Per answer**: immediately paraphrase each answer before moving to the next question. "So if I understand correctly, you're saying that X happens when Y?" Catches misunderstanding in the moment.
 - **Level 2 — Per group**: brief synthesis when transitioning between behavior groups. "We've covered [area A] and [area B]. Before I ask about [area C], here is what I understood so far: [summary]. Does that capture it?" Confirms completeness, gives stakeholder a recovery point.
-- **Level 3 — End of session**: full synthesis of everything discussed. Present to stakeholder for approval. This is the accuracy gate, the baseline signal, and the input to domain modeling.
+- **Level 3 — End of session**: full synthesis of everything discussed. Present to stakeholder for approval. This is the accuracy gate and the input to domain modeling.
 
 Do not introduce topic labels or categories during active listening. The summary must reflect what the stakeholder said, not new framing that prompts reactions to things they haven't considered.
 
 ---
 
-## Phase 1 — Project Discovery
+## Stage 1 — Discovery
 
-**When**: Once per project, before any features are scoped. **Skip entirely if `discovery.md` Status is `BASELINED`.** Adding features to an existing project: append new questions to Session 1 and re-fill from there.
+Discovery is a continuous, iterative process. Sessions happen whenever scope needs to be established or refined — for a new project, for a new feature, or when new information emerges. There is no "Phase 1" vs "Phase 2" distinction; every session follows the same structure.
 
-### Session 1 — Individual Scope Elicitation
+### Session Start (every session)
 
-**Before the session**: Create `docs/features/discovery.md` using the project-level discovery template. Open to the Session 1 section.
+**Before asking any questions:**
 
-**Ask the 7 standard questions** (present all at once):
+1. Check `docs/discovery_journal.md` for the most recent session block.
+   - If the most recent block has `Status: IN-PROGRESS` → the previous session was interrupted. Resume it: check which `.feature` files need updating (compare journal Q&A against current `.feature` descriptions), write the `discovery.md` synthesis block if missing, then mark the block `Status: COMPLETE`. Only then begin a new session.
+   - If `docs/discovery_journal.md` does not exist → this is the first session. Create both `docs/discovery_journal.md` and `docs/discovery.md` using the templates at the end of this skill.
+2. Open `docs/discovery_journal.md` and append a new session header:
+   ```markdown
+   ## YYYY-MM-DD — Session N
+   Status: IN-PROGRESS
+   ```
+   Write this header **before** asking any questions. This is the durability marker — if the session is interrupted, the next agent sees `IN-PROGRESS` and knows writes are pending.
+
+### Question Order (within every session)
+
+Questions follow this order. Skip a group only if it was already fully covered in a prior session.
+
+**1. General questions** (skip entirely if any prior session has covered these)
+
+Ask all 7 at once:
 
 1. **Who** are the users of this product?
 2. **What** does the product do at a high level?
@@ -88,112 +102,81 @@ Do not introduce topic labels or categories during active listening. The summary
 6. **Failure** — what does failure look like? What must never happen?
 7. **Out-of-scope** — what are we explicitly not building?
 
-**During the session**: Apply Level 1 active listening (paraphrase each answer). Apply CIT, Laddering, and CI Perspective Change per answer to surface gaps. Add new questions to the Questions table as they arise — do not defer to a later session.
-
-**After the session**:
-
-1. Write the **Session 1 Synthesis** in `discovery.md`: a 3–5 sentence summary of who the users are, what the product does, why it exists, its success/failure conditions, and explicit out-of-scope boundaries.
-2. Present the synthesis to the stakeholder: "Here is my understanding of what you told me — please correct anything that is missing or wrong."
-3. Stakeholder confirms or corrects. PO refines until approved.
-4. Run a **silent pre-mortem** on the confirmed synthesis: "Imagine we build exactly what was described, ship it, and it fails. What was missing?" Add any discoveries as new questions to the Questions table.
-5. Mark `Template §1: CONFIRMED` in `discovery.md`. This unlocks Session 2.
-
-### Session 2 — Behavior Groups / Big Picture
-
-**Before the session**: Review the confirmed Session 1 synthesis. Identify behavior groups (cross-cutting concerns, system-wide constraints, integration points, lifecycle questions). Prepare group-level questions.
-
-**During the session**: Apply Level 1 active listening per answer. Apply Level 2 active listening when transitioning between groups. Apply CIT, Laddering, and CI Perspective Change per group. Add new questions in the moment.
-
-**After the session**:
-
-1. For each group, write a **Group Summary** in `discovery.md`.
-2. Mark `Template §2: CONFIRMED` in `discovery.md`. This unlocks Session 3.
-
-### Session 3 — Synthesis Approval + Feature Derivation
-
-**Before the session**: Produce a **Full Synthesis** across all behavior groups from Sessions 1 and 2. Write it to `discovery.md`.
-
-**During the session**: Present the full synthesis to the stakeholder. "This is my understanding of the full scope. Please correct anything that is missing or wrong." Stakeholder approves or corrects. PO refines until the stakeholder explicitly approves.
-
-**After the session** (PO alone):
-
-1. Domain analysis: extract all nouns (candidate entities) and verbs (candidate operations) from the approved synthesis.
-2. Group nouns into subject areas (Bounded Contexts: where the same word means different things, a new context begins).
-3. Name each subject area as a feature using FDD "Action object" triples: "Calculate the total of a sale", "Validate the password of a user", "Enroll a student in a seminar".
-4. For each feature: create `docs/features/backlog/<name>.feature` using the feature file template (discovery section only — no Rules yet).
-5. Write `Status: BASELINED (YYYY-MM-DD)` to `discovery.md`.
-
-Commit: `feat(discovery): baseline project discovery`
+Apply Level 1 active listening per answer. Apply CIT, Laddering, and CI Perspective Change per answer to surface gaps. Add new questions in the moment.
 
----
-
-## Phase 2 — Feature Discovery
-
-**When**: Per feature, after project discovery is baselined. Each `.feature` file has its own 3-session discovery template in its description.
+**2. Cross-cutting questions**
 
-### Session 1 — Individual Entity Elicitation
+Target behavior groups, bounded contexts, integration points, lifecycle events, and system-wide constraints. Apply Level 2 active listening when transitioning between groups.
 
-**Before the session**: Open `docs/features/backlog/<name>.feature`.
+**3. Feature questions** (one feature at a time)
 
-1. **Populate the Entities table**: extract nouns (candidate classes) and verbs (candidate methods) from the project discovery synthesis that are relevant to this feature. Mark each as in-scope or not.
-2. **Generate questions from entity gaps**: for each in-scope entity, ask internally:
-   - What are its boundaries and edge cases?
-   - What happens when it is missing, invalid, or at its limits?
-   - How does it interact with other in-scope entities?
-3. Add questions to the Session 1 Questions table.
-4. Run a **silent pre-mortem**: "Imagine the developer builds this feature exactly as described, all tests pass, but the feature doesn't work for the user. What would be missing?" Add any discoveries as new questions.
+For each feature the session touches:
+- Extract relevant nouns and verbs from `docs/discovery.md` Domain Model (if it exists)
+- Generate questions from entity gaps: boundaries, edge cases, interactions, failure modes
+- Run a silent pre-mortem: "Imagine the developer builds this feature exactly as described, all tests pass, but the feature doesn't work for the user. What would be missing?"
+- Apply CIT, Laddering, and CI Perspective Change per question
 
-**During the session**: Apply Level 1 active listening per answer. Apply CIT, Laddering, and CI Perspective Change per answer. Add new questions in the moment.
+**Real-time split rule**: if, during feature questions, the PO detects >2 distinct concerns OR >8 candidate Examples for a single feature, **split immediately**:
+1. Record the split in the journal: note the original feature name and the two new names
+2. Create stub `.feature` files for both parts (if they don't already exist)
+3. Continue feature questions for both new features in sequence within the same session
 
-**After the session**:
+### After Questions (PO alone, same session)
 
-1. Write the **Session 1 Synthesis** in the `.feature` file: summarize the key entities, their relationships, and the constraints that emerged.
-2. Present the synthesis to the stakeholder. Stakeholder confirms or corrects. PO refines until approved.
-3. Mark `Template §1: CONFIRMED`. This unlocks Session 2.
+**Step A — Write answered Q&A to journal**
 
-### Session 2 — Behavior Groups / Big Picture for This Feature
+Append all answered Q&A to `docs/discovery_journal.md`, in groups (general, cross-cutting, then per-feature). Write only answered questions. Unanswered questions are discarded.
 
-**Before the session**: Review the confirmed Session 1 synthesis. Identify behavior groups within this feature (happy paths, error paths, edge cases, lifecycle events, integration points).
+Group headers use this format:
+- General group: `### General`
+- Cross-cutting group: `### <Group Name>`
+- Feature group: `### Feature: <feature-name>`
 
-**During the session**: Apply Level 1 active listening per answer. Apply Level 2 active listening when transitioning between groups. Apply CIT, Laddering, and CI Perspective Change per group.
+**Step B — Update .feature descriptions**
 
-**After the session**:
+For each feature touched in this session: rewrite the `.feature` file description to reflect the current state of understanding. Only touched features are updated; all others remain exactly as-is.
 
-1. Write **Group Summaries** in the `.feature` file. Name each group — these names become candidate `Rule:` titles.
-2. Mark `Template §2: CONFIRMED`. This unlocks Session 3.
+If a feature is new (just created as a stub): write its initial description now.
 
-### Session 3 — Feature Synthesis Approval + Story Derivation
+**Step C — Append session synthesis to discovery.md (LAST)**
 
-**Before the session**: Produce a **Full Synthesis** of the feature scope, covering all behavior groups from Sessions 1 and 2.
+After all `.feature` files are updated, append one `## Session: YYYY-MM-DD` block to `docs/discovery.md`. The block contains:
+- `### Feature List` — which features were added or changed (0–N entries); if nothing changed, write "No changes"
+- `### Domain Model` — new or updated domain entities and verbs; if nothing changed, write "No changes"
+- `### Scope` (first session only) — 3–5 sentence synthesis of who the users are, what the product does, why it exists, success/failure conditions, and explicit out-of-scope
 
-**During the session**: Present the full synthesis to the stakeholder. Stakeholder approves or corrects. PO refines until explicitly approved.
+**Step D — Mark session complete**
 
-**After the session** (PO alone):
-
-1. Map each named group from Session 2 to a candidate user story (Rule).
-2. Write `Status: BASELINED (YYYY-MM-DD)` to the `.feature` file's discovery section.
-3. Mark `Template §3: CONFIRMED`.
+Update the session header in `docs/discovery_journal.md`:
+```markdown
+## YYYY-MM-DD — Session N
+Status: COMPLETE
+```
 
-Commit: `feat(discovery): baseline <name> feature discovery`
+**Commit**: `feat(discovery): <one-sentence summary of session>`
 
-### Decomposition Check
+### Baselining a Feature
 
-After Session 3, before moving to Phase 3:
+A feature is baselined when the stakeholder has explicitly approved its discovery. The PO writes `Status: BASELINED (YYYY-MM-DD)` in the `.feature` file.
 
-Does this feature span **>2 distinct concerns** OR have **>8 candidate Examples**?
+**Gate**: a feature may only be baselined when:
+- Its description accurately reflects the stakeholder's approved understanding
+- Its candidate user stories (Rule candidates) are identified
+- The decomposition check passes: does not span >2 concerns AND does not have >8 candidate Examples
 
-- **YES** → split into separate `.feature` files in `backlog/`, each addressing a single cohesive concern. Re-run Phase 2 for any split feature that needs its own discovery.
-- **NO** → proceed to Phase 3.
+A baselined feature is ready for Stage 2. The PO may baseline features one at a time — not all at once.
 
 ---
 
-## Phase 3 — Stories
+## Stage 2 — Specification
+
+Stage 2 runs per feature, after `Status: BASELINED`. PO works alone. No stakeholder involvement.
 
-**When**: After feature discovery is baselined and decomposition check passes. PO works alone.
+If the PO discovers a gap during Stage 2 that requires stakeholder input: stop Stage 2, open a new Stage 1 session, resolve the gap, then return to Stage 2.
 
-### 3.1 Write Rule Blocks
+### Step A — Stories
 
-Clusters from Phase 2 Session 2 → one `Rule:` block per user story. Add after the discovery section in the `.feature` file.
+Derive `Rule:` blocks from the baselined feature description. One `Rule:` per user story.
 
 Each `Rule:` block contains:
 - The rule title (2-4 words, kebab-friendly)
@@ -216,9 +199,7 @@ Good stories are:
 
 Avoid: "As the system, I want..." (no business value). Break down stories that contain "and" into two Rules.
 
-### 3.2 INVEST Gate
-
-Before committing, verify every Rule passes:
+**INVEST Gate** — verify every Rule before committing:
 
 | Letter | Question | FAIL action |
 |---|---|---|
@@ -229,34 +210,25 @@ Before committing, verify every Rule passes:
 | **S**mall | Completable in one feature cycle? | Split into smaller Rules |
 | **T**estable | Can it be verified with a concrete test? | Rewrite with observable outcomes |
 
-### 3.3 Review Checklist
-
+**Review checklist:**
 - [ ] Every Rule has a distinct user role and benefit
 - [ ] No Rule duplicates another
-- [ ] Rules collectively cover all entities marked in-scope in the discovery section
+- [ ] Rules collectively cover all entities in scope from the feature description
 - [ ] Every Rule passes the INVEST gate
 
 Commit: `feat(stories): write user stories for <name>`
 
----
-
-## Phase 4 — Criteria
+### Step B — Criteria
 
-**When**: After stories are written. PO works alone.
+Add `Example:` blocks under each `Rule:`. PO writes all Examples alone, based on the approved feature description and domain knowledge. No stakeholder review of individual Examples.
 
-### 4.1 Silent Pre-mortem Per Rule
-
-For each `Rule:` block, ask internally before writing any Examples:
+**Silent pre-mortem per Rule** (before writing any Examples):
 
 > "What observable behaviors must we prove for this Rule to be complete?"
 
 All Rules must have their pre-mortems completed before any Examples are written.
 
-### 4.2 Write Example Blocks
-
-Add `Example:` blocks under each `Rule:`. Each Example gets an `@id:<8-char-hex>` tag.
-
-**Format** (mandatory):
+**Example format** (mandatory):
 
 ```gherkin
   Rule: Wall bounce
@@ -279,8 +251,6 @@ Add `Example:` blocks under each `Rule:`. Each Example gets an `@id:<8-char-hex>
 - **Observable means observable by the end user**, not by a test harness
 - **Declarative, not imperative** — describe behavior, not UI steps
 - Each Example must be observably distinct from every other
-- If a single feature spans multiple concerns, split into separate `.feature` files
-- If user interaction is involved, the Feature description must declare the interaction model
 
 **Declarative vs. imperative Gherkin**:
 
@@ -299,9 +269,7 @@ Add `Example:` blocks under each `Rule:`. Each Example gets an `@id:<8-char-hex>
 - Examples that test implementation details ("Then: the Strategy pattern is used")
 - Imperative UI steps instead of declarative behavior descriptions
 
-### 4.3 Review Checklist
-
-Before committing:
+**Review checklist:**
 - [ ] Every `Rule:` block has at least one Example
 - [ ] Every `@id` is unique within this feature
 - [ ] Every Example has `Given/When/Then`
@@ -311,61 +279,53 @@ Before committing:
 - [ ] Each Example is observably distinct from every other
 - [ ] No single feature file spans multiple unrelated concerns
 
-### 4.4 Commit and Freeze
-
-```bash
-git add docs/features/backlog/<name>.feature
-git commit -m "feat(criteria): write acceptance criteria for <name>"
-```
+Commit: `feat(criteria): write acceptance criteria for <name>`
 
-**After this commit, the `Example:` blocks are frozen.** Any change requires:
+**After this commit, `Example:` blocks are frozen.** Any change requires:
 1. Add `@deprecated` tag to the old Example
 2. Write a new Example with a new `@id`
 
 ---
 
+## Bug Handling
+
+When a defect is reported against a completed or in-progress feature:
+
+1. **PO** adds a new Example to the relevant `Rule:` block in the `.feature` file:
+
+   ```gherkin
+   @bug @id:<new-8-char-hex>
+   Example: <what the bug is>
+     Given <conditions that trigger the bug>
+     When <action>
+     Then <correct behavior>
+   ```
+
+2. **SE** implements the specific test in `tests/features/<feature-name>/` (the `@id` test).
+3. **SE** also writes a `@given` Hypothesis property test in `tests/unit/` covering the whole class of inputs that triggered the bug — not just the single case.
+4. Both tests are required — neither is optional.
+5. SE follows the normal TDD loop (Step 3) for the new `@id`.
+
+---
+
 ## Feature File Format
 
-Each feature is a single `.feature` file. The free-form description before the first `Rule:` contains all discovery content. Architecture is added later by the developer (Step 2).
+Each feature is a single `.feature` file. The description block contains the feature description and Status. All Q&A belongs in `docs/discovery_journal.md`; all architectural decisions belong in `docs/architecture.md`.
 
 ```gherkin
 Feature: <Feature title>
 
-  Discovery:
+  <2–4 sentence description of what this feature does and why it exists.
+  Written in plain language, always kept current by the PO.>
 
   Status: ELICITING | BASELINED (YYYY-MM-DD)
 
-  Entities:
-  | Type | Name | Candidate Class/Method | In Scope |
-  |------|------|----------------------|----------|
-  | Noun | Ball | Ball                 | Yes      |
-  | Verb | Bounce | Ball.bounce()      | Yes      |
-
   Rules (Business):
   - <Business rule that applies across multiple Examples>
 
   Constraints:
   - <Non-functional requirement specific to this feature>
 
-  Session 1 — Individual Entity Elicitation:
-  | ID | Question | Answer | Status |
-  |----|----------|--------|--------|
-  | Q1 | ... | ... | OPEN / ANSWERED |
-  Template §1: CONFIRMED
-  Synthesis: <PO synthesis — confirmed by stakeholder>
-
-  Session 2 — Behavior Groups / Big Picture:
-  | ID | Question | Answer | Status |
-  |----|----------|--------|--------|
-  | Q2 | ... | ... | OPEN / ANSWERED |
-  Template §2: CONFIRMED
-  Behavior Groups:
-  - <Behavior group name>: <one-sentence summary>
-
-  Session 3 — Feature Synthesis:
-  Synthesis: <full synthesis across all behavior groups>
-  Template §3: CONFIRMED — stakeholder approved YYYY-MM-DD
-
   Rule: <User story title>
     As a <role>
     I want <goal>
@@ -388,41 +348,115 @@ The **Rules (Business)** section captures business rules that hold across multip
 
 The **Constraints** section captures non-functional requirements. Testable constraints should become `Example:` blocks with `@id` tags.
 
+What is **not** in `.feature` files:
+- Entities table — domain model lives in `docs/discovery.md`
+- Session Q&A blocks — live in `docs/discovery_journal.md`
+- Template §N markers — live in `docs/discovery_journal.md` session blocks
+- Architecture section — lives in `docs/architecture.md`
+
+---
+
+## Project-Level Discovery Templates
+
+Three files hold project-level discovery content. Use these templates when creating them for the first time.
+
+### `docs/discovery_journal.md` — Raw Q&A (append-only)
+
+```markdown
+# Discovery Journal: <project-name>
+
 ---
 
-## Project-Level Discovery (`docs/features/discovery.md`)
+## YYYY-MM-DD — Session 1
+Status: IN-PROGRESS
+
+### General
+
+| ID | Question | Answer |
+|----|----------|--------|
+| Q1 | Who are the users? | ... |
+| Q2 | What does the product do at a high level? | ... |
+| Q3 | Why does it exist — what problem does it solve? | ... |
+| Q4 | When and where is it used? | ... |
+| Q5 | Success — what does "done" look like? | ... |
+| Q6 | Failure — what must never happen? | ... |
+| Q7 | Out-of-scope — what are we explicitly not building? | ... |
+
+### <Group Name>
+
+| ID | Question | Answer |
+|----|----------|--------|
+| Q8 | ... | ... |
+
+### Feature: <feature-name>
+
+| ID | Question | Answer |
+|----|----------|--------|
+| Q9 | ... | ... |
+
+Status: COMPLETE
+```
+
+Rules:
+- Session header written first with `Status: IN-PROGRESS` before any Q&A
+- Only answered questions are written; unanswered questions are discarded
+- Questions grouped by topic (general, cross-cutting groups, per-feature)
+- `Status: COMPLETE` written at the end of the session block, after all writes are done
+- Never edit past entries — only append new session blocks
+
+### `docs/discovery.md` — Synthesis Changelog (append-only)
 
 ```markdown
 # Discovery: <project-name>
 
-## State
-Status: ELICITING | BASELINED (YYYY-MM-DD)
+---
+
+## Session: YYYY-MM-DD
 
-## Session 1 — Individual Scope Elicitation
+### Scope
+<3–5 sentence synthesis of who the users are, what the product does, why it exists,
+success/failure conditions, and out-of-scope boundaries.>
+(First session only. Omit this subsection in subsequent sessions.)
 
-| ID | Question | Answer | Status |
-|----|----------|--------|--------|
-| Q1 | Who are the users? | ... | OPEN / ANSWERED |
+### Feature List
+- `<feature-name>` — <one-sentence description of what changed or was added>
+(Write "No changes" if no features were added or modified this session.)
 
-Template §1: CONFIRMED
-Synthesis: <PO synthesis — confirmed by stakeholder>
-Pre-mortem: <gaps identified; new questions added above>
+### Domain Model
+| Type | Name | Description | In Scope |
+|------|------|-------------|----------|
+| Noun | <name> | <description> | Yes |
+| Verb | <name> | <description> | Yes |
+(Write "No changes" if domain model was not updated this session.)
+```
 
-## Session 2 — Behavior Groups / Big Picture
+Rules:
+- Each session appends one `## Session: YYYY-MM-DD` block
+- Synthesis block is written LAST — only after all `.feature` file descriptions are updated
+- No project-level `Status: BASELINED` — feature-level BASELINED in `.feature` files is the gate
+- Never edit past blocks — append only; later blocks extend or supersede earlier ones
 
-| ID | Question | Answer | Status |
-|----|----------|--------|--------|
-| Q2 | ... | ... | OPEN / ANSWERED |
+### `docs/architecture.md` — Architectural Decisions (append-only, software-engineer)
 
-Template §2: CONFIRMED
-Behavior Groups:
-- <Behavior group name>: <one-sentence summary>
+```markdown
+# Architecture: <project-name>
 
-## Session 3 — Full Synthesis
+---
 
-<3–6 paragraph synthesis of all scope, behavior groups, and boundaries>
+## YYYY-MM-DD — <feature-name>: <short title>
 
-Template §3: CONFIRMED — stakeholder approved YYYY-MM-DD
+Decision: <what was decided — one sentence>
+Reason: <why — one sentence>
+Alternatives considered: <what was rejected and why>
+Feature: <feature-name>
 ```
 
-No Entities table at project level.
+Rules: Append-only. When a decision changes, append a new block that supersedes the old one. Cross-feature decisions use `Cross-feature:` in the header. Only write a block for non-obvious decisions with meaningful trade-offs.
+
+Base directory for this skill: file:///home/user/Documents/projects/python-project-template/.opencode/skills/scope
+Relative paths in this skill (e.g., scripts/, reference/) are relative to this base directory.
+Note: file list is sampled.
+
+<skill_files>
+<file>/home/user/Documents/projects/python-project-template/.opencode/skills/scope/discovery-template.md</file>
+</skill_files>
diff --git a/.opencode/skills/scope/discovery-template.md b/.opencode/skills/scope/discovery-template.md
index 117d025..aa4cc5c 100644
--- a/.opencode/skills/scope/discovery-template.md
+++ b/.opencode/skills/scope/discovery-template.md
@@ -1,33 +1,9 @@
 Feature: <feature-name>
 
-  Discovery:
+  <2–4 sentence description of what this feature does and why it exists.>
 
   Status: ELICITING
 
-  Entities:
-  | Type | Name | Candidate Class/Method | In Scope |
-  |------|------|----------------------|----------|
-
   Rules (Business):
 
   Constraints:
-
-  Session 1 — Individual Entity Elicitation:
-  | ID | Question | Answer | Status |
-  |----|----------|--------|--------|
-
-  Template §1: PENDING
-  Synthesis: (fill after stakeholder confirms)
-  Pre-mortem: (fill after synthesis is confirmed)
-
-  Session 2 — Behavior Groups / Big Picture:
-  | ID | Question | Answer | Status |
-  |----|----------|--------|--------|
-
-  Template §2: PENDING
-  Behavior Groups:
-  - (fill after all group questions are answered)
-
-  Session 3 — Feature Synthesis:
-  (fill after Sessions 1 and 2 are complete)
-  Template §3: PENDING
diff --git a/.opencode/skills/session-workflow/SKILL.md b/.opencode/skills/session-workflow/SKILL.md
index 658023f..588a445 100644
--- a/.opencode/skills/session-workflow/SKILL.md
+++ b/.opencode/skills/session-workflow/SKILL.md
@@ -19,15 +19,20 @@ Every session starts by reading state. Every session ends by writing state. This
      # Current Work
 
      No feature in progress.
-     Next: PO picks feature from docs/features/backlog/ and moves it to docs/features/in-progress/.
+     Next: Run @product-owner — load skill feature-selection and pick the next BASELINED feature from backlog.
      ```
-2. If a feature is active, read:
-   - `docs/features/in-progress/<name>.feature` — feature file (discovery + architecture + Rules + Examples)
-   - `docs/features/discovery.md` — project-level discovery (for context)
-3. Run `git status` — understand what is committed vs. what is not
-4. Confirm scope: you are working on exactly one step of one feature
-
-If TODO.md says "No feature in progress", load `skill feature-selection` — it guides the PO through scoring and selecting the next BASELINED backlog feature. **The software-engineer never self-selects a feature from the backlog — only the PO picks.** The PO must verify the feature has `Status: BASELINED` in its discovery section before moving it to `in-progress/` — if not baselined, the PO must complete Step 1 first.
+2. **If you are the PO** and Step 1 (SCOPE) is active: check `docs/discovery_journal.md` for the most recent session block.
+   - If the most recent block has `Status: IN-PROGRESS` → the previous session was interrupted. Resume it before starting a new session: finish updating `.feature` files and `docs/discovery.md`, then mark the block `Status: COMPLETE`.
+3. If a feature is active at Step 2–5, read:
+   - `docs/features/in-progress/<name>.feature` — feature file (Rules + Examples + @id)
+   - `docs/discovery.md` — project-level synthesis changelog (for context)
+4. Run `git status` — understand what is committed vs. what is not
+5. Confirm scope: you are working on exactly one step of one feature
+
+**If TODO.md says "No feature in progress":**
+
+- **PO**: Load `skill feature-selection` — it guides you through scoring and selecting the next BASELINED backlog feature. You must verify the feature has `Status: BASELINED` before moving it to `in-progress/`. Only you may move it.
+- **Software-engineer or reviewer**: Update TODO.md `Next:` line to `Run @product-owner — load skill feature-selection and pick the next BASELINED feature from backlog.` Then **stop**. Never self-select a feature. Never move a `.feature` file.
 
 ## Session End
 
diff --git a/.opencode/skills/verify/SKILL.md b/.opencode/skills/verify/SKILL.md
index 7d145f2..67b6ae6 100644
--- a/.opencode/skills/verify/SKILL.md
+++ b/.opencode/skills/verify/SKILL.md
@@ -15,6 +15,8 @@ This skill guides the reviewer through Step 4: independent verification that the
 
 **Every PASS/FAIL cell must have evidence.** Empty evidence = UNCHECKED = REJECTED.
 
+**You never move `.feature` files.** After producing an APPROVED report: update TODO.md `Next:` to `Run @product-owner — accept feature <name> at Step 5.` then stop. The PO accepts the feature and moves the file.
+
 ## When to Use (Step 4)
 
 After the software-engineer signals Step 3 is complete and all self-verification checks pass. Do not start verification until the software-engineer has committed all work and written the Self-Declaration.
@@ -26,7 +28,7 @@ After the software-engineer signals Step 3 is complete and all self-verification
 Read `docs/features/in-progress/<name>.feature`. Extract:
 - All `@id` tags and their Example titles from `Rule:` blocks
 - The interaction model (if the feature involves user interaction)
-- The Architecture section (module structure, ADRs)
+- The architectural decisions in `docs/architecture.md` relevant to this feature
 - The software-engineer's Self-Declaration from `TODO.md`
 
 ### 2. pyproject.toml Gate
@@ -124,7 +126,7 @@ Read the source files changed in this feature. **Do this before running lint/sta
 | No internal attribute access | Search for `_x` in assertions | None found | `_x`, `isinstance`, `type()` |
 | Every `@id` has a mapped test | Match `@id` to test functions | All mapped | Missing test |
 | No orphaned skipped stubs | Search for `@pytest.mark.skip` in `tests/features/` | None found | Any found — stub was written but never implemented |
-| Function naming | Matches `test_<rule_slug>_<8char_hex>` | All match | Mismatch |
+| Function naming | Matches `test_<feature_slug>_<8char_hex>` | All match | Mismatch |
 | Hypothesis tests have `@slow` | Read every `@given` for `@slow` marker | All present | Any missing |
 
 #### 5g. Code Quality — any FAIL → REJECTED
diff --git a/AGENTS.md b/AGENTS.md
index 0f2acc3..00b20b1 100644
--- a/AGENTS.md
+++ b/AGENTS.md
@@ -11,10 +11,10 @@ Features flow through 5 steps with a WIP limit of 1 feature at a time. The files
 
 ```
 STEP 1: SCOPE          (product-owner)  → discovery + Gherkin stories + criteria
-STEP 2: ARCH           (software-engineer)      → read all features + existing package files, write domain stubs (signatures only, no bodies); ADRs to docs/architecture/
+STEP 2: ARCH           (software-engineer)      → read all features + existing package files, write domain stubs (signatures only, no bodies); decisions appended to docs/architecture.md
 STEP 3: TDD LOOP       (software-engineer)      → RED → GREEN → REFACTOR, one @id at a time
 STEP 4: VERIFY         (reviewer)       → run all commands, review code
-STEP 5: ACCEPT         (product-owner)  → demo, validate, move folder to completed/
+STEP 5: ACCEPT         (product-owner)  → demo, validate, move .feature to completed/ (PO only)
 ```
 
 **PO picks the next feature from backlog. Software-engineer never self-selects.**
@@ -23,14 +23,25 @@ STEP 5: ACCEPT         (product-owner)  → demo, validate, move folder to compl
 
 ## Roles
 
-- **Product Owner (PO)** — AI agent. Interviews the stakeholder, writes discovery docs, Gherkin features, and acceptance criteria. Accepts or rejects deliveries.
+- **Product Owner (PO)** — AI agent. Interviews the stakeholder, writes discovery docs, Gherkin features, and acceptance criteria. Accepts or rejects deliveries. **Sole owner of all `.feature` file moves** (backlog → in-progress before Step 2; in-progress → completed after Step 5 acceptance).
 - **Stakeholder** — Human. Answers PO's questions, provides domain knowledge, approves PO syntheses to confirm discovery is complete.
-- **Software Engineer** — AI agent. Architecture, test bodies, implementation, git. Never edits `.feature` files. Escalates spec gaps to PO.
-- **Reviewer** — AI agent. Adversarial verification. Reports spec gaps to PO.
+- **Software Engineer** — AI agent. Architecture, test bodies, implementation, git. Never edits or moves `.feature` files. Escalates spec gaps to PO. If no `.feature` file is in `in-progress/`, stops and escalates to PO.
+- **Reviewer** — AI agent. Adversarial verification. Reports spec gaps to PO. Never moves `.feature` files. After APPROVED report, stops and escalates to PO for Step 5.
+
+## Feature File Chain of Responsibility
+
+`.feature` files are owned exclusively by the PO. **No other agent ever moves or edits them.**
+
+| Transition | Who | When |
+|---|---|---|
+| `backlog/` → `in-progress/` | PO only | Before Step 2 begins; only if `Status: BASELINED` |
+| `in-progress/` → `completed/` | PO only | After Step 5 acceptance |
+
+**If an agent (SE or reviewer) finds no `.feature` in `in-progress/`**: update TODO.md with the correct `Next:` escalation line and stop. Never self-select a backlog feature.
 
 ## Agents
 
-- **product-owner** — defines scope (4 phases), picks features, accepts deliveries
+- **product-owner** — defines scope (Stage 1 Discovery + Stage 2 Specification), picks features, accepts deliveries
 - **software-engineer** — architecture, tests, code, git, releases (Steps 2-3 + release)
 - **reviewer** — runs commands and reviews code at Step 4, produces APPROVED/REJECTED report
 - **setup-project** — one-time setup to initialize a new project from this template
@@ -54,43 +65,63 @@ STEP 5: ACCEPT         (product-owner)  → demo, validate, move folder to compl
 
 **Session protocol**: Every agent loads `skill session-workflow` at session start. Load additional skills as needed for the current step.
 
-## Step 1 — SCOPE (4 Phases)
+## Step 1 — SCOPE
+
+Step 1 has two stages:
+
+### Stage 1 — Discovery (PO + stakeholder, iterative)
+
+Discovery is a continuous process. Sessions happen whenever scope needs to be established or refined — for a new project, new features, or new information. Every session follows the same structure:
+
+**Session question order:**
+1. **General** (5Ws + Success + Failure + Out-of-scope) — first session only, if the journal doesn't exist yet
+2. **Cross-cutting** — behavior groups, bounded contexts, integration points, lifecycle events
+3. **Per-feature** — one feature at a time; extract entities from `docs/discovery.md` Domain Model; gap-finding with CIT, Laddering, CI Perspective Change
+
+**Real-time split rule**: if the PO detects >2 concerns or >8 candidate Examples for a feature during per-feature questions, split immediately — record the split in the journal, create stub `.feature` files, continue questions for both in the same session.
+
+**After questions (PO alone, in order):**
+1. Append answered Q&A (in groups) to `docs/discovery_journal.md` — only answered questions
+2. Rewrite `.feature` description for each feature touched — others stay unchanged
+3. Append session synthesis block to `docs/discovery.md` — LAST, after all `.feature` updates
+
+**Session status**: the journal session header begins with `Status: IN-PROGRESS` (written before questions). Updated to `Status: COMPLETE` after all writes. If a session is interrupted, the next agent detects `IN-PROGRESS` and resumes the pending writes before starting a new session.
+
+**Baselining**: PO writes `Status: BASELINED (YYYY-MM-DD)` in the `.feature` file when the stakeholder approves that feature's discovery and the decomposition check passes.
 
-### Phase 1 — Project Discovery (once per project)
-PO creates `docs/features/discovery.md` using the 3-session template. **Skip Phase 1 entirely if `discovery.md` Status is BASELINED.** To add features to an existing project: append new questions to Session 1 and re-fill from there.
+Commit per session: `feat(discovery): <session summary>`
 
-- **Session 1** — Individual scope elicitation: 5Ws + Success + Failure + Out-of-scope. Gap-finding per answer using CIT, Laddering, and CI Perspective Change. PO writes synthesis; stakeholder confirms or corrects. PO runs silent pre-mortem on confirmed synthesis. Template §1 must be confirmed before Session 2.
-- **Session 2** — Behavior groups / big picture: questions target behavior groups and cross-cutting concerns. Gap-finding per group. Level 2 synthesis when transitioning between groups. Template §2 must be complete before Session 3.
-- **Session 3** — Synthesis approval + feature derivation: PO produces full synthesis of all behavior groups; stakeholder approves or corrects (PO refines until approved). Domain analysis: nouns/verbs → subject areas → FDD "Action object" feature names. Create `backlog/<name>.feature` stubs. Write `Status: BASELINED` to `discovery.md`.
+### Stage 2 — Specification (PO alone, per feature)
 
-### Phase 2 — Feature Discovery (per feature)
-Each `.feature` file has its own 3-session discovery template in its description. **Sessions are enforced by the template: each section must be filled before proceeding to the next.**
+Only runs on features with `Status: BASELINED`. No stakeholder involvement. If a gap requires stakeholder input, open a new Stage 1 session first.
 
-- **Session 1** — Individual entity elicitation: populate Entities table from project discovery; generate questions from entity gaps using CIT, Laddering, CI Perspective Change. PO writes synthesis; stakeholder confirms. Silent pre-mortem on confirmed synthesis.
-- **Session 2** — Behavior groups / big picture: questions target behavior groups within this feature. Gap-finding per group. Level 2 group transition summaries.
-- **Session 3** — Feature synthesis approval + story derivation: PO produces synthesis of feature scope and behavior groups; stakeholder approves or corrects (PO refines until approved). Story candidates become candidate user stories (Rules). Write `Status: BASELINED` to `.feature` discovery section.
+**Step A — Stories**: derive one `Rule:` block per user story from the baselined feature description. INVEST gate: all 6 letters must pass.
+Commit: `feat(stories): write user stories for <name>`
 
-**Decomposition check**: after Session 3, does this feature span >2 distinct concerns OR have >8 candidate Examples? YES → split into separate `.feature` files, re-run Phase 2. NO → proceed.
+**Step B — Criteria**: PO writes `Example:` blocks with `@id` tags under each `Rule:`. Pre-mortem per Rule before writing any Examples. MoSCoW triage per Example. Examples are frozen after commit.
+Commit: `feat(criteria): write acceptance criteria for <name>`
 
-### Phase 3 — Stories (PO alone)
-Story candidates from Phase 2 Session 2 → one `Rule:` block per user story. Each `Rule:` has the user story header (`As a / I want / So that`) as its description — no `Example:` blocks yet. INVEST gate: all 6 letters must pass. Commit: `feat(stories): write user stories for <name>`
+**Criteria are frozen**: no `Example:` changes after commit. Adding a new Example with a new `@id` replaces old.
 
-### Phase 4 — Criteria (PO alone)
-Pre-mortem per Rule (all Rules must be checked before writing Examples). Write `Example:` blocks — declarative Given/When/Then, MoSCoW triage (Must/Should/Could) per Example. Review checklist (4.3). Commit: `feat(criteria): write acceptance criteria for <name>`
+### Bug Handling
 
-**Criteria are frozen**: no `Example:` changes after commit. Adding new Example with new `@id` replaces old.
+When a defect is reported:
+1. **PO** adds `@bug @id:<hex>` Example to the relevant `Rule:` in the `.feature` file
+2. **SE** implements the specific test in `tests/features/<feature-name>/`
+3. **SE** also writes a `@given` Hypothesis property test in `tests/unit/` for the whole class of inputs
+4. Both tests are required. SE follows the normal TDD loop (Step 3).
 
 ## Filesystem Structure
 
 ```
-docs/features/
-  discovery.md                        ← project-level (Status + Questions only)
-  backlog/<feature-name>.feature      ← one per feature; discovery + Rules + Examples
-  in-progress/<feature-name>.feature  ← file moves here at Step 2
-  completed/<feature-name>.feature    ← file moves here at Step 5
-
-docs/architecture/
-  adr.md                  ← one per significant architectural decision
+docs/
+  discovery_journal.md                ← raw Q&A, PO appends after every session
+  discovery.md                        ← synthesis changelog, PO appends after every session
+  architecture.md                     ← all architectural decisions, SE appends after Step 2
+  features/
+    backlog/<feature-name>.feature    ← narrative + Rules + Examples
+    in-progress/<feature-name>.feature
+    completed/<feature-name>.feature
 
 tests/
   features/<feature-name>/
diff --git a/docs/architecture.md b/docs/architecture.md
new file mode 100644
index 0000000..5ad7cbd
--- /dev/null
+++ b/docs/architecture.md
@@ -0,0 +1,19 @@
+# Architecture: <project-name>
+
+---
+
+## YYYY-MM-DD — <feature-name>: <short title>
+
+Decision: <what was decided — one sentence>
+Reason: <why — one sentence>
+Alternatives considered: <what was rejected and why>
+Feature: <feature-name>
+
+---
+
+## YYYY-MM-DD — Cross-feature: <short title>
+
+Decision: <what was decided>
+Reason: <why>
+Alternatives considered: <what was rejected and why>
+Affected features: <feature-name>, <feature-name>
diff --git a/docs/discovery.md b/docs/discovery.md
new file mode 100644
index 0000000..1f56443
--- /dev/null
+++ b/docs/discovery.md
@@ -0,0 +1,21 @@
+# Discovery: <project-name>
+
+---
+
+## Session: YYYY-MM-DD
+
+### Scope
+<3–5 sentence synthesis: who the users are, what the product does, why it exists,
+success/failure conditions, and explicit out-of-scope boundaries.>
+(First session only. Omit this subsection in subsequent sessions.)
+
+### Feature List
+- `<feature-name>` — <one-sentence description of what changed or was added>
+(Write "No changes" if no features were added or modified this session.)
+
+### Domain Model
+| Type | Name | Description | In Scope |
+|------|------|-------------|----------|
+| Noun | <name> | <description> | Yes |
+| Verb | <name> | <description> | Yes |
+(Write "No changes" if domain model was not updated this session.)
diff --git a/docs/discovery_journal.md b/docs/discovery_journal.md
new file mode 100644
index 0000000..4c4fd41
--- /dev/null
+++ b/docs/discovery_journal.md
@@ -0,0 +1,32 @@
+# Discovery Journal: <project-name>
+
+---
+
+## YYYY-MM-DD — Session 1
+Status: IN-PROGRESS
+
+### General
+
+| ID | Question | Answer |
+|----|----------|--------|
+| Q1 | Who are the users? | ... |
+| Q2 | What does the product do at a high level? | ... |
+| Q3 | Why does it exist — what problem does it solve? | ... |
+| Q4 | When and where is it used? | ... |
+| Q5 | Success — what does "done" look like? | ... |
+| Q6 | Failure — what must never happen? | ... |
+| Q7 | Out-of-scope — what are we explicitly not building? | ... |
+
+### <Group Name>
+
+| ID | Question | Answer |
+|----|----------|--------|
+| Q8 | ... | ... |
+
+### Feature: <feature-name>
+
+| ID | Question | Answer |
+|----|----------|--------|
+| Q9 | ... | ... |
+
+Status: COMPLETE
diff --git a/docs/features/completed/display-version.feature b/docs/features/completed/display-version.feature
index be7059a..0dfc3dd 100644
--- a/docs/features/completed/display-version.feature
+++ b/docs/features/completed/display-version.feature
@@ -1,22 +1,12 @@
 Feature: Display version
 
-  Discovery:
+  Reads the application version from pyproject.toml at runtime and logs it at INFO
+  level. Log output is controlled by a verbosity parameter; the version is visible
+  at DEBUG and INFO but suppressed at WARNING and above. An invalid verbosity value
+  raises a descriptive error.
 
   Status: COMPLETED
 
-  Entities:
-  | Type | Name             | Candidate Class/Method      | In Scope |
-  |------|------------------|-----------------------------|----------|
-  | Noun | Version string   | version()                   | Yes      |
-  | Noun | pyproject.toml   | (source of truth)           | Yes      |
-  | Noun | Log output       | logging                     | Yes      |
-  | Noun | Verbosity level  | ValidVerbosity              | Yes      |
-  | Noun | Entry point      | main()                      | Yes      |
-  | Verb | Retrieve         | version()                   | Yes      |
-  | Verb | Display / Log    | main()                      | Yes      |
-  | Verb | Configure        | ValidVerbosity              | Yes      |
-  | Verb | Validate         | main() raises ValueError    | Yes      |
-
   Rules (Business):
   - Version is read from pyproject.toml at runtime using tomllib
   - Log verbosity is controlled by a ValidVerbosity parameter passed to main()
@@ -29,12 +19,6 @@ Feature: Display version
   - Entry point: app/__main__.py (main(verbosity) function)
   - Version logic: app/version.py (version() function)
 
-  Questions:
-  | ID | Question | Answer | Status |
-  |----|----------|--------|--------|
-
-  All questions answered. Discovery frozen.
-
   Rule: Version retrieval
     As a software-engineer
     I want to retrieve the application version programmatically
diff --git a/docs/features/discovery.md b/docs/features/discovery.md
deleted file mode 100644
index f764e10..0000000
--- a/docs/features/discovery.md
+++ /dev/null
@@ -1,41 +0,0 @@
-# Discovery: <project-name>
-
-## State
-Status: ELICITING
-
----
-
-## Session 1 — Individual Scope Elicitation
-
-| ID | Question | Answer | Status |
-|----|----------|--------|--------|
-| Q1 | Who are the users of this product? | | OPEN |
-| Q2 | What does the product do at a high level? | | OPEN |
-| Q3 | Why does it exist — what problem does it solve? | | OPEN |
-| Q4 | When and where is it used (environment, platform, context)? | | OPEN |
-| Q5 | How do we know it works? What does "done" look like? | | OPEN |
-| Q6 | What does failure look like? What must never happen? | | OPEN |
-| Q7 | What are we explicitly not building? | | OPEN |
-
-Template §1: PENDING
-Synthesis: (fill after stakeholder confirms answers)
-Pre-mortem: (fill after synthesis is confirmed)
-
----
-
-## Session 2 — Behavior Groups / Big Picture
-
-| ID | Question | Answer | Status |
-|----|----------|--------|--------|
-
-Template §2: PENDING
-Behavior Groups:
-- (fill after all group questions are answered)
-
----
-
-## Session 3 — Full Synthesis
-
-(fill after Sessions 1 and 2 are complete)
-
-Template §3: PENDING
diff --git a/docs/scientific-research/domain-modeling.md b/docs/scientific-research/domain-modeling.md
index d49be2e..e80626d 100644
--- a/docs/scientific-research/domain-modeling.md
+++ b/docs/scientific-research/domain-modeling.md
@@ -14,7 +14,7 @@ Foundations for bounded context identification, ubiquitous language, and feature
 | **Status** | Confirmed — foundational DDD literature |
 | **Core finding** | A Bounded Context is a boundary within which a particular ubiquitous language is consistent. Features are identified by grouping related user stories that share the same language. The decomposition criterion is "single responsibility per context" + "consistency of language." |
 | **Mechanism** | In DDD: (1) Extract ubiquitous language from requirements → (2) Group by language consistency → (3) Each group is a candidate bounded context → (4) Each bounded context maps to a feature. Context Mapper automates this: User Stories → Subdomains (via noun/verb extraction) → Bounded Contexts of type FEATURE. |
-| **Where used** | Phase 1: after feature list identification, verify each feature has consistent language. Phase 2: noun/verb extraction from project discovery answers populates the Entities table — domain analysis cannot begin before this. The "Rules (Business)" section captures the ubiquitous language rules that govern each feature. |
+| **Where used** | Stage 1 Discovery: after session synthesis, verify each feature has consistent language. Noun/verb extraction from discovery answers builds the Domain Model in `docs/discovery.md`. The `Rules (Business):` section in `.feature` files captures the ubiquitous language rules that govern each feature. |
 
 ---
 
diff --git a/docs/scientific-research/requirements-elicitation.md b/docs/scientific-research/requirements-elicitation.md
index ec5e68f..b272727 100644
--- a/docs/scientific-research/requirements-elicitation.md
+++ b/docs/scientific-research/requirements-elicitation.md
@@ -27,7 +27,7 @@ Foundations for the PO interview structure, Gherkin criteria, and feature discov
 | **Status** | Confirmed |
 | **Core finding** | Inserting a "rules" layer between stories and examples prevents redundant or contradictory acceptance criteria. A story with many rules needs splitting; a story with many open questions is not ready for development. |
 | **Mechanism** | Four card types: Story (yellow), Rules (blue), Examples (green), Questions (red). The rules layer groups related examples under the business rule they illustrate. Red cards (unanswered questions) are a first-class signal to stop and investigate. |
-| **Where used** | `## Rules` section in per-feature `discovery.md` (Phase 2). PO identifies business rules before writing Examples in Phase 4. |
+| **Where used** | `Rules (Business):` section in each `.feature` file. PO identifies business rules before writing Examples in Stage 2 Step B. |
 
 ---
 
@@ -40,7 +40,7 @@ Foundations for the PO interview structure, Gherkin criteria, and feature discov
 | **Status** | Confirmed |
 | **Core finding** | Declarative Gherkin ("When Bob logs in") produces specifications that survive UI changes. Imperative Gherkin ("When I click the Login button") couples specs to implementation details and breaks on every UI redesign. |
 | **Mechanism** | Declarative steps describe *what happens* at the business level. Imperative steps describe *how the user interacts with a specific UI*. AI agents are especially prone to writing imperative Gherkin because they mirror literal steps. |
-| **Where used** | Declarative vs. imperative table in Phase 4 of `scope/SKILL.md`. |
+| **Where used** | Declarative vs. imperative table in Stage 2 Step B (Criteria) of `scope/SKILL.md`. |
 
 ---
 
@@ -53,7 +53,7 @@ Foundations for the PO interview structure, Gherkin criteria, and feature discov
 | **Status** | Confirmed |
 | **Core finding** | Classifying requirements as Must/Should/Could/Won't forces explicit negotiation about what is essential vs. desired. When applied *within* a single story, it reveals bloated stories that should be split. |
 | **Mechanism** | DSDM mandates that Musts cannot exceed 60% of total effort. At the story level: if a story has 12 Examples and only 3 are Musts, the remaining 9 can be deferred. This prevents gold-plating and keeps stories small. |
-| **Where used** | MoSCoW triage in Phase 4 of `scope/SKILL.md`. |
+| **Where used** | MoSCoW triage in Stage 2 Step B (Criteria) of `scope/SKILL.md`. |
 
 ---
 
@@ -80,7 +80,7 @@ Foundations for the PO interview structure, Gherkin criteria, and feature discov
 | **Status** | Synthesized rule of thumb — each component individually confirmed |
 | **Core finding** | Active listening in requirements interviews operates at three granularities: **Level 1** (per answer) — immediate paraphrase; **Level 2** (per topic cluster) — transition summary; **Level 3** (end of interview) — full synthesis serving four downstream purposes. |
 | **Level 3 — four uses** | 1. Accuracy gate (NN/G). 2. Scope crystallization (Ambler/FDD). 3. Input to domain modeling (Ambler/FDD). 4. Baseline trigger (Wynne/Cucumber). |
-| **Where used** | Phase 1 and Phase 2 of `scope/SKILL.md`. |
+| **Where used** | Stage 1 Discovery sessions in `scope/SKILL.md`. |
 
 ---
 
@@ -93,7 +93,7 @@ Foundations for the PO interview structure, Gherkin criteria, and feature discov
 | **Alternative** | Hermagoras of Temnos (2nd century BCE) — seven circumstances of rhetoric. |
 | **Status** | Practitioner synthesis — journalism, business analysis, investigative methodology |
 | **Core finding** | The six interrogative questions (Who, What, When, Where, Why, How) form a complete framework for gathering all essential facts about any situation. Together they ensure completeness and prevent gaps. |
-| **Where used** | Phase 1 project discovery: the initial seven questions are an adaptation of the 5W1H framework. |
+| **Where used** | Stage 1 Discovery, General questions (first session): the initial seven questions are an adaptation of the 5W1H framework. |
 
 ---
 
@@ -105,7 +105,7 @@ Foundations for the PO interview structure, Gherkin criteria, and feature discov
 | **Date** | 2025 |
 | **Status** | Practitioner synthesis — consolidated BA methodology, not peer-reviewed |
 | **Core finding** | Ten questions consistently make the most difference in requirements elicitation: (1) What problem are we solving? (2) What happens if we do nothing? (3) Who uses this? (4) What does success look like? (5) Walk me through how this works today. (6) Where does this usually break? (7) What decisions will this help? (8) What should definitely not happen? (9) What happens if input is wrong? (10) What assumptions are we making? |
-| **Where used** | Phase 1 project discovery: the "Success", "Failure", and "Out-of-scope" questions map to this framework. |
+| **Where used** | Stage 1 Discovery, General questions: the "Success", "Failure", and "Out-of-scope" questions map to this framework. |
 
 ---
 
@@ -119,7 +119,7 @@ Foundations for the PO interview structure, Gherkin criteria, and feature discov
 | **Status** | Confirmed |
 | **Core finding** | FDD requires domain modeling *before* feature naming. Features are expressed as "Action result object" triples. Features group into Feature Sets (shared domain object), which group into Subject Areas. |
 | **Mechanism** | Domain modeling extracts the vocabulary (nouns = candidate classes, verbs = candidate methods). Feature identification then asks: "what verbs act on each noun?" |
-| **Where used** | Phase 1 of `scope/SKILL.md`: after interview summary is confirmed, PO performs domain analysis (nouns/verbs → subject areas → FDD "Action object" feature names). |
+| **Where used** | Stage 1 Discovery in `scope/SKILL.md`: after session synthesis, PO performs domain analysis (nouns/verbs → subject areas → FDD "Action object" feature names) for first session. |
 
 ---
 
@@ -132,7 +132,7 @@ Foundations for the PO interview structure, Gherkin criteria, and feature discov
 | **Alternative** | Kawakita, J. (1967). *Abduction*. Chuokoronsha. |
 | **Status** | Confirmed |
 | **Core finding** | Affinity diagramming groups raw observations/requirements into clusters by bottom-up similarity — no categories are named until grouping is complete. This prevents confirmation bias from top-down pre-labelling. |
-| **Where used** | Phase 1 of `scope/SKILL.md` (alternative to FDD domain modeling): PO uses affinity mapping on interview answers to derive feature clusters. Best suited when working from interview transcripts solo. |
+| **Where used** | Stage 1 Discovery in `scope/SKILL.md` (alternative to FDD domain modeling): PO uses affinity mapping on interview answers to derive feature clusters. Best suited when working from interview transcripts solo. |
 
 ---
 
@@ -145,7 +145,7 @@ Foundations for the PO interview structure, Gherkin criteria, and feature discov
 | **Status** | Confirmed |
 | **Core finding** | Event Storming is a collaborative workshop where domain experts place past-tense domain events on a timeline. Sorting the events creates natural Functional Area clusters — these are candidate feature groups. The workshop also produces Ubiquitous Language, a Problem Inventory, and Actor roles. |
 | **Mechanism** | Temporal sequencing of domain events forces resolution of conflicting mental models across organisational silos. Clusters emerge from shared vocabulary and causal proximity. |
-| **Where used** | Optional alternative in Phase 1 of `scope/SKILL.md` for cross-silo discovery. |
+| **Where used** | Optional alternative in Stage 1 Discovery in `scope/SKILL.md` for cross-silo discovery. |
 
 ---
 
@@ -159,7 +159,7 @@ Foundations for the PO interview structure, Gherkin criteria, and feature discov
 | **Status** | Confirmed — foundational; ~200 follow-on empirical studies |
 | **Core finding** | Anchoring an interview on a specific past incident ("Tell me about a time when X broke down") breaks schema-based recall. Stakeholders describing actual past events report real workarounds, edge cases, and failure modes that never surface when asked "how does this usually work?" |
 | **Mechanism** | Direct questions elicit the stakeholder's mental schema — a sanitized, gap-free description of how things *should* work. Incidents bypass the schema because episodic memory is anchored to specific sensory and emotional detail. |
-| **Where used** | Session 2 (gap-finding) of Phase 1 and Phase 2 in `scope/SKILL.md`. |
+| **Where used** | Cross-cutting and per-feature questions (gap-finding) in Stage 1 Discovery in `scope/SKILL.md`. |
 
 ---
 
@@ -173,7 +173,7 @@ Foundations for the PO interview structure, Gherkin criteria, and feature discov
 | **Status** | Confirmed — meta-analysis: Köhnken et al. (1999), *Psychology, Crime & Law*, 5(1-2), 3–27. |
 | **Core finding** | The enhanced CI elicits ~35% more correct information than standard interviews with equal accuracy rates. |
 | **Mechanism** | Four retrieval mnemonics: (1) mental reinstatement of context; (2) report everything; (3) temporal reversal; (4) perspective change. Each mnemonic opens a different memory access route, collectively surfacing what direct questions cannot. |
-| **Where used** | Session 2 (gap-finding) of Phase 1 and Phase 2 in `scope/SKILL.md`. |
+| **Where used** | Cross-cutting and per-feature questions (gap-finding) in Stage 1 Discovery in `scope/SKILL.md`. |
 
 ---
 
@@ -186,7 +186,7 @@ Foundations for the PO interview structure, Gherkin criteria, and feature discov
 | **Status** | Confirmed — operationalised in IS research (Hunter & Beck 2000) |
 | **Core finding** | Repeatedly asking "Why is that important to you?" climbs a means-end chain from concrete attribute → functional consequence → psychosocial consequence → terminal value. The stakeholder's first answer is rarely the real constraint. |
 | **Mechanism** | The Gherkin "So that [benefit]" clause is structurally a single-rung means-end ladder. Full laddering reveals value conflicts between stakeholders whose surface requirements look identical but whose ladders diverge at the consequence level. |
-| **Where used** | Session 2 (gap-finding) of Phase 1 and Phase 2 in `scope/SKILL.md`. |
+| **Where used** | Cross-cutting and per-feature questions (gap-finding) in Stage 1 Discovery in `scope/SKILL.md`. |
 
 ---
 
@@ -200,7 +200,7 @@ Foundations for the PO interview structure, Gherkin criteria, and feature discov
 | **Status** | Confirmed — standard NNG qualitative research protocol |
 | **Core finding** | Starting with broad open-ended questions before narrowing to specifics prevents the interviewer from priming the interviewee's responses. |
 | **Mechanism** | Priming bias is structural: any category name the interviewer introduces activates a schema that filters what the interviewee considers worth reporting. The funnel sequences questions so the interviewee's own categories emerge first. |
-| **Where used** | Within each session of Phase 1 and Phase 2 in `scope/SKILL.md`. |
+| **Where used** | Within each Stage 1 Discovery session in `scope/SKILL.md`. |
 
 ---
 
diff --git a/docs/workflow.md b/docs/workflow.md
index 8bea2cf..a09c36b 100644
--- a/docs/workflow.md
+++ b/docs/workflow.md
@@ -33,79 +33,70 @@ Each step has a designated agent and a specific deliverable. No step is skipped.
 │  STEP 1 — SCOPE                              agent: product-owner   │
 ├─────────────────────────────────────────────────────────────────────┤
 │                                                                     │
-│  Phase 1 — Project Discovery                                        │
-│  [runs ONCE; skip if discovery.md BASELINED]                        │
-│  [adding features later: append new Qs to Session 1, re-fill]      │
-│                                                                     │
-│    Session 1 — Individual Scope Elicitation                         │
-│      5Ws + Success + Failure + Out-of-scope                         │
-│      Gap-finding per answer: CIT · Laddering · CI Perspective       │
-│      [new questions from elucidation added in the moment]           │
-│      Level 1: paraphrase each answer on the spot                    │
-│      → PO writes synthesis → stakeholder confirms or corrects       │
-│      → PO runs silent pre-mortem on confirmed synthesis             │
-│      [template §1: synthesis confirmed → unlocks Session 2]         │
-│                                                                     │
-│    Session 2 — Behavior Groups / Big Picture                       │
-│      Questions target behavior groups and cross-cutting concerns   │
-│      Gap-finding per group: CIT · Laddering · CI Perspective         │
-│      [new questions from elucidation added in the moment]           │
-│      Level 1: paraphrase each answer                                │
-│      Level 2: synthesis when transitioning between groups        │
-│      [template §2: all groups answered → unlocks Session 3]          │
-│                                                                     │
-│    Session 3 — Synthesis Approval + Feature Derivation              │
-│      PO produces full synthesis across all behavior groups         │
-│      → stakeholder approves or corrects; PO refines until approved  │
-│      [template §3: approval → unlocks domain analysis]              │
-│      Domain analysis: nouns/verbs → subject areas                   │
-│      Name features (FDD "Action object" / Affinity groups)           │
-│      Create backlog/<name>.feature stubs                            │
-│      Status: BASELINED written to discovery.md                      │
-│                                                                     │
-│  Phase 2 — Feature Discovery (repeats per feature)                  │
-│  [each .feature has its own 3-session discovery template]           │
-│                                                                     │
-│    Session 1 — Individual Entity Elicitation                        │
-│      Populate Entities table from project discovery                 │
-│      Gap-finding per answer: CIT · Laddering · CI Perspective       │
-│      [new questions from elucidation added in the moment]           │
-│      Level 1: paraphrase each answer                                │
-│      → PO writes synthesis → stakeholder confirms or corrects       │
-│      → PO runs silent pre-mortem on confirmed synthesis             │
-│      [template §1: synthesis confirmed → unlocks Session 2]         │
-│                                                                     │
-│    Session 2 — Behavior Groups / Big Picture for this Feature        │
-│      Questions target behavior groups within this feature            │
-│      Gap-finding per group: CIT · Laddering · CI Perspective         │
-│      [new questions from elucidation added in the moment]           │
-│      Level 1: paraphrase · Level 2: group transition summaries      │
-│      [template §2: all groups answered → unlocks Session 3]          │
-│                                                                     │
-│    Session 3 — Feature Synthesis Approval + Story Derivation       │
-│      PO produces synthesis of feature scope and behavior groups     │
-│      → stakeholder approves or corrects; PO refines until approved │
-│      Story candidates → candidate user stories (Rules)               │
-│      Status: BASELINED written to .feature discovery section         │
-│      [template §3: approval + stories → unlocks decomp check]       │
-│                                                                     │
-│    DECOMPOSITION CHECK                                              │
-│      >2 distinct concerns OR >8 candidate Examples?                 │
-│      YES → split into separate .feature files, re-run Phase 2       │
-│      NO  → proceed                                                  │
-│                                                                     │
-│  Phase 3 — Stories (PO alone)                                       │
-│    Story candidates from Phase 2 Session 2 → one Rule: block per story │
-│    INVEST gate: all 6 letters must pass before committing           │
-│    commit: feat(stories): write user stories for <name>             │
-│                                                                     │
-│  Phase 4 — Criteria (PO alone)                                      │
-│    4.1 Pre-mortem per Rule (all Rules checked before Examples)      │
-│    4.2 Write @id-tagged Examples (Given/When/Then, declarative)     │
-│        MoSCoW triage: Must / Should / Could per Example             │
-│    4.3 Review checklist                                             │
-│    commit: feat(criteria): write acceptance criteria for <name>     │
-│    ★ FROZEN — changes require @deprecated + new Example             │
+│  Stage 1 — Discovery (PO + stakeholder, iterative)                  │
+│  Sessions happen any time scope needs establishing or refining.     │
+│  Every session follows the same structure.                          │
+│                                                                     │
+│    SESSION START                                                     │
+│      Check docs/discovery_journal.md last block for Status:        │
+│        IN-PROGRESS → resume interrupted session first               │
+│        (missing file) → create journal + discovery.md from templates│
+│      Append new session header to journal:                          │
+│        ## YYYY-MM-DD — Session N                                    │
+│        Status: IN-PROGRESS          ← written BEFORE any questions  │
+│                                                                     │
+│    QUESTION ORDER                                                    │
+│      1. General (5Ws + Success + Failure + Out-of-scope)            │
+│         [first session only, if journal did not exist yet]          │
+│         Gap-finding per answer: CIT · Laddering · CI Perspective    │
+│         Level 1: paraphrase each answer on the spot                 │
+│      2. Cross-cutting (behavior groups, bounded contexts,           │
+│         integration points, lifecycle events)                       │
+│         Level 2: synthesis when transitioning between groups        │
+│      3. Per-feature (one feature at a time)                         │
+│         Extract entities from docs/discovery.md Domain Model        │
+│         Gap-finding: CIT · Laddering · CI Perspective               │
+│         Silent pre-mortem per feature                               │
+│         REAL-TIME SPLIT: if >2 concerns OR >8 candidate Examples    │
+│           → split immediately, record in journal, create stubs,     │
+│             continue questions for both features in this session    │
+│                                                                     │
+│    AFTER QUESTIONS (PO alone, in this order)                        │
+│      1. Append answered Q&A to journal (in groups; answered only)   │
+│      2. Rewrite .feature description for each touched feature       │
+│         [untouched features stay exactly as-is]                     │
+│      3. Append session synthesis block to discovery.md (LAST)       │
+│         [only after all .feature files are updated]                 │
+│      4. Mark journal session: Status: COMPLETE                      │
+│      commit: feat(discovery): <session summary>                     │
+│                                                                     │
+│    BASELINING A FEATURE                                             │
+│      When stakeholder approves feature discovery + decomp passes:   │
+│      PO writes Status: BASELINED (YYYY-MM-DD) in the .feature file  │
+│      Gate: feature may only enter Stage 2 when BASELINED            │
+│                                                                     │
+│  Stage 2 — Specification (PO alone, per feature)                    │
+│  Only runs on features with Status: BASELINED.                      │
+│  If a gap needs stakeholder input → open a new Stage 1 session.    │
+│                                                                     │
+│    Step A — Stories                                                  │
+│      Derive one Rule: block per user story from feature description │
+│      INVEST gate: all 6 letters must pass before committing         │
+│      commit: feat(stories): write user stories for <name>           │
+│                                                                     │
+│    Step B — Criteria                                                 │
+│      Pre-mortem per Rule (all Rules checked before Examples)        │
+│      Write @id-tagged Examples (Given/When/Then, declarative)       │
+│      MoSCoW triage: Must / Should / Could per Example               │
+│      Review checklist                                               │
+│      commit: feat(criteria): write acceptance criteria for <name>   │
+│      ★ FROZEN — changes require @deprecated + new Example           │
+│                                                                     │
+│  Bug Handling                                                        │
+│    PO adds @bug @id:<hex> Example to relevant Rule: in .feature     │
+│    SE implements @id test in tests/features/<name>/                 │
+│    SE also writes @given Hypothesis test in tests/unit/ (whole class)│
+│    Both tests required · SE follows normal TDD loop (Step 3)        │
 │                                                                     │
 └─────────────────────────────────────────────────────────────────────┘
                               ↓  PO picks feature from backlog — only if Status: BASELINED
@@ -119,16 +110,14 @@ Each step has a designated agent and a specific deliverable. No step is skipped.
 │    [ ] feature has Rule: + Example: + @id tags                      │
 │    [ ] package name confirmed (pyproject.toml → directory exists)   │
 │                                                                     │
-│  mv backlog/<name>.feature → in-progress/<name>.feature             │
-│                                                                     │
 │  READ (all before writing anything)                                 │
-│    docs/features/discovery.md (project-level)                      │
-│    ALL backlog .feature files (discovery + entities sections)       │
+│    docs/discovery.md (project-level synthesis changelog)           │
+│    ALL backlog .feature files (narrative + Rules + Examples)        │
 │    in-progress .feature file (full: Rules + Examples + @id)        │
 │    ALL existing .py files in <package>/  ← know what exists first  │
 │                                                                     │
 │  DOMAIN ANALYSIS                                                    │
-│    From Entities table + Rules (Business) in .feature file:        │
+│    From Domain Model in docs/discovery.md + Rules (Business):      │
 │    Nouns → named classes, value objects, aggregates                 │
 │    Verbs → method names with typed signatures                       │
 │    Datasets → named types (not bare dict/list)                      │
@@ -156,8 +145,9 @@ Each step has a designated agent and a specific deliverable. No step is skipped.
 │      No docstrings — add after GREEN when signatures are stable     │
 │      No inline comments, no TODO, no speculative code              │
 │                                                                     │
-│  WRITE ADR FILES (significant decisions only)                       │
-│    docs/architecture/adr.md                                        │
+│  RECORD ARCHITECTURAL DECISIONS (significant only)                  │
+│    Append to docs/architecture.md                                  │
+│      ## YYYY-MM-DD — <feature>: <title>                            │
 │      Decision: <what>  Reason: <why>                               │
 │      Alternatives considered: <what was rejected and why>           │
 │                                                                     │
@@ -189,7 +179,7 @@ Each step has a designated agent and a specific deliverable. No step is skipped.
 │                                                                     │
 │  PREREQUISITES (stop if any fail — escalate to PO)                 │
 │    [ ] Architecture stubs present in <package>/ (Step 2 committed) │
-│    [ ] Read all docs/architecture/adr-NNN-*.md files               │
+│    [ ] Read docs/architecture.md — all architectural decisions      │
 │    [ ] All tests written in tests/features/<feature>/              │
 │                                                                     │
 │  Build TODO.md test list                                            │
@@ -355,40 +345,21 @@ Each step has a designated agent and a specific deliverable. No step is skipped.
 
 ## Feature File Structure
 
-Each feature is a single `.feature` file. The free-form description before the first `Rule:` contains all discovery content added progressively through the workflow:
+Each feature is a single `.feature` file. The description contains the feature description and Status line. All Q&A lives in `docs/discovery_journal.md`; architectural decisions live in `docs/architecture.md`.
 
 ```
 Feature: <title>
 
-  Discovery:
+  <2–4 sentence description of what this feature does and why it exists.>
 
   Status: ELICITING | BASELINED (YYYY-MM-DD)
 
-  Entities:
-  | Type | Name | Candidate Class/Method | In Scope |
-
   Rules (Business):
   - <business rule>
 
   Constraints:
   - <non-functional requirement>
 
-  Session 1 — Individual Entity Elicitation:
-  | ID | Question | Answer | Status |     ← OPEN / ANSWERED
-  Template §1: PENDING | CONFIRMED
-  Synthesis: <PO synthesis — confirmed by stakeholder>
-  Pre-mortem: <gaps identified; new questions added above>
-
-  Session 2 — Behavior Groups / Big Picture:
-  | ID | Question | Answer | Status |
-  Template §2: PENDING | CONFIRMED
-  Behavior Groups:
-  - <behavior group name>: <one-sentence summary>
-
-  Session 3 — Feature Synthesis:
-  Template §3: PENDING | CONFIRMED — stakeholder approved YYYY-MM-DD
-  Synthesis: <full synthesis across all behavior groups>
-
   Rule: <story title>
     As a <role>
     I want <goal>
@@ -401,40 +372,27 @@ Feature: <title>
       Then <observable outcome>
 ```
 
-Two discovery sources:
-- `docs/features/discovery.md` — project-level 3-session discovery (once per project; additive for new features)
-- Feature file description — per-feature 3-session discovery, entities, business rules, and acceptance criteria
+Three discovery sources:
+- `docs/discovery_journal.md` — raw Q&A from all scope sessions (PO, append-only)
+- `docs/discovery.md` — synthesis changelog, domain model, feature list (PO, append-only)
+- `docs/architecture.md` — all architectural decisions (software-engineer, append-only)
 
 ---
 
 ## Architecture Artifacts
 
-Architectural decisions made during Step 2 are recorded as ADR files:
-
-```
-docs/architecture/
-  adr-template.md          ← blank template — copy to create a new ADR
-  adr-001-<title>.md       ← one file per significant decision
-  adr-002-<title>.md
-  ...
-```
-
-**ADR format** (copy `adr-template.md` and fill in):
+Architectural decisions made during Step 2 are appended to `docs/architecture.md`:
 
 ```markdown
-# ADR-NNN: <title>
-
-**Status:** PROPOSED | ACCEPTED | SUPERSEDED by ADR-NNN
-
-**Decision:** <what was decided — one sentence>
-
-**Reason:** <why — one sentence>
+## YYYY-MM-DD — <feature-name>: <short title>
 
-**Alternatives considered:**
-- <option>: <why rejected>
+Decision: <what was decided — one sentence>
+Reason: <why — one sentence>
+Alternatives considered: <what was rejected and why>
+Feature: <feature-name>
 ```
 
-Write an ADR only for non-obvious decisions with real trade-offs — module boundaries, external dependency strategy, Protocol vs. concrete class, data model choices. Routine YAGNI choices do not need an ADR.
+Write a block only for non-obvious decisions with real trade-offs — module boundaries, external dependency strategy, Protocol vs. concrete class, data model choices. Routine YAGNI choices do not need a record. The file is append-only; when a decision changes, append a new block that supersedes the old one.
 
 Domain entity and service stubs (signatures, no bodies) live directly in the package under `<package>/domain/`, `<package>/ports/`, and `<package>/adapters/` — written at Step 2, filled in during Step 3.
 
@@ -458,8 +416,8 @@ Domain entity and service stubs (signatures, no bodies) live directly in the pac
 ```
 tests/
   features/<feature-name>/
-    <rule-slug>_test.py     ← software-engineer-written, one per Rule: block
-                              function: test_<rule_slug>_<8char_hex>()
+    <rule_slug>_test.py     ← software-engineer-written, one per Rule: block
+                              function: test_<feature_slug>_<8char_hex>()
   unit/
     <anything>_test.py      ← software-engineer-authored extras, no @id traceability
                               plain pytest or Hypothesis @given (software-engineer choice)