A Python template to quickstart any project with a production-ready workflow, quality tooling, and AI-assisted development.
Features flow through 5 steps with a WIP limit of 1 feature at a time. The filesystem enforces WIP:
docs/features/backlog/<feature-stem>.feature— features waiting to be worked ondocs/features/in-progress/<feature-stem>.feature— exactly one feature being built right nowdocs/features/completed/<feature-stem>.feature— accepted and shipped features
STEP 1: SCOPE (product-owner) → discovery + Gherkin stories + criteria
STEP 2: ARCH (software-engineer) → read all features + existing package files, write domain stubs (signatures only, no bodies); decisions appended to docs/architecture.md
STEP 3: TDD LOOP (software-engineer) → RED → GREEN → REFACTOR, one @id at a time
STEP 4: VERIFY (reviewer) → run all commands, review code
STEP 5: ACCEPT (product-owner) → demo, validate, move .feature to completed/ (PO only)
PO picks the next feature from backlog. Software-engineer never self-selects.
Verification is adversarial. The reviewer's job is to try to break the feature, not to confirm it works. The default hypothesis is "it might be broken despite green checks; prove otherwise."
- Product Owner (PO) — AI agent. Interviews the stakeholder, writes discovery docs, Gherkin features, and acceptance criteria. Accepts or rejects deliveries. Sole owner of all
.featurefile moves (backlog → in-progress before Step 2; in-progress → completed after Step 5 acceptance). - Stakeholder — Human. Answers PO's questions, provides domain knowledge, approves PO syntheses to confirm discovery is complete.
- Software Engineer — AI agent. Architecture, test bodies, implementation, git. Never edits or moves
.featurefiles. Escalates spec gaps to PO. If no.featurefile is inin-progress/, stops and escalates to PO. - Reviewer — AI agent. Adversarial verification. Reports spec gaps to PO. Never moves
.featurefiles. After APPROVED report, stops and escalates to PO for Step 5.
.feature files are owned exclusively by the PO. No other agent ever moves or edits them.
| Transition | Who | When |
|---|---|---|
backlog/ → in-progress/ |
PO only | Before Step 2 begins; only if Status: BASELINED |
in-progress/ → completed/ |
PO only | After Step 5 acceptance |
If an agent (SE or reviewer) finds no .feature in in-progress/: update TODO.md with the correct Next: escalation line and stop. Never self-select a backlog feature.
- product-owner — defines scope (Stage 1 Discovery + Stage 2 Specification), picks features, accepts deliveries
- software-engineer — architecture, tests, code, git, releases (Steps 2-3 + release)
- reviewer — runs commands and reviews code at Step 4, produces APPROVED/REJECTED report
- designer — creates and updates visual assets (SVG banners, logos) and maintains
docs/branding.md - setup-project — one-time setup to initialize a new project from this template
| Skill | Used By | Step |
|---|---|---|
run-session |
all agents | every session |
select-feature |
product-owner | between features (idle state) |
define-scope |
product-owner | 1 |
implement |
software-engineer | 2, 3 |
apply-patterns |
software-engineer | 2, 3 (on-demand, when GoF pattern needed) |
refactor |
software-engineer | 3 (REFACTOR phase + preparatory refactoring) |
verify |
reviewer | 4 |
check-quality |
software-engineer | pre-handoff (redirects to verify) |
create-pr |
software-engineer | 5 |
git-release |
software-engineer | 5 (after acceptance) |
update-docs |
product-owner | 5 (after acceptance) + on stakeholder demand |
design-colors |
designer | branding, color, WCAG compliance |
design-assets |
designer | SVG asset creation and updates |
create-skill |
software-engineer | meta |
create-agent |
human-user | meta |
Branding: Agents that generate docs, diagrams, release names, or visual assets read docs/branding.md if present. Absent or blank fields fall back to defaults (adjective-animal release names, Mermaid default colors, no wording constraints). docs/branding.md and docs/assets/ are owned by the designer agent.
Session protocol: Every agent loads skill run-session at session start. Load additional skills as needed for the current step.
Step 1 has two stages:
Discovery is a continuous process. Sessions happen whenever scope needs to be established or refined — for a new project, new features, or new information. Every session follows the same structure:
Session question order:
- General (5Ws + Success + Failure + Out-of-scope) — first session only, if the journal doesn't exist yet
- Cross-cutting — behavior groups, bounded contexts, integration points, lifecycle events
- Per-feature — one feature at a time; extract entities from
docs/discovery.mdDomain Model; gap-finding with CIT, Laddering, CI Perspective Change
Real-time split rule: if the PO detects >2 concerns or >8 candidate Examples for a feature during per-feature questions, split immediately — record the split in the journal, create stub .feature files, continue questions for both in the same session.
After questions (PO alone, in order):
- Append answered Q&A (in groups) to
docs/discovery_journal.md— only answered questions - Rewrite
.featuredescription for each feature touched — others stay unchanged - Append session synthesis block to
docs/discovery.md— LAST, after all.featureupdates
Session status: the journal session header begins with Status: IN-PROGRESS (written before questions). Updated to Status: COMPLETE after all writes. If a session is interrupted, the next agent detects IN-PROGRESS and resumes the pending writes before starting a new session.
Baselining: PO writes Status: BASELINED (YYYY-MM-DD) in the .feature file when the stakeholder approves that feature's discovery and the decomposition check passes.
Commit per session: feat(discovery): <session summary>
Only runs on features with Status: BASELINED. No stakeholder involvement. If a gap requires stakeholder input, open a new Stage 1 session first.
Step A — Stories: derive one Rule: block per user story from the baselined feature description. INVEST gate: all 6 letters must pass.
Commit: feat(stories): write user stories for <name>
Step B — Criteria: PO writes Example: blocks with @id tags under each Rule:. Pre-mortem per Rule before writing any Examples. MoSCoW triage per Example. Examples are frozen after commit.
Commit: feat(criteria): write acceptance criteria for <name>
Criteria are frozen: no Example: changes after commit. Adding a new Example with a new @id replaces old.
When a defect is reported:
- PO adds a
@bugExample to the relevantRule:in the.featurefile and moves (or keeps) the feature inbacklog/for normal scheduling. - SE handles the bug when the feature is selected for development (standard Step 2–3 flow): implements the specific
@bug-tagged test intests/features/<feature_slug>/and also writes a@givenHypothesis property test intests/unit/covering the whole class of inputs. - Both tests are required. SE follows the normal TDD loop (Step 3).
docs/
discovery_journal.md ← raw Q&A, PO appends after every session
discovery.md ← synthesis changelog, PO appends after every session
architecture.md ← all architectural decisions, SE appends after Step 2
glossary.md ← living glossary, PO updates via update-docs skill
branding.md ← project identity, colors, release naming, wording (designer owns)
assets/ ← logo.svg, banner.svg, and other visual assets (designer owns)
c4/
context.md ← C4 Level 1 diagram, PO updates via update-docs skill
container.md ← C4 Level 2 diagram, PO updates via update-docs skill
features/
backlog/<feature-stem>.feature ← narrative + Rules + Examples
in-progress/<feature-stem>.feature
completed/<feature-stem>.feature
tests/
features/<feature_slug>/
<rule_slug>_test.py ← one per Rule: block, software-engineer-written
unit/
<anything>_test.py ← software-engineer-authored extras (no @id traceability)
Tests in tests/unit/ are software-engineer-authored extras not covered by any @id criterion. Any test style is valid — plain assert or Hypothesis @given. Use Hypothesis when the test covers a property that holds across many inputs (mathematical invariants, parsing contracts, value object constraints). Use plain pytest for specific behaviors or single edge cases discovered during refactoring.
@pytest.mark.slowis mandatory on every@given-decorated test (Hypothesis is genuinely slow)@example(...)is optional but encouraged when using@givento document known corner cases- No
@idtags — tests with@idbelong intests/features/, written by software-engineer
tests/features/<feature_slug>/<rule_slug>_test.py
Stubs are auto-generated by pytest-beehave. The SE triggers generation at Step 2 end by running uv run task test-fast. pytest-beehave reads the in-progress .feature file and creates one skipped function per @id:
@pytest.mark.skip(reason="not yet implemented")
def test_<feature_slug>_<@id>() -> None:
"""
<@id steps raw text including new lines>
"""@pytest.mark.slow— takes > 50ms; applied to Hypothesis tests and any test with I/O, network, or DB@pytest.mark.deprecated— auto-skipped by pytest-beehave; used for superseded Examples
# Install dependencies
uv sync --all-extras
# Run the application (for humans)
uv run task run
# Run the application with timeout (for agents — prevents hanging)
timeout 10s uv run task run
# Run tests (fast, no coverage)
uv run task test-fast
# Run full test suite with coverage
uv run task test
# Run tests with coverage report generation
uv run task test-build
# Lint and format
uv run task lint
# Type checking
uv run task static-check
# Build documentation
uv run task doc-build- Principles (in priority order): YAGNI > KISS > DRY > SOLID > Object Calisthenics > appropriate design patterns > complex code > complicate code > failing code > no code
- Linting: ruff format, ruff check, Google docstring convention,
noqaforbidden - Type checking: pyright, 0 errors required
- Coverage: 100% (measured against your actual package)
- Function length: ≤ 20 lines (code lines only, excluding docstrings)
- Class length: ≤ 50 lines (code lines only, excluding docstrings)
- Max nesting: 2 levels
- Instance variables: ≤ 2 per class (exception: dataclasses, Pydantic models, value objects, and TypedDicts are exempt — they may carry as many fields as the domain requires)
- Semantic alignment: tests must operate at the same abstraction level as the acceptance criteria they cover
During Step 3 (TDD Loop), correctness priorities are:
- Design correctness — YAGNI > KISS > DRY > SOLID > Object Calisthenics > appropriated design patterns > complex code > complicated code > failing code > no code
- One test green — the specific test under work passes, plus
test-faststill passes - Reviewer code-design check — reviewer verifies design + semantic alignment (no lint/pyright/coverage yet)
- Quality tooling —
lint,static-check, fulltestwith coverage run only at software-engineer handoff (before Step 4)
Design correctness is far more important than lint/pyright/coverage compliance. A well-designed codebase with minor lint issues is better than a lint-clean codebase with poor design.
- Automated checks (lint, typecheck, coverage) verify syntax-level correctness — the code is well-formed.
- Human review (semantic alignment, code review, manual testing) verifies semantic-level correctness — the code does what the user needs.
- Both are required. All-green automated checks are necessary but not sufficient for APPROVED.
- Reviewer defaults to REJECTED unless correctness is proven.
Version format: v{major}.{minor}.{YYYYMMDD}
- Minor bump for new features; major bump for breaking changes
- Same-day second release: increment minor, keep same date
- Release name: defined by
docs/branding.md > Release Naming > Convention; absent or blank defaults to version string only (no name)
Use @software-engineer /skill git-release for the full release process. When requested by the stakeholder
Every session: load skill run-session. Read TODO.md first, update it at the end.
TODO.md is a session bookmark — not a project journal. See .opencode/skills/run-session/SKILL.md for the full structure including the Cycle State block used during Step 3.
To initialize a new project from this template:
@setup-projectThe setup agent will ask for your project name, GitHub username, author info, and configure all template placeholders.