You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
- Add adversarial mandate to reviewer and verify skill: default hypothesis
is the code is broken despite green checks
- Reorder verification: code review before lint/test, run app first as
production-grade gate (output must change when input changes)
- Convert all review sections to tables with PASS/FAIL/Fix columns:
Correctness, KISS, SOLID, ObjCal, Design Patterns, Tests, Versions/Build
- Add UUID Drift bash check: duplicate UUIDs across test functions = REJECTED
- Update UUID Uniqueness rule: one function per UUID; if only Given varies
it is a property — use Hypothesis @given + @example, not multiple functions
- Add production-grade self-check to implementation/SKILL.md: developer must
verify output changes with input before handoff
- Add design pattern decision table to developer.md (principle #6)
- Add PO pre-mortem at scope, ADR review gate, live verification at Step 6
- Add semantic alignment rule and integration test requirement to tdd/SKILL.md
- Add architecture contradiction check to implementation/SKILL.md
- Add Verification Philosophy section to AGENTS.md
- Add docs/academic_research.md: 15 cognitive/social science mechanisms with
full citations grounding each workflow design decision
- Delete template-report.md (pre-implementation planning doc, now superseded)
Copy file name to clipboardExpand all lines: .opencode/agents/developer.md
+14-1Lines changed: 14 additions & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -53,7 +53,8 @@ When a new feature is ready in `docs/features/backlog/`:
53
53
Alternatives considered: <what you rejected and why>
54
54
```
55
55
- Build changes that need PO approval: new runtime deps, new packages, changed entry points
56
-
4. If build changes need PO approval, ask before proceeding. Tooling changes (coverage, lint rules, test config) are your autonomy.
56
+
4. **Architecture contradiction check**: After writing the Architecture section, compare each ADR against each AC. If any architectural decision contradicts or circumvents an acceptance criterion, flag it and resolve with the PO before writing any production code.
57
+
5. If build changes need PO approval, ask before proceeding. Tooling changes (coverage, lint rules, test config) are your autonomy.
57
58
5. Update `pyproject.toml` and project structure as needed.
58
59
6. Run `uv run task test` — must still pass.
59
60
7. Commit: `feat(bootstrap): configure build for <feature-name>`
@@ -67,6 +68,7 @@ Load `skill implementation`. Make tests green one at a time.
67
68
Commit after each test goes green: `feat(<feature-name>): implement <component>`
68
69
Self-verify after each commit: run all four commands in the Self-Verification block below.
69
70
If you discover a missing behavior during implementation, load `skill extend-criteria`.
71
+
Before handoff, write a **pre-mortem**: 2–3 sentences answering "If this feature shipped but was broken for the user, what would be the most likely reason?" Include it in the handoff message or as a `## Pre-mortem` subsection in the feature doc's Architecture section.
70
72
71
73
### After reviewer approves (Step 5)
72
74
Load `skill pr-management` and `skill git-release` as needed.
@@ -87,6 +89,15 @@ Load `skill pr-management` and `skill git-release` as needed.
87
89
7. Keep all entities small (functions ≤20 lines, classes ≤50 lines)
88
90
8. No more than 2 instance variables per class
89
91
9. No getters/setters (tell, don't ask)
92
+
6. **Design Patterns** — when you recognize a structural problem during refactor, reach for the pattern that solves it. Not preemptively (YAGNI applies). The trigger is the structural problem, not the pattern.
93
+
94
+
| Structural problem | Pattern to consider |
95
+
|---|---|
96
+
| Multiple if/elif on type or state | State or Strategy |
97
+
| Complex construction logic in `__init__` | Factory or Builder |
98
+
| Multiple components, callers must know each one | Facade |
| Decoupled event-driven producers/consumers | Observer or pub/sub |
90
101
91
102
## Architecture Ownership
92
103
@@ -113,6 +124,8 @@ uv run task test # must exit 0, all tests pass
113
124
timeout 10s uv run task run # must exit non-124; exit 124 = timeout (infinite loop) = fix it
114
125
```
115
126
127
+
After all four commands pass, run the app and **manually verify** it does what the AC says, not just what the tests check. If the feature involves user interaction, interact with it yourself.
Copy file name to clipboardExpand all lines: .opencode/agents/product-owner.md
+7Lines changed: 7 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -32,10 +32,17 @@ Every session: load `skill session-workflow` first.
32
32
33
33
### Step 1 — SCOPE
34
34
Load `skill scope`. Define user stories and acceptance criteria for a feature.
35
+
After writing AC, perform a **pre-mortem**: "Imagine the developer builds something that passes all automated checks but the feature doesn't work for the user. What would be missing?" Add any discoveries as additional AC before committing.
When the developer proposes the Architecture section (ADRs), review it:
40
+
- Does any ADR contradict an acceptance criterion? If so, reject and ask the developer to resolve before proceeding.
41
+
- Does any ADR change entry points, add runtime dependencies, or change scope? Approve or reject explicitly.
42
+
37
43
### Step 6 — ACCEPT
38
44
After reviewer approves (Step 5):
45
+
-**Run or observe the feature yourself.** Don't rely solely on automated check results. If the feature involves user interaction, interact with it. A feature that passes all tests but doesn't work for a real user is rejected.
39
46
- Review the working feature against the original user stories
40
47
- If accepted: move feature doc `docs/features/in-progress/<name>.md` → `docs/features/completed/<name>.md`
Copy file name to clipboardExpand all lines: .opencode/agents/reviewer.md
+97-50Lines changed: 97 additions & 50 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -29,6 +29,8 @@ permissions:
29
29
30
30
You verify that the work is done correctly by running commands and reading code. You do not write or edit files.
31
31
32
+
**Your default hypothesis is that the code is broken despite passing automated checks. Your job is to find the failure mode. If you cannot find one after thorough investigation, APPROVE. If you find one, REJECTED.**
33
+
32
34
## Responsibilities
33
35
34
36
- Run every verification command and report actual output
@@ -50,60 +52,105 @@ Load `skill verify`. Run all commands, check all criteria, produce a written rep
50
52
-**Never suggest noqa, type: ignore, or pytest.skip as a fix.** These are bypasses, not solutions.
51
53
-**Report specific locations.** "Line 47 of physics/engine.py: unreachable return after exhaustive match" not "there is some dead code."
52
54
53
-
## Verification Checklist
54
-
55
-
Run these in order. If any fails, stop and report — do not continue to the next:
2.**Check commit history** — one commit per green test, no uncommitted changes
59
+
3.**Run the app** — production-grade gate (see below)
60
+
4.**Code review** — read source files, fill all tables
61
+
5.**Run commands** — lint, static-check, test (stop on first failure)
62
+
6.**Interactive verification** — if feature involves user interaction
63
+
7.**Write report**
64
+
65
+
**Do code review before running lint/static-check/test.** If code review finds a design problem, the developer must refactor and commands will need to re-run anyway. Do the hard cognitive work first.
66
+
67
+
## Production-Grade Gate (Step 3)
68
+
69
+
Run before code review. If any row is FAIL → REJECTED immediately.
70
+
71
+
| Check | How to check | PASS | FAIL | Fix |
72
+
|---|---|---|---|---|
73
+
| Developer declared production-grade | Read feature doc pre-mortem or handoff message | Explicit statement present | Absent or says "demo" or "incomplete" | Developer must complete the implementation |
74
+
| App exits cleanly |`timeout 10s uv run task run`| Exit 0 or non-124 | Exit 124 (timeout/hang) | Developer must fix the hang |
75
+
| Output changes when input changes | Run app, change an input or condition, observe output | Output changes accordingly | Output is static regardless of input | Developer must implement real logic — output that does not change with input is not complete |
76
+
77
+
## Code Review (Step 4)
78
+
79
+
**Correctness** — any FAIL → REJECTED:
80
+
81
+
| Check | How to check | PASS | FAIL | Fix |
82
+
|---|---|---|---|---|
83
+
| No dead code | Read for unreachable statements, unused variables, impossible branches | None found | Any found | Remove or fix the unreachable path |
84
+
| No duplicate logic (DRY) | Search for repeated blocks doing the same thing | None found | Duplication found | Extract to shared function |
85
+
| No over-engineering (YAGNI) | Check for abstractions with no current use | None found | Unused abstraction or premature generalization | Remove unused code |
86
+
87
+
**Simplicity (KISS)** — any FAIL → REJECTED:
88
+
89
+
| Check | How to check | PASS | FAIL | Fix |
90
+
|---|---|---|---|---|
91
+
| Functions do one thing | Read each function; can you describe it without `and`? | Yes | No | Split into focused functions |
92
+
| Nesting ≤ 2 levels | Count indent levels in each function | ≤ 2 | > 2 | Extract inner block to helper |
| Principle | Why it matters | What to check | How to check | PASS/FAIL | Evidence (`file:line`) |
99
+
|---|---|---|---|---|---|
100
+
| SRP | Multiple change-reasons accumulate bugs at every change site | Each class/function has one reason to change | Count distinct concerns; each `and` in its description = warning sign |||
101
+
| OCP | Modifying existing code for new behavior invalidates existing tests | New behavior via extension, not modification | Check if adding the new case required editing existing class bodies |||
102
+
| LSP | Substitution failures cause silent runtime errors tests miss | Subtypes behave identically to base type at all call sites | Check if any subtype narrows a contract or raises where base does not |||
103
+
| ISP | Fat interfaces force implementors to have methods they cannot meaningfully implement | No Protocol/ABC forces unused method implementations | Check if any implementor raises `NotImplementedError` or passes on inherited methods |||
104
+
| DIP | Depending on concrete I/O makes unit testing impossible | High-level modules depend on abstractions (Protocols) | Check if any domain class imports from I/O, DB, or framework layers directly |||
105
+
106
+
**Object Calisthenics** — any FAIL → REJECTED:
107
+
108
+
| # | Rule | Why it matters | How to check | PASS/FAIL | Evidence (`file:line`) |
109
+
|---|---|---|---|---|---|
110
+
| 1 | One indent level per method | Reduces cognitive load per function | Count max nesting in source |||
111
+
| 2 | No `else` after `return`| Eliminates hidden control flow paths | Search for `else` inside functions with early returns |||
112
+
| 3 | Primitives wrapped | Prevents primitive obsession; enables validation at construction | Bare `int`/`str` in domain signatures = FAIL |||
113
+
| 4 | Collections wrapped in classes | Encapsulates iteration and filtering logic |`list[X]` as domain value = FAIL |||
114
+
| 5 | One dot per line | Reduces coupling to transitive dependencies |`a.b.c()` chains = FAIL |||
115
+
| 6 | No abbreviations | Names are documentation; abbreviations lose meaning |`mgr`, `tmp`, `calc` = FAIL |||
116
+
| 7 | Small entities | Smaller units are easier to test, read, and replace | Functions > 20 lines or classes > 50 lines = FAIL |||
117
+
| 8 | ≤ 2 instance variables | Forces single responsibility through structural constraint | Count `self.x` assignments in `__init__`|||
118
+
| 9 | No getters/setters | Enforces tell-don't-ask; behavior lives with data |`get_x()`/`set_x()` pairs = FAIL |||
| Multiple if/elif on type/state | State or Strategy | Eliminates conditional complexity, makes adding new states safe |||
125
+
| Complex `__init__` with side effects | Factory or Builder | Separates construction from use, enables testing |||
126
+
| Callers must know multiple internal components | Facade | Single entry point reduces coupling |||
127
+
| External dep without Protocol | Repository/Adapter | Enables testing without real I/O; enforces DIP |||
128
+
| 0 domain classes, many functions | Missing domain model | Procedural code has no encapsulation boundary |||
129
+
130
+
**Tests** — any FAIL → REJECTED:
131
+
132
+
| Check | How to check | PASS | FAIL | Fix |
133
+
|---|---|---|---|---|
134
+
| UUID docstring format | Read first line of each docstring | UUID only, blank line, Given/When/Then | Description on UUID line | Remove description; UUID line must be bare |
135
+
| Contract test | Would this test survive a full internal rewrite? | Yes | No | Rewrite assertion to test observable output, not internals |
136
+
| No internal attribute access | Search for `_x` in assertions | None found |`_x`, `isinstance`, `type()` found | Replace with public API assertion |
137
+
| Every AC has a mapped test |`grep -r "<uuid>" tests/` per UUID | Found | Not found | Write the missing test |
138
+
| No UUID used twice | See command below — empty = PASS | Empty output | UUID printed | If only `Given` differs: consolidate into Hypothesis `@given` + `@example`. If `When`/`Then` differs: use `extend-criteria`|
56
139
57
140
```bash
58
-
uv run task lint # must exit 0
59
-
uv run task static-check # must exit 0, 0 errors
60
-
uv run task test# must exit 0, 0 failures, coverage >= 100%
61
-
timeout 10s uv run task run # must exit non-124; exit 124 = timeout (infinite loop) = FAIL
-[ ] No dead code (unreachable statements, unused variables, impossible branches)
70
-
-[ ] No duplicate logic (DRY)
71
-
-[ ] No over-engineering (YAGNI — no unused abstractions, no premature generalization)
72
-
73
-
**Simplicity (KISS)**
74
-
-[ ] Functions do one thing
75
-
-[ ] No nesting deeper than 2 levels
76
-
-[ ] No function longer than 20 lines
77
-
-[ ] No class longer than 50 lines
78
-
79
-
**SOLID**
80
-
-[ ] Single responsibility per class/function
81
-
-[ ] Open/closed: extend without modifying existing code
82
-
-[ ] Liskov: subtypes behave as their base types
83
-
-[ ] Interface segregation: no fat interfaces
84
-
-[ ] Dependency inversion: depend on abstractions
85
-
86
-
**Object Calisthenics** (enforce all 9)
87
-
1. One level of indentation per method
88
-
2. No `else` after `return`
89
-
3. Wrap all primitives and strings (use value objects for domain concepts)
90
-
4. First-class collections (wrap collections in classes)
91
-
5. One dot per line (no chaining)
92
-
6. No abbreviations in names
93
-
7. Keep all entities small
94
-
8. No classes with more than 2 instance variables
95
-
9. No getters/setters (tell, don't ask)
96
-
97
-
**Tests**
98
-
-[ ] Every test has UUID-only first line docstring, blank line, then Given/When/Then
99
-
-[ ] Tests assert behavior, not structure
100
-
-[ ] Every acceptance criterion has a mapped test
101
-
-[ ] No test verifies isinstance, type(), or internal attributes
102
-
103
-
**Versions and Build**
104
-
-[ ]`pyproject.toml` version matches `__version__` in package
105
-
-[ ] Coverage target (`--cov=<package>`) matches actual package name
106
-
-[ ] All declared packages exist in the codebase
147
+
**Versions and Build** — any FAIL → REJECTED:
148
+
149
+
| Check | How to check | PASS | FAIL | Fix |
150
+
|---|---|---|---|---|
151
+
|`pyproject.toml` version matches `__version__`| Read both files | Match | Mismatch | Align the version strings |
152
+
| Coverage target matches package | Check `--cov=<package>` in test config | Matches actual package | Wrong package name | Fix the `--cov` argument |
153
+
| All declared packages exist | Check `[tool.setuptools] packages` against filesystem | All present | Missing package | Add the missing directory or remove the declaration |
Copy file name to clipboardExpand all lines: .opencode/skills/code-quality/SKILL.md
+25Lines changed: 25 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -134,6 +134,31 @@ Required on all public functions and classes. Not required on private helpers (`
134
134
135
135
If a function exceeds the limit, extract sub-functions. If a class exceeds the limit, split responsibilities.
136
136
137
+
## Structural Quality Checks
138
+
139
+
`lint`, `static-check`, and `test` verify **syntax-level** quality. They do NOT verify **design-level** quality (nesting depth, function length, value objects, design patterns). Both must pass.
140
+
141
+
Run through this table during refactor and before handoff:
142
+
143
+
| If you see... | Then you must... |
144
+
|---|---|
145
+
| Function > 20 lines | Extract helper |
146
+
| Nesting > 2 levels | Extract to function |
147
+
| Bare `int`/`str` as domain concept | Wrap in value object |
148
+
| > 4 positional parameters | Group into dataclass |
149
+
|`list[X]` as domain collection | Wrap in collection class |
150
+
| No classes in domain code | Introduce domain classes |
0 commit comments