8thlight · bgerstle · Jun 17, 2026 · Jun 17, 2026 · Jun 18, 2026 · Jun 18, 2026
diff --git a/plugins/praxis/.claude-plugin/plugin.json b/plugins/praxis/.claude-plugin/plugin.json
@@ -1,6 +1,6 @@
 {
   "name": "praxis",
-  "version": "2.1.1",
+  "version": "2.1.2",
   "description": "Agentic engineering skills for software development patterns and best practices including TDD, harness engineering, refactoring, architecture patterns, and AI development workflows.",
   "author": {
     "name": "Eric Olson"

diff --git a/plugins/praxis/agents/agent-test.md b/plugins/praxis/agents/agent-test.md
@@ -16,8 +16,37 @@ Write failing tests ONLY. Do NOT write any implementation code.
 1. Read the issue description provided in your prompt — it contains file paths, test spec, and test command
 2. Read the project's CLAUDE.md to understand testing patterns and conventions
 3. Write test files at the paths specified in the issue's Agent Context
-4. Run the test command from the issue — tests MUST fail (RED gate)
-5. If tests pass immediately, STOP and report this as a **RED gate FAIL** — the test is tautological or the feature already exists
+4. Run the test command, capturing full output to a log file named after the branch
+   and current task or phase. Ensure `.light/` exists, replace `/` in the branch name
+   with `-` so the path stays a single file, and preserve the test command's exit
+   status (not `tee`'s) so a failing build is not masked:
+   ```bash
+   set -o pipefail
+   mkdir -p .light
+   slug="$(git rev-parse --abbrev-ref HEAD | tr '/' '-')"
+   log=".light/red-gate-${slug}-<phase>.log"
+   <test-command> 2>&1 | tee "$log"
+   status=${PIPESTATUS[0]}   # exit code of the test command, not tee
+   ```
+   Example log path: `.light/red-gate-feature-DEV-3658-P2-capacity-check.log`
+5. Read the log file (do NOT rely on in-conversation output) and judge the failure
+   against the **RED gate** signature in your Agent Context — the plan states what a
+   correct initial failure looks like for this phase (which assertion fails and why),
+   authored in the project's framework terms.
+   - **Matches the expected RED gate** (intended RED): the test ran and failed for the
+     stated reason — the asserted behavior is absent. Extract the specific
+     assertion/failure message as evidence.
+   - **Failed for a different reason** (wrong-reason RED): the failure does not match
+     the expected signature — typically the test never reached its assertion (compile/
+     syntax error, unresolved import or symbol, missing dependency, fixture/setup/config
+     failure). STOP and report this as a **RED gate FAIL** with the relevant error
+     excerpt — the test is not failing for the intended reason.
+
+   If the Agent Context has no specific RED gate signature (older plans, ad-hoc runs),
+   fall back to the universal distinction: **failed at the assertion** (intended RED)
+   vs. **failed before the assertion ran** (wrong-reason RED). Consult CLAUDE.md or any
+   project/user testing guidance for what each looks like in this framework.
+6. If tests pass immediately (`status` is 0), STOP and report this as a **RED gate FAIL** — the test is tautological or the feature already exists
 
 ## Rules
 
@@ -32,5 +61,10 @@ Write failing tests ONLY. Do NOT write any implementation code.
 After writing tests:
 - Files created: [list paths]
 - Test command: [the command you ran]
-- Test output: [paste output showing failures]
-- RED gate: PASS (tests fail as expected) or FAIL (tests pass without implementation)
+- Log file: [path to the tee'd log]
+- Expected RED gate: [the signature from your Agent Context, or "none specified — used fallback"]
+- Failure mode: matched expected RED | wrong reason (failed before assertion: compile/import/setup) | tests passed
+- Failure excerpt: [the specific assertion/failure message when it matched; otherwise
+  the relevant compile/import/setup error line]
+- RED gate: PASS (failed for the expected reason) or FAIL (tests passed without
+  implementation, or failed for a different reason — wrong reason)
diff --git a/plugins/praxis/skills/plan-tasks/SKILL.md b/plugins/praxis/skills/plan-tasks/SKILL.md
@@ -186,6 +186,17 @@ Every implementation phase with tests MUST include an `#### Agent Context` subse
 - **RED gate / GREEN gate** — observable success criteria
 - **Architectural constraints** — boundaries the agent must respect
 
+**Authoring the RED gate signature:** The RED gate field tells `agent-test` what a
+*correct* initial failure looks like for this phase — so it can distinguish the
+intended failure (the asserted behavior is absent) from a wrong-reason failure
+(compile/import/setup error, or tests passing outright). Make it specific to the
+phase's framework and test tier rather than generic. Where the project or user has a
+testing skill or guideline (e.g. a framework-specific testing skill referenced from
+CLAUDE.md), consult it to describe the expected failure in that framework's terms. If
+no such guidance exists, describe the expected failure behaviorally (which assertion
+fails and why) and `agent-test` falls back to a generic at-assertion vs.
+before-assertion check.
+
 See [template.md](references/template.md) for the Agent Context block reference and task context templates.
 
 ## Phase Boundaries

diff --git a/plugins/praxis/skills/plan-tasks/references/template.md b/plugins/praxis/skills/plan-tasks/references/template.md
@@ -90,7 +90,7 @@ The plan lives in the Claude Code session plan file during planning. It is NOT r
 - **Files to create:** `{test file path}`, `{implementation file path}`
 - **Test spec:** {Behavioral description of what to test — properties, invariants, edge cases. Reference project CLAUDE.md for testing tools and patterns.}
 - **Test command:** `{project test command}`
-- **RED gate:** Tests fail because implementation does not exist yet
+- **RED gate:** {Expected initial failure for this phase, in the framework's terms — consult the project/user testing skill if one exists. E.g. "assertion fails: split total != original total" — not just "tests fail". A compile/import/setup error or a passing test is a wrong-reason RED.}
 - **GREEN gate:** All tests pass with minimal implementation
 - **Architectural constraints:** Pure functions only, no I/O, no side effects
 
@@ -138,7 +138,7 @@ The plan lives in the Claude Code session plan file during planning. It is NOT r
 - **Files to create:** `{test file path}`, `{feature file path}`
 - **Test spec:** {Behavioral description — what the use case should do on success, what it should return on each error case. Reference project CLAUDE.md for integration test patterns.}
 - **Test command:** `{project integration test command}`
-- **RED gate:** Tests fail because feature implementation does not exist yet
+- **RED gate:** {Expected initial failure for this phase, in the framework's terms — consult the project/user testing skill if one exists. E.g. "use case returns null / 404 instead of the created entity" — not just "tests fail". A compile/import/setup error or a passing test is a wrong-reason RED.}
 - **GREEN gate:** All integration tests pass
 - **Architectural constraints:** Orchestration only — delegates to core logic and repository, no direct database access
 
@@ -170,7 +170,7 @@ The plan lives in the Claude Code session plan file during planning. It is NOT r
 - **Files to modify:** `{app registration file}`
 - **Test spec:** {HTTP contract description — method, path, request body shape, success response shape and status, each error response shape and status. Reference project CLAUDE.md for HTTP test patterns.}
 - **Test command:** `{project e2e/HTTP test command}`
-- **RED gate:** Tests fail because route handler does not exist yet
+- **RED gate:** {Expected initial failure for this phase, in the framework's terms — consult the project/user testing skill if one exists. E.g. "request returns 404, expected 201 with body {id}" — not just "tests fail". A compile/import/setup error or a passing test is a wrong-reason RED.}
 - **GREEN gate:** All HTTP contract tests pass
 - **Architectural constraints:** Thin adapter — validates input, delegates to feature, maps result to HTTP response
 
@@ -250,20 +250,10 @@ Each yak's context MUST be self-contained — everything an agent needs to execu
 - **Files to create:** `{test file path(s)}`
 - **Test spec:** {Behavioral description — properties, invariants, contracts to test. Be specific.}
 - **Test command:** `{shell command to run tests}`
-- **RED gate:** Tests fail because `{implementation file}` does not exist yet
+- **RED gate:** {Expected initial failure for this phase, in the framework's terms — what assertion fails and why. Consult the project/user testing skill if one exists. A compile/import/setup error or a passing test is a wrong-reason RED.}
 - **Architectural constraints:** {L3/L4 boundary, property-based with fast-check, no mocks, etc.}
 
-### Instructions
-1. Read the project's CLAUDE.md for testing tools and conventions
-2. Read existing test files to match style
-3. Write test file(s) at the specified paths
-4. Run the test command
-5. Verify tests FAIL (RED gate)
-
-### Report
-- Test files created: [list paths]
-- Test command output: [paste]
-- RED gate status: PASS (tests fail as expected) or FAIL
+Follow your `praxis:agent-test` definition for the procedure (output capture to a log file, judging the failure against the RED gate above) and report format. The Agent Context above is your task input.
 ```
 
 ### Implement Yak (agent-impl)
@@ -280,17 +270,7 @@ Each yak's context MUST be self-contained — everything an agent needs to execu
 - **GREEN gate:** All tests pass
 - **Architectural constraints:** {Pure functions, no side effects, adapter layer, etc.}
 
-### Instructions
-1. Read the test files to understand expected behavior
-2. Read the project's CLAUDE.md for coding patterns
-3. Write minimal implementation to pass all tests
-4. Run the test command
-5. Verify tests PASS (GREEN gate)
-
-### Report
-- Implementation files created/modified: [list paths]
-- Test command output: [paste]
-- GREEN gate status: PASS or FAIL
+Follow your `praxis:agent-impl` definition for the procedure and report format. The Agent Context above is your task input.
 ```
 
 ### Validate Yak (agent-validate)
@@ -305,14 +285,7 @@ Each yak's context MUST be self-contained — everything an agent needs to execu
 - **Phase test files:** `{test file path(s)}`
 - **Phase impl files:** `{implementation file path(s)}`
 
-### Instructions
-1. Run the full test suite
-2. Report results — both new and pre-existing tests
-
-### Report
-- Result: ALL PASS or FAILURES FOUND
-- Total tests: [count]
-- Failures: [count and details]
+Follow your `praxis:agent-validate` definition for the procedure and report format. The Agent Context above is your task input.
 ```
 
 ### No-Test Yak (no-test)
@@ -328,16 +301,7 @@ Each yak's context MUST be self-contained — everything an agent needs to execu
 - **Acceptance gate:** {Observable success criterion — e.g., "Migration applies without error"}
 - **Architectural constraints:** {Any relevant constraints}
 
-### Instructions
-1. Read the project's CLAUDE.md for conventions
-2. Execute the tasks listed above
-3. Verify the acceptance gate
-
-### Report
-- Files created/modified: [list paths]
-- Task completed: [brief description]
-- Acceptance gate: PASS or FAIL
-- Details: [any relevant output or notes]
+Follow your `praxis:agent-no-test` definition for the procedure and report format. The Agent Context above is your task input.
 ```
 
 ### Remediation Yak (agent-remediate)
@@ -356,16 +320,5 @@ Created dynamically by `/implement` when validation fails. Not created during `/
 - **Failure output from validation:**
 {paste Agent 3's failure output}
 
-### Instructions
-1. Read the failing tests to understand expected behavior
-2. Read the implementation files to find the issue
-3. Fix the implementation — minimal changes only
-4. Run the full test suite
-5. Verify ALL tests pass
-
-### Report
-- Files modified: [list paths]
-- What was fixed: [brief description]
-- Test command output: [paste]
-- Result: ALL PASS or STILL FAILING
+Follow your `praxis:agent-remediate` definition for the procedure and report format. The Agent Context above is your task input.
 ```