agent-test: require tee output capture and failure excerpt in RED gate#13
agent-test: require tee output capture and failure excerpt in RED gate#13bgerstle wants to merge 6 commits into
Conversation
There was a problem hiding this comment.
Pull request overview
This PR tightens the agent-test (RED gate) workflow so failing test output is reliably captured to disk (avoiding chat truncation) and the agent must report a concrete, specific failure excerpt derived from the captured log.
Changes:
- Updates the RED gate “run tests” step to capture full test output via
teeinto a.light/log file. - Adds an explicit step requiring the agent to read the log file and extract a specific assertion/exception excerpt.
- Updates the report format to include the log file path and a failure excerpt (instead of pasting raw output).
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| 4. Run the test command, capturing full output to a log file named after the branch | ||
| and current task or phase: | ||
| `<test-command> 2>&1 | tee .light/red-gate-<branch>-<phase>.log` | ||
| Example: `.light/red-gate-feature-DEV-3658-P2-capacity-check.log` |
There was a problem hiding this comment.
Fixed in 1355819. The step now uses set -o pipefail and reads ${PIPESTATUS[0]} to keep the test command's exit status instead of tee's, runs mkdir -p .light, and sanitizes the branch name with tr '/' '-' so the log stays a single file.
| 5. Read the log file. Find and extract the specific assertion failure — | ||
| the line starting with "expected:" or the @Test method name plus exception message. | ||
| Do NOT rely on in-conversation output — read the file. |
There was a problem hiding this comment.
Fixed in 1355819. Step 5 now classifies the failure: an assertion failure (the expected: line) is the expected RED, while a compilation/configuration error has no expected: line and is reported as a wrong-reason RED gate FAIL with the relevant error excerpt rather than being treated as a valid failing test. The report format now includes a "Failure type" field.
… RED gate Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
74e75b0 to
3e2081e
Compare
Addresses PR #13 review feedback: - Preserve the test command's exit status via pipefail + PIPESTATUS so a failing build is not masked by tee's exit code; mkdir -p .light and sanitize the branch name (/ -> -) so the log path stays a single file. - Classify the failure when reading the log: an assertion failure is the expected RED, but a compilation/configuration error (no expected: line) is a wrong-reason RED gate FAIL and must be reported with the error excerpt rather than treated as a valid failing test. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The previous wording keyed off JUnit/Maven specifics (the "expected:" line, @test method names). Tests vary by language and framework, so the step now frames the distinction in framework-neutral terms — failed AT the assertion (intended RED) vs. failed BEFORE the assertion ran (compile/import/setup error, wrong-reason RED) — and defers the exact wording to CLAUDE.md or language/framework testing guidance. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…gainst them Moves "what a correct RED failure looks like" from agent-test's runtime heuristic into the plan phase, where framework and test-tier context is available and project/user testing skills can be consulted. - plan-tasks: the Agent Context RED gate field becomes an expected failure signature in the framework's terms (which assertion fails and why), not boilerplate "tests fail". Added authoring guidance pointing at project/user testing skills; updated the template examples. - agent-test: judges the observed failure against the plan's RED gate signature (authoritative), and falls back to the generic failed-at-assertion vs. failed-before-assertion check when no signature is provided. Report now records the expected signature and match. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The yak context templates each carried a condensed copy of the agent's Instructions and Report Format. These had drifted from the authoritative agent definitions — the agent-test template still showed the pre-tee "Test command output: [paste]" report and a generic RED gate status, contradicting the agent-test changes in this PR. Each yak template now carries only the Agent Context (the task input) and points to its `praxis:agent-*` definition for the procedure and report format. The agent definitions are richer than the removed copies (YAGNI pass, type-check/lint steps), so no guidance is lost. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Summary
agent-teststep 4 to capture test output viateeto a named log file under.light/, rather than relying on in-conversation renderingMotivation
Maven test output is long and gets truncated in Claude's conversation rendering. The orchestrator was seeing
BUILD FAILUREbut not the specific assertion, making it impossible to verify the RED gate is failing for the right reason (missing implementation vs. compile error or fixture bug).Test plan
agent-teston a TDD phase and confirm the log file is created at the expected path🤖 Generated with Claude Code