agent-test: require tee output capture and failure excerpt in RED gate by bgerstle · Pull Request #13 · 8thlight/lightfactory

bgerstle · 2026-06-17T20:00:43Z

Summary

Updates agent-test step 4 to capture test output via tee to a named log file under .light/, rather than relying on in-conversation rendering
Adds step 5 requiring the agent to read the log file and extract the specific assertion failure before reporting
Replaces the vague "Test output: paste output showing failures" report field with a log file path and a concrete failure excerpt

Motivation

Maven test output is long and gets truncated in Claude's conversation rendering. The orchestrator was seeing BUILD FAILURE but not the specific assertion, making it impossible to verify the RED gate is failing for the right reason (missing implementation vs. compile error or fixture bug).

Test plan

Manually invoke agent-test on a TDD phase and confirm the log file is created at the expected path
Confirm the report includes a log path and a specific failure excerpt

🤖 Generated with Claude Code

Copilot

Pull request overview

This PR tightens the agent-test (RED gate) workflow so failing test output is reliably captured to disk (avoiding chat truncation) and the agent must report a concrete, specific failure excerpt derived from the captured log.

Changes:

Updates the RED gate “run tests” step to capture full test output via tee into a .light/ log file.
Adds an explicit step requiring the agent to read the log file and extract a specific assertion/exception excerpt.
Updates the report format to include the log file path and a failure excerpt (instead of pasting raw output).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

bgerstle · 2026-06-18T14:29:23Z

+4. Run the test command, capturing full output to a log file named after the branch
+   and current task or phase:
+   `<test-command> 2>&1 | tee .light/red-gate-<branch>-<phase>.log`
+   Example: `.light/red-gate-feature-DEV-3658-P2-capacity-check.log`


Fixed in 1355819. The step now uses set -o pipefail and reads ${PIPESTATUS[0]} to keep the test command's exit status instead of tee's, runs mkdir -p .light, and sanitizes the branch name with tr '/' '-' so the log stays a single file.

bgerstle · 2026-06-18T14:29:25Z

+5. Read the log file. Find and extract the specific assertion failure —
+   the line starting with "expected:" or the @Test method name plus exception message.
+   Do NOT rely on in-conversation output — read the file.


Fixed in 1355819. Step 5 now classifies the failure: an assertion failure (the expected: line) is the expected RED, while a compilation/configuration error has no expected: line and is reported as a wrong-reason RED gate FAIL with the relevant error excerpt rather than being treated as a valid failing test. The report format now includes a "Failure type" field.

… RED gate Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Addresses PR #13 review feedback: - Preserve the test command's exit status via pipefail + PIPESTATUS so a failing build is not masked by tee's exit code; mkdir -p .light and sanitize the branch name (/ -> -) so the log path stays a single file. - Classify the failure when reading the log: an assertion failure is the expected RED, but a compilation/configuration error (no expected: line) is a wrong-reason RED gate FAIL and must be reported with the error excerpt rather than treated as a valid failing test. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

@test

The previous wording keyed off JUnit/Maven specifics (the "expected:" line, @test method names). Tests vary by language and framework, so the step now frames the distinction in framework-neutral terms — failed AT the assertion (intended RED) vs. failed BEFORE the assertion ran (compile/import/setup error, wrong-reason RED) — and defers the exact wording to CLAUDE.md or language/framework testing guidance. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

…gainst them Moves "what a correct RED failure looks like" from agent-test's runtime heuristic into the plan phase, where framework and test-tier context is available and project/user testing skills can be consulted. - plan-tasks: the Agent Context RED gate field becomes an expected failure signature in the framework's terms (which assertion fails and why), not boilerplate "tests fail". Added authoring guidance pointing at project/user testing skills; updated the template examples. - agent-test: judges the observed failure against the plan's RED gate signature (authoritative), and falls back to the generic failed-at-assertion vs. failed-before-assertion check when no signature is provided. Report now records the expected signature and match. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

The yak context templates each carried a condensed copy of the agent's Instructions and Report Format. These had drifted from the authoritative agent definitions — the agent-test template still showed the pre-tee "Test command output: [paste]" report and a generic RED gate status, contradicting the agent-test changes in this PR. Each yak template now carries only the Agent Context (the task input) and points to its `praxis:agent-*` definition for the procedure and report format. The agent definitions are richer than the removed copies (YAGNI pass, type-check/lint steps), so no guidance is lost. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Copilot

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated no new comments.

bgerstle requested review from Copilot and ericjohnolson and removed request for ericjohnolson June 17, 2026 20:00

Copilot started reviewing on behalf of bgerstle June 17, 2026 20:01 View session

Copilot AI reviewed Jun 17, 2026

View reviewed changes

bgerstle changed the title ~~reflect: require tee output capture and failure excerpt in agent-test RED gate~~ agent-test: require tee output capture and failure excerpt in agent-test RED gate Jun 17, 2026

bgerstle changed the title ~~agent-test: require tee output capture and failure excerpt in agent-test RED gate~~ agent-test: require tee output capture and failure excerpt in RED gate Jun 17, 2026

bgerstle and others added 2 commits June 17, 2026 16:40

reflect: require tee output capture and failure excerpt in agent-test…

f2aad03

… RED gate Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Bump praxis plugin version to 2.1.2

3e2081e

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

bgerstle force-pushed the reflect/agent-test-red-gate-output branch from 74e75b0 to 3e2081e Compare June 17, 2026 20:52

bgerstle and others added 4 commits June 18, 2026 10:29

bgerstle requested a review from Copilot June 18, 2026 14:52

Copilot started reviewing on behalf of bgerstle June 18, 2026 14:52 View session

Copilot AI reviewed Jun 18, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

agent-test: require tee output capture and failure excerpt in RED gate#13

agent-test: require tee output capture and failure excerpt in RED gate#13
bgerstle wants to merge 6 commits into
mainfrom
reflect/agent-test-red-gate-output

bgerstle commented Jun 17, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

bgerstle Jun 18, 2026

Uh oh!

bgerstle Jun 18, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

bgerstle commented Jun 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Motivation

Test plan

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

bgerstle Jun 18, 2026

Choose a reason for hiding this comment

Uh oh!

bgerstle Jun 18, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

bgerstle commented Jun 17, 2026 •

edited

Loading