Skip to content

agent-test: require tee output capture and failure excerpt in RED gate#13

Open
bgerstle wants to merge 6 commits into
mainfrom
reflect/agent-test-red-gate-output
Open

agent-test: require tee output capture and failure excerpt in RED gate#13
bgerstle wants to merge 6 commits into
mainfrom
reflect/agent-test-red-gate-output

Conversation

@bgerstle

@bgerstle bgerstle commented Jun 17, 2026

Copy link
Copy Markdown
Collaborator

Summary

  • Updates agent-test step 4 to capture test output via tee to a named log file under .light/, rather than relying on in-conversation rendering
  • Adds step 5 requiring the agent to read the log file and extract the specific assertion failure before reporting
  • Replaces the vague "Test output: paste output showing failures" report field with a log file path and a concrete failure excerpt

Motivation

Maven test output is long and gets truncated in Claude's conversation rendering. The orchestrator was seeing BUILD FAILURE but not the specific assertion, making it impossible to verify the RED gate is failing for the right reason (missing implementation vs. compile error or fixture bug).

Test plan

  • Manually invoke agent-test on a TDD phase and confirm the log file is created at the expected path
  • Confirm the report includes a log path and a specific failure excerpt

🤖 Generated with Claude Code

@bgerstle bgerstle requested review from Copilot and ericjohnolson and removed request for ericjohnolson June 17, 2026 20:00

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR tightens the agent-test (RED gate) workflow so failing test output is reliably captured to disk (avoiding chat truncation) and the agent must report a concrete, specific failure excerpt derived from the captured log.

Changes:

  • Updates the RED gate “run tests” step to capture full test output via tee into a .light/ log file.
  • Adds an explicit step requiring the agent to read the log file and extract a specific assertion/exception excerpt.
  • Updates the report format to include the log file path and a failure excerpt (instead of pasting raw output).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread plugins/praxis/agents/agent-test.md Outdated
Comment on lines +19 to +22
4. Run the test command, capturing full output to a log file named after the branch
and current task or phase:
`<test-command> 2>&1 | tee .light/red-gate-<branch>-<phase>.log`
Example: `.light/red-gate-feature-DEV-3658-P2-capacity-check.log`

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 1355819. The step now uses set -o pipefail and reads ${PIPESTATUS[0]} to keep the test command's exit status instead of tee's, runs mkdir -p .light, and sanitizes the branch name with tr '/' '-' so the log stays a single file.

Comment thread plugins/praxis/agents/agent-test.md Outdated
Comment on lines +23 to +25
5. Read the log file. Find and extract the specific assertion failure —
the line starting with "expected:" or the @Test method name plus exception message.
Do NOT rely on in-conversation output — read the file.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 1355819. Step 5 now classifies the failure: an assertion failure (the expected: line) is the expected RED, while a compilation/configuration error has no expected: line and is reported as a wrong-reason RED gate FAIL with the relevant error excerpt rather than being treated as a valid failing test. The report format now includes a "Failure type" field.

@bgerstle bgerstle changed the title reflect: require tee output capture and failure excerpt in agent-test RED gate agent-test: require tee output capture and failure excerpt in agent-test RED gate Jun 17, 2026
@bgerstle bgerstle changed the title agent-test: require tee output capture and failure excerpt in agent-test RED gate agent-test: require tee output capture and failure excerpt in RED gate Jun 17, 2026
bgerstle and others added 2 commits June 17, 2026 16:40
… RED gate

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@bgerstle bgerstle force-pushed the reflect/agent-test-red-gate-output branch from 74e75b0 to 3e2081e Compare June 17, 2026 20:52
bgerstle and others added 4 commits June 18, 2026 10:29
Addresses PR #13 review feedback:
- Preserve the test command's exit status via pipefail + PIPESTATUS so a
  failing build is not masked by tee's exit code; mkdir -p .light and
  sanitize the branch name (/ -> -) so the log path stays a single file.
- Classify the failure when reading the log: an assertion failure is the
  expected RED, but a compilation/configuration error (no expected: line)
  is a wrong-reason RED gate FAIL and must be reported with the error
  excerpt rather than treated as a valid failing test.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The previous wording keyed off JUnit/Maven specifics (the "expected:"
line, @test method names). Tests vary by language and framework, so the
step now frames the distinction in framework-neutral terms — failed AT
the assertion (intended RED) vs. failed BEFORE the assertion ran
(compile/import/setup error, wrong-reason RED) — and defers the exact
wording to CLAUDE.md or language/framework testing guidance.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…gainst them

Moves "what a correct RED failure looks like" from agent-test's runtime
heuristic into the plan phase, where framework and test-tier context is
available and project/user testing skills can be consulted.

- plan-tasks: the Agent Context RED gate field becomes an expected
  failure signature in the framework's terms (which assertion fails and
  why), not boilerplate "tests fail". Added authoring guidance pointing
  at project/user testing skills; updated the template examples.
- agent-test: judges the observed failure against the plan's RED gate
  signature (authoritative), and falls back to the generic
  failed-at-assertion vs. failed-before-assertion check when no signature
  is provided. Report now records the expected signature and match.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The yak context templates each carried a condensed copy of the agent's
Instructions and Report Format. These had drifted from the authoritative
agent definitions — the agent-test template still showed the pre-tee
"Test command output: [paste]" report and a generic RED gate status,
contradicting the agent-test changes in this PR.

Each yak template now carries only the Agent Context (the task input)
and points to its `praxis:agent-*` definition for the procedure and
report format. The agent definitions are richer than the removed copies
(YAGNI pass, type-check/lint steps), so no guidance is lost.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated no new comments.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants