Skip to content

fix(query): surface result error text in trailing ProcessError instead of generic exit code#918

Open
qing-ant wants to merge 4 commits intomainfrom
claude/affectionate-gauss-V78D9
Open

fix(query): surface result error text in trailing ProcessError instead of generic exit code#918
qing-ant wants to merge 4 commits intomainfrom
claude/affectionate-gauss-V78D9

Conversation

@qing-ant
Copy link
Copy Markdown
Contributor

@qing-ant qing-ant commented May 5, 2026

Relates to #913.

Problem

When the CLI emits a result message with is_error: True (e.g. subtype=error_max_turns, error_during_execution, error_max_budget_usd) it then exits with a non-zero code on purpose, for shell-script consumers. The transport's _read_messages_impl raises ProcessError on that exit code, which Query._read_messages flattens to {"type": "error"}, which receive_messages() re-raises as a bare Exceptionafter the consumer has already received the structured ResultMessage:

[msg] ResultMessage subtype=error_max_turns is_error=True num_turns=2
Exception: Command failed with exit code 1 (exit code: 1)
Error output: Check stderr output for details

The exception's message carries no information beyond "exit code 1" — see #913. The consumer can't tell whether they hit their own max_turns or a real transport crash without inspecting turn counts, and the "Check stderr output for details" is a hardcoded string, not actual stderr.

Fix

Track the error text from the most recent result(is_error=True) (result.errors joined, or subtype as fallback). If a ProcessError arrives while that text is set, replace the generic exception message with it:

Exception: Claude Code returned an error result: Reached maximum number of turns (60)

This mirrors the TypeScript SDK (Query.ts lastErrorResultText / readMessages catch), which has shipped this behavior for a while. It keeps the exception contract intact — consumers wrapping query() in try/except to detect failure still catch it — while making the message actionable instead of redundant noise.

A ProcessError raised before any result, after a success result, or after a new user turn or any other intervening message, still surfaces with its original message — those indicate genuinely unexpected failures, not the expected exit from a known error result.

Why replace instead of suppress

An earlier revision suppressed the exception entirely. That silently changed iteration semantics for query() consumers who rely on try/except Exception for failure detection — after suppression, an error_max_turns would no longer enter the except block and code after the async for would unexpectedly run. Replacing the message preserves the contract that an exception is raised, fixes the unactionable-message complaint, and stays aligned with the TypeScript SDK.

Tests

TestProcessExitAfterErrorResult (9 tests):

  • test_process_error_after_error_result_uses_result_error_text — fails on main (Command failed), passes here (Claude Code returned an error result: Reached maximum number of turns (60)).
  • test_process_error_after_error_result_falls_back_to_subtype — no errors[] → message uses subtype.
  • test_process_error_after_error_result_joins_multiple_errors — multiple errors[] entries joined with ; .
  • test_session_state_changed_after_error_result_preserves_replacement — the post-turn system: session_state_changed(idle) marker doesn't reset the flag (matches TS exclusion).
  • test_process_error_without_result_keeps_original_message, ..._after_success_result_keeps_original_message, ..._after_error_then_success_result_keeps_original, test_new_turn_after_error_result_keeps_original_message — replacement scope is bounded.
  • test_pending_control_requests_fail_fast_on_replaced_error — in-flight control requests still fail fast.
  • Full suite: 744 passed, 4 skipped.
  • ruff check / ruff format / mypy src/: clean.
  • Live e2e proof in PR comment below.

Changelog

Bug Fixes


Generated by Claude Code

When the CLI emits a result message with is_error=True (e.g.
subtype=error_max_turns, error_during_execution) it then exits with a
non-zero code. The transport raised ProcessError on that exit code,
which _read_messages converted to a {type: error} stream message and
receive_messages re-raised as a bare Exception — after the consumer
had already received the structured ResultMessage.

Track whether an error result was delivered; if so, treat the
subsequent ProcessError as expected termination and let the stream
end cleanly. ProcessError before any result, or after a success
result, still propagates.

Fixes #913.
Copy link
Copy Markdown
Contributor Author

qing-ant commented May 5, 2026

Live e2e proof

CLI 2.1.130, model claude-haiku-4-5, ClaudeAgentOptions(max_turns=1, allowed_tools=["Bash"]) with a prompt that requires 3 separate Bash calls.

e2e script
import anyio
from claude_agent_sdk import query, ClaudeAgentOptions, ResultMessage

async def main():
    opts = ClaudeAgentOptions(
        max_turns=1, model="claude-haiku-4-5",
        allowed_tools=["Bash"], permission_mode="acceptEdits",
    )
    prompt = ("Run `echo step1` with the Bash tool, then run `echo step2`, "
              "then run `echo step3`. Three separate tool calls across three turns.")
    saw_result, result_subtype, exc_caught = False, None, None
    try:
        async for msg in query(prompt=prompt, options=opts):
            print(f"[msg] type={type(msg).__name__}", end="")
            if isinstance(msg, ResultMessage):
                saw_result, result_subtype = True, msg.subtype
                print(f" subtype={msg.subtype} is_error={msg.is_error} num_turns={msg.num_turns} errors={msg.errors}")
            else:
                print()
    except Exception as e:
        exc_caught = e
        print(f"\n[EXC] {type(e).__name__}: {e}")
    print(f"\nsaw_result={saw_result} result_subtype={result_subtype} exc_caught={exc_caught!r}")

anyio.run(main)

Before (on main @ 99479bc)

[msg] type=ResultMessage subtype=error_max_turns is_error=True num_turns=2 errors=['Reached maximum number of turns (1)']

[EXC] Exception: Command failed with exit code 1 (exit code: 1)
Error output: Check stderr output for details

ResultMessage delivered, then bare Exception with a generic, unactionable message.

After (this branch)

[msg] type=ResultMessage subtype=error_max_turns is_error=True num_turns=2 errors=['Reached maximum number of turns (1)']

[EXC] Exception: Claude Code returned an error result: Reached maximum number of turns (1)

Still raises (preserving the try/except contract), but the message carries the result's actual error text — matching the TypeScript SDK's behavior.

Regression test on main

FAILED tests/test_query.py::TestProcessExitAfterErrorResult::test_process_error_after_error_result_uses_result_error_text
E               Failed: DID NOT MATCH r"Claude Code returned an error result: Reached maximum number of turns \(60\)"
E               Exception: Command failed with exit code 1 (exit code: 1)

Comment thread src/claude_agent_sdk/_internal/query.py Outdated
Comment thread src/claude_agent_sdk/_internal/query.py Outdated
…essed exit

- _got_error_result now tracks the most recent result (bool(is_error))
  rather than latching True forever, so [error result -> success result
  -> ProcessError] still propagates in long-lived ClaudeSDKClient sessions.
- Hoist the pending_control_responses fail-fast loop above the if/else so
  in-flight control requests (interrupt, set_model, ...) fail immediately
  even when the ProcessError is suppressed for the message stream.
- Add tests for both.
Comment thread src/claude_agent_sdk/_internal/query.py Outdated
A ProcessError after a prior turn's error result followed by a new user
message (turn N+1 begins, then crashes before its own result) should
propagate, not be suppressed by the stale flag from turn N.
Copy link
Copy Markdown
Contributor

@claude claude Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All prior feedback is addressed (fail-fast loop hoisted, flag tracks most-recent result, reset on new user turn — each with a regression test). Deferring final sign-off to a maintainer since this changes the public error contract for every query()/ClaudeSDKClient consumer and keys turn-boundary detection on user messages, which is a protocol assumption worth a human glance.

Extended reasoning...

Overview

Touches src/claude_agent_sdk/_internal/query.py (~20 LOC in _read_messages) and adds a 6-test class to tests/test_query.py. Introduces _got_error_result to track whether the most-recently-observed state is "CLI just emitted an is_error=True result and is expected to exit non-zero". When that holds, a subsequent ProcessError is logged at debug and the stream ends cleanly instead of re-raising a redundant bare Exception after the consumer already got the structured ResultMessage.

Since my last review the author landed 5808f22, which resets the flag on every incoming user message and adds test_process_error_after_error_result_then_new_turn_still_raises. Combined with the earlier fixes (a3364fc: fail-fast loop hoisted above the suppression branch + most-recent-result semantics), all three issues I raised are closed with tests.

Security risks

None. No auth, crypto, permissions, subprocess args, or untrusted input parsing is touched. The change only narrows the conditions under which an already-caught exception is forwarded to the message stream; pending control requests still fail fast with the underlying ProcessError.

Level of scrutiny

Moderate. The diff is small and very well tested (six cases covering the suppression matrix), but it lives in the SDK's central read loop and intentionally changes what every consumer observes after error_max_turns / error_during_execution / error_max_budget_usd: previously a trailing Exception, now clean iterator exit. That is the point of #913, but it is a behavior change to the public error contract rather than a purely internal fix.

Other factors

  • The state machine took three review iterations to converge (sticky latch → most-recent-result → reset-on-new-turn), which is a signal that the semantics are subtle enough to merit a maintainer's eye on the final shape.
  • The new-turn reset keys on msg_type == "user". user messages also carry tool results mid-turn, not only turn starts; I walked the orderings and it appears benign (tool-result user messages precede the turn's result, so they cannot land between an error result and the expected exit), but that relies on a CLI message-ordering guarantee a maintainer is better placed to confirm.
  • The PR's own e2e "before" trace shows a SystemMessage arriving after the error ResultMessage and before exit — system messages do not reset the flag, so suppression still fires there. Looks correct, just noting the protocol is not strictly "result is always last".
  • No CODEOWNERS file in the repo; full suite reported green modulo the pre-existing #905 failures.

@qing-ant qing-ant enabled auto-merge (squash) May 5, 2026 18:56
…ssing

Mirrors the TypeScript SDK (Query.ts readMessages): when the CLI exits
non-zero after emitting result(is_error=True), replace the generic
"Command failed with exit code 1" with the structured error text the CLI
already reported, instead of suppressing the exception entirely.

This preserves the try/except contract that existing consumers may rely
on (an exception is still raised), while fixing #913's "unactionable
message" complaint. Suppression silently changed iteration semantics for
consumers who detect failure via try/except around `query()`; replacing
the message keeps that contract intact and aligns the two SDKs.

Tracking flag `_got_error_result: bool` becomes
`_last_error_result_text: str | None`, holding the text from
`result.errors` (joined) or the subtype as fallback. Reset condition
broadened from `user`-only to "any non-result, non-session_state_changed
message" to match the TS SDK's `lastErrorResultText` reset.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@qing-ant qing-ant changed the title fix(query): suppress redundant ProcessError after error result fix(query): surface result error text in trailing ProcessError instead of generic exit code May 5, 2026
@qing-ant
Copy link
Copy Markdown
Contributor Author

qing-ant commented May 5, 2026

E2E Test Results

Verified the fix end-to-end with a real API call by triggering a real error_max_turns (max_turns=1 + a prompt that requires Bash tool calls), then inspecting the exception raised after the ResultMessage is yielded.

Test script (e2e_test_918.py):

"""E2E proof for PR #918: trailing ProcessError after an error result should
carry the result's error text instead of the generic "exit code 1" message.

Triggers a real `error_max_turns` by setting max_turns=1 on a prompt that
requires a tool call (so the CLI never reaches a final assistant text turn),
then inspects the exception raised after the ResultMessage is yielded.
"""

import anyio
from claude_agent_sdk import ClaudeAgentOptions, ResultMessage, query


async def main() -> None:
    options = ClaudeAgentOptions(
        max_turns=1,
        allowed_tools=["Bash"],
        permission_mode="dontAsk",
    )
    prompt = (
        "Run `echo hello` with the Bash tool, then run `echo world`, "
        "then summarize both outputs. Do not answer without running the tools."
    )

    result_msg = None
    exc = None
    try:
        with anyio.fail_after(120):
            async for msg in query(prompt=prompt, options=options):
                if isinstance(msg, ResultMessage):
                    result_msg = msg
                    print(
                        f"[result] subtype={msg.subtype} is_error={msg.is_error} "
                        f"num_turns={msg.num_turns}"
                    )
    except Exception as e:  # noqa: BLE001
        exc = e

    print()
    if result_msg is None:
        print("FAIL: no ResultMessage received")
        return
    if not result_msg.is_error:
        print("SKIP: model finished within 1 turn — rerun for an error result")
        return
    if exc is None:
        print("FAIL: no exception raised after error result")
        return

    print(f"[exception] {type(exc).__name__}: {exc}")
    print()
    if "Claude Code returned an error result" in str(exc):
        print("PASS: exception carries the result error text (PR #918 behavior)")
    elif "Command failed" in str(exc):
        print("REGRESSION: exception is the generic exit-code message (pre-#918)")
    else:
        print(f"UNEXPECTED exception text: {exc}")


anyio.run(main)

On main (04a39ac) — generic exit-code message, plus the noisy logger.error on stderr:

Fatal error in message reader: Command failed with exit code 1 (exit code: 1)
Error output: Check stderr output for details
[result] subtype=error_max_turns is_error=True num_turns=2

[exception] Exception: Command failed with exit code 1 (exit code: 1)
Error output: Check stderr output for details

REGRESSION: exception is the generic exit-code message (pre-#918)

On PR branch (1930c5e) — exception carries the result's error text, no noisy logger.error:

[result] subtype=error_max_turns is_error=True num_turns=2

[exception] Exception: Claude Code returned an error result: Reached maximum number of turns (1)

PASS: exception carries the result error text (PR #918 behavior)

Summary: Confirmed against claude CLI 2.1.129-dev — the trailing exception after an is_error=True result now carries the structured error text instead of the redundant Command failed with exit code 1, the ResultMessage is still yielded first, and the exception contract (a try/except Exception still catches it) is preserved.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant