Skip to content

fix(codex-rescue): require Bash call unconditionally to stop fabricated forwarder output#319

Open
mrlitong wants to merge 1 commit into
openai:mainfrom
mrlitong:fix/codex-rescue-thin-forwarder-bash-call
Open

fix(codex-rescue): require Bash call unconditionally to stop fabricated forwarder output#319
mrlitong wants to merge 1 commit into
openai:mainfrom
mrlitong:fix/codex-rescue-thin-forwarder-bash-call

Conversation

@mrlitong
Copy link
Copy Markdown

@mrlitong mrlitong commented May 12, 2026

Summary

The codex-rescue subagent is meant to be a thin Bash forwarder around codex-companion task, but in practice the Sonnet subagent sometimes skips the Bash call entirely and produces a fabricated Codex-style reply on its own — most reproducibly when the request looks conversational, when --resume is used with a short question, or when the user prompt offers a fallback hint such as if you have no context, say "no context".

This PR tightens the agent prompt so the subagent must always invoke Bash, regardless of how trivial the request appears.

Repro (before this patch)

  1. Round 1 — start a fresh thread and ask Codex to memorize two tokens:
    /codex:rescue --fresh --wait Please remember tokenA=purple-unicorn-42 and tokenB=kStarlightCheckpoint, and reply "memorized".
    
    Subagent forwards correctly, Bash call happens, Codex replies "memorized".
  2. Round 2 — ask back through --resume, but include a fallback hint:
    /codex:rescue --resume --wait What are tokenA and tokenB you remembered? If you have no context, just say "no context".
    
    Subagent returns no context. with tool_uses: 0no Bash call was made. The reply is fabricated by the subagent, not by Codex.
  3. Round 3 — same Round 2 prompt phrased differently (any conversational phrasing): the subagent again skips Bash and self-answers.

Calling codex-companion task --resume-last directly from Bash in the same Claude session works perfectly and returns both tokens — proving the resume plumbing itself is fine, and the issue is entirely in the subagent's forwarding behavior.

Root cause

Two things in the existing agent definition combine to let the subagent rationalize skipping the Bash call:

  1. The Selection guidance block contains "Do not grab simple asks that the main Claude thread can finish quickly on its own." This rule is intended for the caller deciding whether to invoke this subagent, but the subagent reads it as advice to itself — and if the current request looks simple, it decides not to "really" forward.
  2. The Forwarding rules block enumerates many specific dos and don'ts but never states the underlying invariant: every invocation must result in exactly one Bash call. Under prompt pressure (especially fallback-hint phrasing like "if X, say Y"), the model treats the fallback as an allowed shortcut.

Fix

Three small prompt changes in plugins/codex/agents/codex-rescue.md:

  1. Add a clarifying sentence at the end of Selection guidance stating that those rules apply to the caller, not to this subagent after it has already been invoked.
  2. Add an explicit invariant at the top of Forwarding rules: the Bash task call MUST happen exactly once for every invocation, and the subagent must never answer from its own knowledge, fabricate Codex-style replies, claim absence of context on Codex's behalf, or decide a request is "too simple to forward".
  3. Tighten the existing "return nothing on Bash failure" line to also forbid returning any text without first having called Bash.

No code, schema, or test changes. The patch is prompt-only and touches a single Markdown file.

Verification

With patch applied, the same Round 2 / Round 3 style queries trigger the Bash call and return the correct resumed context (3/3 natural-phrasing queries succeeded with tool_uses: 1). Without the patch, the same queries return fabricated no context. with tool_uses: 0.

Known limitation

Prompts that explicitly hand the subagent a fallback answer (e.g. "if you have no context, just say 'no context'") can still occasionally bypass the rule — this is a prompt-injection-shaped edge case that's hard to eliminate purely through prompt tightening. In normal usage (engineering tasks, plain follow-up questions), the patch is sufficient. Callers should avoid embedding fallback-answer hints when they want guaranteed forwarding.

Test plan

  • CI green
  • Manual: /codex:rescue --fresh + /codex:rescue --resume round trip with a short follow-up question returns real Codex output, not a fabricated reply

🤖 Generated with Claude Code

…bricated forwarder output

The codex-rescue subagent is meant to be a thin Bash forwarder around
codex-companion task, but in practice the Sonnet subagent sometimes
skips the Bash call entirely and produces a fabricated Codex-style reply
(e.g. "no context available") on its own. This is most reproducible when
the request looks conversational/trivial, when --resume is used with a
short question, or when the prompt includes a fallback hint such as
'if you have no context, say "no context"'.

Root cause appears to be the Selection guidance block being read by the
subagent itself as license to skip forwarding when the request seems
simple, combined with the absence of an explicit "you must always call
Bash" invariant in the Forwarding rules.

This patch makes three prompt changes to the agent definition:

1. Clarify that the Selection guidance applies to the caller deciding
   whether to invoke this subagent, not to this subagent self-judging
   after invocation.
2. Add an explicit top-of-list rule in Forwarding rules: the Bash task
   call MUST happen exactly once for every invocation, regardless of how
   trivial the request appears, and the subagent must never answer from
   its own knowledge or claim absence of context on Codex's behalf.
3. Tighten the "return nothing on Bash failure" line to also forbid
   returning any text without first having called Bash.

Verified locally: with the patch applied, three natural-language
--resume queries in a row all triggered the Bash call and returned the
correct resumed context (3/3); without the patch, the same queries
produced fabricated "no context" replies with zero Bash invocations.
@mrlitong mrlitong requested a review from a team May 12, 2026 14:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant