I've now hit the same class of bug — #aw_X references rendering as literal text in user-facing artifacts — four times across three different gh-aw versions, in different shapes each time. Each time we've shipped a "fix" (a prompt change, a schema rev, a workaround), and each time the bug has come back in a slightly different form weeks later because the underlying contract isn't enforced anywhere in the pipeline. I'd like to talk about the structural problem rather than file yet another point fix.
The bug pattern, abstracted
A workflow's prompt instructs the agent to:
- Call
create_* with temporary_id: aw_thingN.
- Reference
#aw_thingN in a later safe-output body (PR body, comment, results-issue body, tracker update, …).
The framework resolves #aw_thingN cross-step if and only if both halves of the contract were satisfied. When either half fails, the framework silently publishes a body containing the literal text #aw_thingN, the team-facing artifact is broken, and nobody notices until a human reads it.
Four ways I've watched this manifest
- v0.74.4-era —
create_pull_request registered the temp ID under a different field name than the registration step expected. Every #aw_prN ref rendered literal regardless of what the agent did. Workaround: stop using #aw_prN entirely, fall back to PR-search-by-branch URLs.
- v0.76.1 — registration shape fixed. We removed the search-URL workaround and went back to
#aw_prN. Worked in testing.
- v0.76.1+, ongoing — turns out the agent silently omits
temporary_id from create_pull_request calls (the field is optional in the schema, the MCP server accepts the call, the framework auto-assigns a random aw_<8hex> id and registers that in the temp-id map). Every #aw_prN ref in downstream bodies renders literal — but now the failure looks identical to v0.74.4, even though the underlying cause is completely different. Took us ~6 days of degraded NA-path comments to notice.
- Asymmetric resolution across types —
update_issue bodies never run the substitution pass even when the agent does set temp_ids correctly. We've documented this in our prompt as a footgun, but it bites prompt authors every few weeks.
These look like four different bugs. They're really the same bug: the contract is enforced nowhere.
Why prompt-tightening hasn't worked
We've tried three escalating rounds of prompt language ("set the temp_id", "🛑 mandatory", "this is required so the next step works"). The reliable signal models latch onto isn't severity language — it's a concrete downstream consequence. For create_issue the model reliably sets temporary_id: aw_issue1 because the prompt cites a real consequence (Closes #aw_issue1 won't resolve). For create_pull_request there's no equivalent consequence the model fixates on, and it drops the field. Different model + prompt combinations will land at different failure rates here, so the bug class is just lying in wait everywhere prompts use #aw_X refs.
What a structural fix would look like (in order of preference)
1. Frontmatter-declared required temp_ids
safe-outputs:
create-pull-request:
max: 2
require-temporary-id: true # NEW
create-issue:
require-temporary-id: true # NEW
When this flag is set, the safe-outputs MCP server rejects calls that omit temporary_id, with an actionable error returned to the agent (so it can retry). This converts the silent contract into a hard one and pushes the failure mode left — into the agent's own retry loop, where the agent can self-correct.
2. Symmetric #aw_X resolution across all body-containing safe-outputs
Today: create_issue, create_pull_request, create_discussion, add_comment resolve #aw_X refs in their bodies. update_issue and update_discussion do not. This asymmetry is a footgun every prompt author hits — they reasonably assume "if it resolves in add_comment, it resolves in update_issue". Please make resolution uniform across every safe-output that takes a body.
3. Fail-loud on unresolved #aw_X references after substitution
When the framework finishes the substitution pass and the body still contains a string matching #aw_[a-zA-Z0-9_]+, that should be a workflow error (configurable strict/warn). Today the framework happily publishes whatever's left. A simple post-substitution regex would catch every instance of this bug class — including any future variant we haven't thought of yet.
4. Validation report in run artifacts
A temp-id-resolution.json (or section in an existing artifact) summarising:
- Every
temporary_id registered (with the call type + resolved number)
- Every
#aw_X reference seen in any body
- Resolved vs. unresolved counts, with file/line of each unresolved ref
This wouldn't fix the bug but it would make debugging trivial — today we have to download artifacts, grep safeoutputs.jsonl, cross-reference temporary-id-map.json, and reconstruct intent.
5. Reverse-aliasing for auto-assigned temp_ids
When a create_* call lacks temporary_id and the framework auto-assigns one (e.g. aw_wppbppt6), expose a way for the prompt author to alias expected names to the auto-assigned id. For example, a frontmatter directive like:
safe-outputs:
create-pull-request:
max: 2
auto-assign-names: ["aw_pr1", "aw_pr2"] # NEW — first call → aw_pr1, second → aw_pr2
This is the lowest-leverage of the five but the cheapest belt-and-suspenders.
What I'm asking for
Realistically, (1) and (3) together would close every variant of this bug class I've seen, retroactively and prospectively. (2) would close a known sharp edge. The others are nice-to-haves.
Versions affected
The current shape (point 3 above) is reproducible on gh-aw v0.77.5. Prior variants have surfaced under v0.74.4 and v0.76.1. The asymmetric-resolution variant (point 2) is documented in v0.77.5 reference material as a known limitation.
I've now hit the same class of bug —
#aw_Xreferences rendering as literal text in user-facing artifacts — four times across three different gh-aw versions, in different shapes each time. Each time we've shipped a "fix" (a prompt change, a schema rev, a workaround), and each time the bug has come back in a slightly different form weeks later because the underlying contract isn't enforced anywhere in the pipeline. I'd like to talk about the structural problem rather than file yet another point fix.The bug pattern, abstracted
A workflow's prompt instructs the agent to:
create_*withtemporary_id: aw_thingN.#aw_thingNin a later safe-output body (PR body, comment, results-issue body, tracker update, …).The framework resolves
#aw_thingNcross-step if and only if both halves of the contract were satisfied. When either half fails, the framework silently publishes a body containing the literal text#aw_thingN, the team-facing artifact is broken, and nobody notices until a human reads it.Four ways I've watched this manifest
create_pull_requestregistered the temp ID under a different field name than the registration step expected. Every#aw_prNref rendered literal regardless of what the agent did. Workaround: stop using#aw_prNentirely, fall back to PR-search-by-branch URLs.#aw_prN. Worked in testing.temporary_idfromcreate_pull_requestcalls (the field is optional in the schema, the MCP server accepts the call, the framework auto-assigns a randomaw_<8hex>id and registers that in the temp-id map). Every#aw_prNref in downstream bodies renders literal — but now the failure looks identical to v0.74.4, even though the underlying cause is completely different. Took us ~6 days of degraded NA-path comments to notice.update_issuebodies never run the substitution pass even when the agent does set temp_ids correctly. We've documented this in our prompt as a footgun, but it bites prompt authors every few weeks.These look like four different bugs. They're really the same bug: the contract is enforced nowhere.
Why prompt-tightening hasn't worked
We've tried three escalating rounds of prompt language ("set the temp_id", "🛑 mandatory", "this is required so the next step works"). The reliable signal models latch onto isn't severity language — it's a concrete downstream consequence. For
create_issuethe model reliably setstemporary_id: aw_issue1because the prompt cites a real consequence (Closes #aw_issue1won't resolve). Forcreate_pull_requestthere's no equivalent consequence the model fixates on, and it drops the field. Different model + prompt combinations will land at different failure rates here, so the bug class is just lying in wait everywhere prompts use#aw_Xrefs.What a structural fix would look like (in order of preference)
1. Frontmatter-declared required temp_ids
When this flag is set, the safe-outputs MCP server rejects calls that omit
temporary_id, with an actionable error returned to the agent (so it can retry). This converts the silent contract into a hard one and pushes the failure mode left — into the agent's own retry loop, where the agent can self-correct.2. Symmetric
#aw_Xresolution across all body-containing safe-outputsToday:
create_issue,create_pull_request,create_discussion,add_commentresolve#aw_Xrefs in their bodies.update_issueandupdate_discussiondo not. This asymmetry is a footgun every prompt author hits — they reasonably assume "if it resolves inadd_comment, it resolves inupdate_issue". Please make resolution uniform across every safe-output that takes a body.3. Fail-loud on unresolved
#aw_Xreferences after substitutionWhen the framework finishes the substitution pass and the body still contains a string matching
#aw_[a-zA-Z0-9_]+, that should be a workflow error (configurable strict/warn). Today the framework happily publishes whatever's left. A simple post-substitution regex would catch every instance of this bug class — including any future variant we haven't thought of yet.4. Validation report in run artifacts
A
temp-id-resolution.json(or section in an existing artifact) summarising:temporary_idregistered (with the call type + resolved number)#aw_Xreference seen in any bodyThis wouldn't fix the bug but it would make debugging trivial — today we have to download artifacts, grep
safeoutputs.jsonl, cross-referencetemporary-id-map.json, and reconstruct intent.5. Reverse-aliasing for auto-assigned temp_ids
When a
create_*call lackstemporary_idand the framework auto-assigns one (e.g.aw_wppbppt6), expose a way for the prompt author to alias expected names to the auto-assigned id. For example, a frontmatter directive like:This is the lowest-leverage of the five but the cheapest belt-and-suspenders.
What I'm asking for
Realistically, (1) and (3) together would close every variant of this bug class I've seen, retroactively and prospectively. (2) would close a known sharp edge. The others are nice-to-haves.
Versions affected
The current shape (point 3 above) is reproducible on gh-aw
v0.77.5. Prior variants have surfaced underv0.74.4andv0.76.1. The asymmetric-resolution variant (point 2) is documented in v0.77.5 reference material as a known limitation.