Skip to content

Temp-ID resolution is structurally fragile #36969

@corygehr

Description

@corygehr

I've now hit the same class of bug — #aw_X references rendering as literal text in user-facing artifacts — four times across three different gh-aw versions, in different shapes each time. Each time we've shipped a "fix" (a prompt change, a schema rev, a workaround), and each time the bug has come back in a slightly different form weeks later because the underlying contract isn't enforced anywhere in the pipeline. I'd like to talk about the structural problem rather than file yet another point fix.

The bug pattern, abstracted

A workflow's prompt instructs the agent to:

  1. Call create_* with temporary_id: aw_thingN.
  2. Reference #aw_thingN in a later safe-output body (PR body, comment, results-issue body, tracker update, …).

The framework resolves #aw_thingN cross-step if and only if both halves of the contract were satisfied. When either half fails, the framework silently publishes a body containing the literal text #aw_thingN, the team-facing artifact is broken, and nobody notices until a human reads it.

Four ways I've watched this manifest

  1. v0.74.4-eracreate_pull_request registered the temp ID under a different field name than the registration step expected. Every #aw_prN ref rendered literal regardless of what the agent did. Workaround: stop using #aw_prN entirely, fall back to PR-search-by-branch URLs.
  2. v0.76.1 — registration shape fixed. We removed the search-URL workaround and went back to #aw_prN. Worked in testing.
  3. v0.76.1+, ongoing — turns out the agent silently omits temporary_id from create_pull_request calls (the field is optional in the schema, the MCP server accepts the call, the framework auto-assigns a random aw_<8hex> id and registers that in the temp-id map). Every #aw_prN ref in downstream bodies renders literal — but now the failure looks identical to v0.74.4, even though the underlying cause is completely different. Took us ~6 days of degraded NA-path comments to notice.
  4. Asymmetric resolution across typesupdate_issue bodies never run the substitution pass even when the agent does set temp_ids correctly. We've documented this in our prompt as a footgun, but it bites prompt authors every few weeks.

These look like four different bugs. They're really the same bug: the contract is enforced nowhere.

Why prompt-tightening hasn't worked

We've tried three escalating rounds of prompt language ("set the temp_id", "🛑 mandatory", "this is required so the next step works"). The reliable signal models latch onto isn't severity language — it's a concrete downstream consequence. For create_issue the model reliably sets temporary_id: aw_issue1 because the prompt cites a real consequence (Closes #aw_issue1 won't resolve). For create_pull_request there's no equivalent consequence the model fixates on, and it drops the field. Different model + prompt combinations will land at different failure rates here, so the bug class is just lying in wait everywhere prompts use #aw_X refs.

What a structural fix would look like (in order of preference)

1. Frontmatter-declared required temp_ids

safe-outputs:
  create-pull-request:
    max: 2
    require-temporary-id: true   # NEW
  create-issue:
    require-temporary-id: true   # NEW

When this flag is set, the safe-outputs MCP server rejects calls that omit temporary_id, with an actionable error returned to the agent (so it can retry). This converts the silent contract into a hard one and pushes the failure mode left — into the agent's own retry loop, where the agent can self-correct.

2. Symmetric #aw_X resolution across all body-containing safe-outputs

Today: create_issue, create_pull_request, create_discussion, add_comment resolve #aw_X refs in their bodies. update_issue and update_discussion do not. This asymmetry is a footgun every prompt author hits — they reasonably assume "if it resolves in add_comment, it resolves in update_issue". Please make resolution uniform across every safe-output that takes a body.

3. Fail-loud on unresolved #aw_X references after substitution

When the framework finishes the substitution pass and the body still contains a string matching #aw_[a-zA-Z0-9_]+, that should be a workflow error (configurable strict/warn). Today the framework happily publishes whatever's left. A simple post-substitution regex would catch every instance of this bug class — including any future variant we haven't thought of yet.

4. Validation report in run artifacts

A temp-id-resolution.json (or section in an existing artifact) summarising:

  • Every temporary_id registered (with the call type + resolved number)
  • Every #aw_X reference seen in any body
  • Resolved vs. unresolved counts, with file/line of each unresolved ref

This wouldn't fix the bug but it would make debugging trivial — today we have to download artifacts, grep safeoutputs.jsonl, cross-reference temporary-id-map.json, and reconstruct intent.

5. Reverse-aliasing for auto-assigned temp_ids

When a create_* call lacks temporary_id and the framework auto-assigns one (e.g. aw_wppbppt6), expose a way for the prompt author to alias expected names to the auto-assigned id. For example, a frontmatter directive like:

safe-outputs:
  create-pull-request:
    max: 2
    auto-assign-names: ["aw_pr1", "aw_pr2"]  # NEW — first call → aw_pr1, second → aw_pr2

This is the lowest-leverage of the five but the cheapest belt-and-suspenders.

What I'm asking for

Realistically, (1) and (3) together would close every variant of this bug class I've seen, retroactively and prospectively. (2) would close a known sharp edge. The others are nice-to-haves.

Versions affected

The current shape (point 3 above) is reproducible on gh-aw v0.77.5. Prior variants have surfaced under v0.74.4 and v0.76.1. The asymmetric-resolution variant (point 2) is documented in v0.77.5 reference material as a known limitation.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions