π‘ OTel Instrumentation Improvement: Fix agent span parent hierarchy
Analysis Date: 2026-04-17
Priority: Medium
Effort: Small (< 1h)
Problem
The gh-aw.job.agent span β which measures pure AI execution latency β is emitted as a sibling of the gh-aw.job.conclusion span, not a child of it. Both spans share the same parentSpanId (the setup span's ID from GITHUB_AW_OTEL_PARENT_SPAN_ID).
This makes it impossible to answer the question "what fraction of total job time was spent on AI execution?" from a single trace waterfall view. Engineers must manually correlate two sibling spans by trace_id and compare timestamps β which is error-prone and not supported by standard dashboard widgets.
Why This Matters (DevOps Perspective)
With the current flat hierarchy, any trace waterfall (Grafana Tempo, Honeycomb, Datadog APM) renders:
setup span [setup duration]
βββ agent span [AI latency] β sibling
βββ conclusion span [total job] β sibling
This is semantically wrong: the conclusion span's time window (GITHUB_AW_OTEL_JOB_START_MS β conclusion step) contains the agent span's window (GITHUB_AW_OTEL_JOB_START_MS β agent_output.json mtime). They are not parallel operations.
After the fix, every trace would render as:
setup span [setup duration]
βββ conclusion span [total job execution]
βββ agent span [AI latency]
This directly unblocks:
- "AI latency as % of job time" β standard parent/child duration ratio, works in every backend
- "Overhead after AI finishes" β gap between
agent span.endTime and conclusion span.endTime, visible at a glance
- Accurate p95/p99 latency attribution β dashboards can split AI latency from setup/teardown without custom aggregations
- Faster failure triage β trace view immediately shows whether a slow job is due to AI or to post-processing
Current Behavior
Both spans are constructed with parentSpanId pointing to the setup span β the conclusion span ID is generated on-the-fly and never reused:
// Current: actions/setup/js/send_otlp_span.cjs (lines 836β873)
const agentPayload = buildOTLPPayload({
traceId,
spanId: generateSpanId(),
...(parentSpanId ? { parentSpanId } : {}), // parentSpanId = setup span β problem
spanName: jobName ? `gh-aw.\$\{jobName}.agent` : "gh-aw.job.agent",
startMs: agentStartMs,
endMs: agentEndMs,
...
});
const payload = buildOTLPPayload({ // conclusion span
traceId,
spanId: generateSpanId(), // ID is thrown away immediately
...(parentSpanId ? { parentSpanId } : {}), // parentSpanId = setup span β same parent
spanName,
startMs,
endMs: nowMs(),
...
});
Proposed Change
Pre-generate the conclusion span ID and thread it as the parentSpanId for the agent span:
// Proposed: actions/setup/js/send_otlp_span.cjs (replace lines ~836β873)
// Pre-generate conclusion span ID so the agent span can reference it as parent.
// This creates the correct hierarchy: setup β conclusion β agent
// instead of the current flat: setup β {agent, conclusion}.
const conclusionSpanId = generateSpanId();
if (typeof agentStartMs === "number" && agentStartMs > 0 && typeof agentEndMs === "number" && agentEndMs > agentStartMs) {
const agentSpanEvents = buildSpanEvents(agentEndMs);
const agentPayload = buildOTLPPayload({
traceId,
spanId: generateSpanId(),
parentSpanId: conclusionSpanId, // β changed: agent is nested under conclusion
spanName: jobName ? `gh-aw.\$\{jobName}.agent` : "gh-aw.job.agent",
startMs: agentStartMs,
endMs: agentEndMs,
...
});
appendToOTLPJSONL(agentPayload);
if (endpoint) {
await sendOTLPSpan(endpoint, agentPayload, { skipJSONL: true });
}
}
const payload = buildOTLPPayload({
traceId,
spanId: conclusionSpanId, // β use pre-generated ID
...(parentSpanId ? { parentSpanId } : {}), // conclusion still child of setup
spanName,
startMs,
endMs: nowMs(),
...
});
Expected Outcome
After this change:
- In Grafana Tempo / Honeycomb / Datadog: Trace waterfalls show
agent nested inside conclusion. The "AI execution %" metric becomes a first-class ratio computable from the trace view without custom queries.
- In the JSONL mirror (
/tmp/gh-aw/otel.jsonl): The agent span entry will have a parentSpanId matching the conclusion span's spanId. Engineers inspecting artifacts can immediately identify the call chain.
- For on-call engineers: A slow workflow now shows clearly whether the slowdown is in the AI call itself or in post-processing (safe-outputs, upload, etc.).
Implementation Steps
Evidence from Live Sentry Data
The Sentry MCP server was unavailable during this analysis run (sentry --help reported 0 tools). The gap was identified via static code analysis of send_otlp_span.cjs lines 836β873.
The static evidence is unambiguous: both agentPayload and payload (conclusion) pass the same parentSpanId expression ...(parentSpanId ? { parentSpanId } : {}), meaning both resolve to the setup span as their parent. There is no code path that sets the conclusion span's spanId before the agent span is built.
Related Files
actions/setup/js/send_otlp_span.cjs β primary change site (lines 836β873)
actions/setup/js/action_otlp.test.cjs β add assertion for agent span parent hierarchy
actions/setup/js/action_conclusion_otlp.cjs β no changes required
Generated by the Daily OTel Instrumentation Advisor workflow
Generated by Daily OTel Instrumentation Advisor Β· β 185.7K Β· β·
π‘ OTel Instrumentation Improvement: Fix agent span parent hierarchy
Analysis Date: 2026-04-17
Priority: Medium
Effort: Small (< 1h)
Problem
The
gh-aw.job.agentspan β which measures pure AI execution latency β is emitted as a sibling of thegh-aw.job.conclusionspan, not a child of it. Both spans share the sameparentSpanId(the setup span's ID fromGITHUB_AW_OTEL_PARENT_SPAN_ID).This makes it impossible to answer the question "what fraction of total job time was spent on AI execution?" from a single trace waterfall view. Engineers must manually correlate two sibling spans by
trace_idand compare timestamps β which is error-prone and not supported by standard dashboard widgets.Why This Matters (DevOps Perspective)
With the current flat hierarchy, any trace waterfall (Grafana Tempo, Honeycomb, Datadog APM) renders:
This is semantically wrong: the conclusion span's time window (
GITHUB_AW_OTEL_JOB_START_MSβ conclusion step) contains the agent span's window (GITHUB_AW_OTEL_JOB_START_MSβagent_output.jsonmtime). They are not parallel operations.After the fix, every trace would render as:
This directly unblocks:
agent span.endTimeandconclusion span.endTime, visible at a glanceCurrent Behavior
Both spans are constructed with
parentSpanIdpointing to the setup span β the conclusion span ID is generated on-the-fly and never reused:Proposed Change
Pre-generate the conclusion span ID and thread it as the
parentSpanIdfor the agent span:Expected Outcome
After this change:
agentnested insideconclusion. The "AI execution %" metric becomes a first-class ratio computable from the trace view without custom queries./tmp/gh-aw/otel.jsonl): The agent span entry will have aparentSpanIdmatching the conclusion span'sspanId. Engineers inspecting artifacts can immediately identify the call chain.Implementation Steps
actions/setup/js/send_otlp_span.cjsaround line 836, pre-generateconclusionSpanIdbefore theif (typeof agentStartMs === "number" ...)blockparentSpanId: conclusionSpanIdto theagentPayloadbuildOTLPPayloadcall (replacing the current...(parentSpanId ? { parentSpanId } : {}))spanId: conclusionSpanIdto thepayload(conclusion)buildOTLPPayloadcallactions/setup/js/action_otlp.test.cjsthat writes a fakeagent_output.jsonfile with a past mtime, callssendJobConclusionSpan, captures the two spans emitted, and asserts thatagentSpan.parentSpanId === conclusionSpan.spanIdcd actions/setup/js && npx vitest runto confirm tests passmake fmtto ensure formattingEvidence from Live Sentry Data
The Sentry MCP server was unavailable during this analysis run (
sentry --helpreported 0 tools). The gap was identified via static code analysis ofsend_otlp_span.cjslines 836β873.The static evidence is unambiguous: both
agentPayloadandpayload(conclusion) pass the sameparentSpanIdexpression...(parentSpanId ? { parentSpanId } : {}), meaning both resolve to the setup span as their parent. There is no code path that sets the conclusion span'sspanIdbefore the agent span is built.Related Files
actions/setup/js/send_otlp_span.cjsβ primary change site (lines 836β873)actions/setup/js/action_otlp.test.cjsβ add assertion for agent span parent hierarchyactions/setup/js/action_conclusion_otlp.cjsβ no changes requiredGenerated by the Daily OTel Instrumentation Advisor workflow