Skip to content

[otel-advisor] OTel improvement: make gh-aw.job.agent a child of the conclusion span, not a siblingΒ #26941

@github-actions

Description

@github-actions

πŸ“‘ OTel Instrumentation Improvement: Fix agent span parent hierarchy

Analysis Date: 2026-04-17
Priority: Medium
Effort: Small (< 1h)

Problem

The gh-aw.job.agent span β€” which measures pure AI execution latency β€” is emitted as a sibling of the gh-aw.job.conclusion span, not a child of it. Both spans share the same parentSpanId (the setup span's ID from GITHUB_AW_OTEL_PARENT_SPAN_ID).

This makes it impossible to answer the question "what fraction of total job time was spent on AI execution?" from a single trace waterfall view. Engineers must manually correlate two sibling spans by trace_id and compare timestamps β€” which is error-prone and not supported by standard dashboard widgets.

Why This Matters (DevOps Perspective)

With the current flat hierarchy, any trace waterfall (Grafana Tempo, Honeycomb, Datadog APM) renders:

setup span  [setup duration]
  β”œβ”€β”€ agent span  [AI latency]       ← sibling
  └── conclusion span  [total job]  ← sibling

This is semantically wrong: the conclusion span's time window (GITHUB_AW_OTEL_JOB_START_MS β†’ conclusion step) contains the agent span's window (GITHUB_AW_OTEL_JOB_START_MS β†’ agent_output.json mtime). They are not parallel operations.

After the fix, every trace would render as:

setup span  [setup duration]
  └── conclusion span  [total job execution]
        └── agent span  [AI latency]

This directly unblocks:

  • "AI latency as % of job time" β†’ standard parent/child duration ratio, works in every backend
  • "Overhead after AI finishes" β†’ gap between agent span.endTime and conclusion span.endTime, visible at a glance
  • Accurate p95/p99 latency attribution β†’ dashboards can split AI latency from setup/teardown without custom aggregations
  • Faster failure triage β†’ trace view immediately shows whether a slow job is due to AI or to post-processing

Current Behavior

Both spans are constructed with parentSpanId pointing to the setup span β€” the conclusion span ID is generated on-the-fly and never reused:

// Current: actions/setup/js/send_otlp_span.cjs (lines 836–873)
const agentPayload = buildOTLPPayload({
  traceId,
  spanId: generateSpanId(),
  ...(parentSpanId ? { parentSpanId } : {}),  // parentSpanId = setup span ← problem
  spanName: jobName ? `gh-aw.\$\{jobName}.agent` : "gh-aw.job.agent",
  startMs: agentStartMs,
  endMs: agentEndMs,
  ...
});

const payload = buildOTLPPayload({          // conclusion span
  traceId,
  spanId: generateSpanId(),                 // ID is thrown away immediately
  ...(parentSpanId ? { parentSpanId } : {}), // parentSpanId = setup span ← same parent
  spanName,
  startMs,
  endMs: nowMs(),
  ...
});

Proposed Change

Pre-generate the conclusion span ID and thread it as the parentSpanId for the agent span:

// Proposed: actions/setup/js/send_otlp_span.cjs (replace lines ~836–873)

// Pre-generate conclusion span ID so the agent span can reference it as parent.
// This creates the correct hierarchy: setup β†’ conclusion β†’ agent
// instead of the current flat: setup β†’ {agent, conclusion}.
const conclusionSpanId = generateSpanId();

if (typeof agentStartMs === "number" && agentStartMs > 0 && typeof agentEndMs === "number" && agentEndMs > agentStartMs) {
  const agentSpanEvents = buildSpanEvents(agentEndMs);
  const agentPayload = buildOTLPPayload({
    traceId,
    spanId: generateSpanId(),
    parentSpanId: conclusionSpanId,   // ← changed: agent is nested under conclusion
    spanName: jobName ? `gh-aw.\$\{jobName}.agent` : "gh-aw.job.agent",
    startMs: agentStartMs,
    endMs: agentEndMs,
    ...
  });
  appendToOTLPJSONL(agentPayload);
  if (endpoint) {
    await sendOTLPSpan(endpoint, agentPayload, { skipJSONL: true });
  }
}

const payload = buildOTLPPayload({
  traceId,
  spanId: conclusionSpanId,           // ← use pre-generated ID
  ...(parentSpanId ? { parentSpanId } : {}),  // conclusion still child of setup
  spanName,
  startMs,
  endMs: nowMs(),
  ...
});

Expected Outcome

After this change:

  • In Grafana Tempo / Honeycomb / Datadog: Trace waterfalls show agent nested inside conclusion. The "AI execution %" metric becomes a first-class ratio computable from the trace view without custom queries.
  • In the JSONL mirror (/tmp/gh-aw/otel.jsonl): The agent span entry will have a parentSpanId matching the conclusion span's spanId. Engineers inspecting artifacts can immediately identify the call chain.
  • For on-call engineers: A slow workflow now shows clearly whether the slowdown is in the AI call itself or in post-processing (safe-outputs, upload, etc.).
Implementation Steps
  • In actions/setup/js/send_otlp_span.cjs around line 836, pre-generate conclusionSpanId before the if (typeof agentStartMs === "number" ...) block
  • Pass parentSpanId: conclusionSpanId to the agentPayload buildOTLPPayload call (replacing the current ...(parentSpanId ? { parentSpanId } : {}))
  • Pass spanId: conclusionSpanId to the payload (conclusion) buildOTLPPayload call
  • Add a test in actions/setup/js/action_otlp.test.cjs that writes a fake agent_output.json file with a past mtime, calls sendJobConclusionSpan, captures the two spans emitted, and asserts that agentSpan.parentSpanId === conclusionSpan.spanId
  • Run cd actions/setup/js && npx vitest run to confirm tests pass
  • Run make fmt to ensure formatting
  • Open a PR referencing this issue

Evidence from Live Sentry Data

The Sentry MCP server was unavailable during this analysis run (sentry --help reported 0 tools). The gap was identified via static code analysis of send_otlp_span.cjs lines 836–873.

The static evidence is unambiguous: both agentPayload and payload (conclusion) pass the same parentSpanId expression ...(parentSpanId ? { parentSpanId } : {}), meaning both resolve to the setup span as their parent. There is no code path that sets the conclusion span's spanId before the agent span is built.

Related Files

  • actions/setup/js/send_otlp_span.cjs β€” primary change site (lines 836–873)
  • actions/setup/js/action_otlp.test.cjs β€” add assertion for agent span parent hierarchy
  • actions/setup/js/action_conclusion_otlp.cjs β€” no changes required

Generated by the Daily OTel Instrumentation Advisor workflow

Generated by Daily OTel Instrumentation Advisor Β· ● 185.7K Β· β—·

  • expires on Apr 24, 2026, 9:29 PM UTC

Metadata

Metadata

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions