[otel-advisor] OTel improvement: make gh-aw.job.agent a child of the conclusion span, not a sibling

### 📡 OTel Instrumentation Improvement: Fix agent span parent hierarchy

**Analysis Date**: 2026-04-17
**Priority**: Medium
**Effort**: Small (< 1h)

### Problem

The `gh-aw.job.agent` span — which measures pure AI execution latency — is emitted as a **sibling** of the `gh-aw.job.conclusion` span, not a child of it. Both spans share the same `parentSpanId` (the setup span's ID from `GITHUB_AW_OTEL_PARENT_SPAN_ID`).

This makes it impossible to answer the question **"what fraction of total job time was spent on AI execution?"** from a single trace waterfall view. Engineers must manually correlate two sibling spans by `trace_id` and compare timestamps — which is error-prone and not supported by standard dashboard widgets.

### Why This Matters (DevOps Perspective)

With the current flat hierarchy, any trace waterfall (Grafana Tempo, Honeycomb, Datadog APM) renders:

```
setup span  [setup duration]
  ├── agent span  [AI latency]       ← sibling
  └── conclusion span  [total job]  ← sibling
```

This is semantically wrong: the conclusion span's time window (`GITHUB_AW_OTEL_JOB_START_MS` → conclusion step) **contains** the agent span's window (`GITHUB_AW_OTEL_JOB_START_MS` → `agent_output.json` mtime). They are not parallel operations.

After the fix, every trace would render as:

```
setup span  [setup duration]
  └── conclusion span  [total job execution]
        └── agent span  [AI latency]
```

This directly unblocks:
- **"AI latency as % of job time"** → standard parent/child duration ratio, works in every backend
- **"Overhead after AI finishes"** → gap between `agent span.endTime` and `conclusion span.endTime`, visible at a glance
- **Accurate p95/p99 latency attribution** → dashboards can split AI latency from setup/teardown without custom aggregations
- **Faster failure triage** → trace view immediately shows whether a slow job is due to AI or to post-processing

### Current Behavior

Both spans are constructed with `parentSpanId` pointing to the setup span — the conclusion span ID is generated on-the-fly and never reused:

```javascript
// Current: actions/setup/js/send_otlp_span.cjs (lines 836–873)
const agentPayload = buildOTLPPayload({
  traceId,
  spanId: generateSpanId(),
  ...(parentSpanId ? { parentSpanId } : {}),  // parentSpanId = setup span ← problem
  spanName: jobName ? `gh-aw.\$\{jobName}.agent` : "gh-aw.job.agent",
  startMs: agentStartMs,
  endMs: agentEndMs,
  ...
});

const payload = buildOTLPPayload({          // conclusion span
  traceId,
  spanId: generateSpanId(),                 // ID is thrown away immediately
  ...(parentSpanId ? { parentSpanId } : {}), // parentSpanId = setup span ← same parent
  spanName,
  startMs,
  endMs: nowMs(),
  ...
});
```

### Proposed Change

Pre-generate the conclusion span ID and thread it as the `parentSpanId` for the agent span:

```javascript
// Proposed: actions/setup/js/send_otlp_span.cjs (replace lines ~836–873)

// Pre-generate conclusion span ID so the agent span can reference it as parent.
// This creates the correct hierarchy: setup → conclusion → agent
// instead of the current flat: setup → {agent, conclusion}.
const conclusionSpanId = generateSpanId();

if (typeof agentStartMs === "number" && agentStartMs > 0 && typeof agentEndMs === "number" && agentEndMs > agentStartMs) {
  const agentSpanEvents = buildSpanEvents(agentEndMs);
  const agentPayload = buildOTLPPayload({
    traceId,
    spanId: generateSpanId(),
    parentSpanId: conclusionSpanId,   // ← changed: agent is nested under conclusion
    spanName: jobName ? `gh-aw.\$\{jobName}.agent` : "gh-aw.job.agent",
    startMs: agentStartMs,
    endMs: agentEndMs,
    ...
  });
  appendToOTLPJSONL(agentPayload);
  if (endpoint) {
    await sendOTLPSpan(endpoint, agentPayload, { skipJSONL: true });
  }
}

const payload = buildOTLPPayload({
  traceId,
  spanId: conclusionSpanId,           // ← use pre-generated ID
  ...(parentSpanId ? { parentSpanId } : {}),  // conclusion still child of setup
  spanName,
  startMs,
  endMs: nowMs(),
  ...
});
```

### Expected Outcome

After this change:

- **In Grafana Tempo / Honeycomb / Datadog**: Trace waterfalls show `agent` nested inside `conclusion`. The "AI execution %" metric becomes a first-class ratio computable from the trace view without custom queries.
- **In the JSONL mirror** (`/tmp/gh-aw/otel.jsonl`): The agent span entry will have a `parentSpanId` matching the conclusion span's `spanId`. Engineers inspecting artifacts can immediately identify the call chain.
- **For on-call engineers**: A slow workflow now shows clearly whether the slowdown is in the AI call itself or in post-processing (safe-outputs, upload, etc.).

<details>
<summary><b>Implementation Steps</b></summary>

- [ ] In `actions/setup/js/send_otlp_span.cjs` around line 836, pre-generate `conclusionSpanId` before the `if (typeof agentStartMs === "number" ...)` block
- [ ] Pass `parentSpanId: conclusionSpanId` to the `agentPayload` `buildOTLPPayload` call (replacing the current `...(parentSpanId ? { parentSpanId } : {})`)
- [ ] Pass `spanId: conclusionSpanId` to the `payload` (conclusion) `buildOTLPPayload` call
- [ ] Add a test in `actions/setup/js/action_otlp.test.cjs` that writes a fake `agent_output.json` file with a past mtime, calls `sendJobConclusionSpan`, captures the two spans emitted, and asserts that `agentSpan.parentSpanId === conclusionSpan.spanId`
- [ ] Run `cd actions/setup/js && npx vitest run` to confirm tests pass
- [ ] Run `make fmt` to ensure formatting
- [ ] Open a PR referencing this issue

</details>

### Evidence from Live Sentry Data

The Sentry MCP server was unavailable during this analysis run (`sentry --help` reported 0 tools). The gap was identified via static code analysis of `send_otlp_span.cjs` lines 836–873.

The static evidence is unambiguous: both `agentPayload` and `payload` (conclusion) pass the same `parentSpanId` expression `...(parentSpanId ? { parentSpanId } : {})`, meaning both resolve to the setup span as their parent. There is no code path that sets the conclusion span's `spanId` before the agent span is built.

### Related Files

- `actions/setup/js/send_otlp_span.cjs` — primary change site (lines 836–873)
- `actions/setup/js/action_otlp.test.cjs` — add assertion for agent span parent hierarchy
- `actions/setup/js/action_conclusion_otlp.cjs` — no changes required

---

*Generated by the [Daily OTel Instrumentation Advisor](https://github.com/github/gh-aw/actions/runs/24587284838) workflow*







> Generated by [Daily OTel Instrumentation Advisor](https://github.com/github/gh-aw/actions/runs/24587284838/agentic_workflow) · ● 185.7K · [◷](https://github.com/search?q=repo%3Agithub%2Fgh-aw+is%3Aissue+%22gh-aw-workflow-call-id%3A+github%2Fgh-aw%2Fdaily-otel-instrumentation-advisor%22&type=issues)
> - [x] expires  on Apr 24, 2026, 9:29 PM UTC

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[otel-advisor] OTel improvement: make gh-aw.job.agent a child of the conclusion span, not a sibling #26941

📡 OTel Instrumentation Improvement: Fix agent span parent hierarchy

Problem

Why This Matters (DevOps Perspective)

Current Behavior

Proposed Change

Expected Outcome

Evidence from Live Sentry Data

Related Files

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

[otel-advisor] OTel improvement: make gh-aw.job.agent a child of the conclusion span, not a sibling #26941

Description

📡 OTel Instrumentation Improvement: Fix agent span parent hierarchy

Problem

Why This Matters (DevOps Perspective)

Current Behavior

Proposed Change

Expected Outcome

Evidence from Live Sentry Data

Related Files

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions