Skip to content

feat(otel): honor capture_content + redact on span attributes (closes #130)#154

Open
initializ-mk wants to merge 2 commits into
mainfrom
feat/issue-130-otel-content-capture
Open

feat(otel): honor capture_content + redact on span attributes (closes #130)#154
initializ-mk wants to merge 2 commits into
mainfrom
feat/issue-130-otel-content-capture

Conversation

@initializ-mk

Copy link
Copy Markdown
Contributor

Closes #130.

Summary

Phase 3 of the OTel Tracing v1 initiative (#108) shipped span instrumentation but metadata-only. The `capture_content` + `redact` knobs in `forge.yaml` were plumbed by Phase 2 but never consumed — an operator who set `capture_content: true` got metadata-only spans and no error.

This PR closes the gap. When `observability.tracing.capture_content: true` is set, the `llm.completion` and `tool.` spans stamp the prompt / completion / tool I/O as attributes, passed through a redact-then-truncate pipeline that mirrors what the audit payload-capture path will use.

Attribute keys added

Knob Span Keys added
`capture_content: true` `llm.completion` `gen_ai.prompt` (JSON-serialized inbound messages), `gen_ai.completion` (response text) — OTel GenAI semconv
`capture_content: true` `tool.` `forge.tool.args`, `forge.tool.result`

Default posture (no opt-in) preserved: the keys are absent from spans — not empty string. Backends that look at "is the key present?" can distinguish "metadata-only by default" from "operator opted in but the field happened to be empty." Empty completion / args / result skips stamping for the same reason.

Redact + truncate pipeline

`PrepareSpanContent(s, redact, maxBytes)`:

  • When `redact=true` (the default with capture): scrubs the same vendor secret-token shapes the runtime guardrails CustomRule defaults cover — Anthropic `sk-ant-…`, OpenAI `sk-…`, GitHub `ghp_/gho_/ghs_/github_pat_…`, AWS `AKIA…`, Slack `xoxb-/xoxp-…`, RSA/EC/OPENSSH/PRIVATE key blocks, Telegram bot tokens. Matched values become `[REDACTED]`.
  • `redact=false` is the enterprise raw-capture path — content stamped verbatim, byte cap still applied.
  • Byte-capped at 4 KiB (below the 5 KiB soft attribute-length limit most observability backends apply). The truncation marker (`…[truncated:N]` where N = original byte length) is byte-identical to what `AuditPayloadCapture.TruncateForAudit` emits for the same input. Operators grepping `[truncated:` or `[REDACTED]` across span attributes and audit rows see aligned output.

Ordering matters: redact runs before truncate so a secret straddling the cap boundary can never survive in the truncated tail. Pinned by `TestPrepareSpanContent_RedactThenTruncate`.

Files

  • `forge-core/runtime/content_redact.go` — `RedactSecrets`, `PrepareSpanContent`, `serializeChatMessages`, plus the pattern set
  • `forge-core/runtime/content_redact_test.go` — 11 unit tests
  • `forge-core/runtime/loop_spans_content_test.go` — 8 integration tests covering all four content sites + cross-pipeline marker parity
  • `forge-core/runtime/loop.go` — `LLMExecutorConfig.TracingConfig` + `LLMExecutor.tracingCfg` + conditional attribute stamping on the two span sites
  • `forge-core/observability/attrs.go` — 4 new attribute constants
  • `forge-cli/runtime/runner.go` — passes the already-resolved `tracingCfg` into the executor config
  • `docs/core-concepts/observability-tracing.md` — § Phase 3 is metadata-only → § Span content capture with attribute table
  • `docs/security/audit-logging.md` — § Trace cross-link gains a § Content-capture parity subsection
  • `.claude/skills/forge.md` — § 12.9 caveat replaced; yaml example comment updated

Test plan

  • gofmt clean; golangci-lint 0 issues
  • full `go test ./...` from `forge-core` and `forge-cli` — all green
  • 11 unit tests + 8 integration tests in this PR pass
  • redact-then-truncate ordering pinned by a regression test
  • cross-pipeline marker parity pinned by a regression test that calls both `PrepareSpanContent` and `TruncateForAudit` with the same input and asserts byte equality
  • absent-attribute contract checked: `CaptureContent=false` + a prompt that would otherwise stamp content → no `gen_ai.prompt` / `gen_ai.completion` / `forge.tool.args` / `forge.tool.result` keys

Out of scope

  • New attribute keys beyond the four above (`gen_ai.system_instructions`, per-field tool-args decomposition) — pick up in a follow-up if operators ask.
  • Sampling-aware capture ("only capture content on dropped traces") — the metadata-only default already handles the storage-cost concern.
  • Audit-side payload capture is unchanged. The redactor lives in forge-core and is currently only called from the OTel path; the audit path can call `PrepareSpanContent` / `RedactSecrets` in a future change if/when capture moves to that pipeline too.

…130)

Phase 3 of the OTel Tracing v1 initiative (#108, PR #125) shipped
span instrumentation across the executor loop and tool calls but
kept it metadata-only — span attributes carried provider, model,
usage tokens, finish reasons, but no prompt / completion / tool I/O
text. Phase 2 (#103, PR #124) plumbed two operator-facing knobs
(`capture_content`, `redact`) through the config schema. The runtime
never read them. An operator who set `capture_content: true` got
metadata-only spans and no error — the worst kind of config: load-
bearing-looking, silently inert.

This commit closes that gap.

What lands

1. forge-core/runtime/content_redact.go — new package-internal
   helpers:
   - RedactSecrets scrubs known vendor secret-token shapes (Anthropic
     sk-ant, OpenAI sk-, GitHub ghp_/gho_/ghs_/github_pat_, AWS AKIA,
     Slack xoxb/xoxp, RSA/EC/OPENSSH/PRIVATE key blocks, Telegram bot
     tokens). Patterns mirror the runtime guardrails CustomRule
     defaults in forge-cli/runtime/guardrails_loader.go's
     DefaultStructuredGuardrails — the two should evolve together.
   - PrepareSpanContent runs the redact-then-truncate pipeline for
     content destined for OTel span attributes. Cap defaults to 4 KiB
     (below the 5 KiB soft attribute-length limit most backends
     apply). Reuses the audit pipeline's TruncateForAudit so the
     `…[truncated:N]` marker is byte-identical to what
     AuditPayloadCapture emits for the same input.

2. forge-core/observability/attrs.go — four new attribute constants:
   - AttrGenAIPrompt = "gen_ai.prompt"           // OTel GenAI semconv
   - AttrGenAICompletion = "gen_ai.completion"   // OTel GenAI semconv
   - AttrForgeToolArgs = "forge.tool.args"
   - AttrForgeToolResult = "forge.tool.result"
   Stripped the "Phase 3 metadata-only" callout from the
   forge.tool.* group.

3. forge-core/runtime/loop.go — adds:
   - LLMExecutorConfig.TracingConfig (consumed by Phase 3 sites)
   - LLMExecutor.tracingCfg field
   - Conditional attribute stamping on the llm.completion span
     (`gen_ai.prompt` before Chat(), `gen_ai.completion` after success)
     and the tool.<name> span (`forge.tool.args` before Execute(),
     `forge.tool.result` after).

4. forge-cli/runtime/runner.go — populates LLMExecutorConfig.
   TracingConfig from the already-resolved tracingCfg the cli also
   passes to NewTracerProvider. Zero plumbing additions; just wires
   the existing field through.

Cross-pipeline parity

The four content attributes pass through the same redact-then-
truncate helper as the (existing) audit payload-capture path. An
operator who sees a `[REDACTED]` marker in an audit row sees the
same marker on the linked span; the same goes for `…[truncated:N]`.
Vendor pattern parity with the guardrails defaults is enforced by
convention (and called out in the doc updates).

Default posture preserved

CaptureContent=false (the zero-value default) means the four content
attributes are absent from spans — not set to empty string. Backends
that gate dashboards on "is this key present?" can distinguish
"metadata-only by default" from "operator opted in but the field
happened to be empty." Empty content (e.g. tool-call-only assistant
turn → no completion text) likewise skips stamping.

Tests

- 11 unit tests in content_redact_test.go cover RedactSecrets per
  vendor pattern, PrepareSpanContent ordering invariant (redact
  before truncate so a secret straddling the cap boundary can't
  survive), and the cross-pipeline truncation-marker parity.
- 8 integration tests in loop_spans_content_test.go cover:
  - capture-true + redact-true: span carries redacted prompt
  - capture-true + redact-false: span carries raw prompt
  - capture-false: no prompt/completion/args/result attributes
  - large prompt: truncated with the same marker as audit
  - completion stamping on success
  - empty completion: attribute skipped
  - tool args + result on tool.<name> span (redacted)
  - tool args + result not present when capture-false

Docs

- docs/core-concepts/observability-tracing.md § Phase 3 is metadata-
  only → § Span content capture. New table mapping config knob to
  attribute keys per span. Notes the byte cap, the
  marker-parity-with-audit invariant, and the redact pattern set.
  Updated config example + field table.
- docs/security/audit-logging.md § Trace cross-link gains a
  § Content-capture parity subsection explaining the redact + cap
  parity invariant and the divergent caps (16 KiB audit, 4 KiB span).
- .claude/skills/forge.md § 12.9 — replaces the "Phase 3 ships
  metadata-only" caveat with a paragraph documenting the new
  capture surface. Updates the example forge.yaml comment.

Verification

- gofmt clean; golangci-lint 0 issues
- full forge-core + forge-cli test suites green
- the 19 new tests in this PR all pass
…essages)

The OTel GenAI semantic conventions moved the prompt + completion
attributes from flat-string (gen_ai.prompt, gen_ai.completion) to
structured (gen_ai.input.messages, gen_ai.output.messages) — arrays
of role+content message objects. For a feature landing in v0.15.0
we should ship the current keys, not the deprecated ones.

Changes

1. attrs.go — AttrGenAIPrompt → AttrGenAIInputMessages
   (value: gen_ai.input.messages); AttrGenAICompletion →
   AttrGenAIOutputMessages (value: gen_ai.output.messages). Doc
   comments call out the supersedence.

2. loop.go — completion attribute now stamps a single-element
   [{role,content}] array (via the existing serializeChatMessages
   helper) instead of the raw response string, matching the
   structured-shape contract the new key implies. The prompt path
   already emitted a message array — only the key name changed.

3. Tests — TestExecute_CaptureContentTrue_StampsCompletionOnLLMSpan
   now asserts the value is JSON-parseable as
   []llm.ChatMessage{{Role: assistant, Content: …}} instead of the
   bare response string. Other tests still pass unchanged because
   their assertions look for substring presence (the secret in
   redact tests, the truncation marker, etc.) and the JSON wrapper
   doesn't affect those.

4. Docs — observability-tracing.md attribute table updated with the
   new keys and a note about backends that only recognize the
   deprecated flat-string attributes (operators should upgrade the
   backend's semconv mapping or use a span processor to translate).
   .claude/skills/forge.md § 12.9 updated with the same note.

Verification

- gofmt + golangci-lint clean
- forge-core/runtime + forge-core/observability test suites pass
- the 8 integration tests still cover the same four logical sites
  (LLM prompt, LLM completion, tool args, tool result) under the
  new key names
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

OTel: honor capture_content + redact on span attributes (reuse FWS-8 audit redactor)

1 participant