feat(otel): honor capture_content + redact on span attributes (closes #130) by initializ-mk · Pull Request #154 · initializ/forge

initializ-mk · 2026-06-11T18:41:22Z

Closes #130.

Summary

Phase 3 of the OTel Tracing v1 initiative (#108) shipped span instrumentation but metadata-only. The `capture_content` + `redact` knobs in `forge.yaml` were plumbed by Phase 2 but never consumed — an operator who set `capture_content: true` got metadata-only spans and no error.

This PR closes the gap. When `observability.tracing.capture_content: true` is set, the `llm.completion` and `tool.` spans stamp the prompt / completion / tool I/O as attributes, passed through a redact-then-truncate pipeline that mirrors what the audit payload-capture path will use.

Attribute keys added

Knob	Span	Keys added
`capture_content: true`	`llm.completion`	`gen_ai.prompt` (JSON-serialized inbound messages), `gen_ai.completion` (response text) — OTel GenAI semconv
`capture_content: true`	`tool.`	`forge.tool.args`, `forge.tool.result`

Default posture (no opt-in) preserved: the keys are absent from spans — not empty string. Backends that look at "is the key present?" can distinguish "metadata-only by default" from "operator opted in but the field happened to be empty." Empty completion / args / result skips stamping for the same reason.

Redact + truncate pipeline

`PrepareSpanContent(s, redact, maxBytes)`:

When `redact=true` (the default with capture): scrubs the same vendor secret-token shapes the runtime guardrails CustomRule defaults cover — Anthropic `sk-ant-…`, OpenAI `sk-…`, GitHub `ghp_/gho_/ghs_/github_pat_…`, AWS `AKIA…`, Slack `xoxb-/xoxp-…`, RSA/EC/OPENSSH/PRIVATE key blocks, Telegram bot tokens. Matched values become `[REDACTED]`.
`redact=false` is the enterprise raw-capture path — content stamped verbatim, byte cap still applied.
Byte-capped at 4 KiB (below the 5 KiB soft attribute-length limit most observability backends apply). The truncation marker (`…[truncated:N]` where N = original byte length) is byte-identical to what `AuditPayloadCapture.TruncateForAudit` emits for the same input. Operators grepping `[truncated:` or `[REDACTED]` across span attributes and audit rows see aligned output.

Ordering matters: redact runs before truncate so a secret straddling the cap boundary can never survive in the truncated tail. Pinned by `TestPrepareSpanContent_RedactThenTruncate`.

Files

`forge-core/runtime/content_redact.go` — `RedactSecrets`, `PrepareSpanContent`, `serializeChatMessages`, plus the pattern set
`forge-core/runtime/content_redact_test.go` — 11 unit tests
`forge-core/runtime/loop_spans_content_test.go` — 8 integration tests covering all four content sites + cross-pipeline marker parity
`forge-core/runtime/loop.go` — `LLMExecutorConfig.TracingConfig` + `LLMExecutor.tracingCfg` + conditional attribute stamping on the two span sites
`forge-core/observability/attrs.go` — 4 new attribute constants
`forge-cli/runtime/runner.go` — passes the already-resolved `tracingCfg` into the executor config
`docs/core-concepts/observability-tracing.md` — § Phase 3 is metadata-only → § Span content capture with attribute table
`docs/security/audit-logging.md` — § Trace cross-link gains a § Content-capture parity subsection
`.claude/skills/forge.md` — § 12.9 caveat replaced; yaml example comment updated

Test plan

gofmt clean; golangci-lint 0 issues
full `go test ./...` from `forge-core` and `forge-cli` — all green
11 unit tests + 8 integration tests in this PR pass
redact-then-truncate ordering pinned by a regression test
cross-pipeline marker parity pinned by a regression test that calls both `PrepareSpanContent` and `TruncateForAudit` with the same input and asserts byte equality
absent-attribute contract checked: `CaptureContent=false` + a prompt that would otherwise stamp content → no `gen_ai.prompt` / `gen_ai.completion` / `forge.tool.args` / `forge.tool.result` keys

Out of scope

New attribute keys beyond the four above (`gen_ai.system_instructions`, per-field tool-args decomposition) — pick up in a follow-up if operators ask.
Sampling-aware capture ("only capture content on dropped traces") — the metadata-only default already handles the storage-cost concern.
Audit-side payload capture is unchanged. The redactor lives in forge-core and is currently only called from the OTel path; the audit path can call `PrepareSpanContent` / `RedactSecrets` in a future change if/when capture moves to that pipeline too.

…130) Phase 3 of the OTel Tracing v1 initiative (#108, PR #125) shipped span instrumentation across the executor loop and tool calls but kept it metadata-only — span attributes carried provider, model, usage tokens, finish reasons, but no prompt / completion / tool I/O text. Phase 2 (#103, PR #124) plumbed two operator-facing knobs (`capture_content`, `redact`) through the config schema. The runtime never read them. An operator who set `capture_content: true` got metadata-only spans and no error — the worst kind of config: load- bearing-looking, silently inert. This commit closes that gap. What lands 1. forge-core/runtime/content_redact.go — new package-internal helpers: - RedactSecrets scrubs known vendor secret-token shapes (Anthropic sk-ant, OpenAI sk-, GitHub ghp_/gho_/ghs_/github_pat_, AWS AKIA, Slack xoxb/xoxp, RSA/EC/OPENSSH/PRIVATE key blocks, Telegram bot tokens). Patterns mirror the runtime guardrails CustomRule defaults in forge-cli/runtime/guardrails_loader.go's DefaultStructuredGuardrails — the two should evolve together. - PrepareSpanContent runs the redact-then-truncate pipeline for content destined for OTel span attributes. Cap defaults to 4 KiB (below the 5 KiB soft attribute-length limit most backends apply). Reuses the audit pipeline's TruncateForAudit so the `…[truncated:N]` marker is byte-identical to what AuditPayloadCapture emits for the same input. 2. forge-core/observability/attrs.go — four new attribute constants: - AttrGenAIPrompt = "gen_ai.prompt" // OTel GenAI semconv - AttrGenAICompletion = "gen_ai.completion" // OTel GenAI semconv - AttrForgeToolArgs = "forge.tool.args" - AttrForgeToolResult = "forge.tool.result" Stripped the "Phase 3 metadata-only" callout from the forge.tool.* group. 3. forge-core/runtime/loop.go — adds: - LLMExecutorConfig.TracingConfig (consumed by Phase 3 sites) - LLMExecutor.tracingCfg field - Conditional attribute stamping on the llm.completion span (`gen_ai.prompt` before Chat(), `gen_ai.completion` after success) and the tool.<name> span (`forge.tool.args` before Execute(), `forge.tool.result` after). 4. forge-cli/runtime/runner.go — populates LLMExecutorConfig. TracingConfig from the already-resolved tracingCfg the cli also passes to NewTracerProvider. Zero plumbing additions; just wires the existing field through. Cross-pipeline parity The four content attributes pass through the same redact-then- truncate helper as the (existing) audit payload-capture path. An operator who sees a `[REDACTED]` marker in an audit row sees the same marker on the linked span; the same goes for `…[truncated:N]`. Vendor pattern parity with the guardrails defaults is enforced by convention (and called out in the doc updates). Default posture preserved CaptureContent=false (the zero-value default) means the four content attributes are absent from spans — not set to empty string. Backends that gate dashboards on "is this key present?" can distinguish "metadata-only by default" from "operator opted in but the field happened to be empty." Empty content (e.g. tool-call-only assistant turn → no completion text) likewise skips stamping. Tests - 11 unit tests in content_redact_test.go cover RedactSecrets per vendor pattern, PrepareSpanContent ordering invariant (redact before truncate so a secret straddling the cap boundary can't survive), and the cross-pipeline truncation-marker parity. - 8 integration tests in loop_spans_content_test.go cover: - capture-true + redact-true: span carries redacted prompt - capture-true + redact-false: span carries raw prompt - capture-false: no prompt/completion/args/result attributes - large prompt: truncated with the same marker as audit - completion stamping on success - empty completion: attribute skipped - tool args + result on tool.<name> span (redacted) - tool args + result not present when capture-false Docs - docs/core-concepts/observability-tracing.md § Phase 3 is metadata- only → § Span content capture. New table mapping config knob to attribute keys per span. Notes the byte cap, the marker-parity-with-audit invariant, and the redact pattern set. Updated config example + field table. - docs/security/audit-logging.md § Trace cross-link gains a § Content-capture parity subsection explaining the redact + cap parity invariant and the divergent caps (16 KiB audit, 4 KiB span). - .claude/skills/forge.md § 12.9 — replaces the "Phase 3 ships metadata-only" caveat with a paragraph documenting the new capture surface. Updates the example forge.yaml comment. Verification - gofmt clean; golangci-lint 0 issues - full forge-core + forge-cli test suites green - the 19 new tests in this PR all pass

…essages) The OTel GenAI semantic conventions moved the prompt + completion attributes from flat-string (gen_ai.prompt, gen_ai.completion) to structured (gen_ai.input.messages, gen_ai.output.messages) — arrays of role+content message objects. For a feature landing in v0.15.0 we should ship the current keys, not the deprecated ones. Changes 1. attrs.go — AttrGenAIPrompt → AttrGenAIInputMessages (value: gen_ai.input.messages); AttrGenAICompletion → AttrGenAIOutputMessages (value: gen_ai.output.messages). Doc comments call out the supersedence. 2. loop.go — completion attribute now stamps a single-element [{role,content}] array (via the existing serializeChatMessages helper) instead of the raw response string, matching the structured-shape contract the new key implies. The prompt path already emitted a message array — only the key name changed. 3. Tests — TestExecute_CaptureContentTrue_StampsCompletionOnLLMSpan now asserts the value is JSON-parseable as []llm.ChatMessage{{Role: assistant, Content: …}} instead of the bare response string. Other tests still pass unchanged because their assertions look for substring presence (the secret in redact tests, the truncation marker, etc.) and the JSON wrapper doesn't affect those. 4. Docs — observability-tracing.md attribute table updated with the new keys and a note about backends that only recognize the deprecated flat-string attributes (operators should upgrade the backend's semconv mapping or use a span processor to translate). .claude/skills/forge.md § 12.9 updated with the same note. Verification - gofmt + golangci-lint clean - forge-core/runtime + forge-core/observability test suites pass - the 8 integration tests still cover the same four logical sites (LLM prompt, LLM completion, tool args, tool result) under the new key names

initializ-mk added 2 commits June 11, 2026 14:41

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(otel): honor capture_content + redact on span attributes (closes #130)#154

feat(otel): honor capture_content + redact on span attributes (closes #130)#154
initializ-mk wants to merge 2 commits into
mainfrom
feat/issue-130-otel-content-capture

initializ-mk commented Jun 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

initializ-mk commented Jun 11, 2026

Summary

Attribute keys added

Redact + truncate pipeline

Files

Test plan

Out of scope

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant