Skip to content

Guardrail span parity: emit OTel spans + attributes alongside the audit events #161

@initializ-mk

Description

@initializ-mk

Context

PR #160 (closes #159) wires all five library gates and unifies the
`guardrail_check` audit event on the `gate` vocabulary. The audit
stream now carries everything an SIEM consumer needs to filter by
gate / decision / violation.

The OTel trace tree, however, sees nothing. Verified:

So a trace from a request that fires a guardrail mask shows:

```
a2a.tasks/send
└─ agent.execute
├─ llm.completion
│ └─ (no guardrail span — InputGate fired BEFORE, invisible)
└─ tool.
└─ (no guardrail span — ToolCallGate fired BEFORE, invisible)
```

Operators looking at a trace can't see "PII was masked here" without
pivoting to the audit stream and joining on `correlation_id`. The
guardrail decision is invisible to anyone who only has access to the
trace backend (Honeycomb / Datadog / Tempo / Grafana).

Proposal

Symmetric to the audit work — every gate call opens a child span and
stamps the same fields the audit event carries.

Span names

Span When
`guardrail.input` `CheckInbound`
`guardrail.context` `CheckContext` (one span per system message scanned)
`guardrail.tool_call` `CheckToolCall`
`guardrail.output` `CheckOutbound` and `CheckToolOutput` — distinguished by presence of `forge.tool.name`
`guardrail.stream` `CheckStream` (when wired)

Span attributes

New constants in `forge-core/observability/attrs.go` (under the
existing `forge.*` namespace):

Key Value Source
`forge.guardrail.gate` `input` / `context` / `tool_call` / `output` / `stream` `res.Gate` — single source of truth, matches `fields.gate` on the audit event
`forge.guardrail.decision` `allow` / `mask` / `block` / `warn` `res.Decision`
`forge.guardrail.type` `pii` / `moderation` / `security` / … First violation `Type`
`forge.guardrail.category` `ssn` / `email` / … First violation `Category`
`forge.guardrail.violation_count` int `len(res.Violations)`
`forge.tool.name` string Already a constant in `attrs.go`; reused for tool_call + tool_output gate spans
`forge.guardrail.evidence` string Pre-mask content; gated by `TracingConfig.CaptureContent` + `Redact` (issue #130 posture)

Span parent

Spans nest under whatever's active when the engine method is called:

Gate Parent span
`input` `a2a.tasks/send` (CheckInbound runs in the A2A handler, before the loop starts)
`context` `agent.execute` (BeforeLLMCall hook is inside the loop)
`tool_call` `agent.execute` (BeforeToolExec hook is inside the loop)
`output` (final) `agent.execute` (CheckOutbound runs at the A2A handler exit, AFTER agent.execute closes — needs the parent ctx threaded explicitly OR moves CheckOutbound inside agent.execute's defer)
`output` (tool result) `agent.execute` (AfterToolExec hook)

Span status

Decision OTel status
`allow` / `warn` / `mask` OK
`block` Error (with the violation summary as the status description)

The Error status surfaces blocked invocations as red bars in the trace
UI without needing custom attribute queries.

Evidence capture parity with #130

`forge.guardrail.evidence` follows the exact same posture as
`gen_ai.input.messages` / `gen_ai.output.messages` / `forge.tool.args`
that #130 established:

  • Default off: `TracingConfig.CaptureContent=false` means the attribute is absent.
  • `CaptureContent=true` + `Redact=true` (default) → `PrepareSpanContent(s, true, MaxBytes)` scrubs vendor secret patterns before stamping.
  • `MaxBytes` (default 4 KiB) trims via the existing `…[truncated:N]` marker.

Same env knobs that already control the OTel content-capture pipeline cover guardrail evidence — no new operator-facing surface.

For mask decisions, evidence on the SPAN follows the same rule as
evidence on the AUDIT event: post-mask content (the payload the LLM
actually saw). Block / warn decisions carry the original triggering
content because the library never produces a masked variant in those
paths. See `docs/security/guardrails.md#what-evidence-actually-contains`.

Implementation sketch

  • `forge-core/observability/attrs.go` — add the five `forge.guardrail.*` constants.
  • `forge-cli/runtime/guardrails_tracing.go` (new) — `startGuardrailSpan(ctx, gate, tool)` helper + `finishGuardrailSpan(span, res, decision, content, captureCfg)`.
  • `forge-cli/runtime/guardrails_engine.go` — each Check* method opens a span at the top, stamps attributes + status at the bottom. ~5-10 lines per gate.
  • `forge-cli/runtime/guardrails_engine.go` — wire `observability.TracingConfig` into the engine so the evidence attribute respects `CaptureContent` + `Redact`. Mirrors how PR feat(otel): honor capture_content + redact on span attributes (closes #130) #154 wired the tracing config onto `LLMExecutor`.
  • Tests with `sdktrace.InMemoryExporter` (same pattern as `loop_spans_content_test.go`) asserting the span name + attributes for each gate.
  • Docs: `docs/core-concepts/observability-tracing.md` gains a "Guardrail spans" section linking to `docs/security/guardrails.md`.

Out of scope

  • Cardinality limit on `guardrail.context` spans. If a deployment has many system messages it may produce N spans per LLM iteration — that's fine for now (small N in practice), revisit if cardinality complaints arise.
  • StreamGate spans. `CheckStream` exists but isn't auto-wired (Forge's `ExecuteStream` is a buffered wrapper). The span helper is exposed so when real streaming lands in the loop, the wiring is one line.
  • Trace context propagation to downstream guardrail-library calls. The library doesn't consume OTel context today; if it grows to (e.g. when calling external moderation endpoints) we can revisit.

Verification

End-to-end:

  1. Run an agent with `FORGE_OTEL_ENABLED=true` and `FORGE_OTEL_EXPORTER=otlp` plus `FORGE_GUARDRAIL_CAPTURE_EVIDENCE` env unset (default).
  2. Send a PII-bearing message. Confirm the trace backend shows a `guardrail.input` child of `a2a.tasks/send` with `forge.guardrail.gate=input`, `forge.guardrail.decision=mask`, `forge.guardrail.type=pii`, `forge.guardrail.category=ssn`, and no `forge.guardrail.evidence` attribute.
  3. Set `FORGE_OTEL_CAPTURE_CONTENT=true` and re-run. Confirm `forge.guardrail.evidence` now carries the redacted + truncated content.
  4. Send an A2A request that triggers an outbound block (in enforce mode). Confirm `guardrail.output` has OTel status `Error` with the violation summary as the description.

Related

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions