Skip to content

feat(telemetry): add opt-in user.id span attribution#5647

Draft
claude[bot] wants to merge 3 commits into
mainfrom
feat/5455-user-id-span-attribution
Draft

feat(telemetry): add opt-in user.id span attribution#5647
claude[bot] wants to merge 3 commits into
mainfrom
feat/5455-user-id-span-attribution

Conversation

@claude

@claude claude Bot commented Jun 25, 2026

Copy link
Copy Markdown
Contributor

Requested by Christopher Burns · Slack thread

Summary

When ToolHive is fronted with authentication, the inbound MCP server span carries protocol, transport, and client attributes but nothing tying a span to the authenticated user — so per-user questions ("which principal invoked this tool or hit this backend?") cannot be answered from traces, even though auth.Identity is already on the request context at the span site. This adds an opt-in, default-off option that emits the authenticated subject as the OTEL-standard user.id span attribute on the inbound MCP server span. With it disabled (the default) nothing changes; with it enabled, user.id is set from the identity's Subject only when an identity is present, so anonymous requests are unaffected. It is default-off because the subject can be personally- or tenant-identifying, and is intentionally never added to any metric instrument (it is high-cardinality).

Closes #5455

Medium level
  • Runtime read — the telemetry middleware now consults a new addUserIDAttribute helper from addMCPAttributes. It returns early unless the feature is enabled, then reads auth.IdentityFromContext(ctx) and sets user.id from identity.Subject, only when an identity with a non-empty subject is present.
  • Config plumbing — a new EnableUserIDAttribute field is threaded end to end: the telemetry Config, the pkg/config OpenTelemetryConfig, and the pkg/runner builders (BuildTelemetryConfigFromAppConfig, MaybeMakeConfig, WithTelemetryConfigFromFlags), mirroring the existing --otel-use-legacy-attributes path.
  • CLI surface — a new --otel-enable-user-id-attribute flag (default false) with config-file fallback, parallel to the legacy-attributes flag.
  • Operator — the MCPTelemetryConfig CRD gains an enableUserIDAttribute field with spectoconfig conversion; regenerated CRD manifests, helm templates, CRD-API and swagger docs follow.
  • Tests — table-driven coverage for the four cases (disabled, enabled+subject, enabled+anonymous, enabled+empty-subject), exercised both directly and via addMCPAttributes; the drift-test mapping is updated.
  • Docsdocs/observability.md documents the attribute, its opt-in nature, and the PII consideration.
Low level
File Change
pkg/telemetry/middleware.go Adds addUserIDAttribute(ctx, span), called from addMCPAttributes; gated on config.EnableUserIDAttribute, sets user.id from auth.IdentityFromContext subject only when present and non-empty.
pkg/telemetry/config.go Adds Config.EnableUserIDAttribute field (default false) and threads it through MaybeMakeConfig.
pkg/telemetry/middleware_test.go Table-driven tests: off → no attribute (even with identity); on+subject → set; on+anonymous → none; on+empty-subject → none; covered directly and via addMCPAttributes.
pkg/config/config.go Adds OpenTelemetryConfig.EnableUserIDAttribute.
pkg/runner/telemetry_config.go Propagates the flag in BuildTelemetryConfigFromAppConfig.
pkg/runner/config_builder.go Threads the flag through WithTelemetryConfigFromFlags.
pkg/runner/config_test.go Updates builder call sites for the new parameter.
cmd/thv/app/run_flags.go Adds the --otel-enable-user-id-attribute flag (default false) with config-file fallback.
cmd/thv/app/run_flags_test.go Covers the new flag.
cmd/thv-operator/api/v1beta1/mcptelemetryconfig_types.go Adds the enableUserIDAttribute CRD field with +kubebuilder:default=false.
cmd/thv-operator/pkg/spectoconfig/telemetry.go Maps the CRD field to the runtime telemetry config.
cmd/thv-operator/pkg/spectoconfig/telemetry_drift_test.go Updates the spec-to-config drift mapping.
deploy/charts/operator-crds/files/crds/toolhive.stacklok.dev_mcptelemetryconfigs.yaml Regenerated CRD manifest.
deploy/charts/operator-crds/files/crds/toolhive.stacklok.dev_virtualmcpservers.yaml Regenerated CRD manifest.
deploy/charts/operator-crds/templates/toolhive.stacklok.dev_mcptelemetryconfigs.yaml Regenerated helm template.
deploy/charts/operator-crds/templates/toolhive.stacklok.dev_virtualmcpservers.yaml Regenerated helm template.
docs/operator/crd-api.md Regenerated CRD API reference.
docs/server/docs.go Regenerated swagger (embedded).
docs/server/swagger.json Regenerated swagger.
docs/server/swagger.yaml Regenerated swagger.
docs/observability.md Documents the attribute, its opt-in nature, and the PII consideration.

Type of change

  • Bug fix
  • New feature
  • Breaking change
  • Refactoring
  • Documentation
  • Other

Test plan

  • task lint-fix passes
  • task test passes (telemetry, config, runner, and operator unit tests green)
  • task build passes
  • New unit tests added covering disabled, enabled+subject, enabled+anonymous, and enabled+empty-subject paths
  • Manually tested

Does this introduce a user-facing change?

Yes — but additive and opt-in. A new --otel-enable-user-id-attribute CLI flag (default false) and a corresponding enableUserIDAttribute field on the MCPTelemetryConfig CRD are added. With both at their default (false), behavior is unchanged: no user attribution is emitted on any span. When enabled, the OTEL-standard user.id attribute appears on inbound MCP server spans for authenticated requests.

Special notes for reviewers

  • golangci-lint could not be run locally in this environment — the available linter binary is built against Go 1.25 and refuses the repo's Go 1.26 target — so task lint-fix is left unchecked above and CI runs the authoritative lint.
  • The only failing unit test observed, TestRunLLMSetup_PartialFailure, is pre-existing and unrelated to this change (it does not touch telemetry, config plumbing, the CLI flag, or the operator CRD).
  • The attribute is deliberately scoped to the inbound MCP server span only; extending it to vMCP backend/client spans (pkg/vmcp/server/telemetry.go) is left as possible follow-up per the issue's open questions.
  • user.id is intentionally kept off all metric instruments due to its high cardinality; this is a span-only attribute.

Add an optional, default-off telemetry feature that sets the OTEL
standard "user.id" span attribute on the inbound MCP server span,
sourced from the authenticated subject (auth.Identity.Subject).

When disabled (the default) behavior is unchanged: no user attribution
lands on any span. When enabled, "user.id" is emitted only when an
authenticated identity is present on the request context, so anonymous
requests are unaffected. The attribute is high-cardinality and is never
added to any metric instrument.

Default-off because the subject can be personally- or tenant-identifying.

Plumbs the toggle through every layer:
- telemetry.Config.EnableUserIDAttribute (consumed by NewHTTPMiddleware)
- addMCPAttributes reads auth.IdentityFromContext when enabled
- CLI flag --otel-enable-user-id-attribute (mirrors
  --otel-use-legacy-attributes) with config-file fallback
- app config OpenTelemetryConfig.EnableUserIDAttribute and the shared
  BuildTelemetryConfigFromAppConfig / MaybeMakeConfig builders
- operator MCPTelemetryConfig CRD field + spectoconfig conversion

Tests cover off (no attribute even with identity), on with subject
(attribute set), and on without subject / anonymous (no attribute),
both directly and through addMCPAttributes. Regenerated CRD manifests,
helm templates, CRD-API and swagger docs; documented the attribute, its
opt-in nature, and the PII consideration in docs/observability.md.

Closes #5455
@github-actions github-actions Bot added size/M Medium PR: 300-599 lines changed and removed size/M Medium PR: 300-599 lines changed labels Jun 25, 2026
…e CLI docs

Extract the repeated 'flag value or config fallback' bool resolution into a
boolFlagOrConfig helper so getTelemetryFromFlags drops back below the gocyclo
threshold (the new --otel-enable-user-id-attribute fallback had pushed it to 16).

Regenerate docs/cli/thv_run.md so the --otel-enable-user-id-attribute flag is
documented, matching the swagger/docgen verification output.
@github-actions github-actions Bot added size/M Medium PR: 300-599 lines changed and removed size/M Medium PR: 300-599 lines changed labels Jun 25, 2026
@codecov

codecov Bot commented Jun 25, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 91.66667% with 1 line in your changes missing coverage. Please review.
✅ Project coverage is 70.35%. Comparing base (5310b0c) to head (98db84b).
⚠️ Report is 2 commits behind head on main.

Files with missing lines Patch % Lines
pkg/telemetry/config.go 0.00% 1 Missing ⚠️
Additional details and impacted files
@@           Coverage Diff           @@
##             main    #5647   +/-   ##
=======================================
  Coverage   70.34%   70.35%           
=======================================
  Files         649      649           
  Lines       66101    66114   +13     
=======================================
+ Hits        46500    46515   +15     
- Misses      16253    16255    +2     
+ Partials     3348     3344    -4     

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@github-actions github-actions Bot added size/M Medium PR: 300-599 lines changed and removed size/M Medium PR: 300-599 lines changed labels Jun 25, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size/M Medium PR: 300-599 lines changed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Telemetry: optional, default-off authenticated-user attribution (user.id) on the MCP server span

1 participant