feat(telemetry): add opt-in user.id span attribution#5647
Draft
claude[bot] wants to merge 3 commits into
Draft
Conversation
Add an optional, default-off telemetry feature that sets the OTEL standard "user.id" span attribute on the inbound MCP server span, sourced from the authenticated subject (auth.Identity.Subject). When disabled (the default) behavior is unchanged: no user attribution lands on any span. When enabled, "user.id" is emitted only when an authenticated identity is present on the request context, so anonymous requests are unaffected. The attribute is high-cardinality and is never added to any metric instrument. Default-off because the subject can be personally- or tenant-identifying. Plumbs the toggle through every layer: - telemetry.Config.EnableUserIDAttribute (consumed by NewHTTPMiddleware) - addMCPAttributes reads auth.IdentityFromContext when enabled - CLI flag --otel-enable-user-id-attribute (mirrors --otel-use-legacy-attributes) with config-file fallback - app config OpenTelemetryConfig.EnableUserIDAttribute and the shared BuildTelemetryConfigFromAppConfig / MaybeMakeConfig builders - operator MCPTelemetryConfig CRD field + spectoconfig conversion Tests cover off (no attribute even with identity), on with subject (attribute set), and on without subject / anonymous (no attribute), both directly and through addMCPAttributes. Regenerated CRD manifests, helm templates, CRD-API and swagger docs; documented the attribute, its opt-in nature, and the PII consideration in docs/observability.md. Closes #5455
…e CLI docs Extract the repeated 'flag value or config fallback' bool resolution into a boolFlagOrConfig helper so getTelemetryFromFlags drops back below the gocyclo threshold (the new --otel-enable-user-id-attribute fallback had pushed it to 16). Regenerate docs/cli/thv_run.md so the --otel-enable-user-id-attribute flag is documented, matching the swagger/docgen verification output.
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #5647 +/- ##
=======================================
Coverage 70.34% 70.35%
=======================================
Files 649 649
Lines 66101 66114 +13
=======================================
+ Hits 46500 46515 +15
- Misses 16253 16255 +2
+ Partials 3348 3344 -4 ☔ View full report in Codecov by Harness. 🚀 New features to boost your workflow:
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Requested by Christopher Burns · Slack thread
Summary
When ToolHive is fronted with authentication, the inbound MCP server span carries protocol, transport, and client attributes but nothing tying a span to the authenticated user — so per-user questions ("which principal invoked this tool or hit this backend?") cannot be answered from traces, even though
auth.Identityis already on the request context at the span site. This adds an opt-in, default-off option that emits the authenticated subject as the OTEL-standarduser.idspan attribute on the inbound MCP server span. With it disabled (the default) nothing changes; with it enabled,user.idis set from the identity'sSubjectonly when an identity is present, so anonymous requests are unaffected. It is default-off because the subject can be personally- or tenant-identifying, and is intentionally never added to any metric instrument (it is high-cardinality).Closes #5455
Medium level
addUserIDAttributehelper fromaddMCPAttributes. It returns early unless the feature is enabled, then readsauth.IdentityFromContext(ctx)and setsuser.idfromidentity.Subject, only when an identity with a non-empty subject is present.EnableUserIDAttributefield is threaded end to end: the telemetryConfig, thepkg/configOpenTelemetryConfig, and thepkg/runnerbuilders (BuildTelemetryConfigFromAppConfig,MaybeMakeConfig,WithTelemetryConfigFromFlags), mirroring the existing--otel-use-legacy-attributespath.--otel-enable-user-id-attributeflag (default false) with config-file fallback, parallel to the legacy-attributes flag.MCPTelemetryConfigCRD gains anenableUserIDAttributefield withspectoconfigconversion; regenerated CRD manifests, helm templates, CRD-API and swagger docs follow.addMCPAttributes; the drift-test mapping is updated.docs/observability.mddocuments the attribute, its opt-in nature, and the PII consideration.Low level
pkg/telemetry/middleware.goaddUserIDAttribute(ctx, span), called fromaddMCPAttributes; gated onconfig.EnableUserIDAttribute, setsuser.idfromauth.IdentityFromContextsubject only when present and non-empty.pkg/telemetry/config.goConfig.EnableUserIDAttributefield (default false) and threads it throughMaybeMakeConfig.pkg/telemetry/middleware_test.goaddMCPAttributes.pkg/config/config.goOpenTelemetryConfig.EnableUserIDAttribute.pkg/runner/telemetry_config.goBuildTelemetryConfigFromAppConfig.pkg/runner/config_builder.goWithTelemetryConfigFromFlags.pkg/runner/config_test.gocmd/thv/app/run_flags.go--otel-enable-user-id-attributeflag (default false) with config-file fallback.cmd/thv/app/run_flags_test.gocmd/thv-operator/api/v1beta1/mcptelemetryconfig_types.goenableUserIDAttributeCRD field with+kubebuilder:default=false.cmd/thv-operator/pkg/spectoconfig/telemetry.gocmd/thv-operator/pkg/spectoconfig/telemetry_drift_test.godeploy/charts/operator-crds/files/crds/toolhive.stacklok.dev_mcptelemetryconfigs.yamldeploy/charts/operator-crds/files/crds/toolhive.stacklok.dev_virtualmcpservers.yamldeploy/charts/operator-crds/templates/toolhive.stacklok.dev_mcptelemetryconfigs.yamldeploy/charts/operator-crds/templates/toolhive.stacklok.dev_virtualmcpservers.yamldocs/operator/crd-api.mddocs/server/docs.godocs/server/swagger.jsondocs/server/swagger.yamldocs/observability.mdType of change
Test plan
task lint-fixpassestask testpasses (telemetry, config, runner, and operator unit tests green)task buildpassesDoes this introduce a user-facing change?
Yes — but additive and opt-in. A new
--otel-enable-user-id-attributeCLI flag (default false) and a correspondingenableUserIDAttributefield on theMCPTelemetryConfigCRD are added. With both at their default (false), behavior is unchanged: no user attribution is emitted on any span. When enabled, the OTEL-standarduser.idattribute appears on inbound MCP server spans for authenticated requests.Special notes for reviewers
golangci-lintcould not be run locally in this environment — the available linter binary is built against Go 1.25 and refuses the repo's Go 1.26 target — sotask lint-fixis left unchecked above and CI runs the authoritative lint.TestRunLLMSetup_PartialFailure, is pre-existing and unrelated to this change (it does not touch telemetry, config plumbing, the CLI flag, or the operator CRD).pkg/vmcp/server/telemetry.go) is left as possible follow-up per the issue's open questions.user.idis intentionally kept off all metric instruments due to its high cardinality; this is a span-only attribute.