Skip to content

Respect global config for OTel tracing and metrics enabled#4326

Open
gkatz2 wants to merge 2 commits intostacklok:mainfrom
gkatz2:fix/otel-global-tracing-metrics-config
Open

Respect global config for OTel tracing and metrics enabled#4326
gkatz2 wants to merge 2 commits intostacklok:mainfrom
gkatz2:fix/otel-global-tracing-metrics-config

Conversation

@gkatz2
Copy link
Contributor

@gkatz2 gkatz2 commented Mar 24, 2026

Summary

  • thv config otel set-tracing-enabled false was silently ignored by thv run because getTelemetryFromFlags had no fallback for TracingEnabled or MetricsEnabled, and buildRunConfig bypassed the fallback entirely for the proxy runner config
  • Users could not globally disable telemetry without passing CLI flags on every invocation

Fixes #4323

Type of change

  • Bug fix

Test plan

  • Unit tests (task test)
  • Linting (task lint-fix)
  • Manual testing (describe below)

Deployed a local build and verified:

  1. With global config tracing-enabled: false and metrics-enabled: false, thv run produces a runconfig with null telemetry (no telemetry initialized)
  2. With explicit --otel-tracing-enabled=true, CLI flag correctly overrides global config
  3. config.yaml now shows tracing-enabled: false instead of silently dropping the field

Changes

File Change
pkg/config/config.go Change TracingEnabled/MetricsEnabled from bool with omitempty to *bool without omitempty, matching the existing UseLegacyAttributes pattern
cmd/thv/app/run_flags.go Add config fallback for both fields in getTelemetryFromFlags; use resolved values in buildRunConfig; return nil from createTelemetryConfig when both signals are disabled
cmd/thv/app/otel.go Update setters/getters/unsetters for *bool
cmd/thv/app/run_flags_test.go Update existing tests and add 4 new cases for tracing/metrics fallback

Does this introduce a user-facing change?

thv config otel set-tracing-enabled and set-metrics-enabled now take effect when starting servers with thv run. Previously these settings were silently ignored.

Special notes for reviewers

The *bool pattern (nil = never set, false = explicitly disabled) is necessary because the CLI defaults for these flags are true, unlike Insecure and EnablePrometheusMetricsPath which default to false. A struct-level comment on OpenTelemetryConfig explains this distinction.

Generated with Claude Code

thv config otel set-tracing-enabled false was silently ignored by
thv run because getTelemetryFromFlags had no fallback for these
two fields, and buildRunConfig bypassed the fallback entirely for
the proxy runner config. Users could not globally disable telemetry
without passing CLI flags on every invocation.

Fixes stacklok#4323

Signed-off-by: Greg Katz <gkatz@indeed.com>
@github-actions github-actions bot added the size/S Small PR: 100-299 lines changed label Mar 24, 2026
@amirejaz amirejaz requested a review from Copilot March 24, 2026 11:55
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes thv run ignoring global OpenTelemetry tracing-enabled / metrics-enabled settings by introducing proper config fallbacks and representing “unset vs explicitly false” in the global config model.

Changes:

  • Update global OpenTelemetryConfig to use *bool for TracingEnabled/MetricsEnabled to preserve “never set” vs “explicitly disabled”.
  • Extend getTelemetryFromFlags fallbacks to include tracing/metrics enabled, and use the resolved values when building telemetry config.
  • Update config CLI setters/getters/unsetters and expand unit test coverage for telemetry flag/config resolution.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 5 comments.

File Description
pkg/config/config.go Switch tracing/metrics enabled fields to *bool and add rationale comment.
cmd/thv/app/run_flags.go Add tracing/metrics config fallback and adjust telemetry creation / legacy runconfig wiring.
cmd/thv/app/otel.go Update config mutation and printing logic to work with *bool.
cmd/thv/app/run_flags_test.go Add/adjust tests for tracing/metrics fallback behavior.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +400 to 404
MetricsEnabled *bool `yaml:"metrics-enabled"`
TracingEnabled *bool `yaml:"tracing-enabled"`
Insecure bool `yaml:"insecure,omitempty"`
EnablePrometheusMetricsPath bool `yaml:"enable-prometheus-metrics-path,omitempty"`
UseLegacyAttributes *bool `yaml:"use-legacy-attributes"`
Copy link

Copilot AI Mar 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MetricsEnabled/TracingEnabled are now *bool but the YAML tags no longer use omitempty. When the OTEL block is present (e.g., endpoint set) and these pointers are nil, yaml.Marshal will emit metrics-enabled: null / tracing-enabled: null, which is user-facing noise and ambiguous given the new nil=never-set semantics. Consider adding omitempty back to the pointer fields (so nil is omitted but explicit false/true is preserved), or otherwise ensuring nil values don’t serialize as null.

Suggested change
MetricsEnabled *bool `yaml:"metrics-enabled"`
TracingEnabled *bool `yaml:"tracing-enabled"`
Insecure bool `yaml:"insecure,omitempty"`
EnablePrometheusMetricsPath bool `yaml:"enable-prometheus-metrics-path,omitempty"`
UseLegacyAttributes *bool `yaml:"use-legacy-attributes"`
MetricsEnabled *bool `yaml:"metrics-enabled,omitempty"`
TracingEnabled *bool `yaml:"tracing-enabled,omitempty"`
Insecure bool `yaml:"insecure,omitempty"`
EnablePrometheusMetricsPath bool `yaml:"enable-prometheus-metrics-path,omitempty"`
UseLegacyAttributes *bool `yaml:"use-legacy-attributes,omitempty"`

Copilot uses AI. Check for mistakes.
Comment on lines 412 to 427
func getOtelMetricsEnabledCmdFunc(_ *cobra.Command, _ []string) error {
configProvider := config.NewDefaultProvider()
cfg := configProvider.GetConfig()

fmt.Printf("Current OpenTelemetry metrics enabled: %t\n", cfg.OTEL.MetricsEnabled)
metricsEnabled := cfg.OTEL.MetricsEnabled != nil && *cfg.OTEL.MetricsEnabled
fmt.Printf("Current OpenTelemetry metrics enabled: %t\n", metricsEnabled)
return nil
}

func unsetOtelMetricsEnabledCmdFunc(_ *cobra.Command, _ []string) error {
configProvider := config.NewDefaultProvider()
cfg := configProvider.GetConfig()

if !cfg.OTEL.MetricsEnabled {
if cfg.OTEL.MetricsEnabled == nil || !*cfg.OTEL.MetricsEnabled {
fmt.Println("OpenTelemetry metrics enabled is already disabled.")
return nil
Copy link

Copilot AI Mar 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The get-otel-metrics-enabled output treats nil as false, but with the new *bool semantics nil means “never set” (and thv run will fall back to the CLI default, which is true). This can mislead users into thinking metrics are globally disabled when the setting is actually unset. Also, unset-otel-metrics-enabled currently returns early when the value is explicitly set to false, which prevents unsetting a previously-disabled value back to “never set”. Adjust the getter to report the unset state (or effective default), and allow unsetting regardless of current value (only short-circuit when already nil).

Copilot uses AI. Check for mistakes.
Comment on lines 460 to 476
func getOtelTracingEnabledCmdFunc(_ *cobra.Command, _ []string) error {
configProvider := config.NewDefaultProvider()
cfg := configProvider.GetConfig()

fmt.Printf("Current OpenTelemetry tracing enabled: %t\n", cfg.OTEL.TracingEnabled)
tracingEnabled := cfg.OTEL.TracingEnabled != nil && *cfg.OTEL.TracingEnabled
fmt.Printf("Current OpenTelemetry tracing enabled: %t\n", tracingEnabled)
return nil
}

func unsetOtelTracingEnabledCmdFunc(_ *cobra.Command, _ []string) error {
configProvider := config.NewDefaultProvider()
cfg := configProvider.GetConfig()

if !cfg.OTEL.TracingEnabled {
if cfg.OTEL.TracingEnabled == nil || !*cfg.OTEL.TracingEnabled {
fmt.Println("OpenTelemetry tracing enabled is already disabled.")
return nil
}
Copy link

Copilot AI Mar 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same issue as metrics: get-otel-tracing-enabled reports false when the config value is nil (unset), even though the effective behavior falls back to the CLI default true. Additionally, unset-otel-tracing-enabled currently refuses to unset when the value is explicitly false, so users can’t revert from “explicitly disabled” to “never set”. Update the getter to distinguish unset vs explicitly false, and change unset to only no-op when already nil.

Copilot uses AI. Check for mistakes.
Comment on lines 711 to 734
@@ -716,7 +728,7 @@ func configureMiddlewareAndOptions(
runFlags.JWKSAllowPrivateIP, runFlags.InsecureAllowHTTP, oidcScopes,
),
runner.WithTelemetryConfigFromFlags(finalOtelEndpoint, runFlags.OtelEnablePrometheusMetricsPath,
runFlags.OtelTracingEnabled, runFlags.OtelMetricsEnabled, runFlags.OtelServiceName,
finalTracingEnabled, finalMetricsEnabled, runFlags.OtelServiceName,
finalOtelSamplingRate, runFlags.OtelHeaders, runFlags.OtelInsecure, finalOtelEnvironmentVariables,
runFlags.OtelUseLegacyAttributes,
),
Copy link

Copilot AI Mar 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WithTelemetryConfigFromFlags is still receiving several raw runFlags values (e.g., otel-enable-prometheus-metrics-path, otel-insecure, otel-use-legacy-attributes) rather than the resolved values with global-config fallback. This can make the legacy RunConfig.TelemetryConfig diverge from the middleware telemetryConfig (e.g., global Prometheus metrics path enabled won’t be reflected, affecting exported runconfigs/K8s export). Consider deriving these values from telemetryConfig (when non-nil) or passing the resolved outputs from getTelemetryFromFlags into this call.

Copilot uses AI. Check for mistakes.
Comment on lines +1054 to +1059
// If both tracing and metrics are disabled, skip telemetry entirely.
// This allows users to disable telemetry via global config while keeping
// the endpoint configured for later re-enablement.
if !otelTracingEnabled && !otelMetricsEnabled && !otelEnablePrometheusMetricsPath {
return nil
}
Copy link

Copilot AI Mar 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

createTelemetryConfig now has a new early-return path when both tracing and metrics are disabled. Given this is the core behavior change enabling global disablement, it should have a unit test asserting that telemetry config becomes nil when otelEndpoint is set but both signals (and Prometheus metrics path) are disabled, and that CLI flags still override config where applicable.

Copilot uses AI. Check for mistakes.
@codecov
Copy link

codecov bot commented Mar 24, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 68.45%. Comparing base (35d2fc0) to head (ad0a788).
⚠️ Report is 15 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #4326      +/-   ##
==========================================
- Coverage   68.87%   68.45%   -0.43%     
==========================================
  Files         478      479       +1     
  Lines       48306    48645     +339     
==========================================
+ Hits        33272    33298      +26     
- Misses      12320    12378      +58     
- Partials     2714     2969     +255     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Cover the createTelemetryConfig early-return when both
tracing and metrics are disabled with the endpoint still
configured.

Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Greg Katz <gkatz@indeed.com>
@github-actions github-actions bot added size/S Small PR: 100-299 lines changed and removed size/S Small PR: 100-299 lines changed labels Mar 24, 2026
cfg := configProvider.GetConfig()

if !cfg.OTEL.MetricsEnabled {
if cfg.OTEL.MetricsEnabled == nil || !*cfg.OTEL.MetricsEnabled {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there's a subtle issue with the unset guard here. It treats "not configured" (nil) and "explicitly set to false" the same way — so if someone runs set-metrics-enabled false and later tries to unset it, they get "already disabled" and the explicit false stays in the config file, still overriding the CLI default.

Should we check just == nil instead? That way unset always removes the entry. Same thing applies to unsetOtelTracingEnabledCmdFunc at line 473.

otelUseLegacyAttributes bool) (
string, float64, []string, bool, bool, bool) {
otelUseLegacyAttributes bool, otelTracingEnabled bool, otelMetricsEnabled bool) (
string, float64, []string, bool, bool, bool, bool, bool) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is now returning 8 positional values with 5 consecutive bools — it's getting pretty easy to accidentally swap them at call sites. Would you be up for pulling these into a small struct in this PR, or would you rather do that as a follow-up?

// Extract resolved tracing/metrics values from the middleware telemetry config.
// These must match what setupTelemetryConfiguration resolved (with global config
// fallbacks) rather than the raw runFlags values, which ignore global config.
finalTracingEnabled := runFlags.OtelTracingEnabled
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Heads up — when both signals get disabled via global config, createTelemetryConfig returns nil, and the fallback here ends up using the raw runFlags values (which default to true). So the resolved "disabled" state gets lost. Right now it doesn't cause problems because extractTelemetryValues(nil) returns an empty endpoint and MaybeMakeConfig bails out too, but it feels fragile.

Maybe we could pass the resolved values from getTelemetryFromFlags to both consumers, or just default to false here when telemetryConfig is nil?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size/S Small PR: 100-299 lines changed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Global OTel tracing-enabled and metrics-enabled config ignored by thv run

3 participants