fix(tracing): properly shutdown tracer provider#5131
fix(tracing): properly shutdown tracer provider#5131siavashs wants to merge 3 commits intoprometheus:mainfrom
Conversation
📝 WalkthroughWalkthroughManager now serializes tracer-provider install/uninstall and shutdown with a mutex. ApplyConfig swaps global provider to noop when disabling or builds-and-swaps when enabling, then shuts down the previous provider while logging shutdown errors as warnings. Stop also serializes and disables before shutdown. Tests cover enable/disable, reload, and failure preservation. Changes
Sequence Diagram(s)sequenceDiagram
participant Manager as Manager (ApplyConfig / Stop)
participant Builder as buildTracerProvider
participant Global as Global TracerProvider
participant Prev as Previous TracerProvider
Manager->>Manager: lock mtx
alt Enable / Reconfigure
Manager->>Builder: build new tracer provider (returns TP + shutdown)
Builder-->>Manager: new TP + shutdown func
Manager->>Global: swap global TP -> new TP
Manager->>Prev: record previous TP
Manager->>Prev: Prev.Shutdown(ctx) (log warnings on error)
Manager->>Manager: set tracingEnabled=true, set shutdownFunc, update config
else Disable
Manager->>Global: swap global TP -> noop TP
Manager->>Prev: record previous TP
Manager->>Prev: Prev.Shutdown(ctx) (log warnings on error)
Manager->>Manager: set tracingEnabled=false, clear shutdownFunc, update config
end
Manager->>Manager: unlock mtx
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
Suggested labels
Suggested reviewers
🚥 Pre-merge checks | ✅ 1 | ❌ 2❌ Failed checks (2 warnings)
✅ Passed checks (1 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
tracing/tracing.go (1)
103-136: Please pin the reload state machine with focused tests.This method now has four intentionally different transitions: disabled→enabled, enabled→enabled, enabled→disabled, and build failure keeping the current provider. A table-driven test around
ApplyConfigwould make future ordering changes here much safer.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tracing/tracing.go` around lines 103 - 136, Add table-driven unit tests for ApplyConfig that pin the four state transitions: disabled→enabled, enabled→enabled, enabled→disabled, and build failure (which must leave the current provider/config intact). For each case, arrange initial state on the manager (tracingEnabled, m.config, m.shutdownFunc) and stub or mock buildTracerProvider to return a new tracer provider+shutdown or an error; then call ApplyConfig and assert the expected outcomes: whether tracingEnabled was toggled, otel tracer provider swapped, m.shutdownFunc set/cleared, old shutdown invoked on swap, config updated on success, and that on build failure the previous provider/config/shutdownFunc remain unchanged. Use table-driven subtests to enumerate cases so future reorderings of ApplyConfig are covered.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@tracing/tracing.go`:
- Around line 125-136: Add a sync.Mutex to the Manager struct and use it to
serialize all reads/writes of m.shutdownFunc and m.config in ApplyConfig and
Stop: in ApplyConfig lock the mutex before reading oldShutdown, setting
otel.SetTracerProvider(tp), setting tracingEnabled, assigning m.shutdownFunc =
shutdownFunc and m.config = cfg, then unlock before calling oldShutdown(); in
Stop lock the mutex while checking and clearing m.shutdownFunc and m.config
(capture the shutdownFunc into a local variable), clear tracingEnabled and reset
provider, unlock, then call the captured shutdownFunc outside the lock to avoid
double shutdown and races; ensure all references to m.shutdownFunc and m.config
in both methods are protected by this mutex.
---
Nitpick comments:
In `@tracing/tracing.go`:
- Around line 103-136: Add table-driven unit tests for ApplyConfig that pin the
four state transitions: disabled→enabled, enabled→enabled, enabled→disabled, and
build failure (which must leave the current provider/config intact). For each
case, arrange initial state on the manager (tracingEnabled, m.config,
m.shutdownFunc) and stub or mock buildTracerProvider to return a new tracer
provider+shutdown or an error; then call ApplyConfig and assert the expected
outcomes: whether tracingEnabled was toggled, otel tracer provider swapped,
m.shutdownFunc set/cleared, old shutdown invoked on swap, config updated on
success, and that on build failure the previous provider/config/shutdownFunc
remain unchanged. Use table-driven subtests to enumerate cases so future
reorderings of ApplyConfig are covered.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: a7d5f532-b15c-49c4-b1fd-ed1d86ae3035
📒 Files selected for processing (1)
tracing/tracing.go
Fix tracer provider reloads in different scenarios: - disabled -> enabled: install new tracer provider - enabled -> enabled: keep provider up and swap with new one - enabled -> disabled: shutdown old tracer provider, use noop provider - build failure: keep current provider Signed-off-by: Siavash Safi <siavash@cloudflare.com>
eadfab0 to
fd68082
Compare
There was a problem hiding this comment.
🧹 Nitpick comments (2)
tracing/tracing_test.go (2)
146-147: Consider adding a comment explaining the pointer comparison technique.The
fmt.Sprintf("%p", ...)approach for comparing function pointers works correctly but is non-obvious. A brief comment would help future readers understand why value equality (require.Equal(t, shutdownBefore, m.shutdownFunc)) isn't used (functions aren't directly comparable in Go).📝 Suggested documentation
require.NotNil(t, m.shutdownFunc) - // shutdownFunc should be the same as before the failed apply. + // shutdownFunc should be the same instance as before the failed apply. + // Functions aren't directly comparable in Go, so we compare pointer addresses. require.Equal(t, fmt.Sprintf("%p", shutdownBefore), fmt.Sprintf("%p", m.shutdownFunc))🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tracing/tracing_test.go` around lines 146 - 147, Add a brief inline comment above the comparison that explains why fmt.Sprintf("%p", ...) is used to compare function pointers (e.g., functions aren't directly comparable in Go) and that the intent is to verify that shutdownBefore and m.shutdownFunc refer to the same function address; reference the symbols shutdownBefore and m.shutdownFunc and place the comment immediately before the require.Equal call so future readers understand why value equality isn't used.
54-57: Cleanup should shut down the tracer provider to avoid resource leaks.The cleanup sets
m.shutdownFunc = nilwithout calling it first. This leaves the tracer provider's resources (e.g., exporters, connections) uncleaned. Compare withTestTracerProviderShutdownat line 246, which properly callsm.Stop().♻️ Suggested cleanup pattern
t.Cleanup(func() { + m.Stop() tracingEnabled.Store(false) - m.shutdownFunc = nil })🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tracing/tracing_test.go` around lines 54 - 57, The cleanup currently just sets tracingEnabled.Store(false) and m.shutdownFunc = nil, leaking tracer resources; change the cleanup to invoke the tracer shutdown path before nil-ing it (call m.Stop() or, if Stop is not exposed, call the stored m.shutdownFunc() and handle its error) so the tracer provider and exporters are properly shut down, then set m.shutdownFunc = nil; reference symbols: tracingEnabled, m.shutdownFunc, and the existing TestTracerProviderShutdown pattern (m.Stop()) to mirror the correct shutdown behavior.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@tracing/tracing_test.go`:
- Around line 146-147: Add a brief inline comment above the comparison that
explains why fmt.Sprintf("%p", ...) is used to compare function pointers (e.g.,
functions aren't directly comparable in Go) and that the intent is to verify
that shutdownBefore and m.shutdownFunc refer to the same function address;
reference the symbols shutdownBefore and m.shutdownFunc and place the comment
immediately before the require.Equal call so future readers understand why value
equality isn't used.
- Around line 54-57: The cleanup currently just sets tracingEnabled.Store(false)
and m.shutdownFunc = nil, leaking tracer resources; change the cleanup to invoke
the tracer shutdown path before nil-ing it (call m.Stop() or, if Stop is not
exposed, call the stored m.shutdownFunc() and handle its error) so the tracer
provider and exporters are properly shut down, then set m.shutdownFunc = nil;
reference symbols: tracingEnabled, m.shutdownFunc, and the existing
TestTracerProviderShutdown pattern (m.Stop()) to mirror the correct shutdown
behavior.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 19ef68e4-471c-42b6-a95e-4257e67a7623
📒 Files selected for processing (2)
tracing/tracing.gotracing/tracing_test.go
🚧 Files skipped from review as they are similar to previous changes (1)
- tracing/tracing.go
Signed-off-by: Siavash Safi <siavash@cloudflare.com>
3767cc3 to
e3d714a
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
tracing/tracing_test.go (1)
54-57: Consider calling the shutdown function before clearing it.The cleanup sets
m.shutdownFunc = nilwithout invoking it, which skips the provider's graceful shutdown. While this likely doesn't cause issues in tests (no actual OTLP endpoint), it would be cleaner to call the shutdown or usem.Stop()to properly release resources.Suggested cleanup pattern
t.Cleanup(func() { - tracingEnabled.Store(false) - m.shutdownFunc = nil + if m.shutdownFunc != nil { + _ = m.shutdownFunc() + } + tracingEnabled.Store(false) + m.shutdownFunc = nil })Or simply:
t.Cleanup(func() { - tracingEnabled.Store(false) - m.shutdownFunc = nil + m.Stop() })🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tracing/tracing_test.go` around lines 54 - 57, The test cleanup currently clears tracingEnabled and sets m.shutdownFunc = nil without invoking the shutdown; change the cleanup to call the provider shutdown before clearing it (e.g., if m.shutdownFunc != nil { m.shutdownFunc() } or call m.Stop()), then set m.shutdownFunc = nil and tracingEnabled.Store(false) so the provider can gracefully shut down; reference m.shutdownFunc and m.Stop() in the cleanup logic.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@tracing/tracing.go`:
- Line 127: The error string returned by the fmt.Errorf call that currently
reads "Failed to build a new tracer provider: %w" should start with a lowercase
letter per Go convention; update that fmt.Errorf invocation (the one returning
the "Failed to build a new tracer provider" error in tracing.go) to use "failed
to build a new tracer provider: %w" so the error string is not capitalized.
---
Nitpick comments:
In `@tracing/tracing_test.go`:
- Around line 54-57: The test cleanup currently clears tracingEnabled and sets
m.shutdownFunc = nil without invoking the shutdown; change the cleanup to call
the provider shutdown before clearing it (e.g., if m.shutdownFunc != nil {
m.shutdownFunc() } or call m.Stop()), then set m.shutdownFunc = nil and
tracingEnabled.Store(false) so the provider can gracefully shut down; reference
m.shutdownFunc and m.Stop() in the cleanup logic.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 5ff65672-6c45-420a-b467-57051753cebd
📒 Files selected for processing (2)
tracing/tracing.gotracing/tracing_test.go
Signed-off-by: Siavash Safi <siavash@cloudflare.com>
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@tracing/tracing_test.go`:
- Around line 70-72: The test relies on package-global mutable state
tracingEnabled; before asserting require.False(t, tracingEnabled.Load())
explicitly set the global to a known value (e.g., call
tracingEnabled.Store(false) or the package helper that disables tracing) so the
assertion is independent of test order, and optionally restore the previous
value at test teardown; update the test around the require.False check to
explicitly reset tracingEnabled first and then assert.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 1886d06a-d142-4d96-b09b-95bfbe28c09b
📒 Files selected for processing (1)
tracing/tracing_test.go
| // Initially disabled. | ||
| require.False(t, tracingEnabled.Load()) | ||
|
|
There was a problem hiding this comment.
Make initial-state assertion independent from test order.
At Line 71, the test assumes tracingEnabled starts as false, but this is package-global mutable state touched by other tests. Please reset state explicitly in this test before asserting to avoid shuffle/order flakiness.
🔧 Minimal hardening patch
func TestTracingEnabledFlagTransitions(t *testing.T) {
+ tracingEnabled.Store(false)
+ otel.SetTracerProvider(noop.NewTracerProvider())
+ t.Cleanup(func() {
+ tracingEnabled.Store(false)
+ otel.SetTracerProvider(noop.NewTracerProvider())
+ })
+
m := NewManager(promslog.NewNopLogger())
t.Cleanup(m.Stop)
// Initially disabled.
require.False(t, tracingEnabled.Load())📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| // Initially disabled. | |
| require.False(t, tracingEnabled.Load()) | |
| func TestTracingEnabledFlagTransitions(t *testing.T) { | |
| tracingEnabled.Store(false) | |
| otel.SetTracerProvider(noop.NewTracerProvider()) | |
| t.Cleanup(func() { | |
| tracingEnabled.Store(false) | |
| otel.SetTracerProvider(noop.NewTracerProvider()) | |
| }) | |
| m := NewManager(promslog.NewNopLogger()) | |
| t.Cleanup(m.Stop) | |
| // Initially disabled. | |
| require.False(t, tracingEnabled.Load()) |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tracing/tracing_test.go` around lines 70 - 72, The test relies on
package-global mutable state tracingEnabled; before asserting require.False(t,
tracingEnabled.Load()) explicitly set the global to a known value (e.g., call
tracingEnabled.Store(false) or the package helper that disables tracing) so the
assertion is independent of test order, and optionally restore the previous
value at test teardown; update the test around the require.False check to
explicitly reset tracingEnabled first and then assert.
|
LGTM, I specifically like that the tests enforce what the change accomplishes 👍 |
Fix tracer provider reloads in different scenarios:
Pull Request Checklist
Please check all the applicable boxes.
benchstatto compare benchmarksWhich user-facing changes does this PR introduce?
Summary by CodeRabbit
Refactor
Bug Fixes
Tests