Skip to content

Open observation scope and stop on non-FeignException errors#3407

Merged
velo merged 1 commit into
OpenFeign:masterfrom
seonwooj0810:fix/2566-micrometer-catch-all-exceptions-and-open-scope
Jun 11, 2026
Merged

Open observation scope and stop on non-FeignException errors#3407
velo merged 1 commit into
OpenFeign:masterfrom
seonwooj0810:fix/2566-micrometer-catch-all-exceptions-and-open-scope

Conversation

@seonwooj0810

Copy link
Copy Markdown
Contributor

Summary

MicrometerObservationCapability had two related defects that left users without correct tracing or metrics when their feign-instrumented call did anything other than complete successfully or fail with a FeignException:

  1. The new observation was never made current (Feign observation span is not put in "scope" - loggers do not see current spanId #2406). The capability called observation.start() but never observation.openScope(). Any ObservationHandler that looks at the current observation — most notably the tracing handler that copies traceId/spanId into the MDC — saw the parent observation while the Feign call ran. Log lines emitted from inside the call were tagged with the wrong span id, breaking distributed-tracing log correlation.
  2. Only FeignException was caught ( [BUG] MicrometerObservationCapability not reporting timeouts #2566). The actual clients Feign delegates to throw client-specific types directly: JDK HttpURLConnection throws SocketTimeoutException / ConnectException, Apache HC5 throws HttpHostConnectException, OkHttp throws IOException. None of those are FeignException. They flew past the catch, the observation was never stopped, observation.error(...) was never called — so the timer was never committed, the trace stayed open, and one failed request quietly leaked an observation context.

Both fixes are small and live in the same handful of lines, so this PR addresses them together.

What changed

In MicrometerObservationCapability:

  • The synchronous and asynchronous enrich paths now open the observation as a Scope (try (Observation.Scope scope = observation.openScope()) { … }). The observation is the current observation while the underlying client runs, so MDC-propagating handlers see the Feign span — not the parent.
  • The catch clause widened from FeignException to Throwable. Anything thrown by the underlying client is now captured by observation.error(...) and the observation is always stopped in finally. On the async path the same widening is applied to the synchronous portion of execute and to the whenComplete callback.
  • The async path now unwraps CompletionException before recording, so the error stamped on the observation is the underlying cause rather than the executor wrapper.
  • finalizeObservation collapsed into the inline logic now that the lifecycle is symmetrical.

Why this shape

Observation.Scope is the documented mechanism in Micrometer for making an observation visible to downstream code — handlers, propagators, baggage. start() only marks the observation as in-flight; without openScope() the registry's getCurrentObservation() keeps returning the parent. Opening the scope inside a try-with-resources keeps the scope tied to the call boundary, so a thrown exception still closes it cleanly and the parent observation is correctly restored afterwards.

Catching Throwable rather than Exception is deliberate: the contract here is "the observation must be closed no matter what happens inside this block." finally { observation.stop(); } is what makes that true, and catch (Throwable) is what makes sure observation.error(...) gets the right cause attached before stop.

Tests

Added MicrometerObservationCapabilityTest (uses TestObservationRegistry from micrometer-observation-test, already on the classpath via micrometer-test) covering:

  • scopeIsOpenDuringClientExecutionobservationRegistry.getCurrentObservation() returns the Feign observation while the underlying Client runs, and is null again afterwards. Locks in the fix for Feign observation span is not put in "scope" - loggers do not see current spanId #2406.
  • recordsNonFeignExceptionThrownByClient / recordsRuntimeExceptionThrownByClient — when the Client throws SocketTimeoutException / RuntimeException, the observation is stopped and the underlying error is recorded. Locks in [BUG] MicrometerObservationCapability not reporting timeouts #2566. (Retryer.NEVER_RETRY is used so the test asserts on the single observation produced, not on retry copies.)
  • asyncScopeIsOpenDuringClientExecution — same for the async path.
  • asyncRecordsNonFeignExceptionFromFailedFuture — a future completing exceptionally with IOException records the original IOException on the observation (not a CompletionException wrapper).
  • asyncRecordsSynchronousExceptionFromClient — synchronous throws from inside AsyncClient.execute also stop the observation.
  • parentObservationIsRestoredAfterCall — confirms the scope correctly returns to the caller's observation, so we aren't accidentally clobbering ambient tracing context.
  • scopeReceivesObservationCarryingFeignContext — verifies the FeignContext is the context delivered to ObservationHandler#onScopeOpened, which is what tracing handlers actually read.

Existing tests still pass: mvn -pl micrometer test → 26 tests, 0 failures across MicrometerCapabilityTest, MicrometerObservationRegistryCapabilityTest, FeignHeaderInstrumentationTest, and the new class.

Fixes #2406
Fixes #2566

`MicrometerObservationCapability` started an `Observation` but never opened
its `Scope`, so the new observation was not the current one while the call
ran. Downstream `ObservationHandler`s — most importantly, tracing handlers
that copy trace/span ids into the MDC — saw the parent observation instead
of the Feign one, so log lines for the outgoing call were tagged with the
wrong span id.

The capability also caught only `FeignException`. Underlying clients
(JDK `HttpURLConnection`, Apache HC5, OkHttp, …) throw client-specific
exceptions — `SocketTimeoutException`, `HttpHostConnectException`,
`IOException` — directly, never wrapped as `FeignException`. Those slipped
past the `catch`, so the observation was never stopped: the error was not
recorded, the timer was never committed, and the observation leaked.

Both calls now open the observation as a `Scope` while the underlying
client runs, catch any `Throwable` to record the error and stop the
observation, and unwrap `CompletionException` on the async path so the
recorded error is the cause rather than the wrapper.

Fixes OpenFeign#2406
Fixes OpenFeign#2566

@velo velo left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The production change is covered by focused synchronous and asynchronous observation tests, preserves the public API, and introduces no security concerns. Required checks are green. Thanks!

@velo velo merged commit 16cf312 into OpenFeign:master Jun 11, 2026
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[BUG] MicrometerObservationCapability not reporting timeouts Feign observation span is not put in "scope" - loggers do not see current spanId

2 participants