core, opentelemetry: Implement Attempt-Level RPC Delay Observability (Proposal A121)#12807
core, opentelemetry: Implement Attempt-Level RPC Delay Observability (Proposal A121)#12807AgraVator wants to merge 16 commits into
Conversation
This commit implements the plumbing required to propagate delay reason tokens from load balancing policies up to the transport layer and tracers, as specified in the LB policy delay design.
| connectivityState = newState; | ||
| picker = newPicker; | ||
| if (newState == CONNECTING || newState == IDLE) { | ||
| picker = new PriorityPicker(newPicker, priority); |
There was a problem hiding this comment.
appends "px:" to the child delay_type
shivaspeaks
left a comment
There was a problem hiding this comment.
Quick review, LGTM overall. I'll take a deeper look with implementation doc when I'll be back in office.
This change looks to be something that should be consistent in all the languages. Is there a gRFC baking for this? If so link that PR in description?
| PickResult childResult = delegate.pickSubchannel(args); | ||
| if (!childResult.hasResult() && childResult.getDelayReasonToken() != null) { | ||
| return PickResult.withNoResult( | ||
| "priority_" + priority + ":" + childResult.getDelayReasonToken()); |
There was a problem hiding this comment.
A question to understand this better from performance perspective.
This string concatenation happens on the hot path for every buffered RPC. If the priority tree is deep or policies are nested, this may lead to,
Allocation Overhead- repeated string and PickResult allocations on every pickSubchannel call.
Metric Cardinality- These nested tokens (e.g., priority_p0:priority_p1:ring_hash:connecting) are used as metric labels. Highly nested tokens can cause a cardinality explosion.
Is there a way we can cache the concatenated PickResult in the PriorityPicker (if the child's result is also cached/static) to avoid per-pick allocations? I assume we need the new childResult's DelayReasonToken, I'm not sure if that stays static or is dynamic. If it stays static then we can move out or else we should at least create "priority_" + priority + ":" + statically?
There was a problem hiding this comment.
"delayType" will be very limited, only priorities will be appended and that will lead to some cardinality but not too much.
The delayReason will be dynamic
The gRFC was raised today itself, have attached the same to the description.
…dence invariants - Refactor ClientStreamTracer to expose delayTypeStarted(String) and delayReasonAttached(String) - Enhance PickResult with separate delayType and delayReason diagnostic fields - Implement Mark Roth's hybrid telemetry cadence model in DelayedClientTransport.PendingStream - Support channel fallback delay states (client_channel_init, subchannel_state_mismatch, wait_for_ready_failed) - Simplify leaf and container LB policies to emit canonical unified connecting metric labels
# Conflicts: # core/src/main/java/io/grpc/internal/PickFirstLeafLoadBalancer.java
…tive span retention per A121 spec
…etadata across pass-through containers and Category C scenarios
…hannel: waiting for picker' per gRFC spec
…ests with concrete Fake implementations
…an verification and simplify AttributeKey references
This PR implements Attempt-Level RPC Delay Observability across the core channel transport, built-in load balancers, xDS policies, and the OpenTelemetry telemetry plugin, fully aligned with gRPC Proposal A121.
(Note: Name resolution / Call-Level channel initialization delay observability is cleanly isolated in a dedicated companion PR).
What Changed
ClientStreamTracer): Added attempt-scoped queuing delay observability hooks:recordAttemptDelayStart(String delayType, String delayReason)recordAttemptDelayReasonChanged(String delayReason)recordAttemptDelayEnd()LoadBalancer.PickResult): Added canonical low-cardinalitydelayTypeand high-cardinality diagnosticdelayReasonfields, along with factory methodwithNoResult(String delayType, String delayReason).DelayedClientTransport): UpdatedPendingStreamto track active delay categorization and diagnostic strings. ImplementedupdateDelay(...)enforcing Active Span Retention: whendelayTyperemains identical across LB state transitions (e.g., priority policy failover), the active segment continues without stopwatch reset or span re-creation.OpenTelemetry*Module): Implemented duration recording and tracing:grpc.client.attempt.delay.durationhistogram (seconds)."Attempt Delay"child tracing spans carryinggrpc.delay_type."Delay state transition"span events carrying explicitgrpc.delay_typeandgrpc.delay_reasonattributes.GRPC_EXPERIMENTAL_ENABLE_DELAY_OBSERVABILITYfeature flag (defaultfalse).PickFirst/RoundRobin: Emits canonical"connecting"delay types and diagnostic reasons when buffering picks.RingHash/RLS/CDS: Emits explicit control-plane and resolution delay attributes ("rls_lookup_pending","cds_dynamic_discovery").PriorityLoadBalancer: Composes child delay reasons withpX:prefixes (p0:connecting,p1:connecting) to clearly trace priority failover transitions.