[fix][broker] Remove lock contention in delayed delivery stats read paths#25990
[fix][broker] Remove lock contention in delayed delivery stats read paths#25990nodece wants to merge 1 commit into
Conversation
Is this in a single topic and single subscription? FYI, there's a limit of tracking up to 30M backlogs (with default BK |
Yes.
I will check this later. |
7795eec to
c8f77d9
Compare
a86ce4a to
bdfc915
Compare
| immutableBucket.setSnapshotSegments(null); | ||
| immutableBucket.asyncUpdateSnapshotLength(); | ||
| immutableBucket.asyncUpdateSnapshotLength() | ||
| .thenRun(() -> immutableBuckets.recomputeCounters()); |
There was a problem hiding this comment.
This calls immutableBuckets.recomputeCounters() from an async callback without the tracker lock, while other paths mutate the same TreeRangeMap under that lock. TreeRangeMap is not thread-safe, so this can race with put/remove/clear and produce incorrect counters or fail during concurrent modification.
There was a problem hiding this comment.
This removes entries through the mutable asMapOfRanges() view, bypassing ImmutableBucketIndex.remove(). The cached count and totalSnapshotLength can stay stale after clearDelayedMessages(), so later metrics and decisions based on immutableBuckets.count() can observe an inconsistent index state.
|
At a higher level, moving stats reads away from the dispatcher lock looks like the right direction for this PR. The remaining concern is making sure the newly unsynchronized stats paths do not read or mutate tracker internals without a clear concurrency boundary. One possible short-term approach is to use the tracker lock for the small fixed-size stats sections and unsafe |
bdfc915 to
30b06a0
Compare
Motivation
Under a large delayed-delivery workload (~500M delayed messages), jstack analysis
shows the dispatcher lock is held while sealing delayed-delivery buckets.
A broker worker thread spends significant CPU time in:
while holding both:
As a result, stats collection threads (Prometheus, admin APIs, getStatsAsync)
are blocked waiting for the dispatcher monitor.
The same pattern affects
InMemoryDelayedDeliveryTracker, wheregetBufferMemoryUsage()iterates the entireTreeMap<Long, TreeMap<Long, Roaring64Bitmap>>while holding the dispatcher lock.
Modifications
1.
BucketDelayedDeliveryTracker— lock-free bucket stats via AtomicLong countersIntroduce two
AtomicLongcounters (bucketsCountandtotalSnapshotLengthBytes)maintained alongside the existing
TreeRangeMap<Long, ImmutableBucket>:putBucket(),removeBucket(), andupdateBucketSnapshotLength()that wrap
TreeRangeMapoperations and update counters atomicallyputBucket()calculates removed bucket lengths before insertion (sinceTreeRangeMap.put()silently removes/splits overlapping entries) and updates counters by delta
removeBucket()decrements counters only when removal succeedsupdateBucketSnapshotLength()updates counters when snapshot length changes asynchronouslyrecoverBucketSnapshot(), recalculate counters after all snapshots are loadedto ensure accuracy (since buckets are created with
snapshotLength=0initially)genTopicMetricMap()reads counters without holding any lock2.
InMemoryDelayedDeliveryTracker— lock-free memory usage via delta trackingAdd
AtomicLong memoryUsagethat is updated by delta at each mutation point(
addMessage,getScheduledMessages,clear).getBufferMemoryUsage()returnsthe cached value directly instead of iterating the nested
TreeMap.3. Dispatcher classes — remove
synchronizedfrom stats read pathsIn both
PersistentDispatcherMultipleConsumersandPersistentDispatcherMultipleConsumersClassic:getNumberOfDelayedMessages(),getDelayedTrackerMemoryUsage(),getBucketDelayedIndexStats(),shouldPauseDeliveryForDelayTracker()—removed
synchronized; now useOptional.map()with thevolatilefielddelayedDeliveryTrackerfield changed tovolatilefor safe publication4.
BucketDelayedMessageIndexStats— synchronized metric map generationAdd
synchronizedtogenTopicMetricMap()to ensure atomic reads of multiple fieldswhen generating metrics.
5.
ImmutableBucket— fix async chain in recoverFix
asyncRecoverBucketSnapshotEntry()to properly chainasyncUpdateSnapshotLength()using
thenCompose()instead ofthenApply(), ensuring the snapshot length is updatedbefore recovery completes.
Verifying this change
BucketDelayedDeliveryTrackerTestpassacross create, merge, trim, and recover operations