[ISSUE #10065] Fix inconsistent lock/unlock timeout and improve expired lock handling in ReceiptHandleGroup#10255
Open
Wubabalala wants to merge 2 commits intoapache:developfrom
Open
[ISSUE #10065] Fix inconsistent lock/unlock timeout and improve expired lock handling in ReceiptHandleGroup#10255Wubabalala wants to merge 2 commits intoapache:developfrom
Wubabalala wants to merge 2 commits intoapache:developfrom
Conversation
…and check expiry before tryAcquire in ReceiptHandleGroup 1. Align unlock() threshold from *2 to *3 to match lock()'s force-acquire threshold, eliminating a deadlock window of duration T in [2T, 3T) where the semaphore permit is leaked and cannot be force-acquired. 2. Check lock expiry before calling tryAcquire() so that expired locks are force-acquired immediately without blocking for the full timeout. This improves throughput when the queue has accumulated expired items. 3. Extract tryForceAcquire() to avoid duplicating the double-checked locking block between the fast path and the slow path. 4. Add two tests covering the aligned threshold behavior and the pre-check fast path. Fixes apache#10065
…en threshold test - Skip tryForceAcquire when lastLockTimeMs is -1 (never locked), preventing the pre-check from treating uninitialized state as expired and bypassing the semaphore on first lock() call. - Widen the threshold test window (T=200ms, sleep=450ms, lock timeout=10ms) so it cannot accidentally pass via force-acquire after tryAcquire timeout.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
ReceiptHandleGroup.HandleDatauses different timeout multipliers forlock()(*3) andunlock()(*2), creating a deadlock window of duration T in the range[2T, 3T):2T(e.g., slow network I/O)unlock()— since elapsed >T*2, it skipssemaphore.release(), leaking the permitlock()—tryAcquire()fails (0 permits), and force-acquire requires elapsed >T*3, which isn't met yetHandleDatathrowProxyException(INTERNAL_SERVER_ERROR)until3TelapsesWith the default
lockTimeoutMsInHandleGroup = 3000ms, this is a 3-second window where the semaphore is stuck.Changes
unlock()threshold from*2to*3to matchlock()'s force-acquire threshold, eliminating the gaptryAcquire()so that expired locks are force-acquired immediately without blocking for the full timeout (addresses the second point in the issue)tryForceAcquire()to deduplicate the double-checked locking block between the fast path and slow pathTest plan
testUnlockReleasesWithinAlignedThreshold— verifies semaphore is properly released when elapsed is in[2T, 3T)(failed with old*2threshold)testLockForceAcquiresBeforeTryAcquire— verifies expired lock is force-acquired in <50ms instead of blocking for the full timeout (old code: 111ms; new code: <5ms)ReceiptHandleGroupTesttests pass (no regression)Fixes #10065