Skip to content

threading: lock-free fast path for SemaphoreSlim.WaitAsync#125452

Open
thomhurst wants to merge 16 commits intodotnet:mainfrom
thomhurst:semaphoreslim-cas
Open

threading: lock-free fast path for SemaphoreSlim.WaitAsync#125452
thomhurst wants to merge 16 commits intodotnet:mainfrom
thomhurst:semaphoreslim-cas

Conversation

@thomhurst
Copy link
Copy Markdown

Use a lock-free CAS fast path in SemaphoreSlim.WaitAsync to skip the Monitor lock when a permit is immediately available, improving uncontended throughput

Copilot AI review requested due to automatic review settings March 11, 2026 18:18
@dotnet-policy-service dotnet-policy-service Bot added the community-contribution Indicates that the PR has been added by a community member label Mar 11, 2026
@thomhurst
Copy link
Copy Markdown
Author

@EgorBot -intel -amd -arm

  using System.Threading;
  using System.Threading.Tasks;
  using BenchmarkDotNet.Attributes;

  [MemoryDiagnoser]
  public class SemaphoreSlimUncontended
  {
      private SemaphoreSlim _sem = new SemaphoreSlim(1, 1);

      [Benchmark]
      public async Task WaitAsync_Release()
      {
          await _sem.WaitAsync();
          _sem.Release();
      }
  }

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces a lock-free fast path in SemaphoreSlim.WaitAsync that attempts to acquire an available permit via CAS, avoiding taking m_lockObjAndDisposed when uncontended.

Changes:

  • Added a CAS-based fast path to decrement m_currentCount when a permit appears immediately available.
  • Added special-case handling to keep AvailableWaitHandle state consistent if it’s initialized concurrently during the fast-path acquire.

@EgorBo
Copy link
Copy Markdown
Member

EgorBo commented Mar 11, 2026

Doesn't lock itself has fast paths for that?

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 1 out of 1 changed files in this pull request and generated 1 comment.

Copilot AI review requested due to automatic review settings March 11, 2026 19:36
@thomhurst
Copy link
Copy Markdown
Author

@EgorBot -intel -amd -arm

  using System.Threading;
  using System.Threading.Tasks;
  using BenchmarkDotNet.Attributes;

  [MemoryDiagnoser]
  public class SemaphoreSlimUncontended
  {
      private SemaphoreSlim _sem = new SemaphoreSlim(1, 1);

      [Benchmark]
      public async Task WaitAsync_Release()
      {
          await _sem.WaitAsync();
          _sem.Release();
      }
  }

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 1 out of 1 changed files in this pull request and generated 2 comments.

Comment thread src/libraries/System.Private.CoreLib/src/System/Threading/SemaphoreSlim.cs Outdated
@thomhurst
Copy link
Copy Markdown
Author

@EgorBo I think just by not entering the lock we can save some time: EgorBot/Benchmarks#31

Use Interlocked.Add to apply a relative delta to m_currentCount rather
than writing back an absolute snapshot-derived value, so concurrent
lock-free decrements from the WaitAsync fast path are not overwritten.
Replace plain --m_currentCount with a CAS loop to prevent a double grant
when the lock-free WaitAsync fast path decrements m_currentCount between
the > 0 check and the decrement in the slow path.

WaitCore is safe because m_waitCount++ on lock entry blocks the CAS guard
for its entire critical section. WaitAsyncCore has no such protection.
Apply the same CAS-loop pattern to WaitCore's m_currentCount decrement
that was applied to WaitAsyncCore in the previous commit. A fast-path
thread that read m_waitCount = 0 before WaitCore's m_waitCount++ can
still race with WaitCore's check-at-404 / decrement-at-407 sequence.
The CAS loop serializes both operations on m_currentCount atomically.
… stress test

The assert !waitSuccessful || m_currentCount > 0 in WaitCore could fire
spuriously in Debug builds: the lock-free WaitAsync fast path runs outside
the lock, so it can decrement m_currentCount to 0 between
WaitUntilCountOrTimeout returning and the assert executing.

Adds a stress test that races AvailableWaitHandle lazy initialization
against WaitAsync fast-path acquires and verifies the handle is never
signaled when CurrentCount == 0.
Copilot AI review requested due to automatic review settings April 4, 2026 14:15
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

Comment thread src/libraries/System.Threading/tests/SemaphoreSlimTests.cs Outdated
…phoreSlim

Also fix CS0420 in Release(): Volatile.Read(ref volatile_field) triggers a
compiler error in the coreclr project build; replaced with a plain field read
(already volatile) so the testhost can be rebuilt with the fixed implementation.
Copilot AI review requested due to automatic review settings April 4, 2026 16:47
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.

Comment thread src/libraries/System.Threading/tests/PerformanceTests/SemaphoreSlimBenchmarks.cs Outdated
Comment thread src/libraries/System.Threading/tests/SemaphoreSlimTests.cs Outdated
- Extract duplicated CAS-decrement loop into TryDecrementCount() with
  AggressiveInlining, replacing inline copies in WaitCore and WaitAsyncCore
- Strengthen Assert.InRange to Assert.Equal in NeverUnderflows test
- Add bulk Release(2) concurrent stress test for Interlocked.Add delta math
- Add cancellation-during-fast-path stress test for count integrity
- Use m_currentCount (post-Add) instead of netCount for m_waitHandle.Set()
- Add UncontendedSync and MixedSyncAsync benchmarks for sync path coverage
Prevents the background task from pegging a CPU core in CI while still
exercising concurrent lazy initialization of the wait handle.
Copilot AI review requested due to automatic review settings April 4, 2026 17:20
@thomhurst
Copy link
Copy Markdown
Author

@EgorBot -intel -amd -arm

  using System.Threading;
  using System.Threading.Tasks;
  using BenchmarkDotNet.Attributes;

  [MemoryDiagnoser]
  public class SemaphoreSlimUncontended
  {
      private SemaphoreSlim _sem = new SemaphoreSlim(1, 1);

      [Benchmark]
      public async Task WaitAsync_Release()
      {
          await _sem.WaitAsync();
          _sem.Release();
      }
  }

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.

Comment thread src/libraries/System.Threading/tests/SemaphoreSlimTests.cs Outdated
Comment thread src/libraries/System.Threading/tests/PerformanceTests/SemaphoreSlimBenchmarks.cs Outdated
@JulieLeeMSFT
Copy link
Copy Markdown
Member

@thomhurst, please address the copilot feedback.

@JulieLeeMSFT JulieLeeMSFT added the needs-author-action An issue or pull request that requires more info or actions from the author. label Apr 27, 2026
… file

WaitAsync(CancellationToken) returns Task, not Task<bool>; the prior
declaration didn't compile. Removed PerformanceTests/SemaphoreSlimBenchmarks.cs
since it had no csproj and wasn't wired into any project; runtime perf
benchmarks live in dotnet/performance, and the EgorBot inline benchmarks
posted on the PR cover the relevant scenarios.
@thomhurst
Copy link
Copy Markdown
Author

thomhurst commented Apr 27, 2026

@JulieLeeMSFT Pushed 526f1c4 to address the remaining Copilot threads

@dotnet-policy-service dotnet-policy-service Bot removed the needs-author-action An issue or pull request that requires more info or actions from the author. label Apr 27, 2026
The comment narrated what the next several lines do; the variable
name and surrounding structure already convey it.
Address review hazards in the WaitAsync CAS fast path:

- Make m_waitCount and m_asyncHead volatile. The fast path reads them without
  the lock; sync-waiter writes inside the lock must publish via
  release semantics rather than depending on the lock release that the
  fast path bypasses. Without this, ARM64 can let the fast path observe
  m_waitCount == 0 while a sync waiter is parked, stealing the slot and
  leaving the waiter blocked.

- Restructure AvailableWaitHandle init to publish-then-reflect:
  publish the handle unsignaled, full barrier, then conditionally Set
  based on the post-publish count read. Closes the race where
  ManualResetEvent allocation overlapped a fast-path CAS, leaving the
  handle Set with count == 0.

- WaitCore: loop instead of falling through with a stale waitSuccessful
  when TryDecrementCount loses to a fast-path acquirer. Fixes the case
  where Wait(Infinite) could return without owning a permit (silently
  dropped by the void overload, lying about acquisition for bool overloads).

- Release: use Interlocked.Add's return value for the MRE.Set sentinel
  so a fast-path decrement racing between the Add and the re-read
  doesn't mask the 0 -> positive transition.

- Strengthen the AvailableWaitHandle init test: allocate a fresh
  SemaphoreSlim per iteration so each iteration is a real attempt at
  the race. With a single semaphore the race only fires on the first
  AvailableWaitHandle access.
Defensive: the prior placement was correct because a thrown OCE always
short-circuits before re-entry, but moving 'oce' (and 'timedOut',
already loop-local) inside makes the freshness invariant explicit and
robust to future edits.
@thomhurst
Copy link
Copy Markdown
Author

@EgorBot -intel -amd -arm

  using System.Threading;
  using System.Threading.Tasks;
  using BenchmarkDotNet.Attributes;

  [MemoryDiagnoser]
  public class SemaphoreSlimUncontended
  {
      private SemaphoreSlim _sem = new SemaphoreSlim(1, 1);

      [Benchmark]
      public async Task WaitAsync_Release()
      {
          await _sem.WaitAsync();
          _sem.Release();
      }
  }

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area-System.Threading community-contribution Indicates that the PR has been added by a community member

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants