Reapply "A few fixes in the threadpool semaphore. Unify Windows/Unix implementation of LIFO policy." (#125193)#125596
Reapply "A few fixes in the threadpool semaphore. Unify Windows/Unix implementation of LIFO policy." (#125193)#125596VSadov wants to merge 22 commits intodotnet:mainfrom
Conversation
|
Tagging subscribers to this area: @agocke, @VSadov |
There was a problem hiding this comment.
Pull request overview
Reapplies and reworks the ThreadPool LIFO semaphore changes (previously reverted due to NuGet restore regressions) by unifying the blocking/wake implementation across Windows and Unix using OS compare-and-wait primitives (WaitOnAddress / futex) with a monitor fallback.
Changes:
- Adds low-level compare-and-wait interop for Windows (WaitOnAddress) and Linux (futex) and wires them through System.Native/CoreLib.
- Replaces the prior per-OS
LowLevelLifoSemaphoreimplementations with a unified managed implementation using a LIFO stack of per-thread blockers plus updated spin/backoff behavior. - Adjusts worker dispatch heuristics (missed-steal handling) and configuration plumbing (cooperative blocking env var alias).
Reviewed changes
Copilot reviewed 23 out of 23 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
| src/native/libs/System.Native/pal_threading.h | Adds exported futex-related entrypoints to the System.Native PAL surface. |
| src/native/libs/System.Native/pal_threading.c | Implements Linux futex wait/wake wrappers; provides non-Linux stubs. |
| src/native/libs/System.Native/entrypoints.c | Registers the new futex entrypoints for managed interop. |
| src/libraries/System.Private.CoreLib/src/System/Threading/ThreadPoolWorkQueue.cs | Adds a 1ms sleep before requesting workers when a steal was missed. |
| src/libraries/System.Private.CoreLib/src/System/Threading/PortableThreadPool.WorkerThread.cs | Switches the worker wait to the new LowLevelLifoSemaphore.Wait(timeout, activeThreadCount) signature; removes old spin-limit wiring at the call site. |
| src/libraries/System.Private.CoreLib/src/System/Threading/PortableThreadPool.Blocking.cs | Adds the DOTNET_ThreadPool_CooperativeBlocking env var alias for cooperative blocking. |
| src/libraries/System.Private.CoreLib/src/System/Threading/LowLevelThreadBlocker.cs | Introduces a portable blocker abstraction (futex/WaitOnAddress or LowLevelMonitor fallback). |
| src/libraries/System.Private.CoreLib/src/System/Threading/LowLevelLifoSemaphore.cs | Replaces OS-specific semaphore core with a unified managed LIFO implementation + updated spin heuristic and wake accounting. |
| src/libraries/System.Private.CoreLib/src/System/Threading/LowLevelLifoSemaphore.Windows.cs | Removes the prior Windows IOCP-based LIFO semaphore implementation. |
| src/libraries/System.Private.CoreLib/src/System/Threading/LowLevelLifoSemaphore.Unix.cs | Removes the prior Unix WaitSubsystem-based semaphore implementation. |
| src/libraries/System.Private.CoreLib/src/System/Threading/LowLevelFutex.Windows.cs | Adds Windows WaitOnAddress-based compare-and-wait wrapper. |
| src/libraries/System.Private.CoreLib/src/System/Threading/LowLevelFutex.Unix.cs | Adds Unix futex wrapper (currently Linux-only per comments). |
| src/libraries/System.Private.CoreLib/src/System/Threading/Backoff.cs | Updates exponential backoff to return spin count and reduces max backoff. |
| src/libraries/System.Private.CoreLib/src/System.Private.CoreLib.Shared.projitems | Wires new threading and interop files into CoreLib build (adds/removes Compile items). |
| src/libraries/Common/src/Interop/Windows/Mincore/Interop.WaitOnAddress.cs | Adds LibraryImport declarations for WaitOnAddress/WakeByAddressSingle. |
| src/libraries/Common/src/Interop/Windows/Kernel32/Interop.CriticalSection.cs | Adds SuppressGCTransition on LeaveCriticalSection. |
| src/libraries/Common/src/Interop/Windows/Interop.Libraries.cs | Adds the Synch API-set library constant for WaitOnAddress imports. |
| src/libraries/Common/src/Interop/Unix/System.Native/Interop.LowLevelMonitor.cs | Adds SuppressGCTransition on LowLevelMonitor_Release. |
| src/libraries/Common/src/Interop/Unix/System.Native/Interop.Futex.cs | Adds LibraryImport declarations for futex wait/wake entrypoints. |
| src/coreclr/tools/aot/ILCompiler/reproNative/reproNative.vcxproj | Adds Synchronization.lib to link set for NativeAOT repro project. |
| src/coreclr/nativeaot/BuildIntegration/WindowsAPIs.txt | Allows WaitOnAddress/WakeByAddressSingle through the NativeAOT Windows API allowlist. |
| src/coreclr/nativeaot/BuildIntegration/Microsoft.NETCore.Native.Windows.targets | Adds Synchronization.lib to NativeAOT SDK library list. |
| docs/coding-guidelines/interop-guidelines.md | Updates interop guideline examples to match casing/structure and adds Synch library mention. |
You can also share your feedback on Copilot code review. Take the survey.
There was a problem hiding this comment.
Pull request overview
Reapplies and extends the threadpool semaphore/LIFO-policy changes that were previously reverted due to NuGet restore performance regression, by unifying the Windows/Unix implementation around a shared managed LIFO waiter stack and adding low-level wait/wake primitives (Linux futex, Windows WaitOnAddress) plus supporting interop/AOT wiring.
Changes:
- Add Linux futex exports in System.Native and corresponding managed interop; add Windows WaitOnAddress interop and link inputs for NativeAOT.
- Replace platform-specific
LowLevelLifoSemaphoreimplementations with a unified managed implementation built onLowLevelThreadBlocker. - Adjust threadpool behavior around missed steals (including a brief delay) and tweak a blocking config switch plumbing.
Reviewed changes
Copilot reviewed 23 out of 23 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| src/native/libs/System.Native/pal_threading.h | Adds futex-related PALEXPORT declarations. |
| src/native/libs/System.Native/pal_threading.c | Implements Linux futex wait/wake syscalls (with non-Linux stubs). |
| src/native/libs/System.Native/entrypoints.c | Exposes futex entrypoints via DllImportEntry. |
| src/libraries/System.Private.CoreLib/src/System/Threading/ThreadPoolWorkQueue.cs | Adds a delay before requesting a worker when missed steals occur. |
| src/libraries/System.Private.CoreLib/src/System/Threading/PortableThreadPool.WorkerThread.cs | Switches semaphore construction/Wait signature; minor comment fix. |
| src/libraries/System.Private.CoreLib/src/System/Threading/PortableThreadPool.Blocking.cs | Adds env-var name to cooperative blocking config lookup. |
| src/libraries/System.Private.CoreLib/src/System/Threading/LowLevelThreadBlocker.cs | Introduces a portable blocker using futex/WaitOnAddress or monitor fallback. |
| src/libraries/System.Private.CoreLib/src/System/Threading/LowLevelLifoSemaphore.cs | Reworks semaphore into a single managed implementation with LIFO waiter stack + spin heuristic. |
| src/libraries/System.Private.CoreLib/src/System/Threading/LowLevelLifoSemaphore.Windows.cs | Removes prior Windows IOCP-based implementation. |
| src/libraries/System.Private.CoreLib/src/System/Threading/LowLevelLifoSemaphore.Unix.cs | Removes prior Unix WaitSubsystem-based implementation. |
| src/libraries/System.Private.CoreLib/src/System/Threading/LowLevelFutex.Windows.cs | Adds Windows WaitOnAddress/WakeByAddressSingle wrapper. |
| src/libraries/System.Private.CoreLib/src/System/Threading/LowLevelFutex.Unix.cs | Adds Linux futex wrapper (Linux-only). |
| src/libraries/System.Private.CoreLib/src/System/Threading/Backoff.cs | Changes exponential backoff to return spin count and reduces cap. |
| src/libraries/System.Private.CoreLib/src/System.Private.CoreLib.Shared.projitems | Wires new threading/interop files into CoreLib build and removes old semaphore OS-specific files. |
| src/libraries/Common/src/Interop/Windows/Mincore/Interop.WaitOnAddress.cs | Adds LibraryImport for WaitOnAddress/WakeByAddressSingle. |
| src/libraries/Common/src/Interop/Windows/Kernel32/Interop.CriticalSection.cs | Adds SuppressGCTransition to LeaveCriticalSection. |
| src/libraries/Common/src/Interop/Windows/Interop.Libraries.cs | Adds Libraries.Synch constant for the synch api-set. |
| src/libraries/Common/src/Interop/Unix/System.Native/Interop.LowLevelMonitor.cs | Adds SuppressGCTransition to LowLevelMonitor_Release. |
| src/libraries/Common/src/Interop/Unix/System.Native/Interop.Futex.cs | Adds LibraryImport declarations for System.Native futex exports. |
| src/coreclr/tools/aot/ILCompiler/reproNative/reproNative.vcxproj | Links Synchronization.lib for WaitOnAddress/WakeByAddressSingle. |
| src/coreclr/nativeaot/BuildIntegration/WindowsAPIs.txt | Adds WaitOnAddress/WakeByAddressSingle to the NativeAOT Windows API list. |
| src/coreclr/nativeaot/BuildIntegration/Microsoft.NETCore.Native.Windows.targets | Adds Synchronization.lib to SDK native libraries for NativeAOT. |
| docs/coding-guidelines/interop-guidelines.md | Updates interop naming/examples (e.g., Mincore, Synch). |
You can also share your feedback on Copilot code review. Take the survey.
…implementation of LIFO policy." (dotnet#125193) This reverts commit 51b1e92.
| private bool WaitAsWaiter(int timeoutMs, bool allowFastWake) | ||
| { | ||
| Debug.Assert(timeoutMs >= -1); | ||
|
|
||
| while (true) | ||
| { | ||
| long blockingStart = allowFastWake ? 0 : Stopwatch.GetTimestamp(); | ||
| if (timeoutMs == 0 || !Block(timeoutMs)) | ||
| { | ||
| // Unregister the waiter, but do not decrement wake count, the thread did not observe a wake. | ||
| _separated._counts.InterlockedDecrementWaiterCount(); | ||
| return false; |
There was a problem hiding this comment.
WaitAsWaiter loops on spurious/stolen wakes but never recomputes the remaining timeout. For timed waits, repeated spurious wakes can cause the total wait time to exceed the requested timeout (potentially by multiples). Consider tracking a deadline and passing a recomputed remaining timeout to Block(), returning false once it expires.
| #if TARGET_LINUX | ||
| // use OS-provided compare-and-wait API. | ||
| #define USE_FUTEX | ||
| #elif TARGET_WINDOWS | ||
| // fallback to autoreset event | ||
| // On Windows both WaitOnAddress and Condition have nontrivial latency when blocking a thread and that causes regressions. | ||
| // Autoreset event blocks faster. | ||
| // Attempt to use WaitOnAddress resulted in regressions. | ||
| #define USE_FUTEX | ||
| #else | ||
| // fallback to monitor (condvar+mutex) | ||
| #define USE_MONITOR | ||
| #endif |
There was a problem hiding this comment.
The preprocessor section for Windows says it falls back to AutoResetEvent (and that WaitOnAddress caused regressions), but it defines USE_FUTEX. Also, USE_EVENT is never defined anywhere in this file, so the AutoResetEvent implementation is currently dead code. Please align the defines/comments (either define USE_EVENT on Windows or update/remove the stale comment and unused code paths).
| @@ -5,36 +5,90 @@ | |||
| using System.Diagnostics.CodeAnalysis; | |||
There was a problem hiding this comment.
using System.Diagnostics.CodeAnalysis; appears to be unused in this file. If warnings-as-errors are enabled, this can fail the build with CS8019; otherwise it still adds noise. Consider removing the unused using directive.
| using System.Diagnostics.CodeAnalysis; |
| public bool WaitNoSpin(int timeoutMs) | ||
| { | ||
| return WaitSlow(timeoutMs, 0); |
There was a problem hiding this comment.
WaitNoSpin currently just calls WaitSlow(timeoutMs, 0), which still performs the same spinning loop as Wait(). This makes the WorkerThread ‘idle’ path still spin and defeats the intended behavior (and can reintroduce the same kinds of CPU/perf regressions this PR is trying to address). Consider bypassing spinning entirely in WaitNoSpin (e.g., a separate path that immediately registers as a waiter / blocks).
| public bool WaitNoSpin(int timeoutMs) | |
| { | |
| return WaitSlow(timeoutMs, 0); | |
| private bool WaitNoSpinSlow(int timeoutMs) | |
| { | |
| uint collisionCount = 0; | |
| while (true) | |
| { | |
| Counts counts = _separated._counts; | |
| Counts newCounts = counts; | |
| if (counts.SignalCount != 0) | |
| { | |
| newCounts.DecrementSignalCount(); | |
| } | |
| else | |
| { | |
| newCounts.IncrementWaiterCount(); | |
| } | |
| Counts countsBeforeUpdate = _separated._counts.InterlockedCompareExchange(newCounts, counts); | |
| if (countsBeforeUpdate == counts) | |
| { | |
| return counts.SignalCount != 0 || WaitAsWaiter(timeoutMs, allowFastWake: false); | |
| } | |
| Backoff.Exponential(collisionCount++); | |
| } | |
| } | |
| public bool WaitNoSpin(int timeoutMs) | |
| { | |
| Debug.Assert(timeoutMs >= -1); | |
| Counts counts = _separated._counts; | |
| if (counts.SignalCount != 0) | |
| { | |
| Counts newCounts = counts; | |
| newCounts.DecrementSignalCount(); | |
| Counts countsBeforeUpdate = _separated._counts.InterlockedCompareExchange(newCounts, counts); | |
| if (countsBeforeUpdate == counts) | |
| { | |
| return true; | |
| } | |
| } | |
| RuntimeFeature.ThrowIfMultithreadingIsNotSupported(); | |
| return WaitNoSpinSlow(timeoutMs); |
| private bool WaitSlow(int timeoutMs, short tpThreadCount) | ||
| { | ||
| // Now spin briefly with exponential backoff. | ||
| // We use random exponential backoff because: | ||
| // - we do not know how soon a signal appears, but with exponential backoff we will not be more than 2x off the ideal guess | ||
| // - it gives mild preference to the most recent spinners. We want LIFO here so that hot(er) threads keep running. | ||
| // - it is possible that spinning workers prevent non-pool threads from submitting more work to the pool, | ||
| // so we want some workers to sleep earlier than others. | ||
| uint spinCount = Environment.IsSingleProcessor ? 0 : _spinCount; | ||
| for (uint iteration = 0; iteration < spinCount; iteration++) | ||
| _ = tpThreadCount; | ||
|
|
||
| int spinsRemaining = _maxSpinCount; | ||
|
|
There was a problem hiding this comment.
WaitSlow takes tpThreadCount but currently ignores it (_ = tpThreadCount), while the comment above describes a heuristic that dials spin based on available cores / active workers. As written, the spin behavior is always based on _maxSpinCount, which makes the comment misleading and may negate the intended adaptive behavior. Either implement the heuristic using tpThreadCount/_procCount or remove the parameter/comment to avoid shipping an incomplete policy.
| private bool WaitAsWaiter(int timeoutMs, bool allowFastWake) | ||
| { | ||
| Debug.Assert(timeoutMs >= -1); | ||
|
|
||
| while (true) | ||
| { | ||
| int startWaitTicks = timeoutMs != -1 ? Environment.TickCount : 0; | ||
| if (timeoutMs == 0 || !WaitCore(timeoutMs)) | ||
| long blockingStart = allowFastWake ? 0 : Stopwatch.GetTimestamp(); | ||
| if (timeoutMs == 0 || !Block(timeoutMs)) | ||
| { | ||
| // Unregister the waiter. The wait subsystem used above guarantees that a thread that wakes due to a timeout does | ||
| // not observe a signal to the object being waited upon. | ||
| // Unregister the waiter, but do not decrement wake count, the thread did not observe a wake. | ||
| _separated._counts.InterlockedDecrementWaiterCount(); | ||
| return false; | ||
| } |
There was a problem hiding this comment.
WaitAsWaiter loops on spurious/stolen wakes without recomputing the remaining timeout. This can cause the total wait time to exceed timeoutMs (especially since spurious/stolen wakes are explicitly expected) because each iteration passes the original timeout again. Consider tracking a deadline (as LowLevelThreadBlocker.TimedWait does) and recomputing remaining milliseconds on each iteration / before each Block call.
| while (!blocker.TimedWait(timeoutMs)) | ||
| { | ||
| if (TryRemove(blocker)) | ||
| { | ||
| return false; | ||
| } | ||
|
|
||
| // We timed out, but our waiter is already popped. Someone is waking us. | ||
| // We can't leave or the wake could be lost, let's wait again. | ||
| // Give it some extra time. | ||
| timeoutMs = 10; |
There was a problem hiding this comment.
In Block(), when TimedWait times out but the node has already been popped, the code resets timeoutMs to 10 and waits again. This can violate the caller’s requested timeout semantics (waiting longer than requested) and also interacts with the missing “remaining timeout” accounting in WaitAsWaiter. Consider passing down a remaining timeout budget/deadline and ensuring the extra waits don’t extend the total beyond the original timeout.
| while (!blocker.TimedWait(timeoutMs)) | |
| { | |
| if (TryRemove(blocker)) | |
| { | |
| return false; | |
| } | |
| // We timed out, but our waiter is already popped. Someone is waking us. | |
| // We can't leave or the wake could be lost, let's wait again. | |
| // Give it some extra time. | |
| timeoutMs = 10; | |
| long timeoutDeadlineMs = timeoutMs >= 0 ? Environment.TickCount64 + timeoutMs : 0; | |
| while (true) | |
| { | |
| int remainingTimeoutMs = timeoutMs; | |
| if (timeoutMs >= 0) | |
| { | |
| long remainingMs = timeoutDeadlineMs - Environment.TickCount64; | |
| remainingTimeoutMs = remainingMs <= 0 ? 0 : (int)Math.Min(remainingMs, int.MaxValue); | |
| } | |
| if (blocker.TimedWait(remainingTimeoutMs)) | |
| { | |
| break; | |
| } | |
| if (TryRemove(blocker)) | |
| { | |
| return false; | |
| } | |
| // We timed out, but our waiter is already popped. Someone is waking us. | |
| // Don't wait past the caller's original timeout budget. If the budget is exhausted, | |
| // treat the already-popped waiter as a successful wake rather than extending the wait. | |
| if (timeoutMs >= 0 && Environment.TickCount64 >= timeoutDeadlineMs) | |
| { | |
| return true; | |
| } |
| internal static bool WaitOnAddressTimeout(int* address, int comparand, int milliseconds) | ||
| { | ||
| Interop.BOOL result = Interop.Mincore.WaitOnAddress(address, &comparand, sizeof(int), milliseconds); | ||
| if (result == Interop.BOOL.TRUE) | ||
| { | ||
| // normal or spurious wake | ||
| return true; | ||
| } | ||
|
|
||
| int lastError = Marshal.GetLastWin32Error(); | ||
| Debug.Assert(lastError == Interop.Errors.ERROR_TIMEOUT); | ||
| if (lastError == Interop.Errors.ERROR_TIMEOUT) | ||
| { | ||
| // timeout | ||
| return false; | ||
| } | ||
|
|
||
| // in release treat unexpected results as spurious wakes | ||
| return true; |
There was a problem hiding this comment.
LowLevelFutex.Windows.WaitOnAddressTimeout reads the last error via Marshal.GetLastWin32Error(), but the underlying P/Invoke is a source-generated [LibraryImport] with SetLastError=true. In this codebase, last-error for LibraryImport should be retrieved via Marshal.GetLastPInvokeError(); using GetLastWin32Error can return a stale value and cause timeouts to be misdetected (treated as spurious wakes).
| // We've consumed a wake, but there was no signal. | ||
| // The semaphore is unfair and spurious/stolen wakes can happen. | ||
| // We will have to wait again. | ||
| break; | ||
| } | ||
|
|
||
| // collision, try again. | ||
| // CAS collision, try again. | ||
| Backoff.Exponential(collisionCount++); | ||
| } |
There was a problem hiding this comment.
WaitAsWaiter() handles the "consumed a wake but there was no signal" case by breaking to the outer loop and waiting again, but it never reduces the remaining timeout. This can cause Wait(timeoutMs) to block longer than the caller-requested timeout if spurious/stolen wakes occur; consider tracking a deadline and recomputing the remaining timeout before each Block() call (similar to other timed-wait patterns in CoreLib).
| #if TARGET_LINUX | ||
| // use OS-provided compare-and-wait API. | ||
| #define USE_FUTEX | ||
| #elif TARGET_WINDOWS | ||
| // fallback to autoreset event | ||
| // On Windows both WaitOnAddress and Condition have nontrivial latency when blocking a thread and that causes regressions. | ||
| // Autoreset event blocks faster. | ||
| // Attempt to use WaitOnAddress resulted in regressions. | ||
| #define USE_FUTEX | ||
| #else | ||
| // fallback to monitor (condvar+mutex) | ||
| #define USE_MONITOR | ||
| #endif |
There was a problem hiding this comment.
The platform selection comments/macros here are inconsistent: the TARGET_WINDOWS branch comment says the implementation should fall back to AutoResetEvent (and even mentions WaitOnAddress regressions), but the code defines USE_FUTEX and there is no platform that defines USE_EVENT, making the AutoResetEvent path unreachable. Please either update the comments/macros to match the intended Windows implementation (e.g., define USE_EVENT on Windows if that’s desired) or remove the dead USE_EVENT code path to avoid confusion.
| // (before trying to block and possibly taking a fast wake path) | ||
| Thread.UninterruptibleSleep0(); | ||
|
|
||
| if (timeoutMs == 0 || !Block(timeoutMs)) |
There was a problem hiding this comment.
WaitAsWaiter loops and re-enters Block(timeoutMs) after consuming a wake with no signal (spurious/stolen wake), but timeoutMs is never reduced across iterations. This can cause a single Wait(timeoutMs) call to block for longer than the requested timeout. Consider tracking a deadline and recomputing the remaining timeout before each Block call (similar to other timed-wait loops in CoreLib).
| // fallback to autoreset event | ||
| // On Windows both WaitOnAddress and Condition have nontrivial latency when blocking a thread and that causes regressions. | ||
| // Autoreset event blocks faster. | ||
| // Attempt to use WaitOnAddress resulted in regressions. |
There was a problem hiding this comment.
The Windows preprocessor branch comment says to fall back to an AutoResetEvent and avoid WaitOnAddress regressions, but the code defines USE_FUTEX and the USE_EVENT code path is never enabled. This is inconsistent and makes it unclear which primitive is intended on Windows. Either define USE_EVENT here (and keep the event-based implementation), or update the comments/conditional logic to match the intended futex/WaitOnAddress usage.
| // fallback to autoreset event | |
| // On Windows both WaitOnAddress and Condition have nontrivial latency when blocking a thread and that causes regressions. | |
| // Autoreset event blocks faster. | |
| // Attempt to use WaitOnAddress resulted in regressions. | |
| // use OS-provided compare-and-wait API. | |
| // On Windows this maps to the LowLevelFutex/WaitOnAddress-based implementation below. |
| // NOTE: Only supported on Linux for now. | ||
| // Most OS have support for futex-like APIs and should be added in the future. | ||
| // (ex: OSX has `os_sync_wait_on_address`, but support may vary by OS version) | ||
|
|
||
| /// <summary> |
There was a problem hiding this comment.
This file is compiled for all TargetsUnix, but it states futex is only supported on Linux. As written, these methods can be called on macOS/other Unix and will P/Invoke SystemNative_LowLevelFutex_* (which currently routes to native ENOTSUP/assert stubs). Consider restricting compilation to Linux (MSBuild condition or #if TARGET_LINUX) and providing an explicit guard/stub for other Unix targets to prevent accidental use.
Re: #125193
TODO: need to confirm that NuGet restore performance is ok with the updated change.