gpuserver: named semaphore to fix 100% idle CPU from sched_yield()#1101
Open
antonvnv wants to merge 1 commit intosoedinglab:masterfrom
Open
gpuserver: named semaphore to fix 100% idle CPU from sched_yield()#1101antonvnv wants to merge 1 commit intosoedinglab:masterfrom
antonvnv wants to merge 1 commit intosoedinglab:masterfrom
Conversation
The previous busy-wait loop used sched_yield(), which yields the thread's timeslice but immediately reschedules it if no other thread is waiting on the same core. On a machine with enough cores (typical for GPU servers), the OS has no reason to deschedule the thread, so it spins at 100% CPU while idle. Replace with POSIX named semaphore so gpuserver blocks in sem_wait() and uses ~0% CPU when idle. sem_wait() uses an in-kernel futex, so the thread sleeps without context switches until the client posts. Add GPUSharedMemorySem class to GpuUtil.h that owns the sem_t* internally; call sites are ifdef-free. USE_GPU_SEM is automatically enabled when ENABLE_CUDA=1 in cmake. Disable with -DUSE_GPU_SEM=OFF.
Member
|
Is there any downside to making this default enabled on all CUDA builds? |
Author
It should be enabled by default in this PR for all CUDA builds... To my knowledge there should no downsides other than the fact that so far I had it only under limited testing [and it seems to be working fine so far]... I'll be testing it more in the coming days. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
The previous busy-wait loop used sched_yield(), which yields the thread's timeslice but immediately reschedules it if no other thread is waiting on the same core. On a machine with enough cores (typical for GPU servers), the OS has no reason to deschedule the thread, so it spins at 100% CPU while idle.
Replace with POSIX named semaphore so gpuserver blocks in sem_wait() and uses ~0% CPU when idle. sem_wait() uses an in-kernel futex, so the thread sleeps without context switches until the client posts.
Add GPUSharedMemorySem class to GpuUtil.h that owns the sem_t* internally; call sites are ifdef-free.
USE_GPU_SEM is automatically enabled when ENABLE_CUDA=1 in cmake. Disable with -DUSE_GPU_SEM=OFF.