Skip to content

gpuserver: named semaphore to fix 100% idle CPU from sched_yield()#1101

Open
antonvnv wants to merge 1 commit intosoedinglab:masterfrom
antonvnv:gpusem
Open

gpuserver: named semaphore to fix 100% idle CPU from sched_yield()#1101
antonvnv wants to merge 1 commit intosoedinglab:masterfrom
antonvnv:gpusem

Conversation

@antonvnv
Copy link
Copy Markdown

The previous busy-wait loop used sched_yield(), which yields the thread's timeslice but immediately reschedules it if no other thread is waiting on the same core. On a machine with enough cores (typical for GPU servers), the OS has no reason to deschedule the thread, so it spins at 100% CPU while idle.

Replace with POSIX named semaphore so gpuserver blocks in sem_wait() and uses ~0% CPU when idle. sem_wait() uses an in-kernel futex, so the thread sleeps without context switches until the client posts.

Add GPUSharedMemorySem class to GpuUtil.h that owns the sem_t* internally; call sites are ifdef-free.

USE_GPU_SEM is automatically enabled when ENABLE_CUDA=1 in cmake. Disable with -DUSE_GPU_SEM=OFF.

The previous busy-wait loop used sched_yield(), which yields the
thread's timeslice but immediately reschedules it if no other thread
is waiting on the same core. On a machine with enough cores (typical
for GPU servers), the OS has no reason to deschedule the thread, so
it spins at 100% CPU while idle.

Replace with POSIX named semaphore so gpuserver blocks in sem_wait()
and uses ~0% CPU when idle. sem_wait() uses an in-kernel futex, so
the thread sleeps without context switches until the client posts.

Add GPUSharedMemorySem class to GpuUtil.h that owns the sem_t*
internally; call sites are ifdef-free.

USE_GPU_SEM is automatically enabled when ENABLE_CUDA=1 in cmake.
Disable with -DUSE_GPU_SEM=OFF.
@milot-mirdita
Copy link
Copy Markdown
Member

Is there any downside to making this default enabled on all CUDA builds?

@antonvnv
Copy link
Copy Markdown
Author

Is there any downside to making this default enabled on all CUDA builds?

It should be enabled by default in this PR for all CUDA builds... To my knowledge there should no downsides other than the fact that so far I had it only under limited testing [and it seems to be working fine so far]... I'll be testing it more in the coming days.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants