Skip to content

Prep for driving a fork of rayon#45

Merged
NthTensor merged 8 commits into
mainfrom
improvements_omnibus
Jun 10, 2026
Merged

Prep for driving a fork of rayon#45
NthTensor merged 8 commits into
mainfrom
improvements_omnibus

Conversation

@NthTensor

Copy link
Copy Markdown
Owner

This PR includes a bundle of significant improvements, largely driven by the need of rayon's parallel iterators. This includes:

  • The stack-use optimization described in Optimize StackJob Sizes #35.
  • Support for spawning !Send futures, as requested in Feature: Spawning jobs/tasks to a target thread #31.
  • Implementations of broadcast, spawn_broadcast, and Scope::spawn_broadcast, to further parallel rayon_core.
  • A default thread pool, like rayon-core, which allows for "headless" versions of all the parallel ops.
  • Inline-assembly cpu tick counters, to avoid bringing in an extra dependency.
  • Further improvements to the safety comments for jobs and job-refs. I think I am finally happy with them.

Comment thread src/thread_pool.rs Outdated
Comment thread src/thread_pool.rs
Comment thread src/thread_pool.rs
Comment on lines +199 to +202
if waiting_bitmask != 0 {
let i = waiting_bitmask.trailing_zeros() as usize;
self.get_member_data().semaphores[i].signal();
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering if this could lead to a race condition. hear me out:

In latch.rs:245, you have

let state = self.state.swap(ASLEEP, Ordering::Relaxed); //commits to sleeping
if state == LOCKED {
    waiting_bitmask.fetch_or(seat_mask, Ordering::Relaxed); //bit becomes visible here
    atomic_wait::wait(&self.state, ASLEEP); //blocks, there's no re-check of `find_work`
   ...

So as I understand it, it's possible that the worker's find_work is empty, a producer pushes a job, the producer reads waiting_bitmask == 0 (as the worker's bit is not set yet), the signal is skipped, the worker then runs The above code, and sleeps. wait() doesn't re-check the queues after announcing sleep, so it just kinda hangs around in the shared_queue.

I imagine this scenario would be rare as any other awake worker would pop the job, but may be problematic at low numbers of N (esp if it's 1). Thoughts?

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, correct! Though I have not seen it lead to any issues in practice.

You are right that double-checking find_work again after setting the sleepy-bit should resolve this. It might be worth it to try that, and I think rayon does something similar.

Comment thread src/scope.rs
/// `add_reference`, a direct `fetch_add` on the underlying counter, or the
/// implicit initial increment the scope starts with.
unsafe fn remove_reference(&self) {
let counter = self.count.fetch_sub(1, Ordering::Relaxed);

@dsgallups dsgallups Jun 9, 2026

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I noticed this separately. Is ordering important here?

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm. No I don't think so. Any references with lifetime scope are guaranteed not to be dropped until this fetch_sub is observed. The way the scope blocks until the counter decrements to zero establishes a happens-before relationship.

Unless there's another thing you were referring to?

@dsgallups dsgallups Jun 10, 2026

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In weakly ordered memory architectures, isn't is possible for self.completed to be stale? The surrounding non-atomic memory isn't totally ordered, and therefore, the comment about it being a live latch may be false?

@NthTensor NthTensor merged commit 533d115 into main Jun 10, 2026
4 checks passed
@NthTensor NthTensor deleted the improvements_omnibus branch June 10, 2026 01:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants