Skip to content

Use Builtin Atomics for wait/waitAsync/notify#956

Open
komyg wants to merge 11 commits intotrynova:mainfrom
komyg:fix/use-built-in-atomics
Open

Use Builtin Atomics for wait/waitAsync/notify#956
komyg wants to merge 11 commits intotrynova:mainfrom
komyg:fix/use-built-in-atomics

Conversation

@komyg
Copy link
Contributor

@komyg komyg commented Feb 21, 2026

Fix: #899

@komyg komyg changed the title Create Parking Lot Mutex in SharedDataBlock Use Builtin Atomics for wait/waitAsync/notify Feb 21, 2026
@komyg komyg force-pushed the fix/use-built-in-atomics branch from 4482ead to 4a0935d Compare February 21, 2026 14:03
@komyg komyg requested a review from aapoalas February 23, 2026 12:23
Copy link
Member

@aapoalas aapoalas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I've updated the code according to your comment.

Can you take a look if the general idea is good?

Note: it seems that I broke a few tests that were passing, so I still have to fix those.

Yeah, the general idea looks really good to me, excellent stuff! <3

It seems quite likely to me that the reason for the breakage is the too-strict-by-half locking of the mutex, since you're effectively making it so that only one thread can wait or notify concurrently. (I think.) The mutex should only be guarding adding a new waiter and removing an old waiter, but not the waiting itself.

let remaining = deadline.saturating_duration_since(Instant::now());
if remaining.is_zero() {
// Timed out — remove ourselves from the waiter list.
if let Some(list) = guard.get_mut(&byte_index_in_buffer) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue: I think looping with the guard held here (and above) means that no other thread can start waiting on the SharedDataBlock while we've yet to be woken up. That of course shouldn't be the case, it should be possible for roughly any number of wakers to exist on the same SDB concurrently.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAIK, this shouldn't be an issue, because the lock is released when we call: guard = waiter_record.condvar.wait(guard).unwrap(); a few lines below.

So, if I understand this correctly, we will acquire the lock to check the value inside the datablock and then we release it when we call wait, and it is only reacquired when the thread wakes up.

@komyg komyg force-pushed the fix/use-built-in-atomics branch from 1cfe244 to 3ca1733 Compare February 24, 2026 11:37
@komyg komyg marked this pull request as ready for review February 24, 2026 14:45
@komyg komyg force-pushed the fix/use-built-in-atomics branch from 9f4eea6 to 0d39787 Compare February 24, 2026 15:21
@komyg komyg requested a review from aapoalas February 24, 2026 16:16
@komyg
Copy link
Contributor Author

komyg commented Feb 24, 2026

Hey @aapoalas, I've addressed your feedback, can you review again, please?

Copy link
Member

@aapoalas aapoalas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still some things I'd like changed; most importantly we want to std functions as much as possible.

waiter.condvar.notify_one();
n += 1;
}
drop(guard);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nitpick: I'd move this comment below step 12 to make it clear this is where/how the critical section ends.

// a. Perform SuspendThisAgent(WL, waiterRecord).
guard
.entry(byte_index_in_buffer)
.or_insert_with(|| WaiterList {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nitpick: I think you could derive Default for WaiterList and others and use .or_default() here.

.push_back(waiter_record.clone());

if t == u64::MAX {
while !waiter_record.notified.load(StdOrdering::Acquire) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue: We should probably use wait_timeout_while to avoid having to deal with the deadlines and whatnot ourselves. Should simplify the code fairly nicely.

let handle = thread::spawn(move || {
// SAFETY: buffer is a cloned SharedDataBlock; non-dangling.
let waiters = unsafe { buffer.get_or_init_waiters() };
let waiter_record = Arc::new(WaiterRecord {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion: Could create a WaiterRecord::new_shared() helper function that returns Arc<WaiterRecord>.

.push_back(waiter_record.clone());

if t == u64::MAX {
while !waiter_record.notified.load(StdOrdering::Acquire) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note: Here too we could use wait_while.

WaitResult::Ok
} else {
let deadline = Instant::now() + Duration::from_millis(t);
loop {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note: Here too we could use wait_timeout_while.

} else {
let deadline = Instant::now() + Duration::from_millis(t);
loop {
if waiter_record.notified.load(StdOrdering::Acquire) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue: I think we should be fine using just Relaxed for the notification. Waking up is our synchronisation, we only really care about the value of the bool here and it is atomic on its own.

};
let data_block = buffer.get_data_block(agent);
// 9. Perform EnterCriticalSection(WL).
// SAFETY: buffer is a valid SharedArrayBuffer it cannot be detached, so the data block is non-dangling.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nitpick: Detached and dangling are two different things. 0-sized SAB's are "dangling", ie. they don't have the backing allocation.

@aapoalas
Copy link
Member

Oh! I forgot: the ecmascript_futex dependency should be removed.

}

#[cfg(feature = "shared-array-buffer")]
pub struct WaiterRecord {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue: Not public.

/// Result of an `Atomics.wait` or `Atomics.waitAsync` operation.
#[derive(Debug)]
#[cfg(feature = "shared-array-buffer")]
pub enum WaitResult {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue: Not public

}

#[cfg(feature = "shared-array-buffer")]
pub struct WaiterList {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue: Not public

}

#[cfg(feature = "shared-array-buffer")]
pub type SharedWaiterMap = Mutex<std::collections::HashMap<usize, WaiterList>>;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue: Not public

@aapoalas
Copy link
Member

Hmm... I've been looking at the code and doing some cleanups and whatnot and ... I think we have a somewhat fundamental issue here in that the async side has been just bollixed by me. It's pretty badly mangled from what it ought to be now that we're actually using a real parking lot instead of Futexes directly.

Importantly: it seems like the critical section must be entered on the main thread, not on the async thread. eg. the NotEqual check and return you've done on the waiter thread inside the critical section: that's not something that is allowed by the spec, ie. the Promise should never resolve with not_equal. It should be done within the critical section indeed, as you've correctly identified, but if the check fails then we should still return synchronously from the waitAsync function with an object { async: false, value: "not-equal" }.

Maybe we can still hack the effect by changing the signal that we use to wait for the waiter thread to startup to being not a boolean but a state enum: Startup, Started, and NotEqual, but there might be a race condition somewhere there...

If we cannot hack it, we have to actually do the spec-things exactly and create a separate timeout job and a separate handler job. Also we'd need to somehow figure out if we're resolving a promise on our own Agent or not...

@komyg
Copy link
Contributor Author

komyg commented Feb 25, 2026

Maybe we can still hack the effect by changing the signal that we use to wait for the waiter thread to startup to being not a boolean but a state enum: Startup, Started, and NotEqual, but there might be a race condition somewhere there...

If we cannot hack it, we have to actually do the spec-things exactly and create a separate timeout job and a separate handler job. Also we'd need to somehow figure out if we're resolving a promise on our own Agent or not...

I think we can start with the hack by having the enqueue_atomics_wait_async_job return an Enum with the state, and returning that in the main thread.

By doing that, and addressing your feedback above, I think we will have a solution that is good enough.

Then we can see if it is worth it to take the time to re-write this to be more adherent to the spec.

What do you think?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Atomics wait/waitAsync/notify are not correct

2 participants