Use Builtin Atomics for wait/waitAsync/notify#956
Use Builtin Atomics for wait/waitAsync/notify#956komyg wants to merge 11 commits intotrynova:mainfrom
Conversation
4482ead to
4a0935d
Compare
aapoalas
left a comment
There was a problem hiding this comment.
Ok, I've updated the code according to your comment.
Can you take a look if the general idea is good?
Note: it seems that I broke a few tests that were passing, so I still have to fix those.
Yeah, the general idea looks really good to me, excellent stuff! <3
It seems quite likely to me that the reason for the breakage is the too-strict-by-half locking of the mutex, since you're effectively making it so that only one thread can wait or notify concurrently. (I think.) The mutex should only be guarding adding a new waiter and removing an old waiter, but not the waiting itself.
nova_vm/src/ecmascript/builtins/structured_data/atomics_object.rs
Outdated
Show resolved
Hide resolved
nova_vm/src/ecmascript/builtins/structured_data/atomics_object.rs
Outdated
Show resolved
Hide resolved
nova_vm/src/ecmascript/builtins/structured_data/atomics_object.rs
Outdated
Show resolved
Hide resolved
| let remaining = deadline.saturating_duration_since(Instant::now()); | ||
| if remaining.is_zero() { | ||
| // Timed out — remove ourselves from the waiter list. | ||
| if let Some(list) = guard.get_mut(&byte_index_in_buffer) { |
There was a problem hiding this comment.
issue: I think looping with the guard held here (and above) means that no other thread can start waiting on the SharedDataBlock while we've yet to be woken up. That of course shouldn't be the case, it should be possible for roughly any number of wakers to exist on the same SDB concurrently.
There was a problem hiding this comment.
AFAIK, this shouldn't be an issue, because the lock is released when we call: guard = waiter_record.condvar.wait(guard).unwrap(); a few lines below.
So, if I understand this correctly, we will acquire the lock to check the value inside the datablock and then we release it when we call wait, and it is only reacquired when the thread wakes up.
nova_vm/src/ecmascript/builtins/structured_data/atomics_object.rs
Outdated
Show resolved
Hide resolved
1cfe244 to
3ca1733
Compare
9f4eea6 to
0d39787
Compare
|
Hey @aapoalas, I've addressed your feedback, can you review again, please? |
aapoalas
left a comment
There was a problem hiding this comment.
Still some things I'd like changed; most importantly we want to std functions as much as possible.
| waiter.condvar.notify_one(); | ||
| n += 1; | ||
| } | ||
| drop(guard); |
There was a problem hiding this comment.
nitpick: I'd move this comment below step 12 to make it clear this is where/how the critical section ends.
| // a. Perform SuspendThisAgent(WL, waiterRecord). | ||
| guard | ||
| .entry(byte_index_in_buffer) | ||
| .or_insert_with(|| WaiterList { |
There was a problem hiding this comment.
nitpick: I think you could derive Default for WaiterList and others and use .or_default() here.
| .push_back(waiter_record.clone()); | ||
|
|
||
| if t == u64::MAX { | ||
| while !waiter_record.notified.load(StdOrdering::Acquire) { |
There was a problem hiding this comment.
issue: We should probably use wait_timeout_while to avoid having to deal with the deadlines and whatnot ourselves. Should simplify the code fairly nicely.
| let handle = thread::spawn(move || { | ||
| // SAFETY: buffer is a cloned SharedDataBlock; non-dangling. | ||
| let waiters = unsafe { buffer.get_or_init_waiters() }; | ||
| let waiter_record = Arc::new(WaiterRecord { |
There was a problem hiding this comment.
suggestion: Could create a WaiterRecord::new_shared() helper function that returns Arc<WaiterRecord>.
| .push_back(waiter_record.clone()); | ||
|
|
||
| if t == u64::MAX { | ||
| while !waiter_record.notified.load(StdOrdering::Acquire) { |
| WaitResult::Ok | ||
| } else { | ||
| let deadline = Instant::now() + Duration::from_millis(t); | ||
| loop { |
| } else { | ||
| let deadline = Instant::now() + Duration::from_millis(t); | ||
| loop { | ||
| if waiter_record.notified.load(StdOrdering::Acquire) { |
There was a problem hiding this comment.
issue: I think we should be fine using just Relaxed for the notification. Waking up is our synchronisation, we only really care about the value of the bool here and it is atomic on its own.
| }; | ||
| let data_block = buffer.get_data_block(agent); | ||
| // 9. Perform EnterCriticalSection(WL). | ||
| // SAFETY: buffer is a valid SharedArrayBuffer it cannot be detached, so the data block is non-dangling. |
There was a problem hiding this comment.
nitpick: Detached and dangling are two different things. 0-sized SAB's are "dangling", ie. they don't have the backing allocation.
|
Oh! I forgot: the ecmascript_futex dependency should be removed. |
| } | ||
|
|
||
| #[cfg(feature = "shared-array-buffer")] | ||
| pub struct WaiterRecord { |
| /// Result of an `Atomics.wait` or `Atomics.waitAsync` operation. | ||
| #[derive(Debug)] | ||
| #[cfg(feature = "shared-array-buffer")] | ||
| pub enum WaitResult { |
| } | ||
|
|
||
| #[cfg(feature = "shared-array-buffer")] | ||
| pub struct WaiterList { |
| } | ||
|
|
||
| #[cfg(feature = "shared-array-buffer")] | ||
| pub type SharedWaiterMap = Mutex<std::collections::HashMap<usize, WaiterList>>; |
|
Hmm... I've been looking at the code and doing some cleanups and whatnot and ... I think we have a somewhat fundamental issue here in that the async side has been just bollixed by me. It's pretty badly mangled from what it ought to be now that we're actually using a real parking lot instead of Futexes directly. Importantly: it seems like the critical section must be entered on the main thread, not on the async thread. eg. the Maybe we can still hack the effect by changing the If we cannot hack it, we have to actually do the spec-things exactly and create a separate timeout job and a separate handler job. Also we'd need to somehow figure out if we're resolving a promise on our own Agent or not... |
I think we can start with the hack by having the By doing that, and addressing your feedback above, I think we will have a solution that is good enough. Then we can see if it is worth it to take the time to re-write this to be more adherent to the spec. What do you think? |
Fix: #899