Skip to content

Enable pthread_create with ASYNCIFY#26933

Open
sbc100 wants to merge 1 commit into
emscripten-core:mainfrom
sbc100:pthread_create_asyncify
Open

Enable pthread_create with ASYNCIFY#26933
sbc100 wants to merge 1 commit into
emscripten-core:mainfrom
sbc100:pthread_create_asyncify

Conversation

@sbc100
Copy link
Copy Markdown
Collaborator

@sbc100 sbc100 commented May 13, 2026

This means that users of ASYNCIFY/JSPI no longer need to worry about setting PTHREAD_POOL_SIZE. We can do the same for pthread_join in a followup I think.

See: #9910

@sbc100 sbc100 changed the title Enable pthread_create with ASYNCIFY Enable pthread_create with ASYNCIFY May 13, 2026
@sbc100 sbc100 requested review from RReverser, brendandahl, juj, kripken and tlively and removed request for tlively May 13, 2026 04:12
@sbc100
Copy link
Copy Markdown
Collaborator Author

sbc100 commented May 13, 2026

View with "hide whitespace" to see how simple this change really is. Its a testamant to all the work we have done over the years with ASYNCIFY/JSPI that this patch is so small.

@sbc100 sbc100 force-pushed the pthread_create_asyncify branch 2 times, most recently from c59d635 to 08bf02e Compare May 13, 2026 15:16
This means that users of ASYNCIFY/JSPI no longer need to worry about
setting `PTHREAD_POOL_SIZE`.  We can do the same for pthread_join in a
followup I think.

See: emscripten-core#9910
@sbc100 sbc100 force-pushed the pthread_create_asyncify branch from 08bf02e to af21449 Compare May 13, 2026 16:35
Comment thread src/lib/libpthread.js
Comment on lines +774 to +775
// This is needed in browsers where syncronous worker creation is still not
// possible: <BUG_LINK>
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// This is needed in browsers where syncronous worker creation is still not
// possible: <BUG_LINK>
// This is needed in browsers where synchronous worker creation is still not
// possible: <BUG_LINK>

Do you have a bug to link to?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm hoping @juj might have a link to this?

Comment thread src/postamble.js
sbc100 added a commit to sbc100/emscripten that referenced this pull request May 13, 2026
We don't need/want threads to be `unref` until the worker is up and
running.   This is important for emscripten-core#26933 since otherwise the node runtime
can shut down while a thread is starting up.

This change also simplifies the code by limiting the number of places we
called `unref` to just one, which also slightly reduces codesize.

Split out from emscripten-core#26933
sbc100 added a commit to sbc100/emscripten that referenced this pull request May 13, 2026
We don't need/want threads to be `unref` until the worker is up and
running.   This is important for emscripten-core#26933 since otherwise the node runtime
can shut down while a thread is starting up.

This change also simplifies the code by limiting the number of places we
called `unref` to just one, which also slightly reduces codesize.

Also, remove unused `worker.loaded` assignment.

Split out from emscripten-core#26933
sbc100 added a commit to sbc100/emscripten that referenced this pull request May 13, 2026
We don't need/want threads to be `unref` until the worker is up and
running.   This is important for emscripten-core#26933 since otherwise the node runtime
can shut down while a thread is starting up.

This change also simplifies the code by limiting the number of places we
called `unref` to just one, which also slightly reduces codesize.

Also, remove unused `worker.loaded` assignment.

Split out from emscripten-core#26933
@juj
Copy link
Copy Markdown
Collaborator

juj commented May 13, 2026

Thanks for the PR!

This is exactly what I had in mind for Unity's use. However, I feel a bit cautious about this behavior to occur unconditionally for Emscripten in general. Codebases need to be carefully aware where JSPI exists in their codebases. The following aspects come to mind:

  1. JSPIfied pthread_create() assumes that the path to calling pthread_create() would be guaranteed to have a WebAssembly.promising() in the callstack at JS->Wasm boundary. For example, if users have their own JS callbacks that call to Wasm, and then attempt to launch a thread: then with this, pthread_create() will no longer work for them, but throws. This will be a new user requirement to being able to call pthread_create().

  2. Similarly, the above callstack issue has implications to the sandwiched JS callstacks: The JSPI call stack sandwich problem #26758. We cannot transform pthread_create() unconditionally to use JSPI in the general case, in case the user might arrive to pthread_create() with some JS frames in between. So there is a new user requirement: you can only call pthread_create() if you don't have any sandwiched JS functions in the callstack. The user should have modeled all such sandwich frames, and converted such JS functions to async JS functions.

  3. Technically, only pthread_create() that one intends to synchronously hear back from (sync join with, or sync access a computation result), will need to be JSPIfied. Maybe a user might prefer to use the pthread pool even when JSPI is enabled, to improve performance. The pthread Worker pool also is there to host Workers that finished executing a pthread, as a means to fast-start later pthreads. Not needing to JSPI-pause to spawn a new thread, improves performance for users who don't need to synchronously join. JSPIfying all pthread_create()s would be a pessimization for those users.

(it seems like as a small optimization, we would never want to JSPI pause if a ready Worker does exist in the pool?)

  1. JSPIfied pthread_create() has implications to re-entrancy. The developer will need to analyze all of their event handlers with respect to pthread_create() pause points, and what it means to them with respect to re-entrancy from other event handlers in the program. Are there developers who want to enable JSPI, but only do so for certain call sites (e.g. sync WebGPU buffer map?), but do not want to JSPI'fy pthread_create() (and hence do not want to complicate guarding against re-entrancy related to pthread_create() callsites?)

A bit related concern, although not about JSPIfying pthread_create() itself:

  • we had this discussion before whether Emscripten should implement a globally coordinated JSPI re-entrancy guard. Thinking about this, it feels like this is fundamentally a user-space problem, and not something that Emscripten could solve automatically. This is because for different applications, which events need to be paused, depends on the suspend callsite, and the application itself. As an example, if I did a JSPI sync XHR, then maybe I would still like the emscripten_request_animation_frame_loop() callback to be firing, where I would render a loading screen/spinner as the XHR is progressing. But if I JSPIfied a synchronous GPU buffer map, then maybe I don't want rAF()s to run, because I cannot re-enter my renderer. So it seems like different use cases in a single program might call for different event callbacks to be suspended.

Ultimately I wonder if this JSPIfied pthread_create() should be a setting? E.g. replace -sPTHREAD_POOL_STRICT with a -sPTHREAD_POOL_EXHAUSTED= which could take values "error" (return an error to caller), "jspi" (JSPIfy new thread creation) and "async" (just asynchronously launch the Worker, application knows it'll be fine)?

Maybe we'd have functions pthread_create_sync(), pthread_create_jspi(), pthread_create_async(), that would implement those respective behaviors, and then the selected value of -sPTHREAD_POOL_EXHAUSTED= would decide what pthread_create() would call out to by default? But users could manually call the other pthread_create_*() functions at every callsite, if they so choose.

This way app developers could optimize certain call sites to explicitly say that "this callsite will not synchronously need to join with the thread, so I can pthread_create_async() here"?

@sbc100
Copy link
Copy Markdown
Collaborator Author

sbc100 commented May 13, 2026

Thanks for the feedback. I mostly agree with everything you are saying. I do hope we can place some restrictions on the design space here though, and perhaps add settings/refinements in response to realworld user experience.

Regarding adding "fast-path" to avoid returning to the event loop in just some cases. We originally had this in the design of JSPI itself, but the web standards folks pushed back on this as a bad design in general and so JSPI entry points will not always go back to event loop. So I think we are limited by this in that for a given JS import it must either be JSPI or not-JSPI. We don't have a sometimes-JSPI option, and I think thats fine. We could still do the seperate API design (e.g. pthread_create_xx + pthread_create_yy, but any single import call has to be one or the other.

Regarding the cost of going back to the event loop: For thread creation, which is not normally a super high-frequency activity the cost of hitting the event loop seems pretty small to me. JS microtask queue is very fast.

The concerns about caller needing to handler Promise-returning export is totally valid, but isn't this a general concern about all JSPI usage? There is nothing unique about pthread_create in this argument right? If your program uses JSPI then you exports will likely be promise-returning. Isn't this just something that the use of JSPI is going to have to deal with in any case?

@brendandahl @tlively how realistic is it to have a JSPI-using program what has useful non-promising exports? It seems like as we enable more JSPI usage pretty much all exports are going to end up returning promises. Do we have any kind of mechanism to limit or control this?

@sbc100
Copy link
Copy Markdown
Collaborator Author

sbc100 commented May 13, 2026

Since we already have the JSPI_IMPORTS setting perhaps we could reuse this?

Right now JS libraries functions that are marked as async will automatically be added to JSPI_IMPORTS, but perhaps we could have some way to opt how. e.g. -sJSPI_IMPORTS=-pthread_create ?

@sbc100
Copy link
Copy Markdown
Collaborator Author

sbc100 commented May 13, 2026

The problem of viral WebAssembly.promising is likely going to get a lot worse if I land my planning followup, which to use JSPI also for pthread_join. My plan here is to have a JSPI version of atomic_wait. The initial PR for this is: #26941. Once that lands was thinking we could have pthread_join use this async form of blocking. We could even have emscripten_futex_wait use this when JSPI is enabled. This would allow us to remove the busy-wait futex that we use on the main thread (only in JSPI mode of course).

I agree it would be great to make of this behaviour opt-in (or opt-out).

(Note: emscripten_futex_wait is a low level primitive that is designed to be high cost so I think it may not be unreasonable to run the microtask queue when its called. It should always be preceeded by the contention check and/or a small spin lock).

@tlively
Copy link
Copy Markdown
Member

tlively commented May 13, 2026

Regarding adding "fast-path" to avoid returning to the event loop in just some cases. We originally had this in the design of JSPI itself, but the web standards folks pushed back on this as a bad design in general and so JSPI entry points will not always go back to event loop. So I think we are limited by this in that for a given JS import it must either be JSPI or not-JSPI. We don't have a sometimes-JSPI option, and I think thats fine.

You can get this sometimes-JSPI behavior by not using JSPI for the import, having it sometimes return a Promise, then if it does return a Promise, passing that Promise to a JSPI-wrapped identity function, e.g. emscripten_promise_await. In other words, you separate the returning of the Promise to WebAssembly and the waiting on the promise using JSPI.

@brendandahl @tlively how realistic is it to have a JSPI-using program what has useful non-promising exports? It seems like as we enable more JSPI usage pretty much all exports are going to end up returning promises. Do we have any kind of mechanism to limit or control this?

Right now I believe this is realistic. We've had large applications tell us that rewriting the entire Wasm/JS boundary to be async is prohibitively expensive, so they would only be able to use JSPI in a targeted way where they can reason about everything reachable from a specific async export.

That being said, for new applications it would be much better to conservatively assume that any Wasm can suspend by default. Perhaps we want one mode where JSPI is enabled, but only used for APIs that explicitly mention it (e.g. pthread_create_jspi) and another mode where arbitrary library calls are allowed to suspend.

@sbc100
Copy link
Copy Markdown
Collaborator Author

sbc100 commented May 13, 2026

That being said, for new applications it would be much better to conservatively assume that any Wasm can suspend by default. Perhaps we want one mode where JSPI is enabled, but only used for APIs that explicitly mention it (e.g. pthread_create_jspi) and another mode where arbitrary library calls are allowed to suspend.

Presumably this means we want to some way to opt into JSPI on a per-function basis?

Something like -sJSPI_IMPORTS, but for controlling the JS library functions that are marked with __async?

My feeling is that these should all be on-by-default. i.e. if you just add -sJSPI to your build it will use it everywhere it can by default, and you can then opt out for specific things (such as pthread_create) if you choose.

@brendandahl
Copy link
Copy Markdown
Collaborator

brendandahl commented May 13, 2026

I wonder if we could try to use some the code from the original asyncify that tries to work backwards from ASYNCIFY_IMPORTS to determine what other functions could be async. From that we could see what wasm exports are not in the JSPI_EXPORTS list and error out if they are not there. This would probably lead to a lot of false positives though since any indirectly call is assumed to maybe be async.

sbc100 added a commit that referenced this pull request May 13, 2026
We don't need/want threads to be `unref` until the worker is up and
running.   This is important for #26933 since otherwise the node runtime
can shut down while a thread is starting up.

This change also simplifies the code by limiting the number of places we
called `unref` to just one, which also slightly reduces codesize.

Also, remove unused `worker.loaded` assignment.

Split out from #26933
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants