feat: experimental cross-process jobserver via POSIX semaphore by Kha · Pull Request #13856 · leanprover/lean4

Kha · 2026-05-26T17:19:43Z

This PR adds an opt-in cross-process parallelism limit to the Lean runtime. When LEAN_JOB_SEMAPHORE=/name points at a POSIX named semaphore, task_manager's standard workers acquire a token before running a task and release it after, so the total number of concurrently running standard workers across all participating processes is bounded by the semaphore's initial value.

When the env var is unset, behavior is unchanged and there is no overhead. Task.get releases its token while blocked and reacquires before resuming, so a worker waiting on a sub-task cannot starve the global pool. Dedicated workers (priority above LEAN_MAX_PRIO) and LEAN_SYNC_PRIO tasks bypass the worker loop and so do not consume tokens.

This is intended for experimentation, not production: Linux and macOS only (Windows is a no-op), no MAKEFLAGS parsing, no crash-recovery for tokens leaked by killed processes, and no Lake integration

Kha · 2026-05-26T17:19:59Z

!bench

leanprover-radar · 2026-05-26T17:20:22Z

Benchmark results for 387478f against 2cd9863 are in. There are significant results. @Kha

🟥 build exited with code 1
🟥 other exited with code 1

No significant changes detected.

This PR adds an opt-in cross-process parallelism limit to the Lean runtime. When `LEAN_JOB_SEMAPHORE=/name` points at a POSIX named semaphore, `task_manager`'s standard workers acquire a token before running a task and release it after, so the total number of concurrently running standard workers across all participating processes is bounded by the semaphore's initial value. When the env var is unset, behavior is unchanged and there is no overhead. `Task.get` releases its token while blocked and reacquires before resuming, so a worker waiting on a sub-task cannot starve the global pool. Dedicated workers (priority above `LEAN_MAX_PRIO`) and `LEAN_SYNC_PRIO` tasks bypass the worker loop and so do not consume tokens. This is intended for experimentation, not production: Linux and macOS only (Windows is a no-op), no `MAKEFLAGS` parsing, no crash-recovery for tokens leaked by killed processes, and no Lake integration — callers must create and destroy the semaphore themselves.

This PR makes the experimental jobserver self-bootstrapping: when no `LEAN_JOB_SEMAPHORE` is set in the environment, `task_manager` now creates a fresh named semaphore (`/lean-jobs-<pid>`) sized to `max_std_workers`, exports the name via `LEAN_JOB_SEMAPHORE` so child processes inherit it, and `sem_unlink`s on exit. `LEAN_JOB_SEMAPHORE_AUTO=N` overrides the size. The creating process does not gate its own workers against the semaphore. The creator is typically an orchestrator (e.g. `lake`) whose workers block on subprocesses; gating it would consume tokens that its child `lean` processes need, deadlocking the pool. Together with the previous patch this means `lake build` participates in cross-process parallelism limiting with no command-line changes.

leanprover-bot · 2026-05-26T18:11:24Z

Reference manual CI status:

❗ Reference manual CI can not be attempted yet, as the nightly-testing-2026-05-25 tag does not exist there yet. We will retry when you push more commits. If you rebase your branch onto nightly-with-manual, reference manual CI should run now. You can force reference manual CI using the force-manual-ci label. (2026-05-26 18:11:24)

mathlib-lean-pr-testing · 2026-05-26T19:08:16Z

Mathlib CI status (docs):

✅ Mathlib branch lean-pr-testing-13856 has successfully built against this PR. (2026-05-26 19:08:14) View Log
🟡 Mathlib branch lean-pr-testing-13856 build this PR didn't complete normally. (2026-05-27 07:50:30) View Log
🟡 Mathlib branch lean-pr-testing-13856 build against this PR was cancelled. (2026-05-27 09:48:18) View Log

This PR avoids a thread-count cascade that the previous prototype provoked under heavy parallel elaboration. The earlier design released and re-acquired tokens through the global semaphore at every `Task.get` boundary; the blocking `sem_wait` on the re-acquire side let further `Task.get` calls inflate `m_max_std_workers` and spawn additional workers, multiplying the live OS thread count and tripping "failed to create thread" under `RLIMIT_NPROC`. In the new design, when a worker calls `Task.get`, it `sem_post`s its token globally so a sibling can pick up the blocked sub-task, then waits. Sibling `release_token` calls in the same process check whether a `Task.get` is actively waiting (registered on `m_parked_cv` after its `m_task_finished_cv.wait` returns) and, if so, hand the freed token directly to that waiter via `m_parked_cv` instead of `sem_post`. The waiter wakes without a blocking `sem_wait`, so the cascade cannot form. Excess releases (`m_parked_tokens >= m_parked_waiters`) still flow back to the global semaphore, so tokens aren't hoarded. Counting waiters only *after* `m_task_finished_cv.wait` is essential: counting them before would route releases to a pool nobody is listening on, starving the global semaphore and deadlocking workers that are blocked in `sem_wait`.

Kha · 2026-05-26T21:03:16Z

!bench

leanprover-radar · 2026-05-26T21:03:23Z

Benchmark results for 046409e against 2cd9863 are in. There are significant results. @Kha

🟥 build exited with code 137
🟥 other exited with code 137

No significant changes detected.

This PR fixes a deadlock observed during a stage2 build of Lean and at sem=1 in nested `Task.get` smoke tests. The previous patch routed a freed token to the parked pool only when `m_parked_waiters > 0`, and counted the waiter only after `m_task_finished_cv.wait` returned. But the worker that resolves the sub-task holds the lock continuously through `resolve_core` and `release_token`, so the waiter cannot increment `m_parked_waiters` in between — the release always sees `waiters == 0` and `sem_post`s globally instead. The waiter then woke up, found `m_parked_tokens == 0`, and blocked on `m_parked_cv` forever because no further `release_token` was coming. On wake-up, the waiter now tries the parked pool first (in case another in-process release happened to route there), then attempts a non-blocking `sem_trywait` to recover a token the racing release sent to the global semaphore. Only when both fail does it register as a parked waiter and block on `m_parked_cv`. This handles the race without widening the lock scope or changing the waiter-counting policy.

Kha · 2026-05-27T07:18:35Z

!bench

leanprover-radar · 2026-05-27T07:18:42Z

Benchmark results for f3c3b98 against 2cd9863 are in. There are significant results. @Kha

🟥 build exited with code 137
🟥 other exited with code 137

No significant changes detected.

…ersubscription This PR replaces the parked-pool + `sem_trywait`-fallback design with a simpler approach: when `wait_for` cannot reclaim a token non-blockingly, the worker continues running its task un-gated rather than blocking in `sem_wait`. A thread-local `g_holds_token` flag tracks whether the current worker actually has a token; the worker-loop's `release_token` skips its `sem_post` when the flag is false, keeping per-worker token accounting balanced. The previous design either deadlocked (when the in-process parked-pool notification couldn't reach a token that had been taken by another process) or risked re-introducing the original thread-explosion cascade (when the `sem_trywait` fallback hit blocking `sem_wait` under contention). The new design avoids both: no blocking call in `wait_for`'s reclaim, so the cascade can't form; and no in-process-only wakeup, so cross-process token freeing isn't missed. The cost is brief inter-process oversubscription: while a worker runs un-gated, the global cap is exceeded by one. This is bounded per worker by the depth of nested `Task.get` and clears as soon as the worker finishes its current task.

Kha · 2026-05-27T09:15:25Z

!bench

leanprover-radar · 2026-05-27T09:15:32Z

Benchmark results for fefdfc2 against 2cd9863 are in. There are significant results. @Kha

🟥 build exited with code 1
🟥 other exited with code 1

No significant changes detected.

Kha added 2 commits May 26, 2026 17:21

github-actions Bot added the toolchain-available A toolchain is available for this PR, at leanprover/lean4-pr-releases:pr-release-NNNN label May 26, 2026

github-actions Bot added the mathlib4-nightly-available A branch for this PR exists at leanprover-community/mathlib4-nightly-testing:lean-pr-testing-NNNN label May 26, 2026

mathlib-lean-pr-testing Bot added the builds-mathlib CI has verified that Mathlib builds against this PR label May 26, 2026

Kha force-pushed the push-sntnnpnmsktm branch from 387478f to 046409e Compare May 26, 2026 21:03

mathlib-nightly-testing Bot pushed a commit to leanprover-community/batteries that referenced this pull request May 27, 2026

Update lean-toolchain for leanprover/lean4#13856

d9004e5

mathlib-nightly-testing Bot pushed a commit to leanprover-community/mathlib4-nightly-testing that referenced this pull request May 27, 2026

Update lean-toolchain for leanprover/lean4#13856

9b93f50

mathlib-nightly-testing Bot pushed a commit to leanprover-community/batteries that referenced this pull request May 27, 2026

Update lean-toolchain for leanprover/lean4#13856

9e8a94a

mathlib-nightly-testing Bot pushed a commit to leanprover-community/mathlib4-nightly-testing that referenced this pull request May 27, 2026

Update lean-toolchain for leanprover/lean4#13856

f6a6306

mathlib-lean-pr-testing Bot removed the builds-mathlib CI has verified that Mathlib builds against this PR label May 27, 2026

mathlib-nightly-testing Bot pushed a commit to leanprover-community/batteries that referenced this pull request May 27, 2026

Update lean-toolchain for leanprover/lean4#13856

aa915bb

mathlib-nightly-testing Bot pushed a commit to leanprover-community/mathlib4-nightly-testing that referenced this pull request May 27, 2026

Update lean-toolchain for leanprover/lean4#13856

3c5ecf9

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: experimental cross-process jobserver via POSIX semaphore#13856

feat: experimental cross-process jobserver via POSIX semaphore#13856
Kha wants to merge 5 commits into
leanprover:masterfrom
Kha:push-sntnnpnmsktm

Kha commented May 26, 2026

Uh oh!

Kha commented May 26, 2026

Uh oh!

leanprover-radar commented May 26, 2026 •

edited

Loading

Uh oh!

leanprover-bot commented May 26, 2026

Uh oh!

mathlib-lean-pr-testing Bot commented May 26, 2026 •

edited

Loading

Uh oh!

Kha commented May 26, 2026

Uh oh!

leanprover-radar commented May 26, 2026 •

edited

Loading

Uh oh!

Kha commented May 27, 2026

Uh oh!

leanprover-radar commented May 27, 2026 •

edited

Loading

Uh oh!

Kha commented May 27, 2026

Uh oh!

leanprover-radar commented May 27, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

Kha commented May 26, 2026

Uh oh!

Kha commented May 26, 2026

Uh oh!

leanprover-radar commented May 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

leanprover-bot commented May 26, 2026

Uh oh!

mathlib-lean-pr-testing Bot commented May 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Kha commented May 26, 2026

Uh oh!

leanprover-radar commented May 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Kha commented May 27, 2026

Uh oh!

leanprover-radar commented May 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Kha commented May 27, 2026

Uh oh!

leanprover-radar commented May 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

leanprover-radar commented May 26, 2026 •

edited

Loading

mathlib-lean-pr-testing Bot commented May 26, 2026 •

edited

Loading

leanprover-radar commented May 26, 2026 •

edited

Loading

leanprover-radar commented May 27, 2026 •

edited

Loading

leanprover-radar commented May 27, 2026 •

edited

Loading