ci: dynamically limit parallel build jobs to prevent OOM errors#3441
ci: dynamically limit parallel build jobs to prevent OOM errors#3441prasadn1 wants to merge 2 commits intoml-explore:mainfrom
Conversation
zcbenz
left a comment
There was a problem hiding this comment.
We tried to limit parallel builds before but it made compilation time unbearable long. The special thing in our case is that we only have a few kernels that take insane RAM to build so using swap space ends up as a better solution for us:
mlx/.github/actions/setup-linux/action.yml
Lines 57 to 61 in 211e57b
|
If swap is the preferred stopgap, I can update this PR to expand the Windows pagefile during setup so it mirrors the Linux workaround. Alternatively, if you want a stricter build-system fix, we could use CMake Job Pools. We could define a heavy_compilation_pool limited to 1 or 2 jobs and explicitly assign only those specific memory-hungry kernels to it. That allows the other 95% of the framework to safely compile at -j$(nproc) while ensuring the heavy kernels serialize. This should prevent both OOMs and swap-thrashing without penalizing total CI time. Are either of these options helpful? |
|
Didn't know CMake Job Pools before and it sounds a perfect solution! Most of the heavy kernels are under For Windows we are only doing CPU build though because the free runner does not have enough disk space for installing CUDA toolkits and buildings. |
Replaces the global -j parallelism limit with CMake JOB_POOLS. Global -j limits artificially starve the CPU during the shallow parts of the build tree. Instead, this defines a 'heavy_compilation_pool' (max 2 jobs) and explicitly assigns the massive qmm generated targets to it. This allows the vast majority of the framework to compile at -j$(nproc), while mathematically bounding the memory footprint of the heaviest template instantiations, preventing OOMs without relying on OS swap space.
|
I've pivoted the PR to implement a strict build-system fix using CMake JOB_POOLS (supported by the Ninja generator)
Regarding windows: Do you happen to know which specific files/targets in the core C++ or CPU/Metal backends are causing the memory spikes on the Windows and ARM runners? If we can identify the heaviest targets there, we can assign them to this heavy_compilation_pool as well |
|
|
||
| # Define a job pool for heavy template metaprogramming tasks (e.g., quantized matmul) | ||
| # Limit to 2 concurrent jobs to prevent OOM on standard GitHub Actions runners (16GB RAM) | ||
| set_property(GLOBAL PROPERTY JOB_POOLS heavy_compilation_pool=2) |
There was a problem hiding this comment.
Can you only enable job pool when the CI environment variable is true?
The CPU/Metal backends are totally fine building on CI, this is only a CUDA kernel problem. |
Gate the job pool and pool assignments behind `if(DEFINED ENV{CI})`
so local developer builds retain full parallelism.
Heavy C++ compilation (especially linking) can exhaust memory on GitHub Actions runners with high core counts but limited RAM, leading to intermittent OOM failures.
This introduces a cross-platform Python script that calculates a safe -j parallel build limit based on the system's available memory, ensuring builds scale safely across different runner types.
Proposed changes
Please include a description of the problem or feature this PR is addressing. If there is a corresponding issue, include the issue #.
No issue number. Pulled this from internal bug tracking
The Problem:
The current build configuration uses unbounded parallelism based on CPU core count (e.g., -j $(nproc) and %NUMBER_OF_PROCESSORS%). Because heavy C++ compilation and linking for ML kernels require significant
RAM per job (often 3-4GB per core), runners with high core counts but limited RAM (e.g., 8 cores / 16GB RAM) attempt to launch too many heavy processes simultaneously, exceeding the memory limit and causing
the build to crash.
The Solution:
This PR introduces a cross-platform Python utility (.github/scripts/set_cmake_parallel.py) that acts as a "defense in depth" system fix:
This ensures the build system remains hardware-aware, protecting the current 16GB runners from OOMs while allowing the build to automatically and safely scale up if/when higher-spec 32GB or 64GB runners are
provisioned.
Checklist
Put an
xin the boxes that apply.pre-commit run --all-filesto format my code / installed pre-commit prior to committing changes