From 43007fc29479287f33b11ed2f22c5be73f70b5f3 Mon Sep 17 00:00:00 2001 From: psiddh <2467117+psiddh@users.noreply.github.com> Date: Mon, 9 Mar 2026 16:41:53 -0400 Subject: [PATCH 01/14] Expand building Claude skill to cover general ET building from source The existing building skill only covered runners (Makefile targets) and CMake workflow presets. This expands it to be a comprehensive guide for building ExecuTorch from source, including: - Prerequisites and toolchain requirements - Building the Python package (install_executorch.sh with all flags) - Building the C++ runtime standalone (presets, workflows, manual CMake) - Building model runners (Makefile) - Cross-compilation (Android, iOS, macOS, Windows) - Complete build options reference with dependency chains - Common build patterns (minimal, XNNPACK, profiling, tests, subdirectory) - Troubleshooting section covering 12 common build issues: - Submodule issues - Stale build artifacts - CMake version conflicts - Python version mismatch - Dependency version conflicts - Missing python-dev headers - Linking errors with --whole-archive - XNNPACK build failures - Windows symlink errors - MSVC kernel compilation failures - Intel macOS limitations - Duplicate kernel registration - Build output reference table - Tips for faster and more reliable builds --- .claude/skills/building/SKILL.md | 348 ++++++++++++++++++++++++++++++- 1 file changed, 339 insertions(+), 9 deletions(-) diff --git a/.claude/skills/building/SKILL.md b/.claude/skills/building/SKILL.md index 7ff7be38df1..ab63f1606e4 100644 --- a/.claude/skills/building/SKILL.md +++ b/.claude/skills/building/SKILL.md @@ -1,23 +1,353 @@ --- name: building -description: Build ExecuTorch runners or C++ libraries. Use when compiling runners for Llama, Whisper, or other models, or building the C++ runtime. +description: Build ExecuTorch from source — Python package, C++ runtime, runners, cross-compilation, and backend-specific builds. Use when compiling anything in the ExecuTorch repo, diagnosing build failures, or setting up platform-specific builds. --- # Building -## Runners (Makefile) +## Prerequisites + +Before building, ensure the environment is set up (see `/setup` skill): +```bash +conda activate executorch +``` + +Required toolchain: +- **Python** 3.10–3.13 +- **CMake** >= 3.24, < 4.0 +- **C++17** compiler: `g++` >= 7, `clang++` >= 5, or MSVC 2022+ with Clang-CL +- **Git submodules** must be initialized (handled by `install_executorch.sh`, or manually: `git submodule sync && git submodule update --init --recursive`) + +Optional but recommended: +- **ccache** — automatically detected and used if installed (`sudo apt install ccache` / `brew install ccache`) +- **Ninja** — faster than Make (`sudo apt install ninja-build` / `brew install ninja`); use with `-G Ninja` + +## 1. Building the Python Package + +This installs the ExecuTorch Python package (exir, runtime bindings, etc.) into the active environment. + +```bash +# First time (installs deps + builds + installs) +./install_executorch.sh + +# Editable mode (Python changes reflected without rebuild) +./install_executorch.sh --editable + +# Minimal (skip example dependencies) +./install_executorch.sh --minimal + +# Subsequent installs (deps already present) +pip install -e . --no-build-isolation +``` + +**Enable additional backends** during Python install: +```bash +CMAKE_ARGS="-DEXECUTORCH_BUILD_MPS=ON" ./install_executorch.sh +CMAKE_ARGS="-DEXECUTORCH_BUILD_COREML=ON -DEXECUTORCH_BUILD_MPS=ON" ./install_executorch.sh +``` + +**Verify Python install:** +```bash +python -m executorch.examples.xnnpack.aot_compiler --model_name="mv2" --delegate +``` + +## 2. Building the C++ Runtime (Standalone) + +### Using Presets (Recommended) + +```bash +cmake -B cmake-out --preset -DCMAKE_BUILD_TYPE=Release +cmake --build cmake-out -j$(nproc) +``` + +| Preset | Platform | What it builds | +|--------|----------|----------------| +| `linux` | Linux x86_64 | Runtime + XNNPACK + LLM + executor_runner | +| `macos` | macOS | Runtime + XNNPACK + CoreML + MPS + executor_runner | +| `windows` | Windows | Runtime + XNNPACK + executor_runner | +| `llm-release` | Host | LLM extension (CPU, Release) | +| `llm-release-cuda` | Linux/Windows | LLM extension (CUDA, Release) | +| `llm-release-metal` | macOS | LLM extension (Metal, Release) | +| `llm-debug` | Host | LLM extension (CPU, Debug) | +| `llm-debug-cuda` | Linux/Windows | LLM extension (CUDA, Debug) | +| `llm-debug-metal` | macOS | LLM extension (Metal, Debug) | +| `profiling` | Host | Runtime with profiling/event tracing | +| `android-arm64-v8a` | Android | JNI bindings + runtime for arm64 | +| `android-x86_64` | Android | JNI bindings + runtime for x86_64 | +| `ios` | iOS | Frameworks for device | +| `ios-simulator` | iOS Sim | Frameworks for simulator | +| `arm-baremetal` | Embedded | Cortex-M / Ethos-U bare-metal | +| `zephyr` | RTOS | Zephyr RTOS build | + +### Using CMake Workflow Presets + +Workflow presets combine configure + build + install in one command: +```bash +cmake --workflow --preset llm-release # CPU +cmake --workflow --preset llm-release-cuda # CUDA +cmake --workflow --preset llm-release-metal # Metal +``` + +### Manual CMake (No Preset) + +```bash +mkdir -p cmake-out +cmake -B cmake-out \ + -DCMAKE_BUILD_TYPE=Release \ + -DEXECUTORCH_BUILD_XNNPACK=ON \ + -DEXECUTORCH_BUILD_KERNELS_OPTIMIZED=ON \ + -DEXECUTORCH_BUILD_EXTENSION_MODULE=ON \ + -DEXECUTORCH_BUILD_EXTENSION_DATA_LOADER=ON \ + -DEXECUTORCH_BUILD_EXTENSION_TENSOR=ON +cmake --build cmake-out -j$(nproc) +``` + +### Verify C++ Build + +```bash +# Enable executor_runner if not already +cmake -B cmake-out --preset linux -DEXECUTORCH_BUILD_EXECUTOR_RUNNER=ON +cmake --build cmake-out -j$(nproc) +cmake-out/executor_runner --model_path=mv2_xnnpack_fp32.pte +``` + +## 3. Building Runners (Makefile) + +Model-specific runners use the top-level `Makefile`: ```bash make help # list all targets -make llama-cpu # Llama -make whisper-metal # Whisper on Metal +make llama-cpu # Llama on CPU +make llama-cuda # Llama on CUDA +make llama-cuda-debug # Llama on CUDA (debug) +make llava-cpu # Llava on CPU +make gemma3-cpu # Gemma3 on CPU make gemma3-cuda # Gemma3 on CUDA +make whisper-cpu # Whisper on CPU +make whisper-metal # Whisper on Metal +make parakeet-cpu # Parakeet on CPU +make parakeet-metal # Parakeet on Metal +make clean # remove cmake-out/ +``` + +Output binaries: `cmake-out/examples/models//` + +Each `make` target internally runs `cmake --workflow --preset` for the core libraries, then builds the runner on top. + +## 4. Cross-Compilation + +### Android + +```bash +# AAR (Java bindings) +export ANDROID_ABIS=arm64-v8a +export BUILD_AAR_DIR=aar-out +mkdir -p $BUILD_AAR_DIR +sh scripts/build_android_library.sh + +# Native C++ (direct cross-compile) +cmake -B cmake-out \ + -DCMAKE_TOOLCHAIN_FILE=$ANDROID_NDK/build/cmake/android.toolchain.cmake \ + -DANDROID_ABI=arm64-v8a \ + --preset android-arm64-v8a +cmake --build cmake-out -j$(nproc) ``` -Output: `cmake-out/examples/models//` +### iOS / macOS Frameworks -## C++ Libraries (CMake) ```bash -cmake --list-presets # list presets -cmake --workflow --preset llm-release # LLM CPU -cmake --workflow --preset llm-release-metal # LLM Metal +# Build all frameworks +./scripts/build_apple_frameworks.sh + +# With specific backends +./scripts/build_apple_frameworks.sh --coreml --mps --xnnpack ``` + +Link frameworks in Xcode with `-all_load` linker flag. + +### Windows + +Requires Visual Studio 2022+ with Clang-CL: +```bash +cmake -B cmake-out --preset windows -T ClangCL +cmake --build cmake-out --config Release +``` + +**Windows-specific notes:** +- Enable symlinks before cloning: `git config --system core.symlinks true` +- Missing symlinks cause `version.py` errors during `pip install` +- LLM custom kernels and quantized kernels do not compile with MSVC; use `-T ClangCL` or build with CUDA + +## 5. Key Build Options + +| Option | Type | Default | Description | +|--------|------|---------|-------------| +| `CMAKE_BUILD_TYPE` | STRING | Debug | `Debug` or `Release`. Release disables logging/verification, adds optimizations | +| `EXECUTORCH_BUILD_XNNPACK` | BOOL | OFF | XNNPACK CPU backend (requires CPUINFO + PTHREADPOOL) | +| `EXECUTORCH_BUILD_COREML` | BOOL | OFF | Core ML backend (macOS/iOS only) | +| `EXECUTORCH_BUILD_MPS` | BOOL | OFF | MPS GPU backend (macOS/iOS only) | +| `EXECUTORCH_BUILD_CUDA` | BOOL | OFF | CUDA GPU backend (requires EXTENSION_TENSOR) | +| `EXECUTORCH_BUILD_METAL` | BOOL | OFF | Metal backend (requires EXTENSION_TENSOR) | +| `EXECUTORCH_BUILD_VULKAN` | BOOL | OFF | Vulkan GPU backend (Android) | +| `EXECUTORCH_BUILD_QNN` | BOOL | OFF | Qualcomm QNN backend | +| `EXECUTORCH_BUILD_KERNELS_OPTIMIZED` | BOOL | OFF | Optimized kernel implementations | +| `EXECUTORCH_BUILD_KERNELS_QUANTIZED` | BOOL | OFF | Quantized kernel implementations | +| `EXECUTORCH_BUILD_KERNELS_LLM` | BOOL | OFF | LLM custom kernels (requires KERNELS_OPTIMIZED) | +| `EXECUTORCH_BUILD_EXTENSION_MODULE` | BOOL | OFF | Module extension (requires DATA_LOADER + FLAT_TENSOR + NAMED_DATA_MAP) | +| `EXECUTORCH_BUILD_EXTENSION_TENSOR` | BOOL | OFF | Tensor extension | +| `EXECUTORCH_BUILD_EXTENSION_LLM` | BOOL | OFF | LLM extension | +| `EXECUTORCH_BUILD_EXTENSION_LLM_RUNNER` | BOOL | OFF | LLM runner extension (requires EXTENSION_LLM) | +| `EXECUTORCH_BUILD_PYBIND` | BOOL | OFF | Python bindings (requires EXTENSION_MODULE) | +| `EXECUTORCH_BUILD_TESTS` | BOOL | OFF | CMake-based unit tests | +| `EXECUTORCH_BUILD_DEVTOOLS` | BOOL | OFF | Developer tools (Inspector, ETDump) | +| `EXECUTORCH_ENABLE_EVENT_TRACER` | BOOL | OFF | Event tracing (requires DEVTOOLS) | +| `EXECUTORCH_OPTIMIZE_SIZE` | BOOL | OFF | Optimize for binary size (`-Os`, no exceptions/RTTI) | +| `EXECUTORCH_ENABLE_LOGGING` | BOOL | (Debug=ON) | Runtime logging | +| `EXECUTORCH_LOG_LEVEL` | STRING | Info | Log level: Debug, Info, Error, Fatal | +| `EXECUTORCH_USE_SANITIZER` | BOOL | OFF | ASAN + UBSAN (not supported on MSVC) | +| `EXECUTORCH_PAL_DEFAULT` | STRING | posix | Platform abstraction: `posix`, `minimal`, `android` | + +**Dependency chains** — enabling some options requires others: +- `XNNPACK` requires `CPUINFO` + `PTHREADPOOL` +- `KERNELS_LLM` requires `KERNELS_OPTIMIZED` +- `EXTENSION_MODULE` requires `EXTENSION_DATA_LOADER` + `EXTENSION_FLAT_TENSOR` + `EXTENSION_NAMED_DATA_MAP` +- `BUILD_PYBIND` requires `EXTENSION_MODULE` +- `EXTENSION_LLM_RUNNER` requires `EXTENSION_LLM` +- `EVENT_TRACER` requires `DEVTOOLS` +- `CUDA` and `METAL` require `EXTENSION_TENSOR` + +CMake will error with a clear message if a required option is missing. + +## 6. Common Build Patterns + +### Build core runtime only (minimal) +```bash +cmake -B cmake-out -DCMAKE_BUILD_TYPE=Release +cmake --build cmake-out -j$(nproc) +``` + +### Build with XNNPACK backend +```bash +cmake -B cmake-out -DCMAKE_BUILD_TYPE=Release \ + -DEXECUTORCH_BUILD_XNNPACK=ON +cmake --build cmake-out -j$(nproc) +``` + +### Build with profiling +```bash +cmake -B cmake-out --preset profiling +cmake --build cmake-out -j$(nproc) +``` + +### Build tests +```bash +cmake -B cmake-out -DEXECUTORCH_BUILD_TESTS=ON \ + -DEXECUTORCH_BUILD_EXTENSION_FLAT_TENSOR=ON +cmake --build cmake-out -j$(nproc) +ctest --test-dir cmake-out --output-on-failure +``` + +### Using ExecuTorch as a CMake subdirectory +```cmake +add_subdirectory(executorch) +# Set options before add_subdirectory: +set(EXECUTORCH_BUILD_XNNPACK ON) +set(EXECUTORCH_BUILD_EXTENSION_MODULE ON) +``` + +## 7. Troubleshooting + +### Submodule issues +**Symptom:** Build fails with missing headers or `CMakeLists.txt not found` in third-party dirs. +```bash +git submodule sync --recursive +git submodule update --init --recursive +``` + +### Stale build artifacts +**Symptom:** Mysterious failures after pulling new changes or switching branches. +```bash +./install_executorch.sh --clean +# Or manually: +rm -rf cmake-out/ pip-out/ buck-out/ +git submodule sync && git submodule update --init --recursive +``` + +### CMake version conflicts +**Symptom:** `cmake` errors about policy versions or unsupported features. +- ExecuTorch requires CMake >= 3.24, < 4.0 +- Check: `cmake --version` +- If conda and system cmake conflict, ensure conda env cmake is used: `which cmake` should point to conda env + +### Python version mismatch +**Symptom:** `install_executorch.sh` fails early with compatibility errors. +- Supported: Python 3.10–3.13 +- Check: `python --version` + +### Dependency version conflicts +**Symptom:** pip fails with conflicting torch/torchvision/torchaudio versions. +- Use a fresh conda environment +- If pinning to a specific PyTorch version: `./install_executorch.sh --use-pt-pinned-commit` + +### Missing `python-dev` headers +**Symptom:** Build fails looking for `Python.h`. +```bash +sudo apt install python$(python3 -c 'import sys; print(f"{sys.version_info.major}.{sys.version_info.minor}")')-dev +``` + +### Linking errors with `--whole-archive` +**Symptom:** Missing operator registrations at runtime despite building kernels. +- Kernel binding libraries (e.g., `libportable_kernels_bindings.a`) use load-time registration +- Must link with: `-Wl,--whole-archive -Wl,--no-whole-archive` (Linux) or `-Wl,-force_load,` (macOS) + +### XNNPACK build fails +**Symptom:** Errors about missing `cpuinfo` or `pthreadpool`. +- `EXECUTORCH_BUILD_XNNPACK=ON` requires `EXECUTORCH_BUILD_CPUINFO=ON` and `EXECUTORCH_BUILD_PTHREADPOOL=ON` (both ON by default unless `ARM_BAREMETAL` is set) + +### Windows symlink errors +**Symptom:** `version.py` not found or import errors on Windows. +```bash +git config --system core.symlinks true +# Re-clone the repo after enabling +``` + +### MSVC kernel compilation failures +**Symptom:** LLM/quantized kernels fail to compile on Windows with MSVC. +- Use Clang-CL: `cmake -B cmake-out -T ClangCL` +- Or build with CUDA (which uses nvcc, not MSVC for kernels) + +### Intel macOS +**Symptom:** `install_executorch.sh` fails — no prebuilt PyTorch wheels for Intel Mac. +- Must build PyTorch from source, or use `--use-pt-pinned-commit --minimal` + +### Build directory not at repo root +**Symptom:** Include path errors when ExecuTorch checkout is not the top-level directory. +- ExecuTorch adds `..` to include directories; the build directory must be directly under the repo root or use `add_subdirectory` correctly + +### Duplicate kernel registration +**Symptom:** Abort at runtime with duplicate kernel registration. +- Only link one `gen_operators_lib` per target +- Check for multiple kernel binding libraries being linked + +## 8. Build Output + +| Artifact | Location | Description | +|----------|----------|-------------| +| `executor_runner` | `cmake-out/executor_runner` | Standalone model runner | +| Core runtime | `cmake-out/libexecutorch.a` | Core ExecuTorch runtime | +| Portable ops | `cmake-out/kernels/portable/libportable_ops_lib.a` | Portable operator implementations | +| XNNPACK backend | `cmake-out/backends/xnnpack/libxnnpack_backend.a` | XNNPACK delegate | +| LLM runner | `cmake-out/examples/models//` | Model-specific runners | +| Python package | site-packages | `executorch` Python module | +| iOS frameworks | `cmake-out/*.xcframework` | iOS/macOS frameworks | +| Android AAR | `aar-out/` | Android Java bindings | + +## 9. Tips + +- Always use `Release` for performance measurement; `Debug` is 5–10x slower and significantly larger +- Use `ccache` to speed up rebuilds — ExecuTorch auto-detects it +- Use `Ninja` generator (`-G Ninja`) for faster parallel builds +- Use `cmake --list-presets` to see all available presets +- After `git pull`, always clean and re-init submodules before rebuilding +- For LLM workflows, `make -` is the simplest path +- Set `EXECUTORCH_OPTIMIZE_SIZE=ON` for size-constrained deployments +- Check `cmake-out/compile_commands.json` for IDE integration (auto-generated) From f7d0f375b20f4ffefb6d3d8928bc7613a0414fd2 Mon Sep 17 00:00:00 2001 From: Github Executorch Date: Tue, 10 Mar 2026 09:21:06 -0700 Subject: [PATCH 02/14] Refactor building skill from reference manual to action-oriented flow MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Reorganize the building skill to be concise and action-driven rather than encyclopedic. Key changes: - Add "do not ask questions" directive — detect and fix env issues automatically - Make conda the mandatory path (no venvs, no system Python) - Auto-fix for cmake >= 4.0 and PEP 668 errors - Default action is `./install_executorch.sh --editable` - Complete runner table from actual Makefile (adds voxtral, voxtral_realtime, sortformer, silero-vad, llava, debug variants) - Fix macOS commands: $(sysctl -n hw.ncpu) instead of $(nproc) - Note that --preset macos uses Xcode generator - Condense troubleshooting into scannable table - Trim cmake options to most-used flags (full list in CMakeLists.txt) Co-authored-by: Claude --- .claude/skills/building/SKILL.md | 446 ++++++++++--------------------- 1 file changed, 137 insertions(+), 309 deletions(-) diff --git a/.claude/skills/building/SKILL.md b/.claude/skills/building/SKILL.md index ab63f1606e4..9f5f37db85a 100644 --- a/.claude/skills/building/SKILL.md +++ b/.claude/skills/building/SKILL.md @@ -3,95 +3,103 @@ name: building description: Build ExecuTorch from source — Python package, C++ runtime, runners, cross-compilation, and backend-specific builds. Use when compiling anything in the ExecuTorch repo, diagnosing build failures, or setting up platform-specific builds. --- -# Building +# Building ExecuTorch -## Prerequisites +IMPORTANT: Do NOT ask the user questions about environment setup. Detect issues and fix them automatically. The standard path is conda — always use it. + +## Step 1: Ensure conda environment (do this silently, fix issues automatically) -Before building, ensure the environment is set up (see `/setup` skill): ```bash +# Check if executorch conda env exists; create if not +conda env list | grep executorch || conda create -yn executorch python=3.12 'cmake<4' +# Activate conda activate executorch +# Verify — fix automatically if wrong +python --version # need 3.10–3.13 +cmake --version # need >= 3.24, < 4.0; if cmake >= 4, run: pip install 'cmake>=3.24,<4' ``` -Required toolchain: -- **Python** 3.10–3.13 -- **CMake** >= 3.24, < 4.0 -- **C++17** compiler: `g++` >= 7, `clang++` >= 5, or MSVC 2022+ with Clang-CL -- **Git submodules** must be initialized (handled by `install_executorch.sh`, or manually: `git submodule sync && git submodule update --init --recursive`) - -Optional but recommended: -- **ccache** — automatically detected and used if installed (`sudo apt install ccache` / `brew install ccache`) -- **Ninja** — faster than Make (`sudo apt install ninja-build` / `brew install ninja`); use with `-G Ninja` +If cmake is >= 4.0 inside the conda env, fix it: `pip install 'cmake>=3.24,<4'` — do not ask the user. -## 1. Building the Python Package +Parallel jobs: `$(sysctl -n hw.ncpu)` on macOS, `$(nproc)` on Linux. -This installs the ExecuTorch Python package (exir, runtime bindings, etc.) into the active environment. +## Step 2: Build (route by what the user needs) +### Python package (default — use this unless user asks for something specific) ```bash -# First time (installs deps + builds + installs) -./install_executorch.sh - -# Editable mode (Python changes reflected without rebuild) -./install_executorch.sh --editable - -# Minimal (skip example dependencies) -./install_executorch.sh --minimal - -# Subsequent installs (deps already present) -pip install -e . --no-build-isolation +conda activate executorch +./install_executorch.sh --editable # editable install from source ``` +This handles everything: submodules, deps, C++ build, Python install. Takes ~10 min on Apple Silicon. -**Enable additional backends** during Python install: -```bash -CMAKE_ARGS="-DEXECUTORCH_BUILD_MPS=ON" ./install_executorch.sh -CMAKE_ARGS="-DEXECUTORCH_BUILD_COREML=ON -DEXECUTORCH_BUILD_MPS=ON" ./install_executorch.sh -``` +For subsequent rebuilds (deps already present): `pip install -e . --no-build-isolation` + +For minimal install (skip example deps): `./install_executorch.sh --minimal` -**Verify Python install:** +Enable additional backends: ```bash -python -m executorch.examples.xnnpack.aot_compiler --model_name="mv2" --delegate +CMAKE_ARGS="-DEXECUTORCH_BUILD_COREML=ON -DEXECUTORCH_BUILD_MPS=ON" ./install_executorch.sh --editable ``` -## 2. Building the C++ Runtime (Standalone) +Verify: `python -c "from executorch.exir import to_edge_transform_and_lower; print('OK')"` -### Using Presets (Recommended) +### LLM / ASR model runner (simplest path for running models) ```bash -cmake -B cmake-out --preset -DCMAKE_BUILD_TYPE=Release -cmake --build cmake-out -j$(nproc) -``` - -| Preset | Platform | What it builds | -|--------|----------|----------------| -| `linux` | Linux x86_64 | Runtime + XNNPACK + LLM + executor_runner | -| `macos` | macOS | Runtime + XNNPACK + CoreML + MPS + executor_runner | -| `windows` | Windows | Runtime + XNNPACK + executor_runner | -| `llm-release` | Host | LLM extension (CPU, Release) | -| `llm-release-cuda` | Linux/Windows | LLM extension (CUDA, Release) | -| `llm-release-metal` | macOS | LLM extension (Metal, Release) | -| `llm-debug` | Host | LLM extension (CPU, Debug) | -| `llm-debug-cuda` | Linux/Windows | LLM extension (CUDA, Debug) | -| `llm-debug-metal` | macOS | LLM extension (Metal, Debug) | -| `profiling` | Host | Runtime with profiling/event tracing | -| `android-arm64-v8a` | Android | JNI bindings + runtime for arm64 | -| `android-x86_64` | Android | JNI bindings + runtime for x86_64 | -| `ios` | iOS | Frameworks for device | -| `ios-simulator` | iOS Sim | Frameworks for simulator | -| `arm-baremetal` | Embedded | Cortex-M / Ethos-U bare-metal | -| `zephyr` | RTOS | Zephyr RTOS build | - -### Using CMake Workflow Presets - -Workflow presets combine configure + build + install in one command: +conda activate executorch +make - +``` + +Available targets (run `make help` for full list): + +| Target | Backend | macOS | Linux | +|--------|---------|-------|-------| +| `llama-cpu` | CPU | yes | yes | +| `llama-cuda` | CUDA | — | yes | +| `llama-cuda-debug` | CUDA (debug) | — | yes | +| `llava-cpu` | CPU | yes | yes | +| `whisper-cpu` | CPU | yes | yes | +| `whisper-metal` | Metal | yes | — | +| `whisper-cuda` | CUDA | — | yes | +| `parakeet-cpu` | CPU | yes | yes | +| `parakeet-metal` | Metal | yes | — | +| `parakeet-cuda` | CUDA | — | yes | +| `voxtral-cpu` | CPU | yes | yes | +| `voxtral-cuda` | CUDA | — | yes | +| `voxtral-metal` | Metal | yes | — | +| `voxtral_realtime-cpu` | CPU | yes | yes | +| `voxtral_realtime-cuda` | CUDA | — | yes | +| `voxtral_realtime-metal` | Metal | yes | — | +| `gemma3-cpu` | CPU | yes | yes | +| `gemma3-cuda` | CUDA | — | yes | +| `sortformer-cpu` | CPU | yes | yes | +| `sortformer-cuda` | CUDA | — | yes | +| `silero-vad-cpu` | CPU | yes | yes | +| `clean` | — | yes | yes | + +Output: `cmake-out/examples/models//` + +### C++ runtime (standalone) + +**With presets (recommended):** + +| Platform | Command | +|----------|---------| +| macOS | `cmake -B cmake-out --preset macos` (uses Xcode generator — requires Xcode) | +| Linux | `cmake -B cmake-out --preset linux -DCMAKE_BUILD_TYPE=Release` | +| Windows | `cmake -B cmake-out --preset windows -T ClangCL` | + +Then: `cmake --build cmake-out -j$(sysctl -n hw.ncpu)` (macOS) or `cmake --build cmake-out -j$(nproc)` (Linux) + +**LLM libraries via workflow presets** (configure + build + install in one command): ```bash cmake --workflow --preset llm-release # CPU -cmake --workflow --preset llm-release-cuda # CUDA -cmake --workflow --preset llm-release-metal # Metal +cmake --workflow --preset llm-release-metal # Metal (macOS) +cmake --workflow --preset llm-release-cuda # CUDA (Linux) ``` -### Manual CMake (No Preset) - +**Manual CMake (custom flags):** ```bash -mkdir -p cmake-out cmake -B cmake-out \ -DCMAKE_BUILD_TYPE=Release \ -DEXECUTORCH_BUILD_XNNPACK=ON \ @@ -99,255 +107,75 @@ cmake -B cmake-out \ -DEXECUTORCH_BUILD_EXTENSION_MODULE=ON \ -DEXECUTORCH_BUILD_EXTENSION_DATA_LOADER=ON \ -DEXECUTORCH_BUILD_EXTENSION_TENSOR=ON -cmake --build cmake-out -j$(nproc) -``` - -### Verify C++ Build - -```bash -# Enable executor_runner if not already -cmake -B cmake-out --preset linux -DEXECUTORCH_BUILD_EXECUTOR_RUNNER=ON -cmake --build cmake-out -j$(nproc) -cmake-out/executor_runner --model_path=mv2_xnnpack_fp32.pte +cmake --build cmake-out -j$(sysctl -n hw.ncpu) ``` -## 3. Building Runners (Makefile) +Run `cmake --list-presets` to see all available presets. -Model-specific runners use the top-level `Makefile`: -```bash -make help # list all targets -make llama-cpu # Llama on CPU -make llama-cuda # Llama on CUDA -make llama-cuda-debug # Llama on CUDA (debug) -make llava-cpu # Llava on CPU -make gemma3-cpu # Gemma3 on CPU -make gemma3-cuda # Gemma3 on CUDA -make whisper-cpu # Whisper on CPU -make whisper-metal # Whisper on Metal -make parakeet-cpu # Parakeet on CPU -make parakeet-metal # Parakeet on Metal -make clean # remove cmake-out/ -``` - -Output binaries: `cmake-out/examples/models//` - -Each `make` target internally runs `cmake --workflow --preset` for the core libraries, then builds the runner on top. - -## 4. Cross-Compilation - -### Android +### Cross-compilation +**iOS/macOS frameworks:** ```bash -# AAR (Java bindings) -export ANDROID_ABIS=arm64-v8a -export BUILD_AAR_DIR=aar-out -mkdir -p $BUILD_AAR_DIR -sh scripts/build_android_library.sh - -# Native C++ (direct cross-compile) -cmake -B cmake-out \ - -DCMAKE_TOOLCHAIN_FILE=$ANDROID_NDK/build/cmake/android.toolchain.cmake \ - -DANDROID_ABI=arm64-v8a \ - --preset android-arm64-v8a -cmake --build cmake-out -j$(nproc) -``` - -### iOS / macOS Frameworks - -```bash -# Build all frameworks -./scripts/build_apple_frameworks.sh - -# With specific backends ./scripts/build_apple_frameworks.sh --coreml --mps --xnnpack ``` - -Link frameworks in Xcode with `-all_load` linker flag. - -### Windows - -Requires Visual Studio 2022+ with Clang-CL: -```bash -cmake -B cmake-out --preset windows -T ClangCL -cmake --build cmake-out --config Release -``` - -**Windows-specific notes:** -- Enable symlinks before cloning: `git config --system core.symlinks true` -- Missing symlinks cause `version.py` errors during `pip install` -- LLM custom kernels and quantized kernels do not compile with MSVC; use `-T ClangCL` or build with CUDA - -## 5. Key Build Options - -| Option | Type | Default | Description | -|--------|------|---------|-------------| -| `CMAKE_BUILD_TYPE` | STRING | Debug | `Debug` or `Release`. Release disables logging/verification, adds optimizations | -| `EXECUTORCH_BUILD_XNNPACK` | BOOL | OFF | XNNPACK CPU backend (requires CPUINFO + PTHREADPOOL) | -| `EXECUTORCH_BUILD_COREML` | BOOL | OFF | Core ML backend (macOS/iOS only) | -| `EXECUTORCH_BUILD_MPS` | BOOL | OFF | MPS GPU backend (macOS/iOS only) | -| `EXECUTORCH_BUILD_CUDA` | BOOL | OFF | CUDA GPU backend (requires EXTENSION_TENSOR) | -| `EXECUTORCH_BUILD_METAL` | BOOL | OFF | Metal backend (requires EXTENSION_TENSOR) | -| `EXECUTORCH_BUILD_VULKAN` | BOOL | OFF | Vulkan GPU backend (Android) | -| `EXECUTORCH_BUILD_QNN` | BOOL | OFF | Qualcomm QNN backend | -| `EXECUTORCH_BUILD_KERNELS_OPTIMIZED` | BOOL | OFF | Optimized kernel implementations | -| `EXECUTORCH_BUILD_KERNELS_QUANTIZED` | BOOL | OFF | Quantized kernel implementations | -| `EXECUTORCH_BUILD_KERNELS_LLM` | BOOL | OFF | LLM custom kernels (requires KERNELS_OPTIMIZED) | -| `EXECUTORCH_BUILD_EXTENSION_MODULE` | BOOL | OFF | Module extension (requires DATA_LOADER + FLAT_TENSOR + NAMED_DATA_MAP) | -| `EXECUTORCH_BUILD_EXTENSION_TENSOR` | BOOL | OFF | Tensor extension | -| `EXECUTORCH_BUILD_EXTENSION_LLM` | BOOL | OFF | LLM extension | -| `EXECUTORCH_BUILD_EXTENSION_LLM_RUNNER` | BOOL | OFF | LLM runner extension (requires EXTENSION_LLM) | -| `EXECUTORCH_BUILD_PYBIND` | BOOL | OFF | Python bindings (requires EXTENSION_MODULE) | -| `EXECUTORCH_BUILD_TESTS` | BOOL | OFF | CMake-based unit tests | -| `EXECUTORCH_BUILD_DEVTOOLS` | BOOL | OFF | Developer tools (Inspector, ETDump) | -| `EXECUTORCH_ENABLE_EVENT_TRACER` | BOOL | OFF | Event tracing (requires DEVTOOLS) | -| `EXECUTORCH_OPTIMIZE_SIZE` | BOOL | OFF | Optimize for binary size (`-Os`, no exceptions/RTTI) | -| `EXECUTORCH_ENABLE_LOGGING` | BOOL | (Debug=ON) | Runtime logging | -| `EXECUTORCH_LOG_LEVEL` | STRING | Info | Log level: Debug, Info, Error, Fatal | -| `EXECUTORCH_USE_SANITIZER` | BOOL | OFF | ASAN + UBSAN (not supported on MSVC) | -| `EXECUTORCH_PAL_DEFAULT` | STRING | posix | Platform abstraction: `posix`, `minimal`, `android` | - -**Dependency chains** — enabling some options requires others: -- `XNNPACK` requires `CPUINFO` + `PTHREADPOOL` -- `KERNELS_LLM` requires `KERNELS_OPTIMIZED` -- `EXTENSION_MODULE` requires `EXTENSION_DATA_LOADER` + `EXTENSION_FLAT_TENSOR` + `EXTENSION_NAMED_DATA_MAP` -- `BUILD_PYBIND` requires `EXTENSION_MODULE` -- `EXTENSION_LLM_RUNNER` requires `EXTENSION_LLM` -- `EVENT_TRACER` requires `DEVTOOLS` -- `CUDA` and `METAL` require `EXTENSION_TENSOR` - -CMake will error with a clear message if a required option is missing. - -## 6. Common Build Patterns - -### Build core runtime only (minimal) -```bash -cmake -B cmake-out -DCMAKE_BUILD_TYPE=Release -cmake --build cmake-out -j$(nproc) -``` - -### Build with XNNPACK backend -```bash -cmake -B cmake-out -DCMAKE_BUILD_TYPE=Release \ - -DEXECUTORCH_BUILD_XNNPACK=ON -cmake --build cmake-out -j$(nproc) -``` - -### Build with profiling -```bash -cmake -B cmake-out --preset profiling -cmake --build cmake-out -j$(nproc) -``` - -### Build tests -```bash -cmake -B cmake-out -DEXECUTORCH_BUILD_TESTS=ON \ - -DEXECUTORCH_BUILD_EXTENSION_FLAT_TENSOR=ON -cmake --build cmake-out -j$(nproc) -ctest --test-dir cmake-out --output-on-failure -``` - -### Using ExecuTorch as a CMake subdirectory -```cmake -add_subdirectory(executorch) -# Set options before add_subdirectory: -set(EXECUTORCH_BUILD_XNNPACK ON) -set(EXECUTORCH_BUILD_EXTENSION_MODULE ON) -``` - -## 7. Troubleshooting - -### Submodule issues -**Symptom:** Build fails with missing headers or `CMakeLists.txt not found` in third-party dirs. -```bash -git submodule sync --recursive -git submodule update --init --recursive -``` - -### Stale build artifacts -**Symptom:** Mysterious failures after pulling new changes or switching branches. -```bash -./install_executorch.sh --clean -# Or manually: -rm -rf cmake-out/ pip-out/ buck-out/ -git submodule sync && git submodule update --init --recursive -``` - -### CMake version conflicts -**Symptom:** `cmake` errors about policy versions or unsupported features. -- ExecuTorch requires CMake >= 3.24, < 4.0 -- Check: `cmake --version` -- If conda and system cmake conflict, ensure conda env cmake is used: `which cmake` should point to conda env - -### Python version mismatch -**Symptom:** `install_executorch.sh` fails early with compatibility errors. -- Supported: Python 3.10–3.13 -- Check: `python --version` - -### Dependency version conflicts -**Symptom:** pip fails with conflicting torch/torchvision/torchaudio versions. -- Use a fresh conda environment -- If pinning to a specific PyTorch version: `./install_executorch.sh --use-pt-pinned-commit` - -### Missing `python-dev` headers -**Symptom:** Build fails looking for `Python.h`. -```bash -sudo apt install python$(python3 -c 'import sys; print(f"{sys.version_info.major}.{sys.version_info.minor}")')-dev -``` - -### Linking errors with `--whole-archive` -**Symptom:** Missing operator registrations at runtime despite building kernels. -- Kernel binding libraries (e.g., `libportable_kernels_bindings.a`) use load-time registration -- Must link with: `-Wl,--whole-archive -Wl,--no-whole-archive` (Linux) or `-Wl,-force_load,` (macOS) - -### XNNPACK build fails -**Symptom:** Errors about missing `cpuinfo` or `pthreadpool`. -- `EXECUTORCH_BUILD_XNNPACK=ON` requires `EXECUTORCH_BUILD_CPUINFO=ON` and `EXECUTORCH_BUILD_PTHREADPOOL=ON` (both ON by default unless `ARM_BAREMETAL` is set) - -### Windows symlink errors -**Symptom:** `version.py` not found or import errors on Windows. -```bash -git config --system core.symlinks true -# Re-clone the repo after enabling -``` - -### MSVC kernel compilation failures -**Symptom:** LLM/quantized kernels fail to compile on Windows with MSVC. -- Use Clang-CL: `cmake -B cmake-out -T ClangCL` -- Or build with CUDA (which uses nvcc, not MSVC for kernels) - -### Intel macOS -**Symptom:** `install_executorch.sh` fails — no prebuilt PyTorch wheels for Intel Mac. -- Must build PyTorch from source, or use `--use-pt-pinned-commit --minimal` - -### Build directory not at repo root -**Symptom:** Include path errors when ExecuTorch checkout is not the top-level directory. -- ExecuTorch adds `..` to include directories; the build directory must be directly under the repo root or use `add_subdirectory` correctly - -### Duplicate kernel registration -**Symptom:** Abort at runtime with duplicate kernel registration. -- Only link one `gen_operators_lib` per target -- Check for multiple kernel binding libraries being linked - -## 8. Build Output - -| Artifact | Location | Description | -|----------|----------|-------------| -| `executor_runner` | `cmake-out/executor_runner` | Standalone model runner | -| Core runtime | `cmake-out/libexecutorch.a` | Core ExecuTorch runtime | -| Portable ops | `cmake-out/kernels/portable/libportable_ops_lib.a` | Portable operator implementations | -| XNNPACK backend | `cmake-out/backends/xnnpack/libxnnpack_backend.a` | XNNPACK delegate | -| LLM runner | `cmake-out/examples/models//` | Model-specific runners | -| Python package | site-packages | `executorch` Python module | -| iOS frameworks | `cmake-out/*.xcframework` | iOS/macOS frameworks | -| Android AAR | `aar-out/` | Android Java bindings | - -## 9. Tips - -- Always use `Release` for performance measurement; `Debug` is 5–10x slower and significantly larger -- Use `ccache` to speed up rebuilds — ExecuTorch auto-detects it -- Use `Ninja` generator (`-G Ninja`) for faster parallel builds -- Use `cmake --list-presets` to see all available presets -- After `git pull`, always clean and re-init submodules before rebuilding +Link in Xcode with `-all_load` linker flag. + +**Android:** +```bash +export ANDROID_ABIS=arm64-v8a BUILD_AAR_DIR=aar-out +mkdir -p $BUILD_AAR_DIR && sh scripts/build_android_library.sh +``` + +## Key build options + +Most commonly needed flags (full list: `CMakeLists.txt`): + +| Flag | What it enables | +|------|-----------------| +| `EXECUTORCH_BUILD_XNNPACK` | XNNPACK CPU backend | +| `EXECUTORCH_BUILD_COREML` | Core ML (macOS/iOS) | +| `EXECUTORCH_BUILD_MPS` | MPS GPU (macOS/iOS) | +| `EXECUTORCH_BUILD_METAL` | Metal compute (macOS, requires EXTENSION_TENSOR) | +| `EXECUTORCH_BUILD_CUDA` | CUDA GPU (Linux, requires EXTENSION_TENSOR) | +| `EXECUTORCH_BUILD_KERNELS_OPTIMIZED` | Optimized kernels | +| `EXECUTORCH_BUILD_KERNELS_QUANTIZED` | Quantized kernels | +| `EXECUTORCH_BUILD_EXTENSION_MODULE` | Module extension (requires DATA_LOADER + FLAT_TENSOR + NAMED_DATA_MAP) | +| `EXECUTORCH_BUILD_EXTENSION_LLM` | LLM extension | +| `EXECUTORCH_BUILD_TESTS` | Unit tests (`ctest --test-dir cmake-out --output-on-failure`) | +| `EXECUTORCH_BUILD_DEVTOOLS` | DevTools (Inspector, ETDump) | +| `EXECUTORCH_OPTIMIZE_SIZE` | Size-optimized build (`-Os`, no exceptions/RTTI) | +| `CMAKE_BUILD_TYPE` | `Release` (default for presets) or `Debug` (5-10x slower) | + +## Troubleshooting + +| Symptom | Fix | +|---------|-----| +| Missing headers / `CMakeLists.txt not found` in third-party | `git submodule sync --recursive && git submodule update --init --recursive` | +| Mysterious failures after `git pull` or branch switch | `rm -rf cmake-out/ pip-out/ && git submodule sync && git submodule update --init --recursive` | +| CMake >= 4.0 (too new) | `pip install 'cmake>=3.24,<4'` inside the conda env | +| `externally-managed-environment` / PEP 668 error | You're using system Python, not conda. Activate conda env first. | +| pip conflicts with torch versions | Fresh conda env; or `./install_executorch.sh --use-pt-pinned-commit` | +| Missing `Python.h` (Linux) | `sudo apt install python3.X-dev` | +| Missing operator registrations at runtime | Link kernel libs with `-Wl,-force_load,` (macOS) or `-Wl,--whole-archive -Wl,--no-whole-archive` (Linux) | +| `install_executorch.sh` fails on Intel Mac | No prebuilt PyTorch wheels; use `--use-pt-pinned-commit --minimal` | +| XNNPACK build errors about cpuinfo/pthreadpool | Ensure `EXECUTORCH_BUILD_CPUINFO=ON` and `EXECUTORCH_BUILD_PTHREADPOOL=ON` (both ON by default) | +| Duplicate kernel registration abort | Only link one `gen_operators_lib` per target | + +## Build output + +| Artifact | Location | +|----------|----------| +| Core runtime | `cmake-out/libexecutorch.a` | +| executor_runner | `cmake-out/executor_runner` | +| Model runners | `cmake-out/examples/models//` | +| XNNPACK backend | `cmake-out/backends/xnnpack/libxnnpack_backend.a` | +| Python package | `site-packages/executorch` | +| iOS frameworks | `cmake-out/*.xcframework` | +| Android AAR | `aar-out/` | + +## Tips +- Always use `Release` for benchmarking; `Debug` is 5–10x slower +- `ccache` is auto-detected if installed (`brew install ccache`) +- `Ninja` is faster than Make (`-G Ninja`) — but `--preset macos` uses Xcode generator - For LLM workflows, `make -` is the simplest path -- Set `EXECUTORCH_OPTIMIZE_SIZE=ON` for size-constrained deployments -- Check `cmake-out/compile_commands.json` for IDE integration (auto-generated) +- After `git pull`, clean and re-init submodules before rebuilding From c6ba3b0676d6f3f0e0ee6d143279aa82e06c7865 Mon Sep 17 00:00:00 2001 From: Siddartha Pothapragada Date: Tue, 10 Mar 2026 09:32:28 -0700 Subject: [PATCH 03/14] Update .claude/skills/building/SKILL.md Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> --- .claude/skills/building/SKILL.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/.claude/skills/building/SKILL.md b/.claude/skills/building/SKILL.md index 9f5f37db85a..b7446d7efe0 100644 --- a/.claude/skills/building/SKILL.md +++ b/.claude/skills/building/SKILL.md @@ -107,7 +107,7 @@ cmake -B cmake-out \ -DEXECUTORCH_BUILD_EXTENSION_MODULE=ON \ -DEXECUTORCH_BUILD_EXTENSION_DATA_LOADER=ON \ -DEXECUTORCH_BUILD_EXTENSION_TENSOR=ON -cmake --build cmake-out -j$(sysctl -n hw.ncpu) +cmake --build cmake-out --parallel "$(nproc 2>/dev/null || sysctl -n hw.ncpu)" ``` Run `cmake --list-presets` to see all available presets. From f4390deca0286f0e5f744405cf53e1add9b683ad Mon Sep 17 00:00:00 2001 From: Siddartha Pothapragada Date: Tue, 10 Mar 2026 09:32:44 -0700 Subject: [PATCH 04/14] Update .claude/skills/building/SKILL.md Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> --- .claude/skills/building/SKILL.md | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/.claude/skills/building/SKILL.md b/.claude/skills/building/SKILL.md index b7446d7efe0..350b6cdde38 100644 --- a/.claude/skills/building/SKILL.md +++ b/.claude/skills/building/SKILL.md @@ -163,12 +163,14 @@ Most commonly needed flags (full list: `CMakeLists.txt`): ## Build output +Installed artifact locations under `CMAKE_INSTALL_PREFIX=cmake-out`: + | Artifact | Location | |----------|----------| -| Core runtime | `cmake-out/libexecutorch.a` | +| Core runtime | `cmake-out/lib/libexecutorch.a` | | executor_runner | `cmake-out/executor_runner` | | Model runners | `cmake-out/examples/models//` | -| XNNPACK backend | `cmake-out/backends/xnnpack/libxnnpack_backend.a` | +| XNNPACK backend | `cmake-out/lib/libxnnpack_backend.a` | | Python package | `site-packages/executorch` | | iOS frameworks | `cmake-out/*.xcframework` | | Android AAR | `aar-out/` | From cb41d2302fd0b6b173fa75e10f31a0b2d4135e65 Mon Sep 17 00:00:00 2001 From: Siddartha Pothapragada Date: Tue, 10 Mar 2026 09:33:18 -0700 Subject: [PATCH 05/14] Update .claude/skills/building/SKILL.md Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> --- .claude/skills/building/SKILL.md | 2 ++ 1 file changed, 2 insertions(+) diff --git a/.claude/skills/building/SKILL.md b/.claude/skills/building/SKILL.md index 350b6cdde38..5b4fa544938 100644 --- a/.claude/skills/building/SKILL.md +++ b/.claude/skills/building/SKILL.md @@ -105,6 +105,8 @@ cmake -B cmake-out \ -DEXECUTORCH_BUILD_XNNPACK=ON \ -DEXECUTORCH_BUILD_KERNELS_OPTIMIZED=ON \ -DEXECUTORCH_BUILD_EXTENSION_MODULE=ON \ + -DEXECUTORCH_BUILD_EXTENSION_FLAT_TENSOR=ON \ + -DEXECUTORCH_BUILD_EXTENSION_NAMED_DATA_MAP=ON \ -DEXECUTORCH_BUILD_EXTENSION_DATA_LOADER=ON \ -DEXECUTORCH_BUILD_EXTENSION_TENSOR=ON cmake --build cmake-out --parallel "$(nproc 2>/dev/null || sysctl -n hw.ncpu)" From e0337646ca7cd9dda17a6939b1fab03a6b348c27 Mon Sep 17 00:00:00 2001 From: Siddartha Pothapragada Date: Tue, 10 Mar 2026 09:42:39 -0700 Subject: [PATCH 06/14] Update .claude/skills/building/SKILL.md Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> --- .claude/skills/building/SKILL.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/.claude/skills/building/SKILL.md b/.claude/skills/building/SKILL.md index 5b4fa544938..10d878b4c86 100644 --- a/.claude/skills/building/SKILL.md +++ b/.claude/skills/building/SKILL.md @@ -165,12 +165,12 @@ Most commonly needed flags (full list: `CMakeLists.txt`): ## Build output -Installed artifact locations under `CMAKE_INSTALL_PREFIX=cmake-out`: +Installed artifact locations after `cmake --install` (or `./install_executorch.sh`) with `CMAKE_INSTALL_PREFIX=cmake-out`: | Artifact | Location | |----------|----------| | Core runtime | `cmake-out/lib/libexecutorch.a` | -| executor_runner | `cmake-out/executor_runner` | +| executor_runner (built only; not installed by default) | **build tree**: `/executor_runner` (Ninja/Make) or `//executor_runner` (e.g., `cmake-out/Release/executor_runner` with Xcode/Visual Studio) | | Model runners | `cmake-out/examples/models//` | | XNNPACK backend | `cmake-out/lib/libxnnpack_backend.a` | | Python package | `site-packages/executorch` | From d6c134b7f3fe14084da4d5417f06aab5743d3e06 Mon Sep 17 00:00:00 2001 From: Siddartha Pothapragada Date: Tue, 10 Mar 2026 09:42:50 -0700 Subject: [PATCH 07/14] Update .claude/skills/building/SKILL.md Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> --- .claude/skills/building/SKILL.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/.claude/skills/building/SKILL.md b/.claude/skills/building/SKILL.md index 10d878b4c86..32c28b45f24 100644 --- a/.claude/skills/building/SKILL.md +++ b/.claude/skills/building/SKILL.md @@ -95,7 +95,7 @@ Then: `cmake --build cmake-out -j$(sysctl -n hw.ncpu)` (macOS) or `cmake --build ```bash cmake --workflow --preset llm-release # CPU cmake --workflow --preset llm-release-metal # Metal (macOS) -cmake --workflow --preset llm-release-cuda # CUDA (Linux) +cmake --workflow --preset llm-release-cuda # CUDA (Linux/Windows) ``` **Manual CMake (custom flags):** From ffc0722008be325839e9b75f0d58d7b2d8747b11 Mon Sep 17 00:00:00 2001 From: Siddartha Pothapragada Date: Tue, 10 Mar 2026 09:43:02 -0700 Subject: [PATCH 08/14] Update .claude/skills/building/SKILL.md Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> --- .claude/skills/building/SKILL.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/.claude/skills/building/SKILL.md b/.claude/skills/building/SKILL.md index 32c28b45f24..1ec474ab3cc 100644 --- a/.claude/skills/building/SKILL.md +++ b/.claude/skills/building/SKILL.md @@ -138,7 +138,7 @@ Most commonly needed flags (full list: `CMakeLists.txt`): | `EXECUTORCH_BUILD_COREML` | Core ML (macOS/iOS) | | `EXECUTORCH_BUILD_MPS` | MPS GPU (macOS/iOS) | | `EXECUTORCH_BUILD_METAL` | Metal compute (macOS, requires EXTENSION_TENSOR) | -| `EXECUTORCH_BUILD_CUDA` | CUDA GPU (Linux, requires EXTENSION_TENSOR) | +| `EXECUTORCH_BUILD_CUDA` | CUDA GPU (Linux/Windows, requires EXTENSION_TENSOR) | | `EXECUTORCH_BUILD_KERNELS_OPTIMIZED` | Optimized kernels | | `EXECUTORCH_BUILD_KERNELS_QUANTIZED` | Quantized kernels | | `EXECUTORCH_BUILD_EXTENSION_MODULE` | Module extension (requires DATA_LOADER + FLAT_TENSOR + NAMED_DATA_MAP) | From 179f84e47ff93fb48aa8b631429c5b56c4c17e68 Mon Sep 17 00:00:00 2001 From: Github Executorch Date: Tue, 10 Mar 2026 13:21:28 -0700 Subject: [PATCH 09/14] Harden building skill from e2e testing - Add venv fallback when conda is not installed - Handle conda PermissionError by checking env directory on disk - Auto-fix cmake: missing or < 3.24 gets pip-installed, >= 4.0 works fine - Add troubleshooting entries for conda not found and PEP 668 errors - Remove heavy-handed directive banner; let skill structure guide behavior Co-authored-by: Claude --- .claude/skills/building/SKILL.md | 21 ++++++++++++--------- 1 file changed, 12 insertions(+), 9 deletions(-) diff --git a/.claude/skills/building/SKILL.md b/.claude/skills/building/SKILL.md index 1ec474ab3cc..55fe237c10d 100644 --- a/.claude/skills/building/SKILL.md +++ b/.claude/skills/building/SKILL.md @@ -5,22 +5,24 @@ description: Build ExecuTorch from source — Python package, C++ runtime, runne # Building ExecuTorch -IMPORTANT: Do NOT ask the user questions about environment setup. Detect issues and fix them automatically. The standard path is conda — always use it. - -## Step 1: Ensure conda environment (do this silently, fix issues automatically) +## Step 1: Ensure Python environment (detect and fix automatically) ```bash # Check if executorch conda env exists; create if not -conda env list | grep executorch || conda create -yn executorch python=3.12 'cmake<4' +# Note: `conda env list` may fail with PermissionError on some setups. +# Fallback: check if the env directory exists on disk. +conda env list 2>/dev/null | grep executorch || \ + ls "$CONDA_PREFIX/../envs/" 2>/dev/null | grep executorch || \ + conda create -yn executorch python=3.12 + # Activate conda activate executorch -# Verify — fix automatically if wrong + +# Verify python --version # need 3.10–3.13 -cmake --version # need >= 3.24, < 4.0; if cmake >= 4, run: pip install 'cmake>=3.24,<4' +cmake --version # need >= 3.24; cmake 4.x works in practice ``` -If cmake is >= 4.0 inside the conda env, fix it: `pip install 'cmake>=3.24,<4'` — do not ask the user. - Parallel jobs: `$(sysctl -n hw.ncpu)` on macOS, `$(nproc)` on Linux. ## Step 2: Build (route by what the user needs) @@ -154,7 +156,8 @@ Most commonly needed flags (full list: `CMakeLists.txt`): |---------|-----| | Missing headers / `CMakeLists.txt not found` in third-party | `git submodule sync --recursive && git submodule update --init --recursive` | | Mysterious failures after `git pull` or branch switch | `rm -rf cmake-out/ pip-out/ && git submodule sync && git submodule update --init --recursive` | -| CMake >= 4.0 (too new) | `pip install 'cmake>=3.24,<4'` inside the conda env | +| `conda env list` PermissionError | Use `CONDA_NO_PLUGINS=true conda env list` or check env dir directly | +| CMake >= 4.0 | Works in practice despite `< 4.0` in docs; only fix if build actually fails | | `externally-managed-environment` / PEP 668 error | You're using system Python, not conda. Activate conda env first. | | pip conflicts with torch versions | Fresh conda env; or `./install_executorch.sh --use-pt-pinned-commit` | | Missing `Python.h` (Linux) | `sudo apt install python3.X-dev` | From d75acb2a06faced75b326d584d98351e8f4bbcca Mon Sep 17 00:00:00 2001 From: Github Executorch Date: Tue, 10 Mar 2026 13:24:53 -0700 Subject: [PATCH 10/14] Add routing table to building skill for Android/iOS/model targets Explicit decision tree at the top of Step 2 so Claude routes to the right section based on keywords (Android, iOS, model names, cmake) instead of always defaulting to the Python package build. Co-authored-by: Claude --- .claude/skills/building/SKILL.md | 11 +++++++++-- 1 file changed, 9 insertions(+), 2 deletions(-) diff --git a/.claude/skills/building/SKILL.md b/.claude/skills/building/SKILL.md index 55fe237c10d..2c6d3ada155 100644 --- a/.claude/skills/building/SKILL.md +++ b/.claude/skills/building/SKILL.md @@ -25,9 +25,16 @@ cmake --version # need >= 3.24; cmake 4.x works in practice Parallel jobs: `$(sysctl -n hw.ncpu)` on macOS, `$(nproc)` on Linux. -## Step 2: Build (route by what the user needs) +## Step 2: Build -### Python package (default — use this unless user asks for something specific) +Route based on what the user asks for: +- User mentions **Android** → skip to [Cross-compilation: Android](#cross-compilation) +- User mentions **iOS** or **frameworks** → skip to [Cross-compilation: iOS](#cross-compilation) +- User mentions a **model name** (llama, whisper, etc.) → skip to [LLM / ASR model runner](#llm--asr-model-runner-simplest-path-for-running-models) +- User mentions **C++ runtime** or **cmake** → skip to [C++ runtime](#c-runtime-standalone) +- Otherwise → default to **Python package** below + +### Python package (default) ```bash conda activate executorch ./install_executorch.sh --editable # editable install from source From 62d417dec4b5c8112c7954474c3c4843c2ea3df2 Mon Sep 17 00:00:00 2001 From: Siddartha Pothapragada Date: Tue, 10 Mar 2026 13:28:31 -0700 Subject: [PATCH 11/14] Update .claude/skills/building/SKILL.md Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> --- .claude/skills/building/SKILL.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/.claude/skills/building/SKILL.md b/.claude/skills/building/SKILL.md index 2c6d3ada155..7d4b5f75208 100644 --- a/.claude/skills/building/SKILL.md +++ b/.claude/skills/building/SKILL.md @@ -98,7 +98,7 @@ Output: `cmake-out/examples/models//` | Linux | `cmake -B cmake-out --preset linux -DCMAKE_BUILD_TYPE=Release` | | Windows | `cmake -B cmake-out --preset windows -T ClangCL` | -Then: `cmake --build cmake-out -j$(sysctl -n hw.ncpu)` (macOS) or `cmake --build cmake-out -j$(nproc)` (Linux) +Then: `cmake --build cmake-out --config Release -j$(sysctl -n hw.ncpu)` (macOS) or `cmake --build cmake-out -j$(nproc)` (Linux) **LLM libraries via workflow presets** (configure + build + install in one command): ```bash From 74d0c7ec50d1bface42d381e1ab9639ac7f078a1 Mon Sep 17 00:00:00 2001 From: Github Executorch Date: Tue, 10 Mar 2026 13:34:22 -0700 Subject: [PATCH 12/14] Address PR review comments on building skill - Add ANDROID_NDK requirement and verification to Android section - Fix CMAKE_BUILD_TYPE description: not all presets set it - Separate build output table by flow (pip vs cmake vs cross-compilation) Co-authored-by: Claude --- .claude/skills/building/SKILL.md | 24 +++++++++++++++++++----- 1 file changed, 19 insertions(+), 5 deletions(-) diff --git a/.claude/skills/building/SKILL.md b/.claude/skills/building/SKILL.md index 7d4b5f75208..5ad9b132510 100644 --- a/.claude/skills/building/SKILL.md +++ b/.claude/skills/building/SKILL.md @@ -132,7 +132,11 @@ Run `cmake --list-presets` to see all available presets. Link in Xcode with `-all_load` linker flag. **Android:** + +Requires `ANDROID_NDK` on PATH (typically set by Android Studio or standalone NDK install). ```bash +# Verify NDK is available +echo $ANDROID_NDK # must point to NDK root, e.g. ~/Library/Android/sdk/ndk/ export ANDROID_ABIS=arm64-v8a BUILD_AAR_DIR=aar-out mkdir -p $BUILD_AAR_DIR && sh scripts/build_android_library.sh ``` @@ -155,7 +159,7 @@ Most commonly needed flags (full list: `CMakeLists.txt`): | `EXECUTORCH_BUILD_TESTS` | Unit tests (`ctest --test-dir cmake-out --output-on-failure`) | | `EXECUTORCH_BUILD_DEVTOOLS` | DevTools (Inspector, ETDump) | | `EXECUTORCH_OPTIMIZE_SIZE` | Size-optimized build (`-Os`, no exceptions/RTTI) | -| `CMAKE_BUILD_TYPE` | `Release` (default for presets) or `Debug` (5-10x slower) | +| `CMAKE_BUILD_TYPE` | `Release` or `Debug` (5-10x slower). Some presets (e.g. `llm-release`) set this; others require it explicitly. | ## Troubleshooting @@ -175,15 +179,25 @@ Most commonly needed flags (full list: `CMakeLists.txt`): ## Build output -Installed artifact locations after `cmake --install` (or `./install_executorch.sh`) with `CMAKE_INSTALL_PREFIX=cmake-out`: +**From `./install_executorch.sh` (Python package):** + +| Artifact | Location | +|----------|----------| +| Python package | `site-packages/executorch` | + +**From CMake builds** (`cmake --install` with `CMAKE_INSTALL_PREFIX=cmake-out`): | Artifact | Location | |----------|----------| | Core runtime | `cmake-out/lib/libexecutorch.a` | -| executor_runner (built only; not installed by default) | **build tree**: `/executor_runner` (Ninja/Make) or `//executor_runner` (e.g., `cmake-out/Release/executor_runner` with Xcode/Visual Studio) | -| Model runners | `cmake-out/examples/models//` | | XNNPACK backend | `cmake-out/lib/libxnnpack_backend.a` | -| Python package | `site-packages/executorch` | +| executor_runner | `cmake-out/executor_runner` (Ninja/Make) or `cmake-out/Release/executor_runner` (Xcode) | +| Model runners | `cmake-out/examples/models//` | + +**From cross-compilation:** + +| Artifact | Location | +|----------|----------| | iOS frameworks | `cmake-out/*.xcframework` | | Android AAR | `aar-out/` | From 6209f273ded2976d1e10541a22e4adb7308e9429 Mon Sep 17 00:00:00 2001 From: Github Executorch Date: Tue, 10 Mar 2026 13:41:09 -0700 Subject: [PATCH 13/14] Fix fresh-Mac gaps: Xcode CLT, conda shell hook, Python version fallback Three issues that would break a fresh Mac checkout: - Add Xcode Command Line Tools prerequisite check - Add conda shell.bash hook for non-interactive shells (Claude Code / CI) - Add brew install python@3.12 guidance for venv path when only 3.14+ exists Co-authored-by: Claude --- .claude/skills/building/SKILL.md | 33 ++++++++++++++++++++++++++------ 1 file changed, 27 insertions(+), 6 deletions(-) diff --git a/.claude/skills/building/SKILL.md b/.claude/skills/building/SKILL.md index 5ad9b132510..d349b50bd24 100644 --- a/.claude/skills/building/SKILL.md +++ b/.claude/skills/building/SKILL.md @@ -5,24 +5,45 @@ description: Build ExecuTorch from source — Python package, C++ runtime, runne # Building ExecuTorch +## Prerequisites (macOS) + +A C++ compiler is required. On macOS, ensure Xcode Command Line Tools are installed: +```bash +xcode-select -p || xcode-select --install +``` + ## Step 1: Ensure Python environment (detect and fix automatically) +**Path A — conda (preferred):** ```bash +# Initialize conda for non-interactive shells (required in Claude Code / CI) +eval "$(conda shell.bash hook 2>/dev/null)" + # Check if executorch conda env exists; create if not -# Note: `conda env list` may fail with PermissionError on some setups. -# Fallback: check if the env directory exists on disk. conda env list 2>/dev/null | grep executorch || \ - ls "$CONDA_PREFIX/../envs/" 2>/dev/null | grep executorch || \ + ls "$(conda info --base 2>/dev/null)/envs/" 2>/dev/null | grep executorch || \ conda create -yn executorch python=3.12 # Activate conda activate executorch +``` -# Verify -python --version # need 3.10–3.13 -cmake --version # need >= 3.24; cmake 4.x works in practice +**Path B — no conda (fall back to venv):** +```bash +# Find a compatible Python (3.10–3.13). On macOS with only Homebrew Python 3.14+, +# install a compatible version first: brew install python@3.12 +python3.12 -m venv .executorch-venv # or python3.11, python3.10, python3.13 +source .executorch-venv/bin/activate +pip install --upgrade pip ``` +**Then verify (either path):** + +Run `python --version` and `cmake --version`. Fix automatically: +- **Python not 3.10–3.13**: recreate the env with a correct Python version. +- **cmake missing or < 3.24**: run `pip install 'cmake>=3.24'` inside the env. +- **cmake >= 4.0**: works in practice, no action needed. + Parallel jobs: `$(sysctl -n hw.ncpu)` on macOS, `$(nproc)` on Linux. ## Step 2: Build From d2e8919374ef4c06bdcedf0d51a7f907888a34da Mon Sep 17 00:00:00 2001 From: Github Executorch Date: Tue, 10 Mar 2026 13:42:53 -0700 Subject: [PATCH 14/14] =?UTF-8?q?Remove=20Xcode=20CLT=20prerequisite=20?= =?UTF-8?q?=E2=80=94=20not=20in=20ET=20docs,=20rarely=20needed?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Co-authored-by: Claude --- .claude/skills/building/SKILL.md | 7 ------- 1 file changed, 7 deletions(-) diff --git a/.claude/skills/building/SKILL.md b/.claude/skills/building/SKILL.md index d349b50bd24..d1322cdecae 100644 --- a/.claude/skills/building/SKILL.md +++ b/.claude/skills/building/SKILL.md @@ -5,13 +5,6 @@ description: Build ExecuTorch from source — Python package, C++ runtime, runne # Building ExecuTorch -## Prerequisites (macOS) - -A C++ compiler is required. On macOS, ensure Xcode Command Line Tools are installed: -```bash -xcode-select -p || xcode-select --install -``` - ## Step 1: Ensure Python environment (detect and fix automatically) **Path A — conda (preferred):**