From 43007fc29479287f33b11ed2f22c5be73f70b5f3 Mon Sep 17 00:00:00 2001
From: psiddh <2467117+psiddh@users.noreply.github.com>
Date: Mon, 9 Mar 2026 16:41:53 -0400
Subject: [PATCH 01/14] Expand building Claude skill to cover general ET
 building from source

The existing building skill only covered runners (Makefile targets) and
CMake workflow presets. This expands it to be a comprehensive guide for
building ExecuTorch from source, including:

- Prerequisites and toolchain requirements
- Building the Python package (install_executorch.sh with all flags)
- Building the C++ runtime standalone (presets, workflows, manual CMake)
- Building model runners (Makefile)
- Cross-compilation (Android, iOS, macOS, Windows)
- Complete build options reference with dependency chains
- Common build patterns (minimal, XNNPACK, profiling, tests, subdirectory)
- Troubleshooting section covering 12 common build issues:
  - Submodule issues
  - Stale build artifacts
  - CMake version conflicts
  - Python version mismatch
  - Dependency version conflicts
  - Missing python-dev headers
  - Linking errors with --whole-archive
  - XNNPACK build failures
  - Windows symlink errors
  - MSVC kernel compilation failures
  - Intel macOS limitations
  - Duplicate kernel registration
- Build output reference table
- Tips for faster and more reliable builds
---
 .claude/skills/building/SKILL.md | 348 ++++++++++++++++++++++++++++++-
 1 file changed, 339 insertions(+), 9 deletions(-)
diff --git a/.claude/skills/building/SKILL.md b/.claude/skills/building/SKILL.md
index 7ff7be38df1..ab63f1606e4 100644
--- a/.claude/skills/building/SKILL.md
+++ b/.claude/skills/building/SKILL.md
@@ -1,23 +1,353 @@
 ---
 name: building
-description: Build ExecuTorch runners or C++ libraries. Use when compiling runners for Llama, Whisper, or other models, or building the C++ runtime.
+description: Build ExecuTorch from source — Python package, C++ runtime, runners, cross-compilation, and backend-specific builds. Use when compiling anything in the ExecuTorch repo, diagnosing build failures, or setting up platform-specific builds.
 ---
 
 # Building
 
-## Runners (Makefile)
+## Prerequisites
+
+Before building, ensure the environment is set up (see `/setup` skill):
+```bash
+conda activate executorch
+```
+
+Required toolchain:
+- **Python** 3.10–3.13
+- **CMake** >= 3.24, < 4.0
+- **C++17** compiler: `g++` >= 7, `clang++` >= 5, or MSVC 2022+ with Clang-CL
+- **Git submodules** must be initialized (handled by `install_executorch.sh`, or manually: `git submodule sync && git submodule update --init --recursive`)
+
+Optional but recommended:
+- **ccache** — automatically detected and used if installed (`sudo apt install ccache` / `brew install ccache`)
+- **Ninja** — faster than Make (`sudo apt install ninja-build` / `brew install ninja`); use with `-G Ninja`
+
+## 1. Building the Python Package
+
+This installs the ExecuTorch Python package (exir, runtime bindings, etc.) into the active environment.
+
+```bash
+# First time (installs deps + builds + installs)
+./install_executorch.sh
+
+# Editable mode (Python changes reflected without rebuild)
+./install_executorch.sh --editable
+
+# Minimal (skip example dependencies)
+./install_executorch.sh --minimal
+
+# Subsequent installs (deps already present)
+pip install -e . --no-build-isolation
+```
+
+**Enable additional backends** during Python install:
+```bash
+CMAKE_ARGS="-DEXECUTORCH_BUILD_MPS=ON" ./install_executorch.sh
+CMAKE_ARGS="-DEXECUTORCH_BUILD_COREML=ON -DEXECUTORCH_BUILD_MPS=ON" ./install_executorch.sh
+```
+
+**Verify Python install:**
+```bash
+python -m executorch.examples.xnnpack.aot_compiler --model_name="mv2" --delegate
+```
+
+## 2. Building the C++ Runtime (Standalone)
+
+### Using Presets (Recommended)
+
+```bash
+cmake -B cmake-out --preset <preset> -DCMAKE_BUILD_TYPE=Release
+cmake --build cmake-out -j$(nproc)
+```
+
+| Preset | Platform | What it builds |
+|--------|----------|----------------|
+| `linux` | Linux x86_64 | Runtime + XNNPACK + LLM + executor_runner |
+| `macos` | macOS | Runtime + XNNPACK + CoreML + MPS + executor_runner |
+| `windows` | Windows | Runtime + XNNPACK + executor_runner |
+| `llm-release` | Host | LLM extension (CPU, Release) |
+| `llm-release-cuda` | Linux/Windows | LLM extension (CUDA, Release) |
+| `llm-release-metal` | macOS | LLM extension (Metal, Release) |
+| `llm-debug` | Host | LLM extension (CPU, Debug) |
+| `llm-debug-cuda` | Linux/Windows | LLM extension (CUDA, Debug) |
+| `llm-debug-metal` | macOS | LLM extension (Metal, Debug) |
+| `profiling` | Host | Runtime with profiling/event tracing |
+| `android-arm64-v8a` | Android | JNI bindings + runtime for arm64 |
+| `android-x86_64` | Android | JNI bindings + runtime for x86_64 |
+| `ios` | iOS | Frameworks for device |
+| `ios-simulator` | iOS Sim | Frameworks for simulator |
+| `arm-baremetal` | Embedded | Cortex-M / Ethos-U bare-metal |
+| `zephyr` | RTOS | Zephyr RTOS build |
+
+### Using CMake Workflow Presets
+
+Workflow presets combine configure + build + install in one command:
+```bash
+cmake --workflow --preset llm-release        # CPU
+cmake --workflow --preset llm-release-cuda   # CUDA
+cmake --workflow --preset llm-release-metal  # Metal
+```
+
+### Manual CMake (No Preset)
+
+```bash
+mkdir -p cmake-out
+cmake -B cmake-out \
+  -DCMAKE_BUILD_TYPE=Release \
+  -DEXECUTORCH_BUILD_XNNPACK=ON \
+  -DEXECUTORCH_BUILD_KERNELS_OPTIMIZED=ON \
+  -DEXECUTORCH_BUILD_EXTENSION_MODULE=ON \
+  -DEXECUTORCH_BUILD_EXTENSION_DATA_LOADER=ON \
+  -DEXECUTORCH_BUILD_EXTENSION_TENSOR=ON
+cmake --build cmake-out -j$(nproc)
+```
+
+### Verify C++ Build
+
+```bash
+# Enable executor_runner if not already
+cmake -B cmake-out --preset linux -DEXECUTORCH_BUILD_EXECUTOR_RUNNER=ON
+cmake --build cmake-out -j$(nproc)
+cmake-out/executor_runner --model_path=mv2_xnnpack_fp32.pte
+```
+
+## 3. Building Runners (Makefile)
+
+Model-specific runners use the top-level `Makefile`:
 ```bash
 make help              # list all targets
-make llama-cpu         # Llama
-make whisper-metal     # Whisper on Metal
+make llama-cpu         # Llama on CPU
+make llama-cuda        # Llama on CUDA
+make llama-cuda-debug  # Llama on CUDA (debug)
+make llava-cpu         # Llava on CPU
+make gemma3-cpu        # Gemma3 on CPU
 make gemma3-cuda       # Gemma3 on CUDA
+make whisper-cpu       # Whisper on CPU
+make whisper-metal     # Whisper on Metal
+make parakeet-cpu      # Parakeet on CPU
+make parakeet-metal    # Parakeet on Metal
+make clean             # remove cmake-out/
+```
+
+Output binaries: `cmake-out/examples/models/<model>/<runner>`
+
+Each `make` target internally runs `cmake --workflow --preset` for the core libraries, then builds the runner on top.
+
+## 4. Cross-Compilation
+
+### Android
+
+```bash
+# AAR (Java bindings)
+export ANDROID_ABIS=arm64-v8a
+export BUILD_AAR_DIR=aar-out
+mkdir -p $BUILD_AAR_DIR
+sh scripts/build_android_library.sh
+
+# Native C++ (direct cross-compile)
+cmake -B cmake-out \
+  -DCMAKE_TOOLCHAIN_FILE=$ANDROID_NDK/build/cmake/android.toolchain.cmake \
+  -DANDROID_ABI=arm64-v8a \
+  --preset android-arm64-v8a
+cmake --build cmake-out -j$(nproc)
 ```
 
-Output: `cmake-out/examples/models/<model>/<runner>`
+### iOS / macOS Frameworks
 
-## C++ Libraries (CMake)
 ```bash
-cmake --list-presets                    # list presets
-cmake --workflow --preset llm-release   # LLM CPU
-cmake --workflow --preset llm-release-metal  # LLM Metal
+# Build all frameworks
+./scripts/build_apple_frameworks.sh
+
+# With specific backends
+./scripts/build_apple_frameworks.sh --coreml --mps --xnnpack
 ```
+
+Link frameworks in Xcode with `-all_load` linker flag.
+
+### Windows
+
+Requires Visual Studio 2022+ with Clang-CL:
+```bash
+cmake -B cmake-out --preset windows -T ClangCL
+cmake --build cmake-out --config Release
+```
+
+**Windows-specific notes:**
+- Enable symlinks before cloning: `git config --system core.symlinks true`
+- Missing symlinks cause `version.py` errors during `pip install`
+- LLM custom kernels and quantized kernels do not compile with MSVC; use `-T ClangCL` or build with CUDA
+
+## 5. Key Build Options
+
+| Option | Type | Default | Description |
+|--------|------|---------|-------------|
+| `CMAKE_BUILD_TYPE` | STRING | Debug | `Debug` or `Release`. Release disables logging/verification, adds optimizations |
+| `EXECUTORCH_BUILD_XNNPACK` | BOOL | OFF | XNNPACK CPU backend (requires CPUINFO + PTHREADPOOL) |
+| `EXECUTORCH_BUILD_COREML` | BOOL | OFF | Core ML backend (macOS/iOS only) |
+| `EXECUTORCH_BUILD_MPS` | BOOL | OFF | MPS GPU backend (macOS/iOS only) |
+| `EXECUTORCH_BUILD_CUDA` | BOOL | OFF | CUDA GPU backend (requires EXTENSION_TENSOR) |
+| `EXECUTORCH_BUILD_METAL` | BOOL | OFF | Metal backend (requires EXTENSION_TENSOR) |
+| `EXECUTORCH_BUILD_VULKAN` | BOOL | OFF | Vulkan GPU backend (Android) |
+| `EXECUTORCH_BUILD_QNN` | BOOL | OFF | Qualcomm QNN backend |
+| `EXECUTORCH_BUILD_KERNELS_OPTIMIZED` | BOOL | OFF | Optimized kernel implementations |
+| `EXECUTORCH_BUILD_KERNELS_QUANTIZED` | BOOL | OFF | Quantized kernel implementations |
+| `EXECUTORCH_BUILD_KERNELS_LLM` | BOOL | OFF | LLM custom kernels (requires KERNELS_OPTIMIZED) |
+| `EXECUTORCH_BUILD_EXTENSION_MODULE` | BOOL | OFF | Module extension (requires DATA_LOADER + FLAT_TENSOR + NAMED_DATA_MAP) |
+| `EXECUTORCH_BUILD_EXTENSION_TENSOR` | BOOL | OFF | Tensor extension |
+| `EXECUTORCH_BUILD_EXTENSION_LLM` | BOOL | OFF | LLM extension |
+| `EXECUTORCH_BUILD_EXTENSION_LLM_RUNNER` | BOOL | OFF | LLM runner extension (requires EXTENSION_LLM) |
+| `EXECUTORCH_BUILD_PYBIND` | BOOL | OFF | Python bindings (requires EXTENSION_MODULE) |
+| `EXECUTORCH_BUILD_TESTS` | BOOL | OFF | CMake-based unit tests |
+| `EXECUTORCH_BUILD_DEVTOOLS` | BOOL | OFF | Developer tools (Inspector, ETDump) |
+| `EXECUTORCH_ENABLE_EVENT_TRACER` | BOOL | OFF | Event tracing (requires DEVTOOLS) |
+| `EXECUTORCH_OPTIMIZE_SIZE` | BOOL | OFF | Optimize for binary size (`-Os`, no exceptions/RTTI) |
+| `EXECUTORCH_ENABLE_LOGGING` | BOOL | (Debug=ON) | Runtime logging |
+| `EXECUTORCH_LOG_LEVEL` | STRING | Info | Log level: Debug, Info, Error, Fatal |
+| `EXECUTORCH_USE_SANITIZER` | BOOL | OFF | ASAN + UBSAN (not supported on MSVC) |
+| `EXECUTORCH_PAL_DEFAULT` | STRING | posix | Platform abstraction: `posix`, `minimal`, `android` |
+
+**Dependency chains** — enabling some options requires others:
+- `XNNPACK` requires `CPUINFO` + `PTHREADPOOL`
+- `KERNELS_LLM` requires `KERNELS_OPTIMIZED`
+- `EXTENSION_MODULE` requires `EXTENSION_DATA_LOADER` + `EXTENSION_FLAT_TENSOR` + `EXTENSION_NAMED_DATA_MAP`
+- `BUILD_PYBIND` requires `EXTENSION_MODULE`
+- `EXTENSION_LLM_RUNNER` requires `EXTENSION_LLM`
+- `EVENT_TRACER` requires `DEVTOOLS`
+- `CUDA` and `METAL` require `EXTENSION_TENSOR`
+
+CMake will error with a clear message if a required option is missing.
+
+## 6. Common Build Patterns
+
+### Build core runtime only (minimal)
+```bash
+cmake -B cmake-out -DCMAKE_BUILD_TYPE=Release
+cmake --build cmake-out -j$(nproc)
+```
+
+### Build with XNNPACK backend
+```bash
+cmake -B cmake-out -DCMAKE_BUILD_TYPE=Release \
+  -DEXECUTORCH_BUILD_XNNPACK=ON
+cmake --build cmake-out -j$(nproc)
+```
+
+### Build with profiling
+```bash
+cmake -B cmake-out --preset profiling
+cmake --build cmake-out -j$(nproc)
+```
+
+### Build tests
+```bash
+cmake -B cmake-out -DEXECUTORCH_BUILD_TESTS=ON \
+  -DEXECUTORCH_BUILD_EXTENSION_FLAT_TENSOR=ON
+cmake --build cmake-out -j$(nproc)
+ctest --test-dir cmake-out --output-on-failure
+```
+
+### Using ExecuTorch as a CMake subdirectory
+```cmake
+add_subdirectory(executorch)
+# Set options before add_subdirectory:
+set(EXECUTORCH_BUILD_XNNPACK ON)
+set(EXECUTORCH_BUILD_EXTENSION_MODULE ON)
+```
+
+## 7. Troubleshooting
+
+### Submodule issues
+**Symptom:** Build fails with missing headers or `CMakeLists.txt not found` in third-party dirs.
+```bash
+git submodule sync --recursive
+git submodule update --init --recursive
+```
+
+### Stale build artifacts
+**Symptom:** Mysterious failures after pulling new changes or switching branches.
+```bash
+./install_executorch.sh --clean
+# Or manually:
+rm -rf cmake-out/ pip-out/ buck-out/
+git submodule sync && git submodule update --init --recursive
+```
+
+### CMake version conflicts
+**Symptom:** `cmake` errors about policy versions or unsupported features.
+- ExecuTorch requires CMake >= 3.24, < 4.0
+- Check: `cmake --version`
+- If conda and system cmake conflict, ensure conda env cmake is used: `which cmake` should point to conda env
+
+### Python version mismatch
+**Symptom:** `install_executorch.sh` fails early with compatibility errors.
+- Supported: Python 3.10–3.13
+- Check: `python --version`
+
+### Dependency version conflicts
+**Symptom:** pip fails with conflicting torch/torchvision/torchaudio versions.
+- Use a fresh conda environment
+- If pinning to a specific PyTorch version: `./install_executorch.sh --use-pt-pinned-commit`
+
+### Missing `python-dev` headers
+**Symptom:** Build fails looking for `Python.h`.
+```bash
+sudo apt install python$(python3 -c 'import sys; print(f"{sys.version_info.major}.{sys.version_info.minor}")')-dev
+```
+
+### Linking errors with `--whole-archive`
+**Symptom:** Missing operator registrations at runtime despite building kernels.
+- Kernel binding libraries (e.g., `libportable_kernels_bindings.a`) use load-time registration
+- Must link with: `-Wl,--whole-archive <lib> -Wl,--no-whole-archive` (Linux) or `-Wl,-force_load,<lib>` (macOS)
+
+### XNNPACK build fails
+**Symptom:** Errors about missing `cpuinfo` or `pthreadpool`.
+- `EXECUTORCH_BUILD_XNNPACK=ON` requires `EXECUTORCH_BUILD_CPUINFO=ON` and `EXECUTORCH_BUILD_PTHREADPOOL=ON` (both ON by default unless `ARM_BAREMETAL` is set)
+
+### Windows symlink errors
+**Symptom:** `version.py` not found or import errors on Windows.
+```bash
+git config --system core.symlinks true
+# Re-clone the repo after enabling
+```
+
+### MSVC kernel compilation failures
+**Symptom:** LLM/quantized kernels fail to compile on Windows with MSVC.
+- Use Clang-CL: `cmake -B cmake-out -T ClangCL`
+- Or build with CUDA (which uses nvcc, not MSVC for kernels)
+
+### Intel macOS
+**Symptom:** `install_executorch.sh` fails — no prebuilt PyTorch wheels for Intel Mac.
+- Must build PyTorch from source, or use `--use-pt-pinned-commit --minimal`
+
+### Build directory not at repo root
+**Symptom:** Include path errors when ExecuTorch checkout is not the top-level directory.
+- ExecuTorch adds `..` to include directories; the build directory must be directly under the repo root or use `add_subdirectory` correctly
+
+### Duplicate kernel registration
+**Symptom:** Abort at runtime with duplicate kernel registration.
+- Only link one `gen_operators_lib` per target
+- Check for multiple kernel binding libraries being linked
+
+## 8. Build Output
+
+| Artifact | Location | Description |
+|----------|----------|-------------|
+| `executor_runner` | `cmake-out/executor_runner` | Standalone model runner |
+| Core runtime | `cmake-out/libexecutorch.a` | Core ExecuTorch runtime |
+| Portable ops | `cmake-out/kernels/portable/libportable_ops_lib.a` | Portable operator implementations |
+| XNNPACK backend | `cmake-out/backends/xnnpack/libxnnpack_backend.a` | XNNPACK delegate |
+| LLM runner | `cmake-out/examples/models/<model>/<runner>` | Model-specific runners |
+| Python package | site-packages | `executorch` Python module |
+| iOS frameworks | `cmake-out/*.xcframework` | iOS/macOS frameworks |
+| Android AAR | `aar-out/` | Android Java bindings |
+
+## 9. Tips
+
+- Always use `Release` for performance measurement; `Debug` is 5–10x slower and significantly larger
+- Use `ccache` to speed up rebuilds — ExecuTorch auto-detects it
+- Use `Ninja` generator (`-G Ninja`) for faster parallel builds
+- Use `cmake --list-presets` to see all available presets
+- After `git pull`, always clean and re-init submodules before rebuilding
+- For LLM workflows, `make <model>-<backend>` is the simplest path
+- Set `EXECUTORCH_OPTIMIZE_SIZE=ON` for size-constrained deployments
+- Check `cmake-out/compile_commands.json` for IDE integration (auto-generated)

From f7d0f375b20f4ffefb6d3d8928bc7613a0414fd2 Mon Sep 17 00:00:00 2001
From: Github Executorch <github_executorch@arm.com>
Date: Tue, 10 Mar 2026 09:21:06 -0700
Subject: [PATCH 02/14] Refactor building skill from reference manual to
 action-oriented flow
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Reorganize the building skill to be concise and action-driven rather
than encyclopedic. Key changes:
- Add "do not ask questions" directive — detect and fix env issues automatically
- Make conda the mandatory path (no venvs, no system Python)
- Auto-fix for cmake >= 4.0 and PEP 668 errors
- Default action is `./install_executorch.sh --editable`
- Complete runner table from actual Makefile (adds voxtral, voxtral_realtime,
  sortformer, silero-vad, llava, debug variants)
- Fix macOS commands: $(sysctl -n hw.ncpu) instead of $(nproc)
- Note that --preset macos uses Xcode generator
- Condense troubleshooting into scannable table
- Trim cmake options to most-used flags (full list in CMakeLists.txt)

Co-authored-by: Claude <noreply@anthropic.com>
---
 .claude/skills/building/SKILL.md | 446 ++++++++++---------------------
 1 file changed, 137 insertions(+), 309 deletions(-)

diff --git a/.claude/skills/building/SKILL.md b/.claude/skills/building/SKILL.md
index ab63f1606e4..9f5f37db85a 100644
--- a/.claude/skills/building/SKILL.md
+++ b/.claude/skills/building/SKILL.md
@@ -3,95 +3,103 @@ name: building
 description: Build ExecuTorch from source — Python package, C++ runtime, runners, cross-compilation, and backend-specific builds. Use when compiling anything in the ExecuTorch repo, diagnosing build failures, or setting up platform-specific builds.
 ---
 
-# Building
+# Building ExecuTorch
 
-## Prerequisites
+IMPORTANT: Do NOT ask the user questions about environment setup. Detect issues and fix them automatically. The standard path is conda — always use it.
+
+## Step 1: Ensure conda environment (do this silently, fix issues automatically)
 
-Before building, ensure the environment is set up (see `/setup` skill):
 ```bash
+# Check if executorch conda env exists; create if not
+conda env list | grep executorch || conda create -yn executorch python=3.12 'cmake<4'
+# Activate
 conda activate executorch
+# Verify — fix automatically if wrong
+python --version          # need 3.10–3.13
+cmake --version           # need >= 3.24, < 4.0; if cmake >= 4, run: pip install 'cmake>=3.24,<4'
 ```
 
-Required toolchain:
-- **Python** 3.10–3.13
-- **CMake** >= 3.24, < 4.0
-- **C++17** compiler: `g++` >= 7, `clang++` >= 5, or MSVC 2022+ with Clang-CL
-- **Git submodules** must be initialized (handled by `install_executorch.sh`, or manually: `git submodule sync && git submodule update --init --recursive`)
-
-Optional but recommended:
-- **ccache** — automatically detected and used if installed (`sudo apt install ccache` / `brew install ccache`)
-- **Ninja** — faster than Make (`sudo apt install ninja-build` / `brew install ninja`); use with `-G Ninja`
+If cmake is >= 4.0 inside the conda env, fix it: `pip install 'cmake>=3.24,<4'` — do not ask the user.
 
-## 1. Building the Python Package
+Parallel jobs: `$(sysctl -n hw.ncpu)` on macOS, `$(nproc)` on Linux.
 
-This installs the ExecuTorch Python package (exir, runtime bindings, etc.) into the active environment.
+## Step 2: Build (route by what the user needs)
 
+### Python package (default — use this unless user asks for something specific)
 ```bash
-# First time (installs deps + builds + installs)
-./install_executorch.sh
-
-# Editable mode (Python changes reflected without rebuild)
-./install_executorch.sh --editable
-
-# Minimal (skip example dependencies)
-./install_executorch.sh --minimal
-
-# Subsequent installs (deps already present)
-pip install -e . --no-build-isolation
+conda activate executorch
+./install_executorch.sh --editable    # editable install from source
 ```
+This handles everything: submodules, deps, C++ build, Python install. Takes ~10 min on Apple Silicon.
 
-**Enable additional backends** during Python install:
-```bash
-CMAKE_ARGS="-DEXECUTORCH_BUILD_MPS=ON" ./install_executorch.sh
-CMAKE_ARGS="-DEXECUTORCH_BUILD_COREML=ON -DEXECUTORCH_BUILD_MPS=ON" ./install_executorch.sh
-```
+For subsequent rebuilds (deps already present): `pip install -e . --no-build-isolation`
+
+For minimal install (skip example deps): `./install_executorch.sh --minimal`
 
-**Verify Python install:**
+Enable additional backends:
 ```bash
-python -m executorch.examples.xnnpack.aot_compiler --model_name="mv2" --delegate
+CMAKE_ARGS="-DEXECUTORCH_BUILD_COREML=ON -DEXECUTORCH_BUILD_MPS=ON" ./install_executorch.sh --editable
 ```
 
-## 2. Building the C++ Runtime (Standalone)
+Verify: `python -c "from executorch.exir import to_edge_transform_and_lower; print('OK')"`
 
-### Using Presets (Recommended)
+### LLM / ASR model runner (simplest path for running models)
 
 ```bash
-cmake -B cmake-out --preset <preset> -DCMAKE_BUILD_TYPE=Release
-cmake --build cmake-out -j$(nproc)
-```
-
-| Preset | Platform | What it builds |
-|--------|----------|----------------|
-| `linux` | Linux x86_64 | Runtime + XNNPACK + LLM + executor_runner |
-| `macos` | macOS | Runtime + XNNPACK + CoreML + MPS + executor_runner |
-| `windows` | Windows | Runtime + XNNPACK + executor_runner |
-| `llm-release` | Host | LLM extension (CPU, Release) |
-| `llm-release-cuda` | Linux/Windows | LLM extension (CUDA, Release) |
-| `llm-release-metal` | macOS | LLM extension (Metal, Release) |
-| `llm-debug` | Host | LLM extension (CPU, Debug) |
-| `llm-debug-cuda` | Linux/Windows | LLM extension (CUDA, Debug) |
-| `llm-debug-metal` | macOS | LLM extension (Metal, Debug) |
-| `profiling` | Host | Runtime with profiling/event tracing |
-| `android-arm64-v8a` | Android | JNI bindings + runtime for arm64 |
-| `android-x86_64` | Android | JNI bindings + runtime for x86_64 |
-| `ios` | iOS | Frameworks for device |
-| `ios-simulator` | iOS Sim | Frameworks for simulator |
-| `arm-baremetal` | Embedded | Cortex-M / Ethos-U bare-metal |
-| `zephyr` | RTOS | Zephyr RTOS build |
-
-### Using CMake Workflow Presets
-
-Workflow presets combine configure + build + install in one command:
+conda activate executorch
+make <model>-<backend>
+```
+
+Available targets (run `make help` for full list):
+
+| Target | Backend | macOS | Linux |
+|--------|---------|-------|-------|
+| `llama-cpu` | CPU | yes | yes |
+| `llama-cuda` | CUDA | — | yes |
+| `llama-cuda-debug` | CUDA (debug) | — | yes |
+| `llava-cpu` | CPU | yes | yes |
+| `whisper-cpu` | CPU | yes | yes |
+| `whisper-metal` | Metal | yes | — |
+| `whisper-cuda` | CUDA | — | yes |
+| `parakeet-cpu` | CPU | yes | yes |
+| `parakeet-metal` | Metal | yes | — |
+| `parakeet-cuda` | CUDA | — | yes |
+| `voxtral-cpu` | CPU | yes | yes |
+| `voxtral-cuda` | CUDA | — | yes |
+| `voxtral-metal` | Metal | yes | — |
+| `voxtral_realtime-cpu` | CPU | yes | yes |
+| `voxtral_realtime-cuda` | CUDA | — | yes |
+| `voxtral_realtime-metal` | Metal | yes | — |
+| `gemma3-cpu` | CPU | yes | yes |
+| `gemma3-cuda` | CUDA | — | yes |
+| `sortformer-cpu` | CPU | yes | yes |
+| `sortformer-cuda` | CUDA | — | yes |
+| `silero-vad-cpu` | CPU | yes | yes |
+| `clean` | — | yes | yes |
+
+Output: `cmake-out/examples/models/<model>/<runner>`
+
+### C++ runtime (standalone)
+
+**With presets (recommended):**
+
+| Platform | Command |
+|----------|---------|
+| macOS | `cmake -B cmake-out --preset macos` (uses Xcode generator — requires Xcode) |
+| Linux | `cmake -B cmake-out --preset linux -DCMAKE_BUILD_TYPE=Release` |
+| Windows | `cmake -B cmake-out --preset windows -T ClangCL` |
+
+Then: `cmake --build cmake-out -j$(sysctl -n hw.ncpu)` (macOS) or `cmake --build cmake-out -j$(nproc)` (Linux)
+
+**LLM libraries via workflow presets** (configure + build + install in one command):
 ```bash
 cmake --workflow --preset llm-release        # CPU
-cmake --workflow --preset llm-release-cuda   # CUDA
-cmake --workflow --preset llm-release-metal  # Metal
+cmake --workflow --preset llm-release-metal  # Metal (macOS)
+cmake --workflow --preset llm-release-cuda   # CUDA (Linux)
 ```
 
-### Manual CMake (No Preset)
-
+**Manual CMake (custom flags):**
 ```bash
-mkdir -p cmake-out
 cmake -B cmake-out \
   -DCMAKE_BUILD_TYPE=Release \
   -DEXECUTORCH_BUILD_XNNPACK=ON \
@@ -99,255 +107,75 @@ cmake -B cmake-out \
   -DEXECUTORCH_BUILD_EXTENSION_MODULE=ON \
   -DEXECUTORCH_BUILD_EXTENSION_DATA_LOADER=ON \
   -DEXECUTORCH_BUILD_EXTENSION_TENSOR=ON
-cmake --build cmake-out -j$(nproc)
-```
-
-### Verify C++ Build
-
-```bash
-# Enable executor_runner if not already
-cmake -B cmake-out --preset linux -DEXECUTORCH_BUILD_EXECUTOR_RUNNER=ON
-cmake --build cmake-out -j$(nproc)
-cmake-out/executor_runner --model_path=mv2_xnnpack_fp32.pte
+cmake --build cmake-out -j$(sysctl -n hw.ncpu)
 ```
 
-## 3. Building Runners (Makefile)
+Run `cmake --list-presets` to see all available presets.
 
-Model-specific runners use the top-level `Makefile`:
-```bash
-make help              # list all targets
-make llama-cpu         # Llama on CPU
-make llama-cuda        # Llama on CUDA
-make llama-cuda-debug  # Llama on CUDA (debug)
-make llava-cpu         # Llava on CPU
-make gemma3-cpu        # Gemma3 on CPU
-make gemma3-cuda       # Gemma3 on CUDA
-make whisper-cpu       # Whisper on CPU
-make whisper-metal     # Whisper on Metal
-make parakeet-cpu      # Parakeet on CPU
-make parakeet-metal    # Parakeet on Metal
-make clean             # remove cmake-out/
-```
-
-Output binaries: `cmake-out/examples/models/<model>/<runner>`
-
-Each `make` target internally runs `cmake --workflow --preset` for the core libraries, then builds the runner on top.
-
-## 4. Cross-Compilation
-
-### Android
+### Cross-compilation
 
+**iOS/macOS frameworks:**
 ```bash
-# AAR (Java bindings)
-export ANDROID_ABIS=arm64-v8a
-export BUILD_AAR_DIR=aar-out
-mkdir -p $BUILD_AAR_DIR
-sh scripts/build_android_library.sh
-
-# Native C++ (direct cross-compile)
-cmake -B cmake-out \
-  -DCMAKE_TOOLCHAIN_FILE=$ANDROID_NDK/build/cmake/android.toolchain.cmake \
-  -DANDROID_ABI=arm64-v8a \
-  --preset android-arm64-v8a
-cmake --build cmake-out -j$(nproc)
-```
-
-### iOS / macOS Frameworks
-
-```bash
-# Build all frameworks
-./scripts/build_apple_frameworks.sh
-
-# With specific backends
 ./scripts/build_apple_frameworks.sh --coreml --mps --xnnpack
 ```
-
-Link frameworks in Xcode with `-all_load` linker flag.
-
-### Windows
-
-Requires Visual Studio 2022+ with Clang-CL:
-```bash
-cmake -B cmake-out --preset windows -T ClangCL
-cmake --build cmake-out --config Release
-```
-
-**Windows-specific notes:**
-- Enable symlinks before cloning: `git config --system core.symlinks true`
-- Missing symlinks cause `version.py` errors during `pip install`
-- LLM custom kernels and quantized kernels do not compile with MSVC; use `-T ClangCL` or build with CUDA
-
-## 5. Key Build Options
-
-| Option | Type | Default | Description |
-|--------|------|---------|-------------|
-| `CMAKE_BUILD_TYPE` | STRING | Debug | `Debug` or `Release`. Release disables logging/verification, adds optimizations |
-| `EXECUTORCH_BUILD_XNNPACK` | BOOL | OFF | XNNPACK CPU backend (requires CPUINFO + PTHREADPOOL) |
-| `EXECUTORCH_BUILD_COREML` | BOOL | OFF | Core ML backend (macOS/iOS only) |
-| `EXECUTORCH_BUILD_MPS` | BOOL | OFF | MPS GPU backend (macOS/iOS only) |
-| `EXECUTORCH_BUILD_CUDA` | BOOL | OFF | CUDA GPU backend (requires EXTENSION_TENSOR) |
-| `EXECUTORCH_BUILD_METAL` | BOOL | OFF | Metal backend (requires EXTENSION_TENSOR) |
-| `EXECUTORCH_BUILD_VULKAN` | BOOL | OFF | Vulkan GPU backend (Android) |
-| `EXECUTORCH_BUILD_QNN` | BOOL | OFF | Qualcomm QNN backend |
-| `EXECUTORCH_BUILD_KERNELS_OPTIMIZED` | BOOL | OFF | Optimized kernel implementations |
-| `EXECUTORCH_BUILD_KERNELS_QUANTIZED` | BOOL | OFF | Quantized kernel implementations |
-| `EXECUTORCH_BUILD_KERNELS_LLM` | BOOL | OFF | LLM custom kernels (requires KERNELS_OPTIMIZED) |
-| `EXECUTORCH_BUILD_EXTENSION_MODULE` | BOOL | OFF | Module extension (requires DATA_LOADER + FLAT_TENSOR + NAMED_DATA_MAP) |
-| `EXECUTORCH_BUILD_EXTENSION_TENSOR` | BOOL | OFF | Tensor extension |
-| `EXECUTORCH_BUILD_EXTENSION_LLM` | BOOL | OFF | LLM extension |
-| `EXECUTORCH_BUILD_EXTENSION_LLM_RUNNER` | BOOL | OFF | LLM runner extension (requires EXTENSION_LLM) |
-| `EXECUTORCH_BUILD_PYBIND` | BOOL | OFF | Python bindings (requires EXTENSION_MODULE) |
-| `EXECUTORCH_BUILD_TESTS` | BOOL | OFF | CMake-based unit tests |
-| `EXECUTORCH_BUILD_DEVTOOLS` | BOOL | OFF | Developer tools (Inspector, ETDump) |
-| `EXECUTORCH_ENABLE_EVENT_TRACER` | BOOL | OFF | Event tracing (requires DEVTOOLS) |
-| `EXECUTORCH_OPTIMIZE_SIZE` | BOOL | OFF | Optimize for binary size (`-Os`, no exceptions/RTTI) |
-| `EXECUTORCH_ENABLE_LOGGING` | BOOL | (Debug=ON) | Runtime logging |
-| `EXECUTORCH_LOG_LEVEL` | STRING | Info | Log level: Debug, Info, Error, Fatal |
-| `EXECUTORCH_USE_SANITIZER` | BOOL | OFF | ASAN + UBSAN (not supported on MSVC) |
-| `EXECUTORCH_PAL_DEFAULT` | STRING | posix | Platform abstraction: `posix`, `minimal`, `android` |
-
-**Dependency chains** — enabling some options requires others:
-- `XNNPACK` requires `CPUINFO` + `PTHREADPOOL`
-- `KERNELS_LLM` requires `KERNELS_OPTIMIZED`
-- `EXTENSION_MODULE` requires `EXTENSION_DATA_LOADER` + `EXTENSION_FLAT_TENSOR` + `EXTENSION_NAMED_DATA_MAP`
-- `BUILD_PYBIND` requires `EXTENSION_MODULE`
-- `EXTENSION_LLM_RUNNER` requires `EXTENSION_LLM`
-- `EVENT_TRACER` requires `DEVTOOLS`
-- `CUDA` and `METAL` require `EXTENSION_TENSOR`
-
-CMake will error with a clear message if a required option is missing.
-
-## 6. Common Build Patterns
-
-### Build core runtime only (minimal)
-```bash
-cmake -B cmake-out -DCMAKE_BUILD_TYPE=Release
-cmake --build cmake-out -j$(nproc)
-```
-
-### Build with XNNPACK backend
-```bash
-cmake -B cmake-out -DCMAKE_BUILD_TYPE=Release \
-  -DEXECUTORCH_BUILD_XNNPACK=ON
-cmake --build cmake-out -j$(nproc)
-```
-
-### Build with profiling
-```bash
-cmake -B cmake-out --preset profiling
-cmake --build cmake-out -j$(nproc)
-```
-
-### Build tests
-```bash
-cmake -B cmake-out -DEXECUTORCH_BUILD_TESTS=ON \
-  -DEXECUTORCH_BUILD_EXTENSION_FLAT_TENSOR=ON
-cmake --build cmake-out -j$(nproc)
-ctest --test-dir cmake-out --output-on-failure
-```
-
-### Using ExecuTorch as a CMake subdirectory
-```cmake
-add_subdirectory(executorch)
-# Set options before add_subdirectory:
-set(EXECUTORCH_BUILD_XNNPACK ON)
-set(EXECUTORCH_BUILD_EXTENSION_MODULE ON)
-```
-
-## 7. Troubleshooting
-
-### Submodule issues
-**Symptom:** Build fails with missing headers or `CMakeLists.txt not found` in third-party dirs.
-```bash
-git submodule sync --recursive
-git submodule update --init --recursive
-```
-
-### Stale build artifacts
-**Symptom:** Mysterious failures after pulling new changes or switching branches.
-```bash
-./install_executorch.sh --clean
-# Or manually:
-rm -rf cmake-out/ pip-out/ buck-out/
-git submodule sync && git submodule update --init --recursive
-```
-
-### CMake version conflicts
-**Symptom:** `cmake` errors about policy versions or unsupported features.
-- ExecuTorch requires CMake >= 3.24, < 4.0
-- Check: `cmake --version`
-- If conda and system cmake conflict, ensure conda env cmake is used: `which cmake` should point to conda env
-
-### Python version mismatch
-**Symptom:** `install_executorch.sh` fails early with compatibility errors.
-- Supported: Python 3.10–3.13
-- Check: `python --version`
-
-### Dependency version conflicts
-**Symptom:** pip fails with conflicting torch/torchvision/torchaudio versions.
-- Use a fresh conda environment
-- If pinning to a specific PyTorch version: `./install_executorch.sh --use-pt-pinned-commit`
-
-### Missing `python-dev` headers
-**Symptom:** Build fails looking for `Python.h`.
-```bash
-sudo apt install python$(python3 -c 'import sys; print(f"{sys.version_info.major}.{sys.version_info.minor}")')-dev
-```
-
-### Linking errors with `--whole-archive`
-**Symptom:** Missing operator registrations at runtime despite building kernels.
-- Kernel binding libraries (e.g., `libportable_kernels_bindings.a`) use load-time registration
-- Must link with: `-Wl,--whole-archive <lib> -Wl,--no-whole-archive` (Linux) or `-Wl,-force_load,<lib>` (macOS)
-
-### XNNPACK build fails
-**Symptom:** Errors about missing `cpuinfo` or `pthreadpool`.
-- `EXECUTORCH_BUILD_XNNPACK=ON` requires `EXECUTORCH_BUILD_CPUINFO=ON` and `EXECUTORCH_BUILD_PTHREADPOOL=ON` (both ON by default unless `ARM_BAREMETAL` is set)
-
-### Windows symlink errors
-**Symptom:** `version.py` not found or import errors on Windows.
-```bash
-git config --system core.symlinks true
-# Re-clone the repo after enabling
-```
-
-### MSVC kernel compilation failures
-**Symptom:** LLM/quantized kernels fail to compile on Windows with MSVC.
-- Use Clang-CL: `cmake -B cmake-out -T ClangCL`
-- Or build with CUDA (which uses nvcc, not MSVC for kernels)
-
-### Intel macOS
-**Symptom:** `install_executorch.sh` fails — no prebuilt PyTorch wheels for Intel Mac.
-- Must build PyTorch from source, or use `--use-pt-pinned-commit --minimal`
-
-### Build directory not at repo root
-**Symptom:** Include path errors when ExecuTorch checkout is not the top-level directory.
-- ExecuTorch adds `..` to include directories; the build directory must be directly under the repo root or use `add_subdirectory` correctly
-
-### Duplicate kernel registration
-**Symptom:** Abort at runtime with duplicate kernel registration.
-- Only link one `gen_operators_lib` per target
-- Check for multiple kernel binding libraries being linked
-
-## 8. Build Output
-
-| Artifact | Location | Description |
-|----------|----------|-------------|
-| `executor_runner` | `cmake-out/executor_runner` | Standalone model runner |
-| Core runtime | `cmake-out/libexecutorch.a` | Core ExecuTorch runtime |
-| Portable ops | `cmake-out/kernels/portable/libportable_ops_lib.a` | Portable operator implementations |
-| XNNPACK backend | `cmake-out/backends/xnnpack/libxnnpack_backend.a` | XNNPACK delegate |
-| LLM runner | `cmake-out/examples/models/<model>/<runner>` | Model-specific runners |
-| Python package | site-packages | `executorch` Python module |
-| iOS frameworks | `cmake-out/*.xcframework` | iOS/macOS frameworks |
-| Android AAR | `aar-out/` | Android Java bindings |
-
-## 9. Tips
-
-- Always use `Release` for performance measurement; `Debug` is 5–10x slower and significantly larger
-- Use `ccache` to speed up rebuilds — ExecuTorch auto-detects it
-- Use `Ninja` generator (`-G Ninja`) for faster parallel builds
-- Use `cmake --list-presets` to see all available presets
-- After `git pull`, always clean and re-init submodules before rebuilding
+Link in Xcode with `-all_load` linker flag.
+
+**Android:**
+```bash
+export ANDROID_ABIS=arm64-v8a BUILD_AAR_DIR=aar-out
+mkdir -p $BUILD_AAR_DIR && sh scripts/build_android_library.sh
+```
+
+## Key build options
+
+Most commonly needed flags (full list: `CMakeLists.txt`):
+
+| Flag | What it enables |
+|------|-----------------|
+| `EXECUTORCH_BUILD_XNNPACK` | XNNPACK CPU backend |
+| `EXECUTORCH_BUILD_COREML` | Core ML (macOS/iOS) |
+| `EXECUTORCH_BUILD_MPS` | MPS GPU (macOS/iOS) |
+| `EXECUTORCH_BUILD_METAL` | Metal compute (macOS, requires EXTENSION_TENSOR) |
+| `EXECUTORCH_BUILD_CUDA` | CUDA GPU (Linux, requires EXTENSION_TENSOR) |
+| `EXECUTORCH_BUILD_KERNELS_OPTIMIZED` | Optimized kernels |
+| `EXECUTORCH_BUILD_KERNELS_QUANTIZED` | Quantized kernels |
+| `EXECUTORCH_BUILD_EXTENSION_MODULE` | Module extension (requires DATA_LOADER + FLAT_TENSOR + NAMED_DATA_MAP) |
+| `EXECUTORCH_BUILD_EXTENSION_LLM` | LLM extension |
+| `EXECUTORCH_BUILD_TESTS` | Unit tests (`ctest --test-dir cmake-out --output-on-failure`) |
+| `EXECUTORCH_BUILD_DEVTOOLS` | DevTools (Inspector, ETDump) |
+| `EXECUTORCH_OPTIMIZE_SIZE` | Size-optimized build (`-Os`, no exceptions/RTTI) |
+| `CMAKE_BUILD_TYPE` | `Release` (default for presets) or `Debug` (5-10x slower) |
+
+## Troubleshooting
+
+| Symptom | Fix |
+|---------|-----|
+| Missing headers / `CMakeLists.txt not found` in third-party | `git submodule sync --recursive && git submodule update --init --recursive` |
+| Mysterious failures after `git pull` or branch switch | `rm -rf cmake-out/ pip-out/ && git submodule sync && git submodule update --init --recursive` |
+| CMake >= 4.0 (too new) | `pip install 'cmake>=3.24,<4'` inside the conda env |
+| `externally-managed-environment` / PEP 668 error | You're using system Python, not conda. Activate conda env first. |
+| pip conflicts with torch versions | Fresh conda env; or `./install_executorch.sh --use-pt-pinned-commit` |
+| Missing `Python.h` (Linux) | `sudo apt install python3.X-dev` |
+| Missing operator registrations at runtime | Link kernel libs with `-Wl,-force_load,<lib>` (macOS) or `-Wl,--whole-archive <lib> -Wl,--no-whole-archive` (Linux) |
+| `install_executorch.sh` fails on Intel Mac | No prebuilt PyTorch wheels; use `--use-pt-pinned-commit --minimal` |
+| XNNPACK build errors about cpuinfo/pthreadpool | Ensure `EXECUTORCH_BUILD_CPUINFO=ON` and `EXECUTORCH_BUILD_PTHREADPOOL=ON` (both ON by default) |
+| Duplicate kernel registration abort | Only link one `gen_operators_lib` per target |
+
+## Build output
+
+| Artifact | Location |
+|----------|----------|
+| Core runtime | `cmake-out/libexecutorch.a` |
+| executor_runner | `cmake-out/executor_runner` |
+| Model runners | `cmake-out/examples/models/<model>/<runner>` |
+| XNNPACK backend | `cmake-out/backends/xnnpack/libxnnpack_backend.a` |
+| Python package | `site-packages/executorch` |
+| iOS frameworks | `cmake-out/*.xcframework` |
+| Android AAR | `aar-out/` |
+
+## Tips
+- Always use `Release` for benchmarking; `Debug` is 5–10x slower
+- `ccache` is auto-detected if installed (`brew install ccache`)
+- `Ninja` is faster than Make (`-G Ninja`) — but `--preset macos` uses Xcode generator
 - For LLM workflows, `make <model>-<backend>` is the simplest path
-- Set `EXECUTORCH_OPTIMIZE_SIZE=ON` for size-constrained deployments
-- Check `cmake-out/compile_commands.json` for IDE integration (auto-generated)
+- After `git pull`, clean and re-init submodules before rebuilding

From c6ba3b0676d6f3f0e0ee6d143279aa82e06c7865 Mon Sep 17 00:00:00 2001
From: Siddartha Pothapragada <sidart@meta.com>
Date: Tue, 10 Mar 2026 09:32:28 -0700
Subject: [PATCH 03/14] Update .claude/skills/building/SKILL.md

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
---
 .claude/skills/building/SKILL.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/.claude/skills/building/SKILL.md b/.claude/skills/building/SKILL.md
index 9f5f37db85a..b7446d7efe0 100644
--- a/.claude/skills/building/SKILL.md
+++ b/.claude/skills/building/SKILL.md
@@ -107,7 +107,7 @@ cmake -B cmake-out \
   -DEXECUTORCH_BUILD_EXTENSION_MODULE=ON \
   -DEXECUTORCH_BUILD_EXTENSION_DATA_LOADER=ON \
   -DEXECUTORCH_BUILD_EXTENSION_TENSOR=ON
-cmake --build cmake-out -j$(sysctl -n hw.ncpu)
+cmake --build cmake-out --parallel "$(nproc 2>/dev/null || sysctl -n hw.ncpu)"
 ```
 
 Run `cmake --list-presets` to see all available presets.

From f4390deca0286f0e5f744405cf53e1add9b683ad Mon Sep 17 00:00:00 2001
From: Siddartha Pothapragada <sidart@meta.com>
Date: Tue, 10 Mar 2026 09:32:44 -0700
Subject: [PATCH 04/14] Update .claude/skills/building/SKILL.md

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
---
 .claude/skills/building/SKILL.md | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/.claude/skills/building/SKILL.md b/.claude/skills/building/SKILL.md
index b7446d7efe0..350b6cdde38 100644
--- a/.claude/skills/building/SKILL.md
+++ b/.claude/skills/building/SKILL.md
@@ -163,12 +163,14 @@ Most commonly needed flags (full list: `CMakeLists.txt`):
 
 ## Build output
 
+Installed artifact locations under `CMAKE_INSTALL_PREFIX=cmake-out`:
+
 | Artifact | Location |
 |----------|----------|
-| Core runtime | `cmake-out/libexecutorch.a` |
+| Core runtime | `cmake-out/lib/libexecutorch.a` |
 | executor_runner | `cmake-out/executor_runner` |
 | Model runners | `cmake-out/examples/models/<model>/<runner>` |
-| XNNPACK backend | `cmake-out/backends/xnnpack/libxnnpack_backend.a` |
+| XNNPACK backend | `cmake-out/lib/libxnnpack_backend.a` |
 | Python package | `site-packages/executorch` |
 | iOS frameworks | `cmake-out/*.xcframework` |
 | Android AAR | `aar-out/` |

From cb41d2302fd0b6b173fa75e10f31a0b2d4135e65 Mon Sep 17 00:00:00 2001
From: Siddartha Pothapragada <sidart@meta.com>
Date: Tue, 10 Mar 2026 09:33:18 -0700
Subject: [PATCH 05/14] Update .claude/skills/building/SKILL.md

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
---
 .claude/skills/building/SKILL.md | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/.claude/skills/building/SKILL.md b/.claude/skills/building/SKILL.md
index 350b6cdde38..5b4fa544938 100644
--- a/.claude/skills/building/SKILL.md
+++ b/.claude/skills/building/SKILL.md
@@ -105,6 +105,8 @@ cmake -B cmake-out \
   -DEXECUTORCH_BUILD_XNNPACK=ON \
   -DEXECUTORCH_BUILD_KERNELS_OPTIMIZED=ON \
   -DEXECUTORCH_BUILD_EXTENSION_MODULE=ON \
+  -DEXECUTORCH_BUILD_EXTENSION_FLAT_TENSOR=ON \
+  -DEXECUTORCH_BUILD_EXTENSION_NAMED_DATA_MAP=ON \
   -DEXECUTORCH_BUILD_EXTENSION_DATA_LOADER=ON \
   -DEXECUTORCH_BUILD_EXTENSION_TENSOR=ON
 cmake --build cmake-out --parallel "$(nproc 2>/dev/null || sysctl -n hw.ncpu)"

From e0337646ca7cd9dda17a6939b1fab03a6b348c27 Mon Sep 17 00:00:00 2001
From: Siddartha Pothapragada <sidart@meta.com>
Date: Tue, 10 Mar 2026 09:42:39 -0700
Subject: [PATCH 06/14] Update .claude/skills/building/SKILL.md

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
---
 .claude/skills/building/SKILL.md | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/.claude/skills/building/SKILL.md b/.claude/skills/building/SKILL.md
index 5b4fa544938..10d878b4c86 100644
--- a/.claude/skills/building/SKILL.md
+++ b/.claude/skills/building/SKILL.md
@@ -165,12 +165,12 @@ Most commonly needed flags (full list: `CMakeLists.txt`):
 
 ## Build output
 
-Installed artifact locations under `CMAKE_INSTALL_PREFIX=cmake-out`:
+Installed artifact locations after `cmake --install` (or `./install_executorch.sh`) with `CMAKE_INSTALL_PREFIX=cmake-out`:
 
 | Artifact | Location |
 |----------|----------|
 | Core runtime | `cmake-out/lib/libexecutorch.a` |
-| executor_runner | `cmake-out/executor_runner` |
+| executor_runner (built only; not installed by default) | **build tree**: `<build-dir>/executor_runner` (Ninja/Make) or `<build-dir>/<config>/executor_runner` (e.g., `cmake-out/Release/executor_runner` with Xcode/Visual Studio) |
 | Model runners | `cmake-out/examples/models/<model>/<runner>` |
 | XNNPACK backend | `cmake-out/lib/libxnnpack_backend.a` |
 | Python package | `site-packages/executorch` |

From d6c134b7f3fe14084da4d5417f06aab5743d3e06 Mon Sep 17 00:00:00 2001
From: Siddartha Pothapragada <sidart@meta.com>
Date: Tue, 10 Mar 2026 09:42:50 -0700
Subject: [PATCH 07/14] Update .claude/skills/building/SKILL.md

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
---
 .claude/skills/building/SKILL.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/.claude/skills/building/SKILL.md b/.claude/skills/building/SKILL.md
index 10d878b4c86..32c28b45f24 100644
--- a/.claude/skills/building/SKILL.md
+++ b/.claude/skills/building/SKILL.md
@@ -95,7 +95,7 @@ Then: `cmake --build cmake-out -j$(sysctl -n hw.ncpu)` (macOS) or `cmake --build
 ```bash
 cmake --workflow --preset llm-release        # CPU
 cmake --workflow --preset llm-release-metal  # Metal (macOS)
-cmake --workflow --preset llm-release-cuda   # CUDA (Linux)
+cmake --workflow --preset llm-release-cuda   # CUDA (Linux/Windows)
 ```
 
 **Manual CMake (custom flags):**

From ffc0722008be325839e9b75f0d58d7b2d8747b11 Mon Sep 17 00:00:00 2001
From: Siddartha Pothapragada <sidart@meta.com>
Date: Tue, 10 Mar 2026 09:43:02 -0700
Subject: [PATCH 08/14] Update .claude/skills/building/SKILL.md

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
---
 .claude/skills/building/SKILL.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/.claude/skills/building/SKILL.md b/.claude/skills/building/SKILL.md
index 32c28b45f24..1ec474ab3cc 100644
--- a/.claude/skills/building/SKILL.md
+++ b/.claude/skills/building/SKILL.md
@@ -138,7 +138,7 @@ Most commonly needed flags (full list: `CMakeLists.txt`):
 | `EXECUTORCH_BUILD_COREML` | Core ML (macOS/iOS) |
 | `EXECUTORCH_BUILD_MPS` | MPS GPU (macOS/iOS) |
 | `EXECUTORCH_BUILD_METAL` | Metal compute (macOS, requires EXTENSION_TENSOR) |
-| `EXECUTORCH_BUILD_CUDA` | CUDA GPU (Linux, requires EXTENSION_TENSOR) |
+| `EXECUTORCH_BUILD_CUDA` | CUDA GPU (Linux/Windows, requires EXTENSION_TENSOR) |
 | `EXECUTORCH_BUILD_KERNELS_OPTIMIZED` | Optimized kernels |
 | `EXECUTORCH_BUILD_KERNELS_QUANTIZED` | Quantized kernels |
 | `EXECUTORCH_BUILD_EXTENSION_MODULE` | Module extension (requires DATA_LOADER + FLAT_TENSOR + NAMED_DATA_MAP) |

From 179f84e47ff93fb48aa8b631429c5b56c4c17e68 Mon Sep 17 00:00:00 2001
From: Github Executorch <github_executorch@arm.com>
Date: Tue, 10 Mar 2026 13:21:28 -0700
Subject: [PATCH 09/14] Harden building skill from e2e testing

- Add venv fallback when conda is not installed
- Handle conda PermissionError by checking env directory on disk
- Auto-fix cmake: missing or < 3.24 gets pip-installed, >= 4.0 works fine
- Add troubleshooting entries for conda not found and PEP 668 errors
- Remove heavy-handed directive banner; let skill structure guide behavior

Co-authored-by: Claude <noreply@anthropic.com>
---
 .claude/skills/building/SKILL.md | 21 ++++++++++++---------
 1 file changed, 12 insertions(+), 9 deletions(-)

diff --git a/.claude/skills/building/SKILL.md b/.claude/skills/building/SKILL.md
index 1ec474ab3cc..55fe237c10d 100644
--- a/.claude/skills/building/SKILL.md
+++ b/.claude/skills/building/SKILL.md
@@ -5,22 +5,24 @@ description: Build ExecuTorch from source — Python package, C++ runtime, runne
 
 # Building ExecuTorch
 
-IMPORTANT: Do NOT ask the user questions about environment setup. Detect issues and fix them automatically. The standard path is conda — always use it.
-
-## Step 1: Ensure conda environment (do this silently, fix issues automatically)
+## Step 1: Ensure Python environment (detect and fix automatically)
 
 ```bash
 # Check if executorch conda env exists; create if not
-conda env list | grep executorch || conda create -yn executorch python=3.12 'cmake<4'
+# Note: `conda env list` may fail with PermissionError on some setups.
+# Fallback: check if the env directory exists on disk.
+conda env list 2>/dev/null | grep executorch || \
+  ls "$CONDA_PREFIX/../envs/" 2>/dev/null | grep executorch || \
+  conda create -yn executorch python=3.12
+
 # Activate
 conda activate executorch
-# Verify — fix automatically if wrong
+
+# Verify
 python --version          # need 3.10–3.13
-cmake --version           # need >= 3.24, < 4.0; if cmake >= 4, run: pip install 'cmake>=3.24,<4'
+cmake --version           # need >= 3.24; cmake 4.x works in practice
 ```
 
-If cmake is >= 4.0 inside the conda env, fix it: `pip install 'cmake>=3.24,<4'` — do not ask the user.
-
 Parallel jobs: `$(sysctl -n hw.ncpu)` on macOS, `$(nproc)` on Linux.
 
 ## Step 2: Build (route by what the user needs)
@@ -154,7 +156,8 @@ Most commonly needed flags (full list: `CMakeLists.txt`):
 |---------|-----|
 | Missing headers / `CMakeLists.txt not found` in third-party | `git submodule sync --recursive && git submodule update --init --recursive` |
 | Mysterious failures after `git pull` or branch switch | `rm -rf cmake-out/ pip-out/ && git submodule sync && git submodule update --init --recursive` |
-| CMake >= 4.0 (too new) | `pip install 'cmake>=3.24,<4'` inside the conda env |
+| `conda env list` PermissionError | Use `CONDA_NO_PLUGINS=true conda env list` or check env dir directly |
+| CMake >= 4.0 | Works in practice despite `< 4.0` in docs; only fix if build actually fails |
 | `externally-managed-environment` / PEP 668 error | You're using system Python, not conda. Activate conda env first. |
 | pip conflicts with torch versions | Fresh conda env; or `./install_executorch.sh --use-pt-pinned-commit` |
 | Missing `Python.h` (Linux) | `sudo apt install python3.X-dev` |

From d75acb2a06faced75b326d584d98351e8f4bbcca Mon Sep 17 00:00:00 2001
From: Github Executorch <github_executorch@arm.com>
Date: Tue, 10 Mar 2026 13:24:53 -0700
Subject: [PATCH 10/14] Add routing table to building skill for
 Android/iOS/model targets

Explicit decision tree at the top of Step 2 so Claude routes to the
right section based on keywords (Android, iOS, model names, cmake)
instead of always defaulting to the Python package build.

Co-authored-by: Claude <noreply@anthropic.com>
---
 .claude/skills/building/SKILL.md | 11 +++++++++--
 1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/.claude/skills/building/SKILL.md b/.claude/skills/building/SKILL.md
index 55fe237c10d..2c6d3ada155 100644
--- a/.claude/skills/building/SKILL.md
+++ b/.claude/skills/building/SKILL.md
@@ -25,9 +25,16 @@ cmake --version           # need >= 3.24; cmake 4.x works in practice
 
 Parallel jobs: `$(sysctl -n hw.ncpu)` on macOS, `$(nproc)` on Linux.
 
-## Step 2: Build (route by what the user needs)
+## Step 2: Build
 
-### Python package (default — use this unless user asks for something specific)
+Route based on what the user asks for:
+- User mentions **Android** → skip to [Cross-compilation: Android](#cross-compilation)
+- User mentions **iOS** or **frameworks** → skip to [Cross-compilation: iOS](#cross-compilation)
+- User mentions a **model name** (llama, whisper, etc.) → skip to [LLM / ASR model runner](#llm--asr-model-runner-simplest-path-for-running-models)
+- User mentions **C++ runtime** or **cmake** → skip to [C++ runtime](#c-runtime-standalone)
+- Otherwise → default to **Python package** below
+
+### Python package (default)
 ```bash
 conda activate executorch
 ./install_executorch.sh --editable    # editable install from source

From 62d417dec4b5c8112c7954474c3c4843c2ea3df2 Mon Sep 17 00:00:00 2001
From: Siddartha Pothapragada <sidart@meta.com>
Date: Tue, 10 Mar 2026 13:28:31 -0700
Subject: [PATCH 11/14] Update .claude/skills/building/SKILL.md

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
---
 .claude/skills/building/SKILL.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/.claude/skills/building/SKILL.md b/.claude/skills/building/SKILL.md
index 2c6d3ada155..7d4b5f75208 100644
--- a/.claude/skills/building/SKILL.md
+++ b/.claude/skills/building/SKILL.md
@@ -98,7 +98,7 @@ Output: `cmake-out/examples/models/<model>/<runner>`
 | Linux | `cmake -B cmake-out --preset linux -DCMAKE_BUILD_TYPE=Release` |
 | Windows | `cmake -B cmake-out --preset windows -T ClangCL` |
 
-Then: `cmake --build cmake-out -j$(sysctl -n hw.ncpu)` (macOS) or `cmake --build cmake-out -j$(nproc)` (Linux)
+Then: `cmake --build cmake-out --config Release -j$(sysctl -n hw.ncpu)` (macOS) or `cmake --build cmake-out -j$(nproc)` (Linux)
 
 **LLM libraries via workflow presets** (configure + build + install in one command):
 ```bash

From 74d0c7ec50d1bface42d381e1ab9639ac7f078a1 Mon Sep 17 00:00:00 2001
From: Github Executorch <github_executorch@arm.com>
Date: Tue, 10 Mar 2026 13:34:22 -0700
Subject: [PATCH 12/14] Address PR review comments on building skill

- Add ANDROID_NDK requirement and verification to Android section
- Fix CMAKE_BUILD_TYPE description: not all presets set it
- Separate build output table by flow (pip vs cmake vs cross-compilation)

Co-authored-by: Claude <noreply@anthropic.com>
---
 .claude/skills/building/SKILL.md | 24 +++++++++++++++++++-----
 1 file changed, 19 insertions(+), 5 deletions(-)

diff --git a/.claude/skills/building/SKILL.md b/.claude/skills/building/SKILL.md
index 7d4b5f75208..5ad9b132510 100644
--- a/.claude/skills/building/SKILL.md
+++ b/.claude/skills/building/SKILL.md
@@ -132,7 +132,11 @@ Run `cmake --list-presets` to see all available presets.
 Link in Xcode with `-all_load` linker flag.
 
 **Android:**
+
+Requires `ANDROID_NDK` on PATH (typically set by Android Studio or standalone NDK install).
 ```bash
+# Verify NDK is available
+echo $ANDROID_NDK           # must point to NDK root, e.g. ~/Library/Android/sdk/ndk/<version>
 export ANDROID_ABIS=arm64-v8a BUILD_AAR_DIR=aar-out
 mkdir -p $BUILD_AAR_DIR && sh scripts/build_android_library.sh
 ```
@@ -155,7 +159,7 @@ Most commonly needed flags (full list: `CMakeLists.txt`):
 | `EXECUTORCH_BUILD_TESTS` | Unit tests (`ctest --test-dir cmake-out --output-on-failure`) |
 | `EXECUTORCH_BUILD_DEVTOOLS` | DevTools (Inspector, ETDump) |
 | `EXECUTORCH_OPTIMIZE_SIZE` | Size-optimized build (`-Os`, no exceptions/RTTI) |
-| `CMAKE_BUILD_TYPE` | `Release` (default for presets) or `Debug` (5-10x slower) |
+| `CMAKE_BUILD_TYPE` | `Release` or `Debug` (5-10x slower). Some presets (e.g. `llm-release`) set this; others require it explicitly. |
 
 ## Troubleshooting
 
@@ -175,15 +179,25 @@ Most commonly needed flags (full list: `CMakeLists.txt`):
 
 ## Build output
 
-Installed artifact locations after `cmake --install` (or `./install_executorch.sh`) with `CMAKE_INSTALL_PREFIX=cmake-out`:
+**From `./install_executorch.sh` (Python package):**
+
+| Artifact | Location |
+|----------|----------|
+| Python package | `site-packages/executorch` |
+
+**From CMake builds** (`cmake --install` with `CMAKE_INSTALL_PREFIX=cmake-out`):
 
 | Artifact | Location |
 |----------|----------|
 | Core runtime | `cmake-out/lib/libexecutorch.a` |
-| executor_runner (built only; not installed by default) | **build tree**: `<build-dir>/executor_runner` (Ninja/Make) or `<build-dir>/<config>/executor_runner` (e.g., `cmake-out/Release/executor_runner` with Xcode/Visual Studio) |
-| Model runners | `cmake-out/examples/models/<model>/<runner>` |
 | XNNPACK backend | `cmake-out/lib/libxnnpack_backend.a` |
-| Python package | `site-packages/executorch` |
+| executor_runner | `cmake-out/executor_runner` (Ninja/Make) or `cmake-out/Release/executor_runner` (Xcode) |
+| Model runners | `cmake-out/examples/models/<model>/<runner>` |
+
+**From cross-compilation:**
+
+| Artifact | Location |
+|----------|----------|
 | iOS frameworks | `cmake-out/*.xcframework` |
 | Android AAR | `aar-out/` |
 

From 6209f273ded2976d1e10541a22e4adb7308e9429 Mon Sep 17 00:00:00 2001
From: Github Executorch <github_executorch@arm.com>
Date: Tue, 10 Mar 2026 13:41:09 -0700
Subject: [PATCH 13/14] Fix fresh-Mac gaps: Xcode CLT, conda shell hook, Python
 version fallback

Three issues that would break a fresh Mac checkout:
- Add Xcode Command Line Tools prerequisite check
- Add conda shell.bash hook for non-interactive shells (Claude Code / CI)
- Add brew install python@3.12 guidance for venv path when only 3.14+ exists

Co-authored-by: Claude <noreply@anthropic.com>
---
 .claude/skills/building/SKILL.md | 33 ++++++++++++++++++++++++++------
 1 file changed, 27 insertions(+), 6 deletions(-)

diff --git a/.claude/skills/building/SKILL.md b/.claude/skills/building/SKILL.md
index 5ad9b132510..d349b50bd24 100644
--- a/.claude/skills/building/SKILL.md
+++ b/.claude/skills/building/SKILL.md
@@ -5,24 +5,45 @@ description: Build ExecuTorch from source — Python package, C++ runtime, runne
 
 # Building ExecuTorch
 
+## Prerequisites (macOS)
+
+A C++ compiler is required. On macOS, ensure Xcode Command Line Tools are installed:
+```bash
+xcode-select -p || xcode-select --install
+```
+
 ## Step 1: Ensure Python environment (detect and fix automatically)
 
+**Path A — conda (preferred):**
 ```bash
+# Initialize conda for non-interactive shells (required in Claude Code / CI)
+eval "$(conda shell.bash hook 2>/dev/null)"
+
 # Check if executorch conda env exists; create if not
-# Note: `conda env list` may fail with PermissionError on some setups.
-# Fallback: check if the env directory exists on disk.
 conda env list 2>/dev/null | grep executorch || \
-  ls "$CONDA_PREFIX/../envs/" 2>/dev/null | grep executorch || \
+  ls "$(conda info --base 2>/dev/null)/envs/" 2>/dev/null | grep executorch || \
   conda create -yn executorch python=3.12
 
 # Activate
 conda activate executorch
+```
 
-# Verify
-python --version          # need 3.10–3.13
-cmake --version           # need >= 3.24; cmake 4.x works in practice
+**Path B — no conda (fall back to venv):**
+```bash
+# Find a compatible Python (3.10–3.13). On macOS with only Homebrew Python 3.14+,
+# install a compatible version first: brew install python@3.12
+python3.12 -m venv .executorch-venv   # or python3.11, python3.10, python3.13
+source .executorch-venv/bin/activate
+pip install --upgrade pip
 ```
 
+**Then verify (either path):**
+
+Run `python --version` and `cmake --version`. Fix automatically:
+- **Python not 3.10–3.13**: recreate the env with a correct Python version.
+- **cmake missing or < 3.24**: run `pip install 'cmake>=3.24'` inside the env.
+- **cmake >= 4.0**: works in practice, no action needed.
+
 Parallel jobs: `$(sysctl -n hw.ncpu)` on macOS, `$(nproc)` on Linux.
 
 ## Step 2: Build

From d2e8919374ef4c06bdcedf0d51a7f907888a34da Mon Sep 17 00:00:00 2001
From: Github Executorch <github_executorch@arm.com>
Date: Tue, 10 Mar 2026 13:42:53 -0700
Subject: [PATCH 14/14] =?UTF-8?q?Remove=20Xcode=20CLT=20prerequisite=20?=
 =?UTF-8?q?=E2=80=94=20not=20in=20ET=20docs,=20rarely=20needed?=
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Co-authored-by: Claude <noreply@anthropic.com>
---
 .claude/skills/building/SKILL.md | 7 -------
 1 file changed, 7 deletions(-)

diff --git a/.claude/skills/building/SKILL.md b/.claude/skills/building/SKILL.md
index d349b50bd24..d1322cdecae 100644
--- a/.claude/skills/building/SKILL.md
+++ b/.claude/skills/building/SKILL.md
@@ -5,13 +5,6 @@ description: Build ExecuTorch from source — Python package, C++ runtime, runne
 
 # Building ExecuTorch
 
-## Prerequisites (macOS)
-
-A C++ compiler is required. On macOS, ensure Xcode Command Line Tools are installed:
-```bash
-xcode-select -p || xcode-select --install
-```
-
 ## Step 1: Ensure Python environment (detect and fix automatically)
 
 **Path A — conda (preferred):**