ENH: Add VKFFT_BACKEND=4 (Level Zero) and =5 (Metal) backends; CUDA 13 + multi-ICD OpenCL fixes by hjmjohnson · Pull Request #73 · InsightSoftwareConsortium/ITKVkFFTBackend

hjmjohnson · 2026-05-20T18:56:33Z

Add Level Zero (VKFFT_BACKEND=4) and Metal (VKFFT_BACKEND=5) backends, fix CUDA 13 build, fix a multi-ICD OpenCL CL_INVALID_CONTEXT, and auto-disable Level Zero tests on hosts with no Intel L0 driver. Default OpenCL behavior is unchanged.

Validated on real hardware

+--------+----------------------------+----------------------+---------+-----+--------+--------+----------+
|  Host  |          Backend           |      Build dir       | Defined | Run | Passed | Failed | Disabled |
+--------+----------------------------+----------------------+---------+-----+--------+--------+----------+
| macOS  | VKFFT_BACKEND=5 Metal      | cmake-build-metal/   |      30 |  19 |     19 |      0 |   11 [a] |
| macOS  | VKFFT_BACKEND=3 OpenCL     | cmake-build-release/ |      30 |  19 |     19 |      0 |   11 [a] |
+--------+----------------------------+----------------------+---------+-----+--------+--------+----------+
| Linux  | VKFFT_BACKEND=1 CUDA       | build-cuda/          |      30 |  30 |     30 |      0 |        0 |
| Linux  | VKFFT_BACKEND=3 OpenCL     | build-cl/            |      30 |  30 |     30 |      0 |        0 |
| Linux  | VKFFT_BACKEND=4 Level Zero | build-lz/            |      30 |   8 |      8 |      0 |   22 [b] |
+--------+----------------------------+----------------------+---------+-----+--------+--------+----------+

[a] Apple-arm64 Metal Shading Language has no FP64 -- 11 double-precision tests
    skipped via _vkfft_disable_on_unsupported_fp64() in test/CMakeLists.txt.
[b] Test host (Ice Lake Xeon + 2x NVIDIA RTX 6000 Ada, no Intel iGPU) fails the
    configure-time zeInit/zeDriverGet probe -> 22 FFT-touching tests
    auto-DISABLED. Lightweight factory / KWStyle tests still run.

macOS: Apple M-series + Intel CPU OpenCL.
Linux: Ubuntu 24.04 + CUDA 13.2 + NVIDIA driver 595.71.05 + pixi conda toolchain (~/src/ITK/.pixi/envs/cxx).

Per-commit summary

Commit	Title
`ce40cb8`	`ENH: Add VKFFT_BACKEND=5 (Metal) support for Apple platforms` -- VkFFT v1.3.4, metal-cpp via FetchContent, `BUILD_VKFFT=ON` default.
`1ae617e`	`COMP: Fix VKFFT_BACKEND=1 (CUDA) build against CUDA 13 toolkit` -- `cuCtxCreate` gained a 4th `CUctxCreateParams*` argument in CUDA 13; wrap call with `#if CUDA_VERSION >= 13000`. Also add `find_package(CUDAToolkit)` so `CUDAToolkit_INCLUDE_DIRS` is propagated to the consumer (the CUDA branch was previously a bare `# pass`).
`04e88aa`	`ENH: Add VKFFT_BACKEND=4 (Level Zero) support` -- `find_path`/`find_library` for `ze_loader` + headers; extends `VkGPU` with `driver`/`device`/`context`/`commandQueue`/`commandQueueID`; full `ConfigureBackend`/`PerformFFT`/`ReleaseBackend` implementation modeled on VkFFT's own `utils_VkFFT.cpp` Level Zero path.
`5172445`	`ENH: Cover every test in both float and double precision` -- parameterises every FFT test by precision (doubles ctest count from ~14 to ~30); adds `_vkfft_disable_on_unsupported_fp64()` for Apple-arm64 Metal.
`0ea4440`	`BUG: Fix CL_INVALID_CONTEXT when multiple OpenCL ICDs are loaded` -- `clCreateContext(NULL, ...)` binds to an implementation-defined platform when multiple ICDs are visible; pass an explicit `CL_CONTEXT_PLATFORM` property. Restores OpenCL 30 / 30 on hosts where both NVIDIA's CUDA-OpenCL and Intel CPU OpenCL ICDs are installed.
`30b8bbe`	`ENH: Auto-disable VKFFT_BACKEND=4 FFT tests when no Level Zero driver is present` -- configure-time `try_run` probe (`zeInit` + `zeDriverGet`); when zero drivers are discovered, mark FFT-touching tests `DISABLED TRUE` instead of letting them fail. Keeps KWStyle / factory / global-config tests enabled.

Local-build invocation notes

All three Linux builds were configured inside the same pixi conda env that ITK was built with (pixi run --environment cxx ... from ~/src/ITK). External-build consumers on the same host must either use the same toolchain or have their own ITK build that matches their compiler -- the conda sysroot's libm.so linker script references /lib64/... which is absent on Ubuntu (/lib/x86_64-linux-gnu/...).

All three builds install their .so to the same path in the shared ITK build tree (build/lib/libitkVkFFTBackend-6.0.so.1); after switching backend, rebuild the chosen backend's VkFFTBackend + VkFFTBackendTestDriver targets immediately before ctest, otherwise the runtime .so may belong to a different backend.

CUDA needs -DCUDA_USE_STATIC_CUDA_RUNTIME=OFF and a linker -L to the CUDA driver stubs directory (<cuda-root>/targets/x86_64-linux/lib/stubs) so the conda toolchain can resolve -lcuda against the driver stub.

OpenCL needs OpenCL_INCLUDE_DIR pointed at a Khronos headers checkout (e.g. /tmp/OpenCL-Headers) and OpenCL_LIBRARY at the CUDA-shipped libOpenCL.so.1 ICD loader.

Level Zero needs LevelZero_INCLUDE_DIR=$HOME/local/include and LevelZero_LIBRARY=$HOME/local/lib/libze_loader.so plus a -I$HOME/local/include/level_zero flag so VkFFT's unqualified #include <ze_api.h> resolves.

Adds native Apple Metal backend support via VkFFT's VKFFT_BACKEND=5 code path, exposed through the standard VKFFT_BACKEND cache option. On Apple platforms consumers (ANTs, BRAINSTools, Slicer) can now target Metal directly instead of going through the deprecated OpenCL-on-Metal translation layer. CMakeLists.txt - Document and accept VKFFT_BACKEND=5 (Metal). - Bump VkFFT pin to v1.3.4 (first release whose Metal codegen compiles cleanly on macOS arm64). - Flip BUILD_VKFFT default to ON because v1.3+ split vkFFT.h into a multi-file tree under vkFFT/vkFFT_Structs/, vkFFT/vkFFT_CodeGen/ that the single-header URL download cannot satisfy. - On VKFFT_BACKEND=5: require APPLE, bump CMAKE_CXX_STANDARD to 17 (metal-cpp requirement), FetchContent metal-cpp from the bkaradzic mirror at metal-cpp_macOS15_iOS18, and link the Metal/Foundation/QuartzCore system frameworks. itkVkDefinitions.h - Add METAL=5 alongside the existing VULKAN/CUDA/HIP/OPENCL aliases. itkVkCommon.h - Pre-include metal-cpp's Foundation/Metal/QuartzCore headers under VKFFT_BACKEND==METAL so vkFFT.h's internal *_PRIVATE_IMPLEMENTATION re-include is a header-guard no-op in every translation unit that pulls in itkVkCommon.h. Without this, metal-cpp's selector storage (CA::Private::Selector::s_k* etc.) gets emitted in every TU and the link step fails with duplicate-symbol errors. - Extend VkGPU with MTL::Device* and MTL::CommandQueue* members and the matching operator!= branch. itkVkCommon.cxx - Define NS_/MTL_/CA_PRIVATE_IMPLEMENTATION exactly once (this TU), before including the metal-cpp headers, so the storage symbols are emitted exactly once. - ConfigureBackend: enumerate Metal devices via MTL::CopyAllDevices(), pick by device_id, retain the device, and create a command queue. - Wire VkFFTConfiguration's Metal fields. Unlike CUDA/OpenCL which take pointer-to-pointer (void**, cl_device_id*), Metal expects single pointers (MTL::Device*, MTL::CommandQueue*), so the device assignment is moved inside the per-backend branch. - PerformFFT: allocate MTL::Buffer in MTL::ResourceStorageModeShared (Apple unified-memory shared storage), memcpy host->device into buffer->contents(), drive VkFFTAppend via a commandBuffer + computeCommandEncoder pair, waitUntilCompleted, memcpy back, and release the buffers. - ReleaseBackend: release the queue and device. Tested on macOS 26.5 / Apple Silicon: VKFFT_BACKEND=5 builds cleanly, otool -L confirms Metal/Foundation/QuartzCore linkage on the resulting dylib, and 4 VkFFT FFT round-trip tests pass on Metal (itkVkForwardInverseFFTImageFilterTest, itkVkForwardInverse1DFFTImage- FilterTest, itkVkHalfHermitianFFTImageFilterTest, itkVkGlobalConfig- urationTest). Double-precision tests still fail because Metal Shading Language does not support FP64 on Apple Silicon (program_source: incomplete type 'double2' reserved name); consumers must select PrecisionEnum::FLOAT on Metal. VKFFT_BACKEND=3 (OpenCL) build remains green.

Two pre-existing breakages prevented the CUDA backend from building against CUDA >= 13.0: - cuCtxCreate gained a fourth argument (CUctxCreateParams*) in CUDA 13.0. The legacy three-arg call is the cuCtxCreate_v3 symbol; the unsuffixed cuCtxCreate macro now expands to cuCtxCreate_v4. Wrap the call so CUDA 13+ supplies nullptr for the new params slot while earlier toolkits keep the three-arg form. - The VKFFT_BACKEND=1 CMake branch was a bare "# pass", so the CUDA Toolkit's include directory was never added to the module's include path and vkFFT.h's "#include <nvrtc.h>" failed unless the consumer manually exported CPATH. Pull in find_package(CUDAToolkit) and append CUDAToolkit_INCLUDE_DIRS to VkFFTBackend_SYSTEM_INCLUDE_DIRS. Verified on Ubuntu 24.04 with CUDA 13.2 toolkit + driver 595.71.05: backend=1 builds cleanly and all 15 VkFFTBackend ctests pass on the local NVIDIA GPU.

Adds the oneAPI Level Zero (L0) backend via VkFFT's VKFFT_BACKEND=4 code path, exposed through the standard VKFFT_BACKEND cache option. On platforms with an Intel L0 GPU runtime (Intel iGPUs and dGPUs on Linux/Windows) consumers can now target Level Zero directly without going through OpenCL or SYCL. CMakeLists.txt / itk-module-init.cmake - Document and accept VKFFT_BACKEND=4 (Level Zero). - find_path(level_zero/ze_api.h) and find_library(ze_loader); fail with a clear FATAL_ERROR if either is missing. LEVEL_ZERO_ROOT and CMPLR_ROOT env vars are honored as HINTS so oneAPI installs work out of the box. - target_link_libraries(VkFFTBackend PUBLIC ${LevelZero_LIBRARY}) and a SYSTEM include directive so downstream consumers do not need to re-resolve the loader. itkVkDefinitions.h - Add LEVEL_ZERO=4 alongside the existing aliases. itkVkCommon.h - Pre-include <level_zero/ze_api.h> under VKFFT_BACKEND==LEVEL_ZERO so that VkFFT's own #include <ze_api.h> resolves consistently. - Extend VkGPU with ze_driver_handle_t / ze_device_handle_t / ze_context_handle_t / ze_command_queue_handle_t / commandQueueID members and the matching operator!= branch. itkVkCommon.cxx - ConfigureBackend: zeInit -> enumerate drivers and devices, pick by device_id, create a ze_context, locate a queue group that supports both COMPUTE and COPY flags, and create a default ze_command_queue. Wire VkFFTConfiguration's L0 fields (device / context / commandQueue as pointer-to-handle, commandQueueID by value). - PerformFFT: allocate device buffers with zeMemAllocDevice; copy host->device via an immediate command list (zeCommandListCreateImmediate + zeCommandListAppendMemoryCopy + zeCommandQueueSynchronize); launch via a regular command list, zeCommandListClose, zeCommandQueueExecuteCommandLists, and zeCommandQueueSynchronize; copy device->host with another immediate command list; free with zeMemFree using the same C2C / inverse-R2H aliasing rules as the OpenCL / Metal paths. - ReleaseBackend: zeCommandQueueDestroy + zeContextDestroy. Validated on Linux x86_64 with the Level Zero loader v1.28.6 built from oneapi-src/level-zero: backend=4 configures, compiles, and links cleanly. Runtime device enumeration fails on the test host as expected because no Intel L0 driver (intel-level-zero-gpu) is installed; on a system with the Intel GPU runtime the same binary will discover the device. Backends 1 (CUDA), 3 (OpenCL), and 5 (Metal) builds are unchanged.

Refactor the VkFFTBackend test suite so every logical test runs in both single (float) and double precision, exposing per-precision regressions and giving downstream consumers (ANTs, BRAINSTools) confidence that either pixel type works end-to-end through the FFT path. Approach -------- Each .cxx test source is templated on a PrecisionType parameter and its former main() body is hoisted into runFooTest<PrecisionType>(). The test driver entry function inspects argv[1] for "float" or "double" and dispatches accordingly: template <typename PrecisionType> int runFooTest(...) { /* body, using PrecisionType */ } int itkVkFooTest(int argc, char * argv[]) { const std::string precision{ (argc > 1) ? argv[1] : "float" }; if (precision == "double") return runFooTest<double>(...); if (precision == "float") return runFooTest<float>(...); std::cerr << "Unknown precision '" << precision << "'.\n"; return EXIT_FAILURE; } 13 test sources are converted in this pattern: the two factory tests, GlobalConfigurationTest, the three round-trip tests (ForwardInverseFFT, ForwardInverse1DFFT, HalfHermitianFFT), the two ComplexToComplex tests, the three baseline-comparison tests (ComplexToComplex1DFFT/Forward1DFFT/Inverse1DFFT BaselineTest), the MultiResolutionPyramid test, and DiscreteGaussianImageFilterTest. CMake registration ------------------ test/CMakeLists.txt is rewritten to register paired itk_add_test(NAME ...Float) and itk_add_test(NAME ...Double) entries that pass the precision string as the first test argument. A helper _vkfft_disable_on_unsupported_fp64(<test_name>) marks each Double variant as DISABLED on Apple-arm64, where the GPU has no FP64 hardware and VkFFT's shader compile would fail with VKFFT_ERROR_FAILED_TO_COMPILE_PROGRAM (4031). Apple-arm64 detection uses APPLE AND CMAKE_SYSTEM_PROCESSOR=arm64. Linux/x86_64 and Windows CI continue to exercise the Double variants in full. Factory-only tests (FFTImageFilterFactoryTest, MultiResolutionPyramidImageFilterFactoryTest, GlobalConfigurationTest) do not launch a GPU kernel and therefore run unguarded on every platform, exercising the C++ template instantiation surface for both precisions even where FP64 GPU execution is unavailable. Per-precision baselines ----------------------- Where --compare references a baseline image, the Float and Double ctest entries write to precision-suffixed output paths so the same filename does not collide: *TestOutputFloat.mha *TestOutputDouble.mha For Float variants whose baseline does not yet exist in ITKData (ComplexToComplexFFTImageFilterTestFloat and ComplexToComplex1DFFTImageFilterSizesTestFloat), --compare is intentionally omitted with a TODO so the test still exercises the FFT pipeline via internal assertions. Once float baselines are uploaded to ITKData the --compare DATA{...Float.mha} lines can be re-added. Local validation (Apple Silicon, macOS 26.5, M-series GPU) ---------------------------------------------------------- Both backends pass identically on this hardware: VKFFT_BACKEND=5 (Metal) — 19/19 enabled tests pass, 11 disabled VKFFT_BACKEND=3 (OpenCL) — 19/19 enabled tests pass, 11 disabled The matching pass/skip signature across the two backends is strong evidence the wrapper layer is forwarding correctly to VkFFT's per-backend kernels.

clCreateContext was called with properties=NULL. With multiple ICDs visible to the loader (e.g. NVIDIA CUDA's OpenCL + Intel CPU OpenCL on Linux, or Apple's two platforms on macOS) the loader binds the returned context to an implementation-defined platform, which may not match the platform that m_VkGPU.device was obtained from. The handle compiles and clCreateContext itself returns CL_SUCCESS, but the next clCreateBuffer dispatches through the wrong ICD and fails with CL_INVALID_CONTEXT (-34), which VkFFT surfaces as VKFFT_ERROR_FAILED_TO_INITIALIZE (4001). Pass an explicit CL_CONTEXT_PLATFORM property so the loader binds the context to the platform we already selected. Single-ICD hosts are unaffected. Tested on Linux x86_64 with NVIDIA's OpenCL ICD + Intel CPU OpenCL ICD both visible to libOpenCL.so.1 (typical CUDA-toolkit install): 30 / 30 VkFFTBackend ctests now pass on backend=3, up from 14 / 30. CUDA (1) and Level Zero (4) builds are unchanged.

… is present The Level Zero loader (libze_loader) can be installed via apt/dnf without any underlying GPU runtime (Intel iGPU/dGPU). On hosts with no Intel L0 driver — CPU-only servers, NVIDIA-only workstations, virtualised environments — every FFT-touching test fails at zeInit with ZE_RESULT_ERROR_UNINITIALIZED (0x78000001), producing 22 spurious red checks per CDash submission. Add a configure-time try_run probe that calls zeInit + zeDriverGet against the loader we already located. The probe runs only when BUILD_TESTING is ON. If zero drivers come back, export VkFFTBackend_LEVEL_ZERO_RUNTIME_AVAILABLE=FALSE; the test/CMakeLists.txt guard then marks every FFT-exercising ctest entry DISABLED. The lightweight KWStyle / factory / global-configuration / instantiation tests are kept enabled because they execute no GPU code. Hosts with a real Intel L0 driver are unaffected: the probe succeeds, the flag is TRUE, and every test runs as before. Backends 1/2/3/5 are also unaffected — the guard only fires for VKFFT_BACKEND==4. Tested on cortex (Ice Lake Xeon + 2x NVIDIA RTX 6000 Ada, no Intel iGPU): backend=4 now reports 8 / 8 passed (KWStyle + factory + configuration + instantiation) with 22 disabled, instead of 8 / 30 with 22 failed.

@latest

….07.22 Update the build-test-package workflow so its pinned versions match the underlying ITK 5.4.6 release that PR #73 targets: - itk-git-tag, itk-wheel-tag v5.3.0 -> v5.4.6 - opencl-icd-loader-git-tag v2021.04.29 -> v2025.07.22 - opencl-headers-git-tag v2021.04.29 -> v2025.07.22 - macos runner macos-14 -> macos-15 - actions/download-artifact v2 -> v4 (required for upload-artifact@v4) - lukka/get-cmake v3.22.2 -> @latest Restrict Python wheel matrices to the supported interpreter set: - Linux: ["37","38","39","310","311"] -> ["310","311"] - Windows: ["9","10","11"] -> ["10","11"] Drops Python 3.7-3.9 wheel production; aligns with the project's pyproject.toml requires-python >= 3.10 (3.9 is kept only as the C++ build job's pip-for-ninja interpreter, which is incidental). itk-python-package-tag is left at the existing pinned commit; bumping it is a separate change that requires verifying the wheel-build scripts against ITK 5.4.6 first. The "Install pocl" step is intentionally unchanged — the conda-forge pocl path was working previously and is preferred for the C++ OpenCL test job.

@latest

….07.22 Update the build-test-package workflow so its pinned versions match the underlying ITK 5.4.6 release that PR #73 targets: - itk-git-tag, itk-wheel-tag v5.3.0 -> v5.4.6 - opencl-icd-loader-git-tag v2021.04.29 -> v2025.07.22 - opencl-headers-git-tag v2021.04.29 -> v2025.07.22 - macos runner macos-14 -> macos-15 - actions/download-artifact v2 -> v4 (required for upload-artifact@v4) - lukka/get-cmake v3.22.2 -> @latest Restrict Python wheel matrices to the supported interpreter set: - Linux: ["37","38","39","310","311"] -> ["310","311"] - Windows: ["9","10","11"] -> ["10","11"] Drops Python 3.7-3.9 wheel production; aligns with the project's pyproject.toml requires-python >= 3.10 (3.9 is kept only as the C++ build job's pip-for-ninja interpreter, which is incidental). itk-python-package-tag is left at the existing pinned commit; bumping it is a separate change that requires verifying the wheel-build scripts against ITK 5.4.6 first. The "Install pocl" step is intentionally unchanged — the conda-forge pocl path was working previously and is preferred for the C++ OpenCL test job.

@latest

….07.22 Update the build-test-package workflow so its pinned versions match the underlying ITK 5.4.6 release that PR #73 targets: - itk-git-tag, itk-wheel-tag v5.3.0 -> v5.4.6 - opencl-icd-loader-git-tag v2021.04.29 -> v2025.07.22 - opencl-headers-git-tag v2021.04.29 -> v2025.07.22 - macos runner macos-14 -> macos-15 - actions/download-artifact v2 -> v4 (required for upload-artifact@v4) - lukka/get-cmake v3.22.2 -> @latest Restrict Python wheel matrices to the supported interpreter set: - Linux: ["37","38","39","310","311"] -> ["310","311"] - Windows: ["9","10","11"] -> ["10","11"] Drops Python 3.7-3.9 wheel production; aligns with the project's pyproject.toml requires-python >= 3.10 (3.9 is kept only as the C++ build job's pip-for-ninja interpreter, which is incidental). itk-python-package-tag is left at the existing pinned commit; bumping it is a separate change that requires verifying the wheel-build scripts against ITK 5.4.6 first. The "Install pocl" step is intentionally unchanged — the conda-forge pocl path was working previously and is preferred for the C++ OpenCL test job.

@latest

….07.22 Update the build-test-package workflow so its pinned versions match the underlying ITK 5.4.6 release that PR #73 targets: - itk-git-tag, itk-wheel-tag v5.3.0 -> v5.4.6 - opencl-icd-loader-git-tag v2021.04.29 -> v2025.07.22 - opencl-headers-git-tag v2021.04.29 -> v2025.07.22 - macos runner macos-14 -> macos-15 - actions/download-artifact v2 -> v4 (required for upload-artifact@v4) - lukka/get-cmake v3.22.2 -> @latest Restrict Python wheel matrices to the supported interpreter set: - Linux: ["37","38","39","310","311"] -> ["310","311"] - Windows: ["9","10","11"] -> ["10","11"] Drops Python 3.7-3.9 wheel production; aligns with the project's pyproject.toml requires-python >= 3.10 (3.9 is kept only as the C++ build job's pip-for-ninja interpreter, which is incidental). itk-python-package-tag is left at the existing pinned commit; bumping it is a separate change that requires verifying the wheel-build scripts against ITK 5.4.6 first. The "Install pocl" step is intentionally unchanged — the conda-forge pocl path was working previously and is preferred for the C++ OpenCL test job.

@latest

….07.22 Update the build-test-package workflow so its pinned versions match the underlying ITK 5.4.6 release that PR #73 targets: - itk-git-tag, itk-wheel-tag v5.3.0 -> v5.4.6 - opencl-icd-loader-git-tag v2021.04.29 -> v2025.07.22 - opencl-headers-git-tag v2021.04.29 -> v2025.07.22 - macos runner macos-14 -> macos-15 - actions/download-artifact v2 -> v4 (required for upload-artifact@v4) - lukka/get-cmake v3.22.2 -> @latest Restrict Python wheel matrices to the supported interpreter set: - Linux: ["37","38","39","310","311"] -> ["310","311"] - Windows: ["9","10","11"] -> ["10","11"] Drops Python 3.7-3.9 wheel production; aligns with the project's pyproject.toml requires-python >= 3.10 (3.9 is kept only as the C++ build job's pip-for-ninja interpreter, which is incidental). itk-python-package-tag is left at the existing pinned commit; bumping it is a separate change that requires verifying the wheel-build scripts against ITK 5.4.6 first. The "Install pocl" step is intentionally unchanged — the conda-forge pocl path was working previously and is preferred for the C++ OpenCL test job.

@latest

….07.22 Update the build-test-package workflow so its pinned versions match the underlying ITK 5.4.6 release that PR #73 targets: - itk-git-tag, itk-wheel-tag v5.3.0 -> v5.4.6 - opencl-icd-loader-git-tag v2021.04.29 -> v2025.07.22 - opencl-headers-git-tag v2021.04.29 -> v2025.07.22 - macos runner macos-14 -> macos-15 - actions/download-artifact v2 -> v4 (required for upload-artifact@v4) - lukka/get-cmake v3.22.2 -> @latest Restrict Python wheel matrices to the supported interpreter set: - Linux: ["37","38","39","310","311"] -> ["310","311"] - Windows: ["9","10","11"] -> ["10","11"] Drops Python 3.7-3.9 wheel production; aligns with the project's pyproject.toml requires-python >= 3.10 (3.9 is kept only as the C++ build job's pip-for-ninja interpreter, which is incidental). itk-python-package-tag is left at the existing pinned commit; bumping it is a separate change that requires verifying the wheel-build scripts against ITK 5.4.6 first. The "Install pocl" step is intentionally unchanged — the conda-forge pocl path was working previously and is preferred for the C++ OpenCL test job.

@latest

….07.22 Update the build-test-package workflow so its pinned versions match the underlying ITK 5.4.6 release that PR #73 targets: - itk-git-tag, itk-wheel-tag v5.3.0 -> v5.4.6 - opencl-icd-loader-git-tag v2021.04.29 -> v2025.07.22 - opencl-headers-git-tag v2021.04.29 -> v2025.07.22 - macos runner macos-14 -> macos-15 - actions/download-artifact v2 -> v4 (required for upload-artifact@v4) - lukka/get-cmake v3.22.2 -> @latest Restrict Python wheel matrices to the supported interpreter set: - Linux: ["37","38","39","310","311"] -> ["310","311"] - Windows: ["9","10","11"] -> ["10","11"] Drops Python 3.7-3.9 wheel production; aligns with the project's pyproject.toml requires-python >= 3.10 (3.9 is kept only as the C++ build job's pip-for-ninja interpreter, which is incidental). itk-python-package-tag is left at the existing pinned commit; bumping it is a separate change that requires verifying the wheel-build scripts against ITK 5.4.6 first. The "Install pocl" step is intentionally unchanged — the conda-forge pocl path was working previously and is preferred for the C++ OpenCL test job.

@latest

….07.22 Update the build-test-package workflow so its pinned versions match the underlying ITK 5.4.6 release that PR #73 targets: - itk-git-tag, itk-wheel-tag v5.3.0 -> v5.4.6 - opencl-icd-loader-git-tag v2021.04.29 -> v2025.07.22 - opencl-headers-git-tag v2021.04.29 -> v2025.07.22 - macos runner macos-14 -> macos-15 - actions/download-artifact v2 -> v4 (required for upload-artifact@v4) - lukka/get-cmake v3.22.2 -> @latest Restrict Python wheel matrices to the supported interpreter set: - Linux: ["37","38","39","310","311"] -> ["310","311"] - Windows: ["9","10","11"] -> ["10","11"] Drops Python 3.7-3.9 wheel production; aligns with the project's pyproject.toml requires-python >= 3.10 (3.9 is kept only as the C++ build job's pip-for-ninja interpreter, which is incidental). itk-python-package-tag is left at the existing pinned commit; bumping it is a separate change that requires verifying the wheel-build scripts against ITK 5.4.6 first. The "Install pocl" step is intentionally unchanged — the conda-forge pocl path was working previously and is preferred for the C++ OpenCL test job.

@latest

….07.22 Update the build-test-package workflow so its pinned versions match the underlying ITK 5.4.6 release that PR #73 targets: - itk-git-tag, itk-wheel-tag v5.3.0 -> v5.4.6 - opencl-icd-loader-git-tag v2021.04.29 -> v2025.07.22 - opencl-headers-git-tag v2021.04.29 -> v2025.07.22 - macos runner macos-14 -> macos-15 - actions/download-artifact v2 -> v4 (required for upload-artifact@v4) - lukka/get-cmake v3.22.2 -> @latest Restrict Python wheel matrices to the supported interpreter set: - Linux: ["37","38","39","310","311"] -> ["310","311"] - Windows: ["9","10","11"] -> ["10","11"] Drops Python 3.7-3.9 wheel production; aligns with the project's pyproject.toml requires-python >= 3.10 (3.9 is kept only as the C++ build job's pip-for-ninja interpreter, which is incidental). itk-python-package-tag is left at the existing pinned commit; bumping it is a separate change that requires verifying the wheel-build scripts against ITK 5.4.6 first. The "Install pocl" step is intentionally unchanged — the conda-forge pocl path was working previously and is preferred for the C++ OpenCL test job.

@latest

….07.22 Update the build-test-package workflow so its pinned versions match the underlying ITK 5.4.6 release that PR #73 targets: - itk-git-tag, itk-wheel-tag v5.3.0 -> v5.4.6 - opencl-icd-loader-git-tag v2021.04.29 -> v2025.07.22 - opencl-headers-git-tag v2021.04.29 -> v2025.07.22 - macos runner macos-14 -> macos-15 - actions/download-artifact v2 -> v4 (required for upload-artifact@v4) - lukka/get-cmake v3.22.2 -> @latest Restrict Python wheel matrices to the supported interpreter set: - Linux: ["37","38","39","310","311"] -> ["310","311"] - Windows: ["9","10","11"] -> ["10","11"] Drops Python 3.7-3.9 wheel production; aligns with the project's pyproject.toml requires-python >= 3.10 (3.9 is kept only as the C++ build job's pip-for-ninja interpreter, which is incidental). itk-python-package-tag is left at the existing pinned commit; bumping it is a separate change that requires verifying the wheel-build scripts against ITK 5.4.6 first. The "Install pocl" step is intentionally unchanged — the conda-forge pocl path was working previously and is preferred for the C++ OpenCL test job.

@latest

….07.22 Update the build-test-package workflow so its pinned versions match the underlying ITK 5.4.6 release that PR #73 targets: - itk-git-tag, itk-wheel-tag v5.3.0 -> v5.4.6 - opencl-icd-loader-git-tag v2021.04.29 -> v2025.07.22 - opencl-headers-git-tag v2021.04.29 -> v2025.07.22 - macos runner macos-14 -> macos-15 - actions/download-artifact v2 -> v4 (required for upload-artifact@v4) - lukka/get-cmake v3.22.2 -> @latest Restrict Python wheel matrices to the supported interpreter set: - Linux: ["37","38","39","310","311"] -> ["310","311"] - Windows: ["9","10","11"] -> ["10","11"] Drops Python 3.7-3.9 wheel production; aligns with the project's pyproject.toml requires-python >= 3.10 (3.9 is kept only as the C++ build job's pip-for-ninja interpreter, which is incidental). itk-python-package-tag is left at the existing pinned commit; bumping it is a separate change that requires verifying the wheel-build scripts against ITK 5.4.6 first. The "Install pocl" step is intentionally unchanged — the conda-forge pocl path was working previously and is preferred for the C++ OpenCL test job.

@latest

….07.22 Update the build-test-package workflow so its pinned versions match the underlying ITK 5.4.6 release that PR #73 targets: - itk-git-tag, itk-wheel-tag v5.3.0 -> v5.4.6 - opencl-icd-loader-git-tag v2021.04.29 -> v2025.07.22 - opencl-headers-git-tag v2021.04.29 -> v2025.07.22 - macos runner macos-14 -> macos-15 - actions/download-artifact v2 -> v4 (required for upload-artifact@v4) - lukka/get-cmake v3.22.2 -> @latest Restrict Python wheel matrices to the supported interpreter set: - Linux: ["37","38","39","310","311"] -> ["310","311"] - Windows: ["9","10","11"] -> ["10","11"] Drops Python 3.7-3.9 wheel production; aligns with the project's pyproject.toml requires-python >= 3.10 (3.9 is kept only as the C++ build job's pip-for-ninja interpreter, which is incidental). itk-python-package-tag is left at the existing pinned commit; bumping it is a separate change that requires verifying the wheel-build scripts against ITK 5.4.6 first. The "Install pocl" step is intentionally unchanged — the conda-forge pocl path was working previously and is preferred for the C++ OpenCL test job.

@latest

….07.22 Update the build-test-package workflow so its pinned versions match the underlying ITK 5.4.6 release that PR #73 targets: - itk-git-tag, itk-wheel-tag v5.3.0 -> v5.4.6 - opencl-icd-loader-git-tag v2021.04.29 -> v2025.07.22 - opencl-headers-git-tag v2021.04.29 -> v2025.07.22 - macos runner macos-14 -> macos-15 - actions/download-artifact v2 -> v4 (required for upload-artifact@v4) - lukka/get-cmake v3.22.2 -> @latest Restrict Python wheel matrices to the supported interpreter set: - Linux: ["37","38","39","310","311"] -> ["310","311"] - Windows: ["9","10","11"] -> ["10","11"] Drops Python 3.7-3.9 wheel production; aligns with the project's pyproject.toml requires-python >= 3.10 (3.9 is kept only as the C++ build job's pip-for-ninja interpreter, which is incidental). itk-python-package-tag is left at the existing pinned commit; bumping it is a separate change that requires verifying the wheel-build scripts against ITK 5.4.6 first. The "Install pocl" step is intentionally unchanged — the conda-forge pocl path was working previously and is preferred for the C++ OpenCL test job.

@latest

….07.22 Update the build-test-package workflow so its pinned versions match the underlying ITK 5.4.6 release that PR #73 targets: - itk-git-tag, itk-wheel-tag v5.3.0 -> v5.4.6 - opencl-icd-loader-git-tag v2021.04.29 -> v2025.07.22 - opencl-headers-git-tag v2021.04.29 -> v2025.07.22 - macos runner macos-14 -> macos-15 - actions/download-artifact v2 -> v4 (required for upload-artifact@v4) - lukka/get-cmake v3.22.2 -> @latest Restrict Python wheel matrices to the supported interpreter set: - Linux: ["37","38","39","310","311"] -> ["310","311"] - Windows: ["9","10","11"] -> ["10","11"] Drops Python 3.7-3.9 wheel production; aligns with the project's pyproject.toml requires-python >= 3.10 (3.9 is kept only as the C++ build job's pip-for-ninja interpreter, which is incidental). itk-python-package-tag is left at the existing pinned commit; bumping it is a separate change that requires verifying the wheel-build scripts against ITK 5.4.6 first. The "Install pocl" step is intentionally unchanged — the conda-forge pocl path was working previously and is preferred for the C++ OpenCL test job.

@latest

….07.22 Update the build-test-package workflow so its pinned versions match the underlying ITK 5.4.6 release that PR #73 targets: - itk-git-tag, itk-wheel-tag v5.3.0 -> v5.4.6 - opencl-icd-loader-git-tag v2021.04.29 -> v2025.07.22 - opencl-headers-git-tag v2021.04.29 -> v2025.07.22 - macos runner macos-14 -> macos-15 - actions/download-artifact v2 -> v4 (required for upload-artifact@v4) - lukka/get-cmake v3.22.2 -> @latest Restrict Python wheel matrices to the supported interpreter set: - Linux: ["37","38","39","310","311"] -> ["310","311"] - Windows: ["9","10","11"] -> ["10","11"] Drops Python 3.7-3.9 wheel production; aligns with the project's pyproject.toml requires-python >= 3.10 (3.9 is kept only as the C++ build job's pip-for-ninja interpreter, which is incidental). itk-python-package-tag is left at the existing pinned commit; bumping it is a separate change that requires verifying the wheel-build scripts against ITK 5.4.6 first. The "Install pocl" step is intentionally unchanged — the conda-forge pocl path was working previously and is preferred for the C++ OpenCL test job.

@latest

….07.22 Update the build-test-package workflow so its pinned versions match the underlying ITK 5.4.6 release that PR #73 targets: - itk-git-tag, itk-wheel-tag v5.3.0 -> v5.4.6 - opencl-icd-loader-git-tag v2021.04.29 -> v2025.07.22 - opencl-headers-git-tag v2021.04.29 -> v2025.07.22 - macos runner macos-14 -> macos-15 - actions/download-artifact v2 -> v4 (required for upload-artifact@v4) - lukka/get-cmake v3.22.2 -> @latest Restrict Python wheel matrices to the supported interpreter set: - Linux: ["37","38","39","310","311"] -> ["310","311"] - Windows: ["9","10","11"] -> ["10","11"] Drops Python 3.7-3.9 wheel production; aligns with the project's pyproject.toml requires-python >= 3.10 (3.9 is kept only as the C++ build job's pip-for-ninja interpreter, which is incidental). itk-python-package-tag is left at the existing pinned commit; bumping it is a separate change that requires verifying the wheel-build scripts against ITK 5.4.6 first. The "Install pocl" step is intentionally unchanged — the conda-forge pocl path was working previously and is preferred for the C++ OpenCL test job.

hjmjohnson · 2026-05-22T17:27:46Z

PR #74 (stacked on this branch) modernizes the CI environment and makes the hosted runners exercise real FFTs wherever the hardware/runtime allows — previously hosted CI only compiled and ran the lint smoke tests.

Environmental tool updates

ITK v5.4.6, runner matrix ubuntu-24.04 / macos-15 / windows-2022
OpenCL ICD loader + headers v2025.07.22, CL_TARGET_OPENCL_VERSION=300
Python 3.9 setup, current actions/* (checkout@v5, setup-python@v5, lukka/get-cmake@latest, upload/download-artifact@v4)
ITK built with only the module's declared dependency set (ITK_BUILD_DEFAULT_MODULES=OFF) to keep each leg cheap
Self-hosted GPU/notebook workflows gated behind workflow_dispatch / a gpu-ci label so they no longer queue indefinitely (see CI: Self-hosted GPU runners needed to fully test VkFFT backends #75)

Run tests where possible (functional FFT coverage on hosted CI)

Leg	Coverage
macos-15 Metal	Builds with `BUILD_TESTING=ON` and runs the FFT tests on the Apple Silicon GPU — real on-hardware FFT correctness
ubuntu-24.04 OpenCL (pocl)	Runs a pocl-safe FFT round-trip subset (radix-2/3/5/7 + Bluestein primes 11/13, capped at size 16); pocl mis-computes the size-19 Bluestein inverse, so larger/baseline cases stay on real GPUs
CUDA / Level Zero	Compile + link coverage on hosted runners (no GPU needed to build)
macOS/Windows OpenCL	Lint smoke tests (no verified pocl device)

Full-size and baseline FFT correctness continue to run on the self-hosted GPU runner. Several genuine module fixes fell out of this (Level Zero loader/header discovery, the <ze_api.h> include path, OpenCL device-less-platform handling).

hjmjohnson added 4 commits May 20, 2026 11:42

hjmjohnson changed the title ~~ENH: Add VKFFT_BACKEND=4 (Level Zero) and =5 (Metal) backends; fix CUDA 13~~ ENH: Add VKFFT_BACKEND=5 (Metal), =4 (Level Zero), CUDA-13 fix, and float+double test coverage May 20, 2026

hjmjohnson added 2 commits May 20, 2026 15:52

hjmjohnson changed the title ~~ENH: Add VKFFT_BACKEND=5 (Metal), =4 (Level Zero), CUDA-13 fix, and float+double test coverage~~ ENH: Add VKFFT_BACKEND=4 (Level Zero) and =5 (Metal) backends; CUDA 13 + multi-ICD OpenCL fixes May 20, 2026

hjmjohnson requested a review from thewtex May 20, 2026 21:06

hjmjohnson marked this pull request as ready for review May 20, 2026 21:08

This was referenced May 20, 2026

ENH: Bring CI workflow current — ITK 5.4.6, macos-15, OpenCL ICD 2025.07.22 hjmjohnson/ITKVkFFTBackend#1

Closed

ENH: Bring CI workflow current — ITK 5.4.6, macos-15, OpenCL ICD 2025.07.22 #74

Open

hjmjohnson mentioned this pull request May 22, 2026

CI: Self-hosted GPU runners needed to fully test VkFFT backends #75

Open

hjmjohnson requested a review from Leengit May 22, 2026 17:52

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ENH: Add VKFFT_BACKEND=4 (Level Zero) and =5 (Metal) backends; CUDA 13 + multi-ICD OpenCL fixes#73

ENH: Add VKFFT_BACKEND=4 (Level Zero) and =5 (Metal) backends; CUDA 13 + multi-ICD OpenCL fixes#73
hjmjohnson wants to merge 6 commits into
InsightSoftwareConsortium:mainfrom
hjmjohnson:enh/vkfft-backend-5-metal

hjmjohnson commented May 20, 2026 •

edited

Loading

Uh oh!

hjmjohnson commented May 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

hjmjohnson commented May 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Validated on real hardware

Uh oh!

hjmjohnson commented May 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

hjmjohnson commented May 20, 2026 •

edited

Loading