ENH: Add VKFFT_BACKEND=4 (Level Zero) and =5 (Metal) backends; CUDA 13 + multi-ICD OpenCL fixes#73
Open
hjmjohnson wants to merge 6 commits into
Conversation
Adds native Apple Metal backend support via VkFFT's VKFFT_BACKEND=5
code path, exposed through the standard VKFFT_BACKEND cache option.
On Apple platforms consumers (ANTs, BRAINSTools, Slicer) can now
target Metal directly instead of going through the deprecated
OpenCL-on-Metal translation layer.
CMakeLists.txt
- Document and accept VKFFT_BACKEND=5 (Metal).
- Bump VkFFT pin to v1.3.4 (first release whose Metal codegen
compiles cleanly on macOS arm64).
- Flip BUILD_VKFFT default to ON because v1.3+ split vkFFT.h into
a multi-file tree under vkFFT/vkFFT_Structs/, vkFFT/vkFFT_CodeGen/
that the single-header URL download cannot satisfy.
- On VKFFT_BACKEND=5: require APPLE, bump CMAKE_CXX_STANDARD to 17
(metal-cpp requirement), FetchContent metal-cpp from the
bkaradzic mirror at metal-cpp_macOS15_iOS18, and link the
Metal/Foundation/QuartzCore system frameworks.
itkVkDefinitions.h
- Add METAL=5 alongside the existing VULKAN/CUDA/HIP/OPENCL aliases.
itkVkCommon.h
- Pre-include metal-cpp's Foundation/Metal/QuartzCore headers under
VKFFT_BACKEND==METAL so vkFFT.h's internal *_PRIVATE_IMPLEMENTATION
re-include is a header-guard no-op in every translation unit that
pulls in itkVkCommon.h. Without this, metal-cpp's selector storage
(CA::Private::Selector::s_k* etc.) gets emitted in every TU and
the link step fails with duplicate-symbol errors.
- Extend VkGPU with MTL::Device* and MTL::CommandQueue* members
and the matching operator!= branch.
itkVkCommon.cxx
- Define NS_/MTL_/CA_PRIVATE_IMPLEMENTATION exactly once (this TU),
before including the metal-cpp headers, so the storage symbols
are emitted exactly once.
- ConfigureBackend: enumerate Metal devices via
MTL::CopyAllDevices(), pick by device_id, retain the device,
and create a command queue.
- Wire VkFFTConfiguration's Metal fields. Unlike CUDA/OpenCL which
take pointer-to-pointer (void**, cl_device_id*), Metal expects
single pointers (MTL::Device*, MTL::CommandQueue*), so the device
assignment is moved inside the per-backend branch.
- PerformFFT: allocate MTL::Buffer in MTL::ResourceStorageModeShared
(Apple unified-memory shared storage), memcpy host->device into
buffer->contents(), drive VkFFTAppend via a commandBuffer +
computeCommandEncoder pair, waitUntilCompleted, memcpy back, and
release the buffers.
- ReleaseBackend: release the queue and device.
Tested on macOS 26.5 / Apple Silicon: VKFFT_BACKEND=5 builds cleanly,
otool -L confirms Metal/Foundation/QuartzCore linkage on the resulting
dylib, and 4 VkFFT FFT round-trip tests pass on Metal
(itkVkForwardInverseFFTImageFilterTest, itkVkForwardInverse1DFFTImage-
FilterTest, itkVkHalfHermitianFFTImageFilterTest, itkVkGlobalConfig-
urationTest). Double-precision tests still fail because Metal Shading
Language does not support FP64 on Apple Silicon (program_source:
incomplete type 'double2' reserved name); consumers must select
PrecisionEnum::FLOAT on Metal.
VKFFT_BACKEND=3 (OpenCL) build remains green.
Two pre-existing breakages prevented the CUDA backend from building against CUDA >= 13.0: - cuCtxCreate gained a fourth argument (CUctxCreateParams*) in CUDA 13.0. The legacy three-arg call is the cuCtxCreate_v3 symbol; the unsuffixed cuCtxCreate macro now expands to cuCtxCreate_v4. Wrap the call so CUDA 13+ supplies nullptr for the new params slot while earlier toolkits keep the three-arg form. - The VKFFT_BACKEND=1 CMake branch was a bare "# pass", so the CUDA Toolkit's include directory was never added to the module's include path and vkFFT.h's "#include <nvrtc.h>" failed unless the consumer manually exported CPATH. Pull in find_package(CUDAToolkit) and append CUDAToolkit_INCLUDE_DIRS to VkFFTBackend_SYSTEM_INCLUDE_DIRS. Verified on Ubuntu 24.04 with CUDA 13.2 toolkit + driver 595.71.05: backend=1 builds cleanly and all 15 VkFFTBackend ctests pass on the local NVIDIA GPU.
Adds the oneAPI Level Zero (L0) backend via VkFFT's VKFFT_BACKEND=4
code path, exposed through the standard VKFFT_BACKEND cache option.
On platforms with an Intel L0 GPU runtime (Intel iGPUs and dGPUs on
Linux/Windows) consumers can now target Level Zero directly without
going through OpenCL or SYCL.
CMakeLists.txt / itk-module-init.cmake
- Document and accept VKFFT_BACKEND=4 (Level Zero).
- find_path(level_zero/ze_api.h) and find_library(ze_loader); fail
with a clear FATAL_ERROR if either is missing. LEVEL_ZERO_ROOT
and CMPLR_ROOT env vars are honored as HINTS so oneAPI installs
work out of the box.
- target_link_libraries(VkFFTBackend PUBLIC ${LevelZero_LIBRARY})
and a SYSTEM include directive so downstream consumers do not
need to re-resolve the loader.
itkVkDefinitions.h
- Add LEVEL_ZERO=4 alongside the existing aliases.
itkVkCommon.h
- Pre-include <level_zero/ze_api.h> under VKFFT_BACKEND==LEVEL_ZERO
so that VkFFT's own #include <ze_api.h> resolves consistently.
- Extend VkGPU with ze_driver_handle_t / ze_device_handle_t /
ze_context_handle_t / ze_command_queue_handle_t / commandQueueID
members and the matching operator!= branch.
itkVkCommon.cxx
- ConfigureBackend: zeInit -> enumerate drivers and devices, pick
by device_id, create a ze_context, locate a queue group that
supports both COMPUTE and COPY flags, and create a default
ze_command_queue. Wire VkFFTConfiguration's L0 fields (device /
context / commandQueue as pointer-to-handle, commandQueueID by
value).
- PerformFFT: allocate device buffers with zeMemAllocDevice; copy
host->device via an immediate command list
(zeCommandListCreateImmediate + zeCommandListAppendMemoryCopy +
zeCommandQueueSynchronize); launch via a regular command list,
zeCommandListClose, zeCommandQueueExecuteCommandLists, and
zeCommandQueueSynchronize; copy device->host with another
immediate command list; free with zeMemFree using the same
C2C / inverse-R2H aliasing rules as the OpenCL / Metal paths.
- ReleaseBackend: zeCommandQueueDestroy + zeContextDestroy.
Validated on Linux x86_64 with the Level Zero loader v1.28.6 built
from oneapi-src/level-zero: backend=4 configures, compiles, and
links cleanly. Runtime device enumeration fails on the test host as
expected because no Intel L0 driver (intel-level-zero-gpu) is
installed; on a system with the Intel GPU runtime the same binary
will discover the device. Backends 1 (CUDA), 3 (OpenCL), and 5
(Metal) builds are unchanged.
Refactor the VkFFTBackend test suite so every logical test runs in both
single (float) and double precision, exposing per-precision regressions
and giving downstream consumers (ANTs, BRAINSTools) confidence that
either pixel type works end-to-end through the FFT path.
Approach
--------
Each .cxx test source is templated on a PrecisionType parameter and its
former main() body is hoisted into runFooTest<PrecisionType>(). The
test driver entry function inspects argv[1] for "float" or "double" and
dispatches accordingly:
template <typename PrecisionType>
int runFooTest(...) { /* body, using PrecisionType */ }
int itkVkFooTest(int argc, char * argv[]) {
const std::string precision{ (argc > 1) ? argv[1] : "float" };
if (precision == "double") return runFooTest<double>(...);
if (precision == "float") return runFooTest<float>(...);
std::cerr << "Unknown precision '" << precision << "'.\n";
return EXIT_FAILURE;
}
13 test sources are converted in this pattern: the two factory tests,
GlobalConfigurationTest, the three round-trip tests
(ForwardInverseFFT, ForwardInverse1DFFT, HalfHermitianFFT), the two
ComplexToComplex tests, the three baseline-comparison tests
(ComplexToComplex1DFFT/Forward1DFFT/Inverse1DFFT BaselineTest), the
MultiResolutionPyramid test, and DiscreteGaussianImageFilterTest.
CMake registration
------------------
test/CMakeLists.txt is rewritten to register paired
itk_add_test(NAME ...Float) and itk_add_test(NAME ...Double) entries
that pass the precision string as the first test argument. A helper
_vkfft_disable_on_unsupported_fp64(<test_name>)
marks each Double variant as DISABLED on Apple-arm64, where the GPU
has no FP64 hardware and VkFFT's shader compile would fail with
VKFFT_ERROR_FAILED_TO_COMPILE_PROGRAM (4031). Apple-arm64 detection
uses APPLE AND CMAKE_SYSTEM_PROCESSOR=arm64. Linux/x86_64 and Windows
CI continue to exercise the Double variants in full.
Factory-only tests (FFTImageFilterFactoryTest,
MultiResolutionPyramidImageFilterFactoryTest, GlobalConfigurationTest)
do not launch a GPU kernel and therefore run unguarded on every
platform, exercising the C++ template instantiation surface for both
precisions even where FP64 GPU execution is unavailable.
Per-precision baselines
-----------------------
Where --compare references a baseline image, the Float and Double
ctest entries write to precision-suffixed output paths so the same
filename does not collide:
*TestOutputFloat.mha
*TestOutputDouble.mha
For Float variants whose baseline does not yet exist in ITKData
(ComplexToComplexFFTImageFilterTestFloat and
ComplexToComplex1DFFTImageFilterSizesTestFloat), --compare is
intentionally omitted with a TODO so the test still exercises the
FFT pipeline via internal assertions. Once float baselines are
uploaded to ITKData the --compare DATA{...Float.mha} lines can be
re-added.
Local validation (Apple Silicon, macOS 26.5, M-series GPU)
----------------------------------------------------------
Both backends pass identically on this hardware:
VKFFT_BACKEND=5 (Metal) — 19/19 enabled tests pass, 11 disabled
VKFFT_BACKEND=3 (OpenCL) — 19/19 enabled tests pass, 11 disabled
The matching pass/skip signature across the two backends is strong
evidence the wrapper layer is forwarding correctly to VkFFT's
per-backend kernels.
clCreateContext was called with properties=NULL. With multiple ICDs visible to the loader (e.g. NVIDIA CUDA's OpenCL + Intel CPU OpenCL on Linux, or Apple's two platforms on macOS) the loader binds the returned context to an implementation-defined platform, which may not match the platform that m_VkGPU.device was obtained from. The handle compiles and clCreateContext itself returns CL_SUCCESS, but the next clCreateBuffer dispatches through the wrong ICD and fails with CL_INVALID_CONTEXT (-34), which VkFFT surfaces as VKFFT_ERROR_FAILED_TO_INITIALIZE (4001). Pass an explicit CL_CONTEXT_PLATFORM property so the loader binds the context to the platform we already selected. Single-ICD hosts are unaffected. Tested on Linux x86_64 with NVIDIA's OpenCL ICD + Intel CPU OpenCL ICD both visible to libOpenCL.so.1 (typical CUDA-toolkit install): 30 / 30 VkFFTBackend ctests now pass on backend=3, up from 14 / 30. CUDA (1) and Level Zero (4) builds are unchanged.
… is present The Level Zero loader (libze_loader) can be installed via apt/dnf without any underlying GPU runtime (Intel iGPU/dGPU). On hosts with no Intel L0 driver — CPU-only servers, NVIDIA-only workstations, virtualised environments — every FFT-touching test fails at zeInit with ZE_RESULT_ERROR_UNINITIALIZED (0x78000001), producing 22 spurious red checks per CDash submission. Add a configure-time try_run probe that calls zeInit + zeDriverGet against the loader we already located. The probe runs only when BUILD_TESTING is ON. If zero drivers come back, export VkFFTBackend_LEVEL_ZERO_RUNTIME_AVAILABLE=FALSE; the test/CMakeLists.txt guard then marks every FFT-exercising ctest entry DISABLED. The lightweight KWStyle / factory / global-configuration / instantiation tests are kept enabled because they execute no GPU code. Hosts with a real Intel L0 driver are unaffected: the probe succeeds, the flag is TRUE, and every test runs as before. Backends 1/2/3/5 are also unaffected — the guard only fires for VKFFT_BACKEND==4. Tested on cortex (Ice Lake Xeon + 2x NVIDIA RTX 6000 Ada, no Intel iGPU): backend=4 now reports 8 / 8 passed (KWStyle + factory + configuration + instantiation) with 22 disabled, instead of 8 / 30 with 22 failed.
This was referenced May 20, 2026
hjmjohnson
added a commit
that referenced
this pull request
May 20, 2026
….07.22 Update the build-test-package workflow so its pinned versions match the underlying ITK 5.4.6 release that PR #73 targets: - itk-git-tag, itk-wheel-tag v5.3.0 -> v5.4.6 - opencl-icd-loader-git-tag v2021.04.29 -> v2025.07.22 - opencl-headers-git-tag v2021.04.29 -> v2025.07.22 - macos runner macos-14 -> macos-15 - actions/download-artifact v2 -> v4 (required for upload-artifact@v4) - lukka/get-cmake v3.22.2 -> @latest Restrict Python wheel matrices to the supported interpreter set: - Linux: ["37","38","39","310","311"] -> ["310","311"] - Windows: ["9","10","11"] -> ["10","11"] Drops Python 3.7-3.9 wheel production; aligns with the project's pyproject.toml requires-python >= 3.10 (3.9 is kept only as the C++ build job's pip-for-ninja interpreter, which is incidental). itk-python-package-tag is left at the existing pinned commit; bumping it is a separate change that requires verifying the wheel-build scripts against ITK 5.4.6 first. The "Install pocl" step is intentionally unchanged — the conda-forge pocl path was working previously and is preferred for the C++ OpenCL test job.
hjmjohnson
added a commit
that referenced
this pull request
May 20, 2026
….07.22 Update the build-test-package workflow so its pinned versions match the underlying ITK 5.4.6 release that PR #73 targets: - itk-git-tag, itk-wheel-tag v5.3.0 -> v5.4.6 - opencl-icd-loader-git-tag v2021.04.29 -> v2025.07.22 - opencl-headers-git-tag v2021.04.29 -> v2025.07.22 - macos runner macos-14 -> macos-15 - actions/download-artifact v2 -> v4 (required for upload-artifact@v4) - lukka/get-cmake v3.22.2 -> @latest Restrict Python wheel matrices to the supported interpreter set: - Linux: ["37","38","39","310","311"] -> ["310","311"] - Windows: ["9","10","11"] -> ["10","11"] Drops Python 3.7-3.9 wheel production; aligns with the project's pyproject.toml requires-python >= 3.10 (3.9 is kept only as the C++ build job's pip-for-ninja interpreter, which is incidental). itk-python-package-tag is left at the existing pinned commit; bumping it is a separate change that requires verifying the wheel-build scripts against ITK 5.4.6 first. The "Install pocl" step is intentionally unchanged — the conda-forge pocl path was working previously and is preferred for the C++ OpenCL test job.
hjmjohnson
added a commit
that referenced
this pull request
May 20, 2026
….07.22 Update the build-test-package workflow so its pinned versions match the underlying ITK 5.4.6 release that PR #73 targets: - itk-git-tag, itk-wheel-tag v5.3.0 -> v5.4.6 - opencl-icd-loader-git-tag v2021.04.29 -> v2025.07.22 - opencl-headers-git-tag v2021.04.29 -> v2025.07.22 - macos runner macos-14 -> macos-15 - actions/download-artifact v2 -> v4 (required for upload-artifact@v4) - lukka/get-cmake v3.22.2 -> @latest Restrict Python wheel matrices to the supported interpreter set: - Linux: ["37","38","39","310","311"] -> ["310","311"] - Windows: ["9","10","11"] -> ["10","11"] Drops Python 3.7-3.9 wheel production; aligns with the project's pyproject.toml requires-python >= 3.10 (3.9 is kept only as the C++ build job's pip-for-ninja interpreter, which is incidental). itk-python-package-tag is left at the existing pinned commit; bumping it is a separate change that requires verifying the wheel-build scripts against ITK 5.4.6 first. The "Install pocl" step is intentionally unchanged — the conda-forge pocl path was working previously and is preferred for the C++ OpenCL test job.
hjmjohnson
added a commit
that referenced
this pull request
May 20, 2026
….07.22 Update the build-test-package workflow so its pinned versions match the underlying ITK 5.4.6 release that PR #73 targets: - itk-git-tag, itk-wheel-tag v5.3.0 -> v5.4.6 - opencl-icd-loader-git-tag v2021.04.29 -> v2025.07.22 - opencl-headers-git-tag v2021.04.29 -> v2025.07.22 - macos runner macos-14 -> macos-15 - actions/download-artifact v2 -> v4 (required for upload-artifact@v4) - lukka/get-cmake v3.22.2 -> @latest Restrict Python wheel matrices to the supported interpreter set: - Linux: ["37","38","39","310","311"] -> ["310","311"] - Windows: ["9","10","11"] -> ["10","11"] Drops Python 3.7-3.9 wheel production; aligns with the project's pyproject.toml requires-python >= 3.10 (3.9 is kept only as the C++ build job's pip-for-ninja interpreter, which is incidental). itk-python-package-tag is left at the existing pinned commit; bumping it is a separate change that requires verifying the wheel-build scripts against ITK 5.4.6 first. The "Install pocl" step is intentionally unchanged — the conda-forge pocl path was working previously and is preferred for the C++ OpenCL test job.
hjmjohnson
added a commit
that referenced
this pull request
May 20, 2026
….07.22 Update the build-test-package workflow so its pinned versions match the underlying ITK 5.4.6 release that PR #73 targets: - itk-git-tag, itk-wheel-tag v5.3.0 -> v5.4.6 - opencl-icd-loader-git-tag v2021.04.29 -> v2025.07.22 - opencl-headers-git-tag v2021.04.29 -> v2025.07.22 - macos runner macos-14 -> macos-15 - actions/download-artifact v2 -> v4 (required for upload-artifact@v4) - lukka/get-cmake v3.22.2 -> @latest Restrict Python wheel matrices to the supported interpreter set: - Linux: ["37","38","39","310","311"] -> ["310","311"] - Windows: ["9","10","11"] -> ["10","11"] Drops Python 3.7-3.9 wheel production; aligns with the project's pyproject.toml requires-python >= 3.10 (3.9 is kept only as the C++ build job's pip-for-ninja interpreter, which is incidental). itk-python-package-tag is left at the existing pinned commit; bumping it is a separate change that requires verifying the wheel-build scripts against ITK 5.4.6 first. The "Install pocl" step is intentionally unchanged — the conda-forge pocl path was working previously and is preferred for the C++ OpenCL test job.
hjmjohnson
added a commit
that referenced
this pull request
May 20, 2026
….07.22 Update the build-test-package workflow so its pinned versions match the underlying ITK 5.4.6 release that PR #73 targets: - itk-git-tag, itk-wheel-tag v5.3.0 -> v5.4.6 - opencl-icd-loader-git-tag v2021.04.29 -> v2025.07.22 - opencl-headers-git-tag v2021.04.29 -> v2025.07.22 - macos runner macos-14 -> macos-15 - actions/download-artifact v2 -> v4 (required for upload-artifact@v4) - lukka/get-cmake v3.22.2 -> @latest Restrict Python wheel matrices to the supported interpreter set: - Linux: ["37","38","39","310","311"] -> ["310","311"] - Windows: ["9","10","11"] -> ["10","11"] Drops Python 3.7-3.9 wheel production; aligns with the project's pyproject.toml requires-python >= 3.10 (3.9 is kept only as the C++ build job's pip-for-ninja interpreter, which is incidental). itk-python-package-tag is left at the existing pinned commit; bumping it is a separate change that requires verifying the wheel-build scripts against ITK 5.4.6 first. The "Install pocl" step is intentionally unchanged — the conda-forge pocl path was working previously and is preferred for the C++ OpenCL test job.
hjmjohnson
added a commit
that referenced
this pull request
May 20, 2026
….07.22 Update the build-test-package workflow so its pinned versions match the underlying ITK 5.4.6 release that PR #73 targets: - itk-git-tag, itk-wheel-tag v5.3.0 -> v5.4.6 - opencl-icd-loader-git-tag v2021.04.29 -> v2025.07.22 - opencl-headers-git-tag v2021.04.29 -> v2025.07.22 - macos runner macos-14 -> macos-15 - actions/download-artifact v2 -> v4 (required for upload-artifact@v4) - lukka/get-cmake v3.22.2 -> @latest Restrict Python wheel matrices to the supported interpreter set: - Linux: ["37","38","39","310","311"] -> ["310","311"] - Windows: ["9","10","11"] -> ["10","11"] Drops Python 3.7-3.9 wheel production; aligns with the project's pyproject.toml requires-python >= 3.10 (3.9 is kept only as the C++ build job's pip-for-ninja interpreter, which is incidental). itk-python-package-tag is left at the existing pinned commit; bumping it is a separate change that requires verifying the wheel-build scripts against ITK 5.4.6 first. The "Install pocl" step is intentionally unchanged — the conda-forge pocl path was working previously and is preferred for the C++ OpenCL test job.
hjmjohnson
added a commit
that referenced
this pull request
May 21, 2026
….07.22 Update the build-test-package workflow so its pinned versions match the underlying ITK 5.4.6 release that PR #73 targets: - itk-git-tag, itk-wheel-tag v5.3.0 -> v5.4.6 - opencl-icd-loader-git-tag v2021.04.29 -> v2025.07.22 - opencl-headers-git-tag v2021.04.29 -> v2025.07.22 - macos runner macos-14 -> macos-15 - actions/download-artifact v2 -> v4 (required for upload-artifact@v4) - lukka/get-cmake v3.22.2 -> @latest Restrict Python wheel matrices to the supported interpreter set: - Linux: ["37","38","39","310","311"] -> ["310","311"] - Windows: ["9","10","11"] -> ["10","11"] Drops Python 3.7-3.9 wheel production; aligns with the project's pyproject.toml requires-python >= 3.10 (3.9 is kept only as the C++ build job's pip-for-ninja interpreter, which is incidental). itk-python-package-tag is left at the existing pinned commit; bumping it is a separate change that requires verifying the wheel-build scripts against ITK 5.4.6 first. The "Install pocl" step is intentionally unchanged — the conda-forge pocl path was working previously and is preferred for the C++ OpenCL test job.
hjmjohnson
added a commit
that referenced
this pull request
May 21, 2026
….07.22 Update the build-test-package workflow so its pinned versions match the underlying ITK 5.4.6 release that PR #73 targets: - itk-git-tag, itk-wheel-tag v5.3.0 -> v5.4.6 - opencl-icd-loader-git-tag v2021.04.29 -> v2025.07.22 - opencl-headers-git-tag v2021.04.29 -> v2025.07.22 - macos runner macos-14 -> macos-15 - actions/download-artifact v2 -> v4 (required for upload-artifact@v4) - lukka/get-cmake v3.22.2 -> @latest Restrict Python wheel matrices to the supported interpreter set: - Linux: ["37","38","39","310","311"] -> ["310","311"] - Windows: ["9","10","11"] -> ["10","11"] Drops Python 3.7-3.9 wheel production; aligns with the project's pyproject.toml requires-python >= 3.10 (3.9 is kept only as the C++ build job's pip-for-ninja interpreter, which is incidental). itk-python-package-tag is left at the existing pinned commit; bumping it is a separate change that requires verifying the wheel-build scripts against ITK 5.4.6 first. The "Install pocl" step is intentionally unchanged — the conda-forge pocl path was working previously and is preferred for the C++ OpenCL test job.
hjmjohnson
added a commit
that referenced
this pull request
May 21, 2026
….07.22 Update the build-test-package workflow so its pinned versions match the underlying ITK 5.4.6 release that PR #73 targets: - itk-git-tag, itk-wheel-tag v5.3.0 -> v5.4.6 - opencl-icd-loader-git-tag v2021.04.29 -> v2025.07.22 - opencl-headers-git-tag v2021.04.29 -> v2025.07.22 - macos runner macos-14 -> macos-15 - actions/download-artifact v2 -> v4 (required for upload-artifact@v4) - lukka/get-cmake v3.22.2 -> @latest Restrict Python wheel matrices to the supported interpreter set: - Linux: ["37","38","39","310","311"] -> ["310","311"] - Windows: ["9","10","11"] -> ["10","11"] Drops Python 3.7-3.9 wheel production; aligns with the project's pyproject.toml requires-python >= 3.10 (3.9 is kept only as the C++ build job's pip-for-ninja interpreter, which is incidental). itk-python-package-tag is left at the existing pinned commit; bumping it is a separate change that requires verifying the wheel-build scripts against ITK 5.4.6 first. The "Install pocl" step is intentionally unchanged — the conda-forge pocl path was working previously and is preferred for the C++ OpenCL test job.
hjmjohnson
added a commit
that referenced
this pull request
May 21, 2026
….07.22 Update the build-test-package workflow so its pinned versions match the underlying ITK 5.4.6 release that PR #73 targets: - itk-git-tag, itk-wheel-tag v5.3.0 -> v5.4.6 - opencl-icd-loader-git-tag v2021.04.29 -> v2025.07.22 - opencl-headers-git-tag v2021.04.29 -> v2025.07.22 - macos runner macos-14 -> macos-15 - actions/download-artifact v2 -> v4 (required for upload-artifact@v4) - lukka/get-cmake v3.22.2 -> @latest Restrict Python wheel matrices to the supported interpreter set: - Linux: ["37","38","39","310","311"] -> ["310","311"] - Windows: ["9","10","11"] -> ["10","11"] Drops Python 3.7-3.9 wheel production; aligns with the project's pyproject.toml requires-python >= 3.10 (3.9 is kept only as the C++ build job's pip-for-ninja interpreter, which is incidental). itk-python-package-tag is left at the existing pinned commit; bumping it is a separate change that requires verifying the wheel-build scripts against ITK 5.4.6 first. The "Install pocl" step is intentionally unchanged — the conda-forge pocl path was working previously and is preferred for the C++ OpenCL test job.
hjmjohnson
added a commit
that referenced
this pull request
May 21, 2026
….07.22 Update the build-test-package workflow so its pinned versions match the underlying ITK 5.4.6 release that PR #73 targets: - itk-git-tag, itk-wheel-tag v5.3.0 -> v5.4.6 - opencl-icd-loader-git-tag v2021.04.29 -> v2025.07.22 - opencl-headers-git-tag v2021.04.29 -> v2025.07.22 - macos runner macos-14 -> macos-15 - actions/download-artifact v2 -> v4 (required for upload-artifact@v4) - lukka/get-cmake v3.22.2 -> @latest Restrict Python wheel matrices to the supported interpreter set: - Linux: ["37","38","39","310","311"] -> ["310","311"] - Windows: ["9","10","11"] -> ["10","11"] Drops Python 3.7-3.9 wheel production; aligns with the project's pyproject.toml requires-python >= 3.10 (3.9 is kept only as the C++ build job's pip-for-ninja interpreter, which is incidental). itk-python-package-tag is left at the existing pinned commit; bumping it is a separate change that requires verifying the wheel-build scripts against ITK 5.4.6 first. The "Install pocl" step is intentionally unchanged — the conda-forge pocl path was working previously and is preferred for the C++ OpenCL test job.
hjmjohnson
added a commit
that referenced
this pull request
May 21, 2026
….07.22 Update the build-test-package workflow so its pinned versions match the underlying ITK 5.4.6 release that PR #73 targets: - itk-git-tag, itk-wheel-tag v5.3.0 -> v5.4.6 - opencl-icd-loader-git-tag v2021.04.29 -> v2025.07.22 - opencl-headers-git-tag v2021.04.29 -> v2025.07.22 - macos runner macos-14 -> macos-15 - actions/download-artifact v2 -> v4 (required for upload-artifact@v4) - lukka/get-cmake v3.22.2 -> @latest Restrict Python wheel matrices to the supported interpreter set: - Linux: ["37","38","39","310","311"] -> ["310","311"] - Windows: ["9","10","11"] -> ["10","11"] Drops Python 3.7-3.9 wheel production; aligns with the project's pyproject.toml requires-python >= 3.10 (3.9 is kept only as the C++ build job's pip-for-ninja interpreter, which is incidental). itk-python-package-tag is left at the existing pinned commit; bumping it is a separate change that requires verifying the wheel-build scripts against ITK 5.4.6 first. The "Install pocl" step is intentionally unchanged — the conda-forge pocl path was working previously and is preferred for the C++ OpenCL test job.
hjmjohnson
added a commit
that referenced
this pull request
May 21, 2026
….07.22 Update the build-test-package workflow so its pinned versions match the underlying ITK 5.4.6 release that PR #73 targets: - itk-git-tag, itk-wheel-tag v5.3.0 -> v5.4.6 - opencl-icd-loader-git-tag v2021.04.29 -> v2025.07.22 - opencl-headers-git-tag v2021.04.29 -> v2025.07.22 - macos runner macos-14 -> macos-15 - actions/download-artifact v2 -> v4 (required for upload-artifact@v4) - lukka/get-cmake v3.22.2 -> @latest Restrict Python wheel matrices to the supported interpreter set: - Linux: ["37","38","39","310","311"] -> ["310","311"] - Windows: ["9","10","11"] -> ["10","11"] Drops Python 3.7-3.9 wheel production; aligns with the project's pyproject.toml requires-python >= 3.10 (3.9 is kept only as the C++ build job's pip-for-ninja interpreter, which is incidental). itk-python-package-tag is left at the existing pinned commit; bumping it is a separate change that requires verifying the wheel-build scripts against ITK 5.4.6 first. The "Install pocl" step is intentionally unchanged — the conda-forge pocl path was working previously and is preferred for the C++ OpenCL test job.
hjmjohnson
added a commit
that referenced
this pull request
May 21, 2026
….07.22 Update the build-test-package workflow so its pinned versions match the underlying ITK 5.4.6 release that PR #73 targets: - itk-git-tag, itk-wheel-tag v5.3.0 -> v5.4.6 - opencl-icd-loader-git-tag v2021.04.29 -> v2025.07.22 - opencl-headers-git-tag v2021.04.29 -> v2025.07.22 - macos runner macos-14 -> macos-15 - actions/download-artifact v2 -> v4 (required for upload-artifact@v4) - lukka/get-cmake v3.22.2 -> @latest Restrict Python wheel matrices to the supported interpreter set: - Linux: ["37","38","39","310","311"] -> ["310","311"] - Windows: ["9","10","11"] -> ["10","11"] Drops Python 3.7-3.9 wheel production; aligns with the project's pyproject.toml requires-python >= 3.10 (3.9 is kept only as the C++ build job's pip-for-ninja interpreter, which is incidental). itk-python-package-tag is left at the existing pinned commit; bumping it is a separate change that requires verifying the wheel-build scripts against ITK 5.4.6 first. The "Install pocl" step is intentionally unchanged — the conda-forge pocl path was working previously and is preferred for the C++ OpenCL test job.
hjmjohnson
added a commit
that referenced
this pull request
May 21, 2026
….07.22 Update the build-test-package workflow so its pinned versions match the underlying ITK 5.4.6 release that PR #73 targets: - itk-git-tag, itk-wheel-tag v5.3.0 -> v5.4.6 - opencl-icd-loader-git-tag v2021.04.29 -> v2025.07.22 - opencl-headers-git-tag v2021.04.29 -> v2025.07.22 - macos runner macos-14 -> macos-15 - actions/download-artifact v2 -> v4 (required for upload-artifact@v4) - lukka/get-cmake v3.22.2 -> @latest Restrict Python wheel matrices to the supported interpreter set: - Linux: ["37","38","39","310","311"] -> ["310","311"] - Windows: ["9","10","11"] -> ["10","11"] Drops Python 3.7-3.9 wheel production; aligns with the project's pyproject.toml requires-python >= 3.10 (3.9 is kept only as the C++ build job's pip-for-ninja interpreter, which is incidental). itk-python-package-tag is left at the existing pinned commit; bumping it is a separate change that requires verifying the wheel-build scripts against ITK 5.4.6 first. The "Install pocl" step is intentionally unchanged — the conda-forge pocl path was working previously and is preferred for the C++ OpenCL test job.
Member
Author
|
PR #74 (stacked on this branch) modernizes the CI environment and makes the hosted runners exercise real FFTs wherever the hardware/runtime allows — previously hosted CI only compiled and ran the lint smoke tests. Environmental tool updates
Run tests where possible (functional FFT coverage on hosted CI)
Full-size and baseline FFT correctness continue to run on the self-hosted GPU runner. Several genuine module fixes fell out of this (Level Zero loader/header discovery, the |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Add Level Zero (
VKFFT_BACKEND=4) and Metal (VKFFT_BACKEND=5) backends, fix CUDA 13 build, fix a multi-ICD OpenCLCL_INVALID_CONTEXT, and auto-disable Level Zero tests on hosts with no Intel L0 driver. Default OpenCL behavior is unchanged.Validated on real hardware
macOS: Apple M-series + Intel CPU OpenCL.
Linux: Ubuntu 24.04 + CUDA 13.2 + NVIDIA driver 595.71.05 + pixi conda toolchain (
~/src/ITK/.pixi/envs/cxx).Per-commit summary
ce40cb8ENH: Add VKFFT_BACKEND=5 (Metal) support for Apple platforms-- VkFFT v1.3.4, metal-cpp via FetchContent,BUILD_VKFFT=ONdefault.1ae617eCOMP: Fix VKFFT_BACKEND=1 (CUDA) build against CUDA 13 toolkit--cuCtxCreategained a 4thCUctxCreateParams*argument in CUDA 13; wrap call with#if CUDA_VERSION >= 13000. Also addfind_package(CUDAToolkit)soCUDAToolkit_INCLUDE_DIRSis propagated to the consumer (the CUDA branch was previously a bare# pass).04e88aaENH: Add VKFFT_BACKEND=4 (Level Zero) support--find_path/find_libraryforze_loader+ headers; extendsVkGPUwithdriver/device/context/commandQueue/commandQueueID; fullConfigureBackend/PerformFFT/ReleaseBackendimplementation modeled on VkFFT's ownutils_VkFFT.cppLevel Zero path.5172445ENH: Cover every test in both float and double precision-- parameterises every FFT test by precision (doubles ctest count from ~14 to ~30); adds_vkfft_disable_on_unsupported_fp64()for Apple-arm64 Metal.0ea4440BUG: Fix CL_INVALID_CONTEXT when multiple OpenCL ICDs are loaded--clCreateContext(NULL, ...)binds to an implementation-defined platform when multiple ICDs are visible; pass an explicitCL_CONTEXT_PLATFORMproperty. Restores OpenCL 30 / 30 on hosts where both NVIDIA's CUDA-OpenCL and Intel CPU OpenCL ICDs are installed.30b8bbeENH: Auto-disable VKFFT_BACKEND=4 FFT tests when no Level Zero driver is present-- configure-timetry_runprobe (zeInit+zeDriverGet); when zero drivers are discovered, mark FFT-touching testsDISABLED TRUEinstead of letting them fail. Keeps KWStyle / factory / global-config tests enabled.Local-build invocation notes
All three Linux builds were configured inside the same pixi conda env that ITK was built with (
pixi run --environment cxx ...from~/src/ITK). External-build consumers on the same host must either use the same toolchain or have their own ITK build that matches their compiler -- the conda sysroot'slibm.solinker script references/lib64/...which is absent on Ubuntu (/lib/x86_64-linux-gnu/...).All three builds install their
.soto the same path in the shared ITK build tree (build/lib/libitkVkFFTBackend-6.0.so.1); after switching backend, rebuild the chosen backend'sVkFFTBackend+VkFFTBackendTestDrivertargets immediately beforectest, otherwise the runtime.somay belong to a different backend.CUDA needs
-DCUDA_USE_STATIC_CUDA_RUNTIME=OFFand a linker-Lto the CUDA driver stubs directory (<cuda-root>/targets/x86_64-linux/lib/stubs) so the conda toolchain can resolve-lcudaagainst the driver stub.OpenCL needs
OpenCL_INCLUDE_DIRpointed at a Khronos headers checkout (e.g./tmp/OpenCL-Headers) andOpenCL_LIBRARYat the CUDA-shippedlibOpenCL.so.1ICD loader.Level Zero needs
LevelZero_INCLUDE_DIR=$HOME/local/includeandLevelZero_LIBRARY=$HOME/local/lib/libze_loader.soplus a-I$HOME/local/include/level_zeroflag so VkFFT's unqualified#include <ze_api.h>resolves.