Skip to content

ENH: Add VKFFT_BACKEND=4 (Level Zero) and =5 (Metal) backends; CUDA 13 + multi-ICD OpenCL fixes#73

Open
hjmjohnson wants to merge 6 commits into
InsightSoftwareConsortium:mainfrom
hjmjohnson:enh/vkfft-backend-5-metal
Open

ENH: Add VKFFT_BACKEND=4 (Level Zero) and =5 (Metal) backends; CUDA 13 + multi-ICD OpenCL fixes#73
hjmjohnson wants to merge 6 commits into
InsightSoftwareConsortium:mainfrom
hjmjohnson:enh/vkfft-backend-5-metal

Conversation

@hjmjohnson
Copy link
Copy Markdown
Member

@hjmjohnson hjmjohnson commented May 20, 2026

Add Level Zero (VKFFT_BACKEND=4) and Metal (VKFFT_BACKEND=5) backends, fix CUDA 13 build, fix a multi-ICD OpenCL CL_INVALID_CONTEXT, and auto-disable Level Zero tests on hosts with no Intel L0 driver. Default OpenCL behavior is unchanged.

Validated on real hardware

+--------+----------------------------+----------------------+---------+-----+--------+--------+----------+
|  Host  |          Backend           |      Build dir       | Defined | Run | Passed | Failed | Disabled |
+--------+----------------------------+----------------------+---------+-----+--------+--------+----------+
| macOS  | VKFFT_BACKEND=5 Metal      | cmake-build-metal/   |      30 |  19 |     19 |      0 |   11 [a] |
| macOS  | VKFFT_BACKEND=3 OpenCL     | cmake-build-release/ |      30 |  19 |     19 |      0 |   11 [a] |
+--------+----------------------------+----------------------+---------+-----+--------+--------+----------+
| Linux  | VKFFT_BACKEND=1 CUDA       | build-cuda/          |      30 |  30 |     30 |      0 |        0 |
| Linux  | VKFFT_BACKEND=3 OpenCL     | build-cl/            |      30 |  30 |     30 |      0 |        0 |
| Linux  | VKFFT_BACKEND=4 Level Zero | build-lz/            |      30 |   8 |      8 |      0 |   22 [b] |
+--------+----------------------------+----------------------+---------+-----+--------+--------+----------+

[a] Apple-arm64 Metal Shading Language has no FP64 -- 11 double-precision tests
    skipped via _vkfft_disable_on_unsupported_fp64() in test/CMakeLists.txt.
[b] Test host (Ice Lake Xeon + 2x NVIDIA RTX 6000 Ada, no Intel iGPU) fails the
    configure-time zeInit/zeDriverGet probe -> 22 FFT-touching tests
    auto-DISABLED. Lightweight factory / KWStyle tests still run.

macOS: Apple M-series + Intel CPU OpenCL.
Linux: Ubuntu 24.04 + CUDA 13.2 + NVIDIA driver 595.71.05 + pixi conda toolchain (~/src/ITK/.pixi/envs/cxx).

Per-commit summary
Commit Title
ce40cb8 ENH: Add VKFFT_BACKEND=5 (Metal) support for Apple platforms -- VkFFT v1.3.4, metal-cpp via FetchContent, BUILD_VKFFT=ON default.
1ae617e COMP: Fix VKFFT_BACKEND=1 (CUDA) build against CUDA 13 toolkit -- cuCtxCreate gained a 4th CUctxCreateParams* argument in CUDA 13; wrap call with #if CUDA_VERSION >= 13000. Also add find_package(CUDAToolkit) so CUDAToolkit_INCLUDE_DIRS is propagated to the consumer (the CUDA branch was previously a bare # pass).
04e88aa ENH: Add VKFFT_BACKEND=4 (Level Zero) support -- find_path/find_library for ze_loader + headers; extends VkGPU with driver/device/context/commandQueue/commandQueueID; full ConfigureBackend/PerformFFT/ReleaseBackend implementation modeled on VkFFT's own utils_VkFFT.cpp Level Zero path.
5172445 ENH: Cover every test in both float and double precision -- parameterises every FFT test by precision (doubles ctest count from ~14 to ~30); adds _vkfft_disable_on_unsupported_fp64() for Apple-arm64 Metal.
0ea4440 BUG: Fix CL_INVALID_CONTEXT when multiple OpenCL ICDs are loaded -- clCreateContext(NULL, ...) binds to an implementation-defined platform when multiple ICDs are visible; pass an explicit CL_CONTEXT_PLATFORM property. Restores OpenCL 30 / 30 on hosts where both NVIDIA's CUDA-OpenCL and Intel CPU OpenCL ICDs are installed.
30b8bbe ENH: Auto-disable VKFFT_BACKEND=4 FFT tests when no Level Zero driver is present -- configure-time try_run probe (zeInit + zeDriverGet); when zero drivers are discovered, mark FFT-touching tests DISABLED TRUE instead of letting them fail. Keeps KWStyle / factory / global-config tests enabled.
Local-build invocation notes

All three Linux builds were configured inside the same pixi conda env that ITK was built with (pixi run --environment cxx ... from ~/src/ITK). External-build consumers on the same host must either use the same toolchain or have their own ITK build that matches their compiler -- the conda sysroot's libm.so linker script references /lib64/... which is absent on Ubuntu (/lib/x86_64-linux-gnu/...).

All three builds install their .so to the same path in the shared ITK build tree (build/lib/libitkVkFFTBackend-6.0.so.1); after switching backend, rebuild the chosen backend's VkFFTBackend + VkFFTBackendTestDriver targets immediately before ctest, otherwise the runtime .so may belong to a different backend.

CUDA needs -DCUDA_USE_STATIC_CUDA_RUNTIME=OFF and a linker -L to the CUDA driver stubs directory (<cuda-root>/targets/x86_64-linux/lib/stubs) so the conda toolchain can resolve -lcuda against the driver stub.

OpenCL needs OpenCL_INCLUDE_DIR pointed at a Khronos headers checkout (e.g. /tmp/OpenCL-Headers) and OpenCL_LIBRARY at the CUDA-shipped libOpenCL.so.1 ICD loader.

Level Zero needs LevelZero_INCLUDE_DIR=$HOME/local/include and LevelZero_LIBRARY=$HOME/local/lib/libze_loader.so plus a -I$HOME/local/include/level_zero flag so VkFFT's unqualified #include <ze_api.h> resolves.

Adds native Apple Metal backend support via VkFFT's VKFFT_BACKEND=5
code path, exposed through the standard VKFFT_BACKEND cache option.
On Apple platforms consumers (ANTs, BRAINSTools, Slicer) can now
target Metal directly instead of going through the deprecated
OpenCL-on-Metal translation layer.

CMakeLists.txt
  - Document and accept VKFFT_BACKEND=5 (Metal).
  - Bump VkFFT pin to v1.3.4 (first release whose Metal codegen
    compiles cleanly on macOS arm64).
  - Flip BUILD_VKFFT default to ON because v1.3+ split vkFFT.h into
    a multi-file tree under vkFFT/vkFFT_Structs/, vkFFT/vkFFT_CodeGen/
    that the single-header URL download cannot satisfy.
  - On VKFFT_BACKEND=5: require APPLE, bump CMAKE_CXX_STANDARD to 17
    (metal-cpp requirement), FetchContent metal-cpp from the
    bkaradzic mirror at metal-cpp_macOS15_iOS18, and link the
    Metal/Foundation/QuartzCore system frameworks.

itkVkDefinitions.h
  - Add METAL=5 alongside the existing VULKAN/CUDA/HIP/OPENCL aliases.

itkVkCommon.h
  - Pre-include metal-cpp's Foundation/Metal/QuartzCore headers under
    VKFFT_BACKEND==METAL so vkFFT.h's internal *_PRIVATE_IMPLEMENTATION
    re-include is a header-guard no-op in every translation unit that
    pulls in itkVkCommon.h. Without this, metal-cpp's selector storage
    (CA::Private::Selector::s_k* etc.) gets emitted in every TU and
    the link step fails with duplicate-symbol errors.
  - Extend VkGPU with MTL::Device* and MTL::CommandQueue* members
    and the matching operator!= branch.

itkVkCommon.cxx
  - Define NS_/MTL_/CA_PRIVATE_IMPLEMENTATION exactly once (this TU),
    before including the metal-cpp headers, so the storage symbols
    are emitted exactly once.
  - ConfigureBackend: enumerate Metal devices via
    MTL::CopyAllDevices(), pick by device_id, retain the device,
    and create a command queue.
  - Wire VkFFTConfiguration's Metal fields. Unlike CUDA/OpenCL which
    take pointer-to-pointer (void**, cl_device_id*), Metal expects
    single pointers (MTL::Device*, MTL::CommandQueue*), so the device
    assignment is moved inside the per-backend branch.
  - PerformFFT: allocate MTL::Buffer in MTL::ResourceStorageModeShared
    (Apple unified-memory shared storage), memcpy host->device into
    buffer->contents(), drive VkFFTAppend via a commandBuffer +
    computeCommandEncoder pair, waitUntilCompleted, memcpy back, and
    release the buffers.
  - ReleaseBackend: release the queue and device.

Tested on macOS 26.5 / Apple Silicon: VKFFT_BACKEND=5 builds cleanly,
otool -L confirms Metal/Foundation/QuartzCore linkage on the resulting
dylib, and 4 VkFFT FFT round-trip tests pass on Metal
(itkVkForwardInverseFFTImageFilterTest, itkVkForwardInverse1DFFTImage-
FilterTest, itkVkHalfHermitianFFTImageFilterTest, itkVkGlobalConfig-
urationTest). Double-precision tests still fail because Metal Shading
Language does not support FP64 on Apple Silicon (program_source:
incomplete type 'double2' reserved name); consumers must select
PrecisionEnum::FLOAT on Metal.

VKFFT_BACKEND=3 (OpenCL) build remains green.
Two pre-existing breakages prevented the CUDA backend from building
against CUDA >= 13.0:

- cuCtxCreate gained a fourth argument (CUctxCreateParams*) in CUDA
  13.0. The legacy three-arg call is the cuCtxCreate_v3 symbol; the
  unsuffixed cuCtxCreate macro now expands to cuCtxCreate_v4. Wrap the
  call so CUDA 13+ supplies nullptr for the new params slot while
  earlier toolkits keep the three-arg form.

- The VKFFT_BACKEND=1 CMake branch was a bare "# pass", so the CUDA
  Toolkit's include directory was never added to the module's include
  path and vkFFT.h's "#include <nvrtc.h>" failed unless the consumer
  manually exported CPATH. Pull in find_package(CUDAToolkit) and
  append CUDAToolkit_INCLUDE_DIRS to VkFFTBackend_SYSTEM_INCLUDE_DIRS.

Verified on Ubuntu 24.04 with CUDA 13.2 toolkit + driver 595.71.05:
backend=1 builds cleanly and all 15 VkFFTBackend ctests pass on the
local NVIDIA GPU.
Adds the oneAPI Level Zero (L0) backend via VkFFT's VKFFT_BACKEND=4
code path, exposed through the standard VKFFT_BACKEND cache option.
On platforms with an Intel L0 GPU runtime (Intel iGPUs and dGPUs on
Linux/Windows) consumers can now target Level Zero directly without
going through OpenCL or SYCL.

CMakeLists.txt / itk-module-init.cmake
  - Document and accept VKFFT_BACKEND=4 (Level Zero).
  - find_path(level_zero/ze_api.h) and find_library(ze_loader); fail
    with a clear FATAL_ERROR if either is missing. LEVEL_ZERO_ROOT
    and CMPLR_ROOT env vars are honored as HINTS so oneAPI installs
    work out of the box.
  - target_link_libraries(VkFFTBackend PUBLIC ${LevelZero_LIBRARY})
    and a SYSTEM include directive so downstream consumers do not
    need to re-resolve the loader.

itkVkDefinitions.h
  - Add LEVEL_ZERO=4 alongside the existing aliases.

itkVkCommon.h
  - Pre-include <level_zero/ze_api.h> under VKFFT_BACKEND==LEVEL_ZERO
    so that VkFFT's own #include <ze_api.h> resolves consistently.
  - Extend VkGPU with ze_driver_handle_t / ze_device_handle_t /
    ze_context_handle_t / ze_command_queue_handle_t / commandQueueID
    members and the matching operator!= branch.

itkVkCommon.cxx
  - ConfigureBackend: zeInit -> enumerate drivers and devices, pick
    by device_id, create a ze_context, locate a queue group that
    supports both COMPUTE and COPY flags, and create a default
    ze_command_queue. Wire VkFFTConfiguration's L0 fields (device /
    context / commandQueue as pointer-to-handle, commandQueueID by
    value).
  - PerformFFT: allocate device buffers with zeMemAllocDevice; copy
    host->device via an immediate command list
    (zeCommandListCreateImmediate + zeCommandListAppendMemoryCopy +
    zeCommandQueueSynchronize); launch via a regular command list,
    zeCommandListClose, zeCommandQueueExecuteCommandLists, and
    zeCommandQueueSynchronize; copy device->host with another
    immediate command list; free with zeMemFree using the same
    C2C / inverse-R2H aliasing rules as the OpenCL / Metal paths.
  - ReleaseBackend: zeCommandQueueDestroy + zeContextDestroy.

Validated on Linux x86_64 with the Level Zero loader v1.28.6 built
from oneapi-src/level-zero: backend=4 configures, compiles, and
links cleanly. Runtime device enumeration fails on the test host as
expected because no Intel L0 driver (intel-level-zero-gpu) is
installed; on a system with the Intel GPU runtime the same binary
will discover the device. Backends 1 (CUDA), 3 (OpenCL), and 5
(Metal) builds are unchanged.
Refactor the VkFFTBackend test suite so every logical test runs in both
single (float) and double precision, exposing per-precision regressions
and giving downstream consumers (ANTs, BRAINSTools) confidence that
either pixel type works end-to-end through the FFT path.

Approach
--------

Each .cxx test source is templated on a PrecisionType parameter and its
former main() body is hoisted into runFooTest<PrecisionType>(). The
test driver entry function inspects argv[1] for "float" or "double" and
dispatches accordingly:

    template <typename PrecisionType>
    int runFooTest(...) { /* body, using PrecisionType */ }

    int itkVkFooTest(int argc, char * argv[]) {
      const std::string precision{ (argc > 1) ? argv[1] : "float" };
      if (precision == "double") return runFooTest<double>(...);
      if (precision == "float")  return runFooTest<float>(...);
      std::cerr << "Unknown precision '" << precision << "'.\n";
      return EXIT_FAILURE;
    }

13 test sources are converted in this pattern: the two factory tests,
GlobalConfigurationTest, the three round-trip tests
(ForwardInverseFFT, ForwardInverse1DFFT, HalfHermitianFFT), the two
ComplexToComplex tests, the three baseline-comparison tests
(ComplexToComplex1DFFT/Forward1DFFT/Inverse1DFFT BaselineTest), the
MultiResolutionPyramid test, and DiscreteGaussianImageFilterTest.

CMake registration
------------------

test/CMakeLists.txt is rewritten to register paired
itk_add_test(NAME ...Float) and itk_add_test(NAME ...Double) entries
that pass the precision string as the first test argument. A helper

    _vkfft_disable_on_unsupported_fp64(<test_name>)

marks each Double variant as DISABLED on Apple-arm64, where the GPU
has no FP64 hardware and VkFFT's shader compile would fail with
VKFFT_ERROR_FAILED_TO_COMPILE_PROGRAM (4031). Apple-arm64 detection
uses APPLE AND CMAKE_SYSTEM_PROCESSOR=arm64. Linux/x86_64 and Windows
CI continue to exercise the Double variants in full.

Factory-only tests (FFTImageFilterFactoryTest,
MultiResolutionPyramidImageFilterFactoryTest, GlobalConfigurationTest)
do not launch a GPU kernel and therefore run unguarded on every
platform, exercising the C++ template instantiation surface for both
precisions even where FP64 GPU execution is unavailable.

Per-precision baselines
-----------------------

Where --compare references a baseline image, the Float and Double
ctest entries write to precision-suffixed output paths so the same
filename does not collide:
    *TestOutputFloat.mha
    *TestOutputDouble.mha
For Float variants whose baseline does not yet exist in ITKData
(ComplexToComplexFFTImageFilterTestFloat and
ComplexToComplex1DFFTImageFilterSizesTestFloat), --compare is
intentionally omitted with a TODO so the test still exercises the
FFT pipeline via internal assertions. Once float baselines are
uploaded to ITKData the --compare DATA{...Float.mha} lines can be
re-added.

Local validation (Apple Silicon, macOS 26.5, M-series GPU)
----------------------------------------------------------

Both backends pass identically on this hardware:

    VKFFT_BACKEND=5 (Metal)   — 19/19 enabled tests pass, 11 disabled
    VKFFT_BACKEND=3 (OpenCL)  — 19/19 enabled tests pass, 11 disabled

The matching pass/skip signature across the two backends is strong
evidence the wrapper layer is forwarding correctly to VkFFT's
per-backend kernels.
@hjmjohnson hjmjohnson changed the title ENH: Add VKFFT_BACKEND=4 (Level Zero) and =5 (Metal) backends; fix CUDA 13 ENH: Add VKFFT_BACKEND=5 (Metal), =4 (Level Zero), CUDA-13 fix, and float+double test coverage May 20, 2026
clCreateContext was called with properties=NULL. With multiple ICDs visible
to the loader (e.g. NVIDIA CUDA's OpenCL + Intel CPU OpenCL on Linux, or
Apple's two platforms on macOS) the loader binds the returned context to an
implementation-defined platform, which may not match the platform that
m_VkGPU.device was obtained from. The handle compiles and clCreateContext
itself returns CL_SUCCESS, but the next clCreateBuffer dispatches through
the wrong ICD and fails with CL_INVALID_CONTEXT (-34), which VkFFT surfaces
as VKFFT_ERROR_FAILED_TO_INITIALIZE (4001).

Pass an explicit CL_CONTEXT_PLATFORM property so the loader binds the
context to the platform we already selected. Single-ICD hosts are
unaffected.

Tested on Linux x86_64 with NVIDIA's OpenCL ICD + Intel CPU OpenCL ICD both
visible to libOpenCL.so.1 (typical CUDA-toolkit install): 30 / 30
VkFFTBackend ctests now pass on backend=3, up from 14 / 30. CUDA (1) and
Level Zero (4) builds are unchanged.
… is present

The Level Zero loader (libze_loader) can be installed via apt/dnf without
any underlying GPU runtime (Intel iGPU/dGPU). On hosts with no Intel L0
driver — CPU-only servers, NVIDIA-only workstations, virtualised
environments — every FFT-touching test fails at zeInit with
ZE_RESULT_ERROR_UNINITIALIZED (0x78000001), producing 22 spurious red
checks per CDash submission.

Add a configure-time try_run probe that calls zeInit + zeDriverGet against
the loader we already located. The probe runs only when BUILD_TESTING is
ON. If zero drivers come back, export
VkFFTBackend_LEVEL_ZERO_RUNTIME_AVAILABLE=FALSE; the test/CMakeLists.txt
guard then marks every FFT-exercising ctest entry DISABLED. The lightweight
KWStyle / factory / global-configuration / instantiation tests are kept
enabled because they execute no GPU code.

Hosts with a real Intel L0 driver are unaffected: the probe succeeds, the
flag is TRUE, and every test runs as before. Backends 1/2/3/5 are also
unaffected — the guard only fires for VKFFT_BACKEND==4.

Tested on cortex (Ice Lake Xeon + 2x NVIDIA RTX 6000 Ada, no Intel iGPU):
backend=4 now reports 8 / 8 passed (KWStyle + factory + configuration +
instantiation) with 22 disabled, instead of 8 / 30 with 22 failed.
@hjmjohnson hjmjohnson changed the title ENH: Add VKFFT_BACKEND=5 (Metal), =4 (Level Zero), CUDA-13 fix, and float+double test coverage ENH: Add VKFFT_BACKEND=4 (Level Zero) and =5 (Metal) backends; CUDA 13 + multi-ICD OpenCL fixes May 20, 2026
@hjmjohnson hjmjohnson requested a review from thewtex May 20, 2026 21:06
@hjmjohnson hjmjohnson marked this pull request as ready for review May 20, 2026 21:08
hjmjohnson added a commit that referenced this pull request May 20, 2026
….07.22

Update the build-test-package workflow so its pinned versions match the
underlying ITK 5.4.6 release that PR #73 targets:

- itk-git-tag, itk-wheel-tag        v5.3.0       -> v5.4.6
- opencl-icd-loader-git-tag         v2021.04.29  -> v2025.07.22
- opencl-headers-git-tag            v2021.04.29  -> v2025.07.22
- macos runner                      macos-14     -> macos-15
- actions/download-artifact          v2          -> v4 (required for
                                                    upload-artifact@v4)
- lukka/get-cmake                    v3.22.2     -> @latest

Restrict Python wheel matrices to the supported interpreter set:

- Linux:    ["37","38","39","310","311"]  -> ["310","311"]
- Windows:  ["9","10","11"]               -> ["10","11"]

Drops Python 3.7-3.9 wheel production; aligns with the project's
pyproject.toml requires-python >= 3.10 (3.9 is kept only as the C++
build job's pip-for-ninja interpreter, which is incidental).

itk-python-package-tag is left at the existing pinned commit; bumping
it is a separate change that requires verifying the wheel-build scripts
against ITK 5.4.6 first.

The "Install pocl" step is intentionally unchanged — the conda-forge
pocl path was working previously and is preferred for the C++ OpenCL
test job.
hjmjohnson added a commit that referenced this pull request May 20, 2026
….07.22

Update the build-test-package workflow so its pinned versions match the
underlying ITK 5.4.6 release that PR #73 targets:

- itk-git-tag, itk-wheel-tag        v5.3.0       -> v5.4.6
- opencl-icd-loader-git-tag         v2021.04.29  -> v2025.07.22
- opencl-headers-git-tag            v2021.04.29  -> v2025.07.22
- macos runner                      macos-14     -> macos-15
- actions/download-artifact          v2          -> v4 (required for
                                                    upload-artifact@v4)
- lukka/get-cmake                    v3.22.2     -> @latest

Restrict Python wheel matrices to the supported interpreter set:

- Linux:    ["37","38","39","310","311"]  -> ["310","311"]
- Windows:  ["9","10","11"]               -> ["10","11"]

Drops Python 3.7-3.9 wheel production; aligns with the project's
pyproject.toml requires-python >= 3.10 (3.9 is kept only as the C++
build job's pip-for-ninja interpreter, which is incidental).

itk-python-package-tag is left at the existing pinned commit; bumping
it is a separate change that requires verifying the wheel-build scripts
against ITK 5.4.6 first.

The "Install pocl" step is intentionally unchanged — the conda-forge
pocl path was working previously and is preferred for the C++ OpenCL
test job.
hjmjohnson added a commit that referenced this pull request May 20, 2026
….07.22

Update the build-test-package workflow so its pinned versions match the
underlying ITK 5.4.6 release that PR #73 targets:

- itk-git-tag, itk-wheel-tag        v5.3.0       -> v5.4.6
- opencl-icd-loader-git-tag         v2021.04.29  -> v2025.07.22
- opencl-headers-git-tag            v2021.04.29  -> v2025.07.22
- macos runner                      macos-14     -> macos-15
- actions/download-artifact          v2          -> v4 (required for
                                                    upload-artifact@v4)
- lukka/get-cmake                    v3.22.2     -> @latest

Restrict Python wheel matrices to the supported interpreter set:

- Linux:    ["37","38","39","310","311"]  -> ["310","311"]
- Windows:  ["9","10","11"]               -> ["10","11"]

Drops Python 3.7-3.9 wheel production; aligns with the project's
pyproject.toml requires-python >= 3.10 (3.9 is kept only as the C++
build job's pip-for-ninja interpreter, which is incidental).

itk-python-package-tag is left at the existing pinned commit; bumping
it is a separate change that requires verifying the wheel-build scripts
against ITK 5.4.6 first.

The "Install pocl" step is intentionally unchanged — the conda-forge
pocl path was working previously and is preferred for the C++ OpenCL
test job.
hjmjohnson added a commit that referenced this pull request May 20, 2026
….07.22

Update the build-test-package workflow so its pinned versions match the
underlying ITK 5.4.6 release that PR #73 targets:

- itk-git-tag, itk-wheel-tag        v5.3.0       -> v5.4.6
- opencl-icd-loader-git-tag         v2021.04.29  -> v2025.07.22
- opencl-headers-git-tag            v2021.04.29  -> v2025.07.22
- macos runner                      macos-14     -> macos-15
- actions/download-artifact          v2          -> v4 (required for
                                                    upload-artifact@v4)
- lukka/get-cmake                    v3.22.2     -> @latest

Restrict Python wheel matrices to the supported interpreter set:

- Linux:    ["37","38","39","310","311"]  -> ["310","311"]
- Windows:  ["9","10","11"]               -> ["10","11"]

Drops Python 3.7-3.9 wheel production; aligns with the project's
pyproject.toml requires-python >= 3.10 (3.9 is kept only as the C++
build job's pip-for-ninja interpreter, which is incidental).

itk-python-package-tag is left at the existing pinned commit; bumping
it is a separate change that requires verifying the wheel-build scripts
against ITK 5.4.6 first.

The "Install pocl" step is intentionally unchanged — the conda-forge
pocl path was working previously and is preferred for the C++ OpenCL
test job.
hjmjohnson added a commit that referenced this pull request May 20, 2026
….07.22

Update the build-test-package workflow so its pinned versions match the
underlying ITK 5.4.6 release that PR #73 targets:

- itk-git-tag, itk-wheel-tag        v5.3.0       -> v5.4.6
- opencl-icd-loader-git-tag         v2021.04.29  -> v2025.07.22
- opencl-headers-git-tag            v2021.04.29  -> v2025.07.22
- macos runner                      macos-14     -> macos-15
- actions/download-artifact          v2          -> v4 (required for
                                                    upload-artifact@v4)
- lukka/get-cmake                    v3.22.2     -> @latest

Restrict Python wheel matrices to the supported interpreter set:

- Linux:    ["37","38","39","310","311"]  -> ["310","311"]
- Windows:  ["9","10","11"]               -> ["10","11"]

Drops Python 3.7-3.9 wheel production; aligns with the project's
pyproject.toml requires-python >= 3.10 (3.9 is kept only as the C++
build job's pip-for-ninja interpreter, which is incidental).

itk-python-package-tag is left at the existing pinned commit; bumping
it is a separate change that requires verifying the wheel-build scripts
against ITK 5.4.6 first.

The "Install pocl" step is intentionally unchanged — the conda-forge
pocl path was working previously and is preferred for the C++ OpenCL
test job.
hjmjohnson added a commit that referenced this pull request May 20, 2026
….07.22

Update the build-test-package workflow so its pinned versions match the
underlying ITK 5.4.6 release that PR #73 targets:

- itk-git-tag, itk-wheel-tag        v5.3.0       -> v5.4.6
- opencl-icd-loader-git-tag         v2021.04.29  -> v2025.07.22
- opencl-headers-git-tag            v2021.04.29  -> v2025.07.22
- macos runner                      macos-14     -> macos-15
- actions/download-artifact          v2          -> v4 (required for
                                                    upload-artifact@v4)
- lukka/get-cmake                    v3.22.2     -> @latest

Restrict Python wheel matrices to the supported interpreter set:

- Linux:    ["37","38","39","310","311"]  -> ["310","311"]
- Windows:  ["9","10","11"]               -> ["10","11"]

Drops Python 3.7-3.9 wheel production; aligns with the project's
pyproject.toml requires-python >= 3.10 (3.9 is kept only as the C++
build job's pip-for-ninja interpreter, which is incidental).

itk-python-package-tag is left at the existing pinned commit; bumping
it is a separate change that requires verifying the wheel-build scripts
against ITK 5.4.6 first.

The "Install pocl" step is intentionally unchanged — the conda-forge
pocl path was working previously and is preferred for the C++ OpenCL
test job.
hjmjohnson added a commit that referenced this pull request May 20, 2026
….07.22

Update the build-test-package workflow so its pinned versions match the
underlying ITK 5.4.6 release that PR #73 targets:

- itk-git-tag, itk-wheel-tag        v5.3.0       -> v5.4.6
- opencl-icd-loader-git-tag         v2021.04.29  -> v2025.07.22
- opencl-headers-git-tag            v2021.04.29  -> v2025.07.22
- macos runner                      macos-14     -> macos-15
- actions/download-artifact          v2          -> v4 (required for
                                                    upload-artifact@v4)
- lukka/get-cmake                    v3.22.2     -> @latest

Restrict Python wheel matrices to the supported interpreter set:

- Linux:    ["37","38","39","310","311"]  -> ["310","311"]
- Windows:  ["9","10","11"]               -> ["10","11"]

Drops Python 3.7-3.9 wheel production; aligns with the project's
pyproject.toml requires-python >= 3.10 (3.9 is kept only as the C++
build job's pip-for-ninja interpreter, which is incidental).

itk-python-package-tag is left at the existing pinned commit; bumping
it is a separate change that requires verifying the wheel-build scripts
against ITK 5.4.6 first.

The "Install pocl" step is intentionally unchanged — the conda-forge
pocl path was working previously and is preferred for the C++ OpenCL
test job.
hjmjohnson added a commit that referenced this pull request May 21, 2026
….07.22

Update the build-test-package workflow so its pinned versions match the
underlying ITK 5.4.6 release that PR #73 targets:

- itk-git-tag, itk-wheel-tag        v5.3.0       -> v5.4.6
- opencl-icd-loader-git-tag         v2021.04.29  -> v2025.07.22
- opencl-headers-git-tag            v2021.04.29  -> v2025.07.22
- macos runner                      macos-14     -> macos-15
- actions/download-artifact          v2          -> v4 (required for
                                                    upload-artifact@v4)
- lukka/get-cmake                    v3.22.2     -> @latest

Restrict Python wheel matrices to the supported interpreter set:

- Linux:    ["37","38","39","310","311"]  -> ["310","311"]
- Windows:  ["9","10","11"]               -> ["10","11"]

Drops Python 3.7-3.9 wheel production; aligns with the project's
pyproject.toml requires-python >= 3.10 (3.9 is kept only as the C++
build job's pip-for-ninja interpreter, which is incidental).

itk-python-package-tag is left at the existing pinned commit; bumping
it is a separate change that requires verifying the wheel-build scripts
against ITK 5.4.6 first.

The "Install pocl" step is intentionally unchanged — the conda-forge
pocl path was working previously and is preferred for the C++ OpenCL
test job.
hjmjohnson added a commit that referenced this pull request May 21, 2026
….07.22

Update the build-test-package workflow so its pinned versions match the
underlying ITK 5.4.6 release that PR #73 targets:

- itk-git-tag, itk-wheel-tag        v5.3.0       -> v5.4.6
- opencl-icd-loader-git-tag         v2021.04.29  -> v2025.07.22
- opencl-headers-git-tag            v2021.04.29  -> v2025.07.22
- macos runner                      macos-14     -> macos-15
- actions/download-artifact          v2          -> v4 (required for
                                                    upload-artifact@v4)
- lukka/get-cmake                    v3.22.2     -> @latest

Restrict Python wheel matrices to the supported interpreter set:

- Linux:    ["37","38","39","310","311"]  -> ["310","311"]
- Windows:  ["9","10","11"]               -> ["10","11"]

Drops Python 3.7-3.9 wheel production; aligns with the project's
pyproject.toml requires-python >= 3.10 (3.9 is kept only as the C++
build job's pip-for-ninja interpreter, which is incidental).

itk-python-package-tag is left at the existing pinned commit; bumping
it is a separate change that requires verifying the wheel-build scripts
against ITK 5.4.6 first.

The "Install pocl" step is intentionally unchanged — the conda-forge
pocl path was working previously and is preferred for the C++ OpenCL
test job.
hjmjohnson added a commit that referenced this pull request May 21, 2026
….07.22

Update the build-test-package workflow so its pinned versions match the
underlying ITK 5.4.6 release that PR #73 targets:

- itk-git-tag, itk-wheel-tag        v5.3.0       -> v5.4.6
- opencl-icd-loader-git-tag         v2021.04.29  -> v2025.07.22
- opencl-headers-git-tag            v2021.04.29  -> v2025.07.22
- macos runner                      macos-14     -> macos-15
- actions/download-artifact          v2          -> v4 (required for
                                                    upload-artifact@v4)
- lukka/get-cmake                    v3.22.2     -> @latest

Restrict Python wheel matrices to the supported interpreter set:

- Linux:    ["37","38","39","310","311"]  -> ["310","311"]
- Windows:  ["9","10","11"]               -> ["10","11"]

Drops Python 3.7-3.9 wheel production; aligns with the project's
pyproject.toml requires-python >= 3.10 (3.9 is kept only as the C++
build job's pip-for-ninja interpreter, which is incidental).

itk-python-package-tag is left at the existing pinned commit; bumping
it is a separate change that requires verifying the wheel-build scripts
against ITK 5.4.6 first.

The "Install pocl" step is intentionally unchanged — the conda-forge
pocl path was working previously and is preferred for the C++ OpenCL
test job.
hjmjohnson added a commit that referenced this pull request May 21, 2026
….07.22

Update the build-test-package workflow so its pinned versions match the
underlying ITK 5.4.6 release that PR #73 targets:

- itk-git-tag, itk-wheel-tag        v5.3.0       -> v5.4.6
- opencl-icd-loader-git-tag         v2021.04.29  -> v2025.07.22
- opencl-headers-git-tag            v2021.04.29  -> v2025.07.22
- macos runner                      macos-14     -> macos-15
- actions/download-artifact          v2          -> v4 (required for
                                                    upload-artifact@v4)
- lukka/get-cmake                    v3.22.2     -> @latest

Restrict Python wheel matrices to the supported interpreter set:

- Linux:    ["37","38","39","310","311"]  -> ["310","311"]
- Windows:  ["9","10","11"]               -> ["10","11"]

Drops Python 3.7-3.9 wheel production; aligns with the project's
pyproject.toml requires-python >= 3.10 (3.9 is kept only as the C++
build job's pip-for-ninja interpreter, which is incidental).

itk-python-package-tag is left at the existing pinned commit; bumping
it is a separate change that requires verifying the wheel-build scripts
against ITK 5.4.6 first.

The "Install pocl" step is intentionally unchanged — the conda-forge
pocl path was working previously and is preferred for the C++ OpenCL
test job.
hjmjohnson added a commit that referenced this pull request May 21, 2026
….07.22

Update the build-test-package workflow so its pinned versions match the
underlying ITK 5.4.6 release that PR #73 targets:

- itk-git-tag, itk-wheel-tag        v5.3.0       -> v5.4.6
- opencl-icd-loader-git-tag         v2021.04.29  -> v2025.07.22
- opencl-headers-git-tag            v2021.04.29  -> v2025.07.22
- macos runner                      macos-14     -> macos-15
- actions/download-artifact          v2          -> v4 (required for
                                                    upload-artifact@v4)
- lukka/get-cmake                    v3.22.2     -> @latest

Restrict Python wheel matrices to the supported interpreter set:

- Linux:    ["37","38","39","310","311"]  -> ["310","311"]
- Windows:  ["9","10","11"]               -> ["10","11"]

Drops Python 3.7-3.9 wheel production; aligns with the project's
pyproject.toml requires-python >= 3.10 (3.9 is kept only as the C++
build job's pip-for-ninja interpreter, which is incidental).

itk-python-package-tag is left at the existing pinned commit; bumping
it is a separate change that requires verifying the wheel-build scripts
against ITK 5.4.6 first.

The "Install pocl" step is intentionally unchanged — the conda-forge
pocl path was working previously and is preferred for the C++ OpenCL
test job.
hjmjohnson added a commit that referenced this pull request May 21, 2026
….07.22

Update the build-test-package workflow so its pinned versions match the
underlying ITK 5.4.6 release that PR #73 targets:

- itk-git-tag, itk-wheel-tag        v5.3.0       -> v5.4.6
- opencl-icd-loader-git-tag         v2021.04.29  -> v2025.07.22
- opencl-headers-git-tag            v2021.04.29  -> v2025.07.22
- macos runner                      macos-14     -> macos-15
- actions/download-artifact          v2          -> v4 (required for
                                                    upload-artifact@v4)
- lukka/get-cmake                    v3.22.2     -> @latest

Restrict Python wheel matrices to the supported interpreter set:

- Linux:    ["37","38","39","310","311"]  -> ["310","311"]
- Windows:  ["9","10","11"]               -> ["10","11"]

Drops Python 3.7-3.9 wheel production; aligns with the project's
pyproject.toml requires-python >= 3.10 (3.9 is kept only as the C++
build job's pip-for-ninja interpreter, which is incidental).

itk-python-package-tag is left at the existing pinned commit; bumping
it is a separate change that requires verifying the wheel-build scripts
against ITK 5.4.6 first.

The "Install pocl" step is intentionally unchanged — the conda-forge
pocl path was working previously and is preferred for the C++ OpenCL
test job.
hjmjohnson added a commit that referenced this pull request May 21, 2026
….07.22

Update the build-test-package workflow so its pinned versions match the
underlying ITK 5.4.6 release that PR #73 targets:

- itk-git-tag, itk-wheel-tag        v5.3.0       -> v5.4.6
- opencl-icd-loader-git-tag         v2021.04.29  -> v2025.07.22
- opencl-headers-git-tag            v2021.04.29  -> v2025.07.22
- macos runner                      macos-14     -> macos-15
- actions/download-artifact          v2          -> v4 (required for
                                                    upload-artifact@v4)
- lukka/get-cmake                    v3.22.2     -> @latest

Restrict Python wheel matrices to the supported interpreter set:

- Linux:    ["37","38","39","310","311"]  -> ["310","311"]
- Windows:  ["9","10","11"]               -> ["10","11"]

Drops Python 3.7-3.9 wheel production; aligns with the project's
pyproject.toml requires-python >= 3.10 (3.9 is kept only as the C++
build job's pip-for-ninja interpreter, which is incidental).

itk-python-package-tag is left at the existing pinned commit; bumping
it is a separate change that requires verifying the wheel-build scripts
against ITK 5.4.6 first.

The "Install pocl" step is intentionally unchanged — the conda-forge
pocl path was working previously and is preferred for the C++ OpenCL
test job.
hjmjohnson added a commit that referenced this pull request May 21, 2026
….07.22

Update the build-test-package workflow so its pinned versions match the
underlying ITK 5.4.6 release that PR #73 targets:

- itk-git-tag, itk-wheel-tag        v5.3.0       -> v5.4.6
- opencl-icd-loader-git-tag         v2021.04.29  -> v2025.07.22
- opencl-headers-git-tag            v2021.04.29  -> v2025.07.22
- macos runner                      macos-14     -> macos-15
- actions/download-artifact          v2          -> v4 (required for
                                                    upload-artifact@v4)
- lukka/get-cmake                    v3.22.2     -> @latest

Restrict Python wheel matrices to the supported interpreter set:

- Linux:    ["37","38","39","310","311"]  -> ["310","311"]
- Windows:  ["9","10","11"]               -> ["10","11"]

Drops Python 3.7-3.9 wheel production; aligns with the project's
pyproject.toml requires-python >= 3.10 (3.9 is kept only as the C++
build job's pip-for-ninja interpreter, which is incidental).

itk-python-package-tag is left at the existing pinned commit; bumping
it is a separate change that requires verifying the wheel-build scripts
against ITK 5.4.6 first.

The "Install pocl" step is intentionally unchanged — the conda-forge
pocl path was working previously and is preferred for the C++ OpenCL
test job.
hjmjohnson added a commit that referenced this pull request May 21, 2026
….07.22

Update the build-test-package workflow so its pinned versions match the
underlying ITK 5.4.6 release that PR #73 targets:

- itk-git-tag, itk-wheel-tag        v5.3.0       -> v5.4.6
- opencl-icd-loader-git-tag         v2021.04.29  -> v2025.07.22
- opencl-headers-git-tag            v2021.04.29  -> v2025.07.22
- macos runner                      macos-14     -> macos-15
- actions/download-artifact          v2          -> v4 (required for
                                                    upload-artifact@v4)
- lukka/get-cmake                    v3.22.2     -> @latest

Restrict Python wheel matrices to the supported interpreter set:

- Linux:    ["37","38","39","310","311"]  -> ["310","311"]
- Windows:  ["9","10","11"]               -> ["10","11"]

Drops Python 3.7-3.9 wheel production; aligns with the project's
pyproject.toml requires-python >= 3.10 (3.9 is kept only as the C++
build job's pip-for-ninja interpreter, which is incidental).

itk-python-package-tag is left at the existing pinned commit; bumping
it is a separate change that requires verifying the wheel-build scripts
against ITK 5.4.6 first.

The "Install pocl" step is intentionally unchanged — the conda-forge
pocl path was working previously and is preferred for the C++ OpenCL
test job.
@hjmjohnson
Copy link
Copy Markdown
Member Author

PR #74 (stacked on this branch) modernizes the CI environment and makes the hosted runners exercise real FFTs wherever the hardware/runtime allows — previously hosted CI only compiled and ran the lint smoke tests.

Environmental tool updates
  • ITK v5.4.6, runner matrix ubuntu-24.04 / macos-15 / windows-2022
  • OpenCL ICD loader + headers v2025.07.22, CL_TARGET_OPENCL_VERSION=300
  • Python 3.9 setup, current actions/* (checkout@v5, setup-python@v5, lukka/get-cmake@latest, upload/download-artifact@v4)
  • ITK built with only the module's declared dependency set (ITK_BUILD_DEFAULT_MODULES=OFF) to keep each leg cheap
  • Self-hosted GPU/notebook workflows gated behind workflow_dispatch / a gpu-ci label so they no longer queue indefinitely (see CI: Self-hosted GPU runners needed to fully test VkFFT backends #75)
Run tests where possible (functional FFT coverage on hosted CI)
Leg Coverage
macos-15 Metal Builds with BUILD_TESTING=ON and runs the FFT tests on the Apple Silicon GPU — real on-hardware FFT correctness
ubuntu-24.04 OpenCL (pocl) Runs a pocl-safe FFT round-trip subset (radix-2/3/5/7 + Bluestein primes 11/13, capped at size 16); pocl mis-computes the size-19 Bluestein inverse, so larger/baseline cases stay on real GPUs
CUDA / Level Zero Compile + link coverage on hosted runners (no GPU needed to build)
macOS/Windows OpenCL Lint smoke tests (no verified pocl device)

Full-size and baseline FFT correctness continue to run on the self-hosted GPU runner. Several genuine module fixes fell out of this (Level Zero loader/header discovery, the <ze_api.h> include path, OpenCL device-less-platform handling).

@hjmjohnson hjmjohnson requested a review from Leengit May 22, 2026 17:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant