Skip to content

kernel-builder: improve GPU arch handling#579

Open
danieldk wants to merge 3 commits into
mainfrom
gpu-archs-cleanup
Open

kernel-builder: improve GPU arch handling#579
danieldk wants to merge 3 commits into
mainfrom
gpu-archs-cleanup

Conversation

@danieldk
Copy link
Copy Markdown
Member

Add a bunch of improvements to the GPU arch handling code:

  • Completely remove arch.nix. This file was originally used to for compiling Torch and for determining supported archs list. However, we repackage Torch binaries and we use our own arch list.
  • Completely separate CUDA and HIP CMake code generation. This is cleaner and fixes an issue where when compiling with ROCm, the CUDA code would also get called, since it was not guarded.
  • Improve per-kernel GPU arch reporting.

danieldk added 2 commits May 22, 2026 09:52
Add a bunch of improvements to the GPU arch handling code:

- Completely remove `arch.nix`. This file was originally used to for
  compiling Torch and for determining supported archs list. However,
  we repackage Torch binaries and we use our own arch list.
- Completely separate CUDA and HIP CMake code generation. This is
  cleaner and fixes an issue where when compiling with ROCm, the CUDA
  code would also get called, since it was not guarded.
- Improve per-kernel GPU arch reporting.
else()
set(_KERNEL_ARCHS "${CUDA_KERNEL_ARCHS}")
endif()
message(STATUS "CUDA kernel: ${KERNEL_NAME}, capabilities: ${_KERNEL_ARCHS}")
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the only real change here, the rest is just moving the code out of the conditional block due to the CUDA/HIP split.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make sense to split the CUDA and HIP functions into their own scripts and use them here or too much moving around?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I considered that in the big CMake refactor a few months back, but in the end we need all those functions anyway (since the kernel may be multi-backend), so I decided to put them together and do the variable substitution in cuda.cmake, cpu.cmake, xpu.cmake, etc.

else()
set(_KERNEL_ARCHS "${ROCM_ARCHS}")
endif()
message(STATUS "ROCm kernel: ${KERNEL_NAME}, archs: ${_KERNEL_ARCHS}")
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the only real change here, the rest is just moving the code out of the conditional block due to the CUDA/HIP split.

#
# Note: this is defined as a macro since it updates `CMAKE_CUDA_FLAGS`.
#
macro(override_gpu_arches GPU_ARCHES GPU_LANG GPU_SUPPORTED_ARCHES)
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unused now, remove.

Copy link
Copy Markdown
Member

@sayakpaul sayakpaul left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome, left some questions.

@@ -0,0 +1,10 @@
if(GPU_LANG STREQUAL "HIP")
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice separation.

else()
set(_KERNEL_ARCHS "${CUDA_KERNEL_ARCHS}")
endif()
message(STATUS "CUDA kernel: ${KERNEL_NAME}, capabilities: ${_KERNEL_ARCHS}")
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make sense to split the CUDA and HIP functions into their own scripts and use them here or too much moving around?

Comment on lines 96 to -108
cuda_capabilities.as_deref(),
None,
cuda_flags.as_deref(),
None,
cuda_minver.as_ref(),
),
Kernel::Rocm {
rocm_archs,
hip_flags,
..
} => (
None,
rocm_archs.as_deref(),
None,
hip_flags.as_deref(),
None,
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice. Way less confusing.

Comment thread nix-builder/lib/extension/torch/arch.nix Outdated
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants