kernel-builder: improve GPU arch handling#579
Conversation
Add a bunch of improvements to the GPU arch handling code: - Completely remove `arch.nix`. This file was originally used to for compiling Torch and for determining supported archs list. However, we repackage Torch binaries and we use our own arch list. - Completely separate CUDA and HIP CMake code generation. This is cleaner and fixes an issue where when compiling with ROCm, the CUDA code would also get called, since it was not guarded. - Improve per-kernel GPU arch reporting.
| else() | ||
| set(_KERNEL_ARCHS "${CUDA_KERNEL_ARCHS}") | ||
| endif() | ||
| message(STATUS "CUDA kernel: ${KERNEL_NAME}, capabilities: ${_KERNEL_ARCHS}") |
There was a problem hiding this comment.
This is the only real change here, the rest is just moving the code out of the conditional block due to the CUDA/HIP split.
There was a problem hiding this comment.
Would it make sense to split the CUDA and HIP functions into their own scripts and use them here or too much moving around?
There was a problem hiding this comment.
I considered that in the big CMake refactor a few months back, but in the end we need all those functions anyway (since the kernel may be multi-backend), so I decided to put them together and do the variable substitution in cuda.cmake, cpu.cmake, xpu.cmake, etc.
| else() | ||
| set(_KERNEL_ARCHS "${ROCM_ARCHS}") | ||
| endif() | ||
| message(STATUS "ROCm kernel: ${KERNEL_NAME}, archs: ${_KERNEL_ARCHS}") |
There was a problem hiding this comment.
This is the only real change here, the rest is just moving the code out of the conditional block due to the CUDA/HIP split.
| # | ||
| # Note: this is defined as a macro since it updates `CMAKE_CUDA_FLAGS`. | ||
| # | ||
| macro(override_gpu_arches GPU_ARCHES GPU_LANG GPU_SUPPORTED_ARCHES) |
sayakpaul
left a comment
There was a problem hiding this comment.
Awesome, left some questions.
| @@ -0,0 +1,10 @@ | |||
| if(GPU_LANG STREQUAL "HIP") | |||
| else() | ||
| set(_KERNEL_ARCHS "${CUDA_KERNEL_ARCHS}") | ||
| endif() | ||
| message(STATUS "CUDA kernel: ${KERNEL_NAME}, capabilities: ${_KERNEL_ARCHS}") |
There was a problem hiding this comment.
Would it make sense to split the CUDA and HIP functions into their own scripts and use them here or too much moving around?
| cuda_capabilities.as_deref(), | ||
| None, | ||
| cuda_flags.as_deref(), | ||
| None, | ||
| cuda_minver.as_ref(), | ||
| ), | ||
| Kernel::Rocm { | ||
| rocm_archs, | ||
| hip_flags, | ||
| .. | ||
| } => ( | ||
| None, | ||
| rocm_archs.as_deref(), | ||
| None, | ||
| hip_flags.as_deref(), | ||
| None, |
There was a problem hiding this comment.
Very nice. Way less confusing.
Add a bunch of improvements to the GPU arch handling code:
arch.nix. This file was originally used to for compiling Torch and for determining supported archs list. However, we repackage Torch binaries and we use our own arch list.