Skip to content

add ROCm backend support#79

Open
alantsev wants to merge 1 commit into
antirez:mainfrom
alantsev:rocm
Open

add ROCm backend support#79
alantsev wants to merge 1 commit into
antirez:mainfrom
alantsev:rocm

Conversation

@alantsev
Copy link
Copy Markdown

add ROCm backend support

This PR adds ROCm support as a third GPU backend alongside the existing CUDA and Metal backends.

  • Makefile — Added GPU_BACKEND=rocm with hipcc, configurable via ROCM_PATH and ROCM_ARCH. Existing CUDA and Metal builds are unchanged.
  • ds4_cuda.cu — Now builds for both CUDA and ROCm from a single source via #ifdef __HIP_PLATFORM_AMD__, using FULL_WARP_MASK/MASK_T to handle the warp width difference.
  • ds4_rocm.h (new) — Compatibility header mapping CUDA runtime / cuBLAS to HIP equivalents, plus AMD implementations of __vcmpne4, __vsub4, __dp4a.

Build

make GPU_BACKEND=rocm
make GPU_BACKEND=rocm ROCM_ARCH=gfx942

Benchmarks — AMD Radeon 8060S (gfx1151), ROCm 6.x:

ctx_tokens | prefill_tps | gen_tps
     2,048 |       82.36 |    7.40
     8,192 |       80.59 |    7.03
    16,384 |       80.51 |    6.69
    32,768 |       79.62 |    6.10
    49,152 |       78.77 |    5.59
    65,536 |       78.04 |    5.24

@alantsev alantsev changed the title Add ROCm (AMD GPU) backend support add ROCm backend support May 11, 2026
@alantsev alantsev force-pushed the rocm branch 3 times, most recently from b5a7126 to 6e4c3f8 Compare May 12, 2026 10:33
@janimo
Copy link
Copy Markdown

janimo commented May 12, 2026

@alantsev nice, tested on Ubuntu 26.04/Strix Halo with both the distro's ROCm 7.1 and AMD's 7.2.3 packages.
I am getting slightly lower prefill numbers (~70) than you in the bench.

@ucjonathan
Copy link
Copy Markdown

Lot of these $2-3k machines out there. My friend has one. Glad you guys are doing the work to support this platform as well. Impressed that the community is going from Apple Metal -> CUDA -> ROCm support in just about a week. Someone is going to need to get a Strix machine in @antirez hands though before it will be merged in. I really think there needs to be some corporate level sponsorship on the hardware side for this project between Apple / Nvidia / AMD / Intel etc.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants