Skip to content

docs(gpu): add ROCm 7.x and RDNA 3.5 / Strix Halo (gfx1151) to GPU acceleration guide#9229

Open
walcz-de wants to merge 2 commits intomudler:masterfrom
walcz-de:pr/docs-gfx1151-rocm7
Open

docs(gpu): add ROCm 7.x and RDNA 3.5 / Strix Halo (gfx1151) to GPU acceleration guide#9229
walcz-de wants to merge 2 commits intomudler:masterfrom
walcz-de:pr/docs-gfx1151-rocm7

Conversation

@walcz-de
Copy link
Copy Markdown
Contributor

@walcz-de walcz-de commented Apr 4, 2026

Summary

  • Adds a dedicated AMD RDNA 3.5 / Strix Halo (gfx1151) section with kernel boot params, required env vars, and a Docker Compose example
  • Updates ROCm requirements to mention ROCm 7.x alongside 6.x
  • Adds Ubuntu 24.04 to the tested OS list
  • Documents the AMDGPU_TARGETS default (11 rocWMMA-compatible architectures) and when to override it
  • Adds ROCm version column and gfx1151 / Radeon 8060S entry to the verified devices table
  • Fixes typo: "deditated" → "dedicated"

Background

AMD Ryzen AI MAX+ (Strix Halo) APUs with an integrated Radeon 8060S (gfx1151 / RDNA 3.5) expose up to 96 GB of unified VRAM to the GPU, but require ROCm 7.x (not available in ROCm 6.x) and two runtime env vars to work correctly:

HSA_OVERRIDE_GFX_VERSION=11.5.1   # tells HSA runtime to use gfx1151 code objects
ROCBLAS_USE_HIPBLASLT=1            # prefer hipBLASLt over rocBLAS GEMM

The companion build PR (feat(rocm) #9230) explains the AMDGPU_TARGETS default in detail: the default covers the 11 GPU architectures supported by the rocWMMA library, which is required for -DGGML_HIP_ROCWMMA_FATTN=ON (~50% FlashAttention speedup). GPUs outside that list (gfx803, gfx900, gfx906, gfx1012, gfx1030–1032, gfx1103, gfx1152) can still use ROCm 7.x but require a custom build with the full arch list and without the rocWMMA optimisation.

Test plan

  • Docs render correctly in Hugo / the LocalAI docs site
  • Docker Compose example is functional on a Strix Halo system with the build-support PR applied

🤖 Generated with Claude Code

walcz-de added 2 commits April 4, 2026 17:01
- Fix typo: "deditated" → "dedicated", "ROCm6" → "ROCm"
- Add ROCm 7.x to requirements (alongside ROCm 6.x)
- Add Ubuntu 24.04 to tested OS list
- Add AMD Strix Halo / gfx1151 section with kernel params,
  required env vars (HSA_OVERRIDE_GFX_VERSION, ROCBLAS_USE_HIPBLASLT),
  and Docker Compose example
- Add gfx1151 to the list of compiled GPU targets
- Add ROCm version column to verified devices table
- Add gfx1151 / Radeon 8060S (ROCm 7.11.0) as verified device
…warning

- Add all 4 required env vars (HSA_OVERRIDE_GFX_VERSION, ROCBLAS_USE_HIPBLASLT,
  HSA_XNACK=1, HSA_ENABLE_SDMA=0) with descriptions in a table
- Fix Docker Compose example to use the ROCm 7.x image tag (-gpu-hipblas-rocm7),
  not the ROCm 6.x image
- Add explicit warning: GGML_CUDA_ENABLE_UNIFIED_MEMORY must NOT be set
  (even =0 activates hipMallocManaged due to getenv != nullptr check)
- Add --force-recreate note (docker restart does not update container env)
- Add tested hardware note (Geekom A9 Mega / Ryzen AI MAX+ 395)
@walcz-de
Copy link
Copy Markdown
Contributor Author

walcz-de commented Apr 4, 2026

Context for reviewers

This is the documentation companion to #9230.

On the kernel boot parameters
The parameters listed (iommu=pt amdgpu.gttsize=126976 ttm.pages_limit=32505856) are what I'm running on the test machine (128 GB RAM, Ubuntu 24.04). Without iommu=pt the GPU is not accessible from the container. The gttsize/pages_limit values allocate most system RAM as GTT (Graphics Translation Table) memory — necessary to give the integrated GPU access to the full memory pool. Other Strix Halo systems with different RAM capacities may need different values; I've noted that in the wording.

On the environment variables
All four env vars listed (HSA_OVERRIDE_GFX_VERSION, ROCBLAS_USE_HIPBLASLT, HSA_XNACK, HSA_ENABLE_SDMA) are confirmed working and necessary on my hardware. I did not set them unconditionally in the Dockerfile (PR #9230) because HSA_OVERRIDE_GFX_VERSION=11.5.1 would break ROCm 6.x users with a different GPU — it overrides the hardware detection. Users building the ROCm 7.x image for gfx1151 need to set these themselves, or the maintainers can decide to bake them into the image for the -rocm7 variant.

On the GGML_CUDA_ENABLE_UNIFIED_MEMORY warning
This one bit me hard during development — the variable name says "CUDA" but it controls HIP's hipMallocManaged path too. The getenv() != nullptr check means setting it to 0 in a compose file is as bad as setting it to 1. Worth calling out explicitly for anyone trying to tune memory allocation.

@walcz-de walcz-de changed the title docs(gpu): add AMD Strix Halo / gfx1151 (ROCm 7.x) to GPU acceleration guide docs(gpu): add ROCm 7.x and RDNA 3.5 / Strix Halo (gfx1151) to GPU acceleration guide Apr 4, 2026
@walcz-de walcz-de marked this pull request as draft April 4, 2026 22:18
@walcz-de walcz-de marked this pull request as ready for review April 4, 2026 23:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant