docs(gpu): add ROCm 7.x and RDNA 3.5 / Strix Halo (gfx1151) to GPU acceleration guide by walcz-de · Pull Request #9229 · mudler/LocalAI

walcz-de · 2026-04-04T15:07:13Z

Summary

Adds a dedicated AMD RDNA 3.5 / Strix Halo (gfx1151) section with kernel boot params, required env vars, and a Docker Compose example
Updates ROCm requirements to mention ROCm 7.x alongside 6.x
Adds Ubuntu 24.04 to the tested OS list
Documents the AMDGPU_TARGETS default (11 rocWMMA-compatible architectures) and when to override it
Adds ROCm version column and gfx1151 / Radeon 8060S entry to the verified devices table
Fixes typo: "deditated" → "dedicated"

Background

AMD Ryzen AI MAX+ (Strix Halo) APUs with an integrated Radeon 8060S (gfx1151 / RDNA 3.5) expose up to 96 GB of unified VRAM to the GPU, but require ROCm 7.x (not available in ROCm 6.x) and two runtime env vars to work correctly:

HSA_OVERRIDE_GFX_VERSION=11.5.1   # tells HSA runtime to use gfx1151 code objects
ROCBLAS_USE_HIPBLASLT=1            # prefer hipBLASLt over rocBLAS GEMM

The companion build PR (feat(rocm) #9230) explains the AMDGPU_TARGETS default in detail: the default covers the 11 GPU architectures supported by the rocWMMA library, which is required for -DGGML_HIP_ROCWMMA_FATTN=ON (~50% FlashAttention speedup). GPUs outside that list (gfx803, gfx900, gfx906, gfx1012, gfx1030–1032, gfx1103, gfx1152) can still use ROCm 7.x but require a custom build with the full arch list and without the rocWMMA optimisation.

Test plan

Docs render correctly in Hugo / the LocalAI docs site
Docker Compose example is functional on a Strix Halo system with the build-support PR applied

🤖 Generated with Claude Code

- Fix typo: "deditated" → "dedicated", "ROCm6" → "ROCm" - Add ROCm 7.x to requirements (alongside ROCm 6.x) - Add Ubuntu 24.04 to tested OS list - Add AMD Strix Halo / gfx1151 section with kernel params, required env vars (HSA_OVERRIDE_GFX_VERSION, ROCBLAS_USE_HIPBLASLT), and Docker Compose example - Add gfx1151 to the list of compiled GPU targets - Add ROCm version column to verified devices table - Add gfx1151 / Radeon 8060S (ROCm 7.11.0) as verified device

…warning - Add all 4 required env vars (HSA_OVERRIDE_GFX_VERSION, ROCBLAS_USE_HIPBLASLT, HSA_XNACK=1, HSA_ENABLE_SDMA=0) with descriptions in a table - Fix Docker Compose example to use the ROCm 7.x image tag (-gpu-hipblas-rocm7), not the ROCm 6.x image - Add explicit warning: GGML_CUDA_ENABLE_UNIFIED_MEMORY must NOT be set (even =0 activates hipMallocManaged due to getenv != nullptr check) - Add --force-recreate note (docker restart does not update container env) - Add tested hardware note (Geekom A9 Mega / Ryzen AI MAX+ 395)

walcz-de · 2026-04-04T15:44:04Z

Context for reviewers

This is the documentation companion to #9230.

On the kernel boot parameters
The parameters listed (iommu=pt amdgpu.gttsize=126976 ttm.pages_limit=32505856) are what I'm running on the test machine (128 GB RAM, Ubuntu 24.04). Without iommu=pt the GPU is not accessible from the container. The gttsize/pages_limit values allocate most system RAM as GTT (Graphics Translation Table) memory — necessary to give the integrated GPU access to the full memory pool. Other Strix Halo systems with different RAM capacities may need different values; I've noted that in the wording.

On the environment variables
All four env vars listed (HSA_OVERRIDE_GFX_VERSION, ROCBLAS_USE_HIPBLASLT, HSA_XNACK, HSA_ENABLE_SDMA) are confirmed working and necessary on my hardware. I did not set them unconditionally in the Dockerfile (PR #9230) because HSA_OVERRIDE_GFX_VERSION=11.5.1 would break ROCm 6.x users with a different GPU — it overrides the hardware detection. Users building the ROCm 7.x image for gfx1151 need to set these themselves, or the maintainers can decide to bake them into the image for the -rocm7 variant.

On the GGML_CUDA_ENABLE_UNIFIED_MEMORY warning
This one bit me hard during development — the variable name says "CUDA" but it controls HIP's hipMallocManaged path too. The getenv() != nullptr check means setting it to 0 in a compose file is as bad as setting it to 1. Worth calling out explicitly for anyone trying to tune memory allocation.

walcz-de added 2 commits April 4, 2026 17:01

walcz-de changed the title ~~docs(gpu): add AMD Strix Halo / gfx1151 (ROCm 7.x) to GPU acceleration guide~~ docs(gpu): add ROCm 7.x and RDNA 3.5 / Strix Halo (gfx1151) to GPU acceleration guide Apr 4, 2026

walcz-de marked this pull request as draft April 4, 2026 22:18

walcz-de marked this pull request as ready for review April 4, 2026 23:15

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

docs(gpu): add ROCm 7.x and RDNA 3.5 / Strix Halo (gfx1151) to GPU acceleration guide#9229

docs(gpu): add ROCm 7.x and RDNA 3.5 / Strix Halo (gfx1151) to GPU acceleration guide#9229
walcz-de wants to merge 2 commits intomudler:masterfrom
walcz-de:pr/docs-gfx1151-rocm7

walcz-de commented Apr 4, 2026 •

edited

Loading

Uh oh!

walcz-de commented Apr 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

walcz-de commented Apr 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Background

Test plan

Uh oh!

walcz-de commented Apr 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

walcz-de commented Apr 4, 2026 •

edited

Loading