Skip to content

[Experiment] ROCm backend#2300

Open
NripeshN wants to merge 221 commits intoml-explore:mainfrom
NripeshN:rocm-support
Open

[Experiment] ROCm backend#2300
NripeshN wants to merge 221 commits intoml-explore:mainfrom
NripeshN:rocm-support

Conversation

@NripeshN
Copy link
Copy Markdown
Contributor

@NripeshN NripeshN commented Jun 16, 2025

Experiment with ROCm backend.

install MLX with ROCm backend using:

mkdir build && cd build
cmake -DMLX_BUILD_ROCM=ON \
      -DCMAKE_PREFIX_PATH=/opt/rocm \
      -DCMAKE_HIP_ARCHITECTURES="gfx90a;gfx1100" \
      ..
make -j$(nproc)

closes #2556

Inspired by @zcbenz

@NripeshN NripeshN changed the title [Experiment] ROCm backend initial push [Experiment] ROCm backend Jun 16, 2025
@lin72h
Copy link
Copy Markdown

lin72h commented Jun 17, 2025

What an unexpected and amazing surprise! I'm absolutely thrilled.

@NripeshN
Copy link
Copy Markdown
Contributor Author

@awni
What do you think of this PR? Does this have the potential to be merged into main? I can turn this PR from experimental to WIP if so.

@angeloskath
Copy link
Copy Markdown
Member

I think this is good to stay as an experiment branch for some time while we work on core and CUDA. I don't think we have the bandwidth to merge this for a few months at least. Sorry if this is disappointing @NripeshN I don't mean to discourage you working on it.

@akshat2602
Copy link
Copy Markdown

I would love to see the ROCm backend get more traction. The new AI series of processors by AMD have a similar advantage to Apple Silicon with unified memory and getting MLX to run on those processors would be neat.

@countradooku
Copy link
Copy Markdown

Stole my idea :(

@goniz
Copy link
Copy Markdown

goniz commented Jan 22, 2026

How is this even possible for such an awesome PR to be left like this?

Copilot AI review requested due to automatic review settings January 24, 2026 17:08
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds experimental ROCm backend support to MLX, enabling execution on AMD GPUs. The implementation mirrors the CUDA backend structure, providing HIP-based implementations of core operations, memory management, and device handling.

Changes:

  • Added ROCm backend infrastructure with device management, memory allocation, and stream handling
  • Implemented HIP kernels for unary, binary, ternary operations, reductions, normalization (softmax, layer_norm, rms_norm), RoPE, and sorting
  • Updated build system (CMake) to support ROCm compilation with configurable GPU architectures

Reviewed changes

Copilot reviewed 59 out of 59 changed files in this pull request and generated 13 comments.

Show a summary per file
File Description
CMakeLists.txt Added MLX_BUILD_ROCM option and ROCm library detection
mlx/CMakeLists.txt Integrated ROCm backend build configuration
mlx/device.cpp Added ROCm device availability checks
mlx/backend/rocm/*.hip HIP kernel implementations for various operations
mlx/backend/rocm/device.* ROCm device and stream management
mlx/backend/rocm/allocator.* ROCm-specific memory allocator using HIP unified memory
mlx/backend/rocm/worker.* Async task execution worker for stream synchronization
mlx/backend/rocm/utils.* HIP utility functions and error handling
mlx/backend/rocm/jit_module.* JIT compilation support using HIPRTC
mlx/backend/rocm/device/*.hpp Device-side utility functions and type definitions
mlx/backend/rocm/CMakeLists.txt ROCm backend build configuration

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread mlx/backend/rocm/softmax.hip Outdated
Comment thread mlx/backend/rocm/device.cpp Outdated
Comment thread mlx/backend/rocm/layer_norm.hip Outdated
Comment thread mlx/backend/rocm/rope.hip Outdated
Comment thread mlx/backend/rocm/softmax.hip Outdated
Comment thread mlx/backend/rocm/allocator.cpp Outdated
Comment thread CMakeLists.txt Outdated
Comment thread mlx/backend/rocm/binary.hip Outdated
Comment thread mlx/backend/rocm/rms_norm.hip Outdated
Comment thread mlx/backend/rocm/layer_norm.hip Outdated
…ather, scatter, logsumexp, random bits generation, and sorting. Introduce new kernels for efficient computation and integrate with existing ROCm utilities. Update CMake configuration to include new source files and dependencies. Enhance error handling and ensure compatibility with different data types. This commit significantly expands the functionality of the ROCm backend.
@goniz
Copy link
Copy Markdown

goniz commented Jan 24, 2026

👑👑👑

@NripeshN
Copy link
Copy Markdown
Contributor Author

Can anyone run

CMAKE_ARGS="-DMLX_BUILD_ROCM=ON" pip install -e .
CMAKE_ARGS="-DMLX_BUILD_ROCM=ON -DMLX_ROCM_ARCHITECTURES={based on your GPU}" pip install -e .

Replace {based on your GPU} with your GPU architecture

You can run

rocm-smi

to get your GPU information

@goniz
Copy link
Copy Markdown

goniz commented Jan 24, 2026

I'm getting this CMake error:

CMAKE_ARGS="-DMLX_BUILD_ROCM=ON -DMLX_ROCM_ARCHITECTURES=gfx1151" pip install -e .

      -- Configuring done (4.8s)
      CMake Error: The following variables are used in this project, but they are set to NOTFOUND.
      Please set them or make sure they are set and tested correctly in the CMake files:
      /home/goniz/Work/mlx/LAPACK_INCLUDE_DIRS
         used as include directory in directory /home/goniz/Work/mlx
      
      CMake Error in CMakeLists.txt:
        HIP_ARCHITECTURES is empty for target "mlx".
      
      
      CMake Error in CMakeLists.txt:
        HIP_ARCHITECTURES is empty for target "mlx".
      
      
      -- Generating done (0.0s)
      CMake Generate step failed.  Build files cannot be regene
rated correctly.

Running on Strix Halo (gfx1151)

@NripeshN
Copy link
Copy Markdown
Contributor Author

I'm getting this CMake error:

CMAKE_ARGS="-DMLX_BUILD_ROCM=ON -DMLX_ROCM_ARCHITECTURES=gfx1151" pip install -e .
     -- Configuring done (4.8s)
     CMake Error: The following variables are used in this project, but they are set to NOTFOUND.
     Please set them or make sure they are set and tested correctly in the CMake files:
     /home/goniz/Work/mlx/LAPACK_INCLUDE_DIRS
        used as include directory in directory /home/goniz/Work/mlx
     
     CMake Error in CMakeLists.txt:
       HIP_ARCHITECTURES is empty for target "mlx".
     
     
     CMake Error in CMakeLists.txt:
       HIP_ARCHITECTURES is empty for target "mlx".
     
     
     -- Generating done (0.0s)
     CMake Generate step failed.  Build files cannot be regene
rated correctly.

Running on Strix Halo (gfx1151)

Could you retry with the latest push please (p.s. keep your fingers crossed while it compiles, worked for me 138th time)😅

… string formatting, replacing fmt library usage. Remove unused event.cpp file. Update kernel name generation and parameter formatting for consistency.
@goniz
Copy link
Copy Markdown

goniz commented Jan 25, 2026

  Created wheel for mlx: filename=mlx-0.30.4.dev20260125+cadf18c1-0.editable-cp314-cp314-linux_x86_64.whl size=4722 sha256=72c664adbfc4fb9ec317522a8d83b84f85d599d08bd691d7fec3abfdb6f3a5e9
  Stored in directory: /tmp/pip-ephem-wheel-cache-nt7w6bq0/wheels/8a/63/d1/d7d629a5ff73457822bb71aa527c083674bb19ca314735cd05
Successfully built mlx
Installing collected packages: mlx
Successfully installed mlx-0.30.4.dev20260125+cadf18c1

Now what can I test? 😍

@goniz
Copy link
Copy Markdown

goniz commented Jan 25, 2026

I'm getting this:

ImportError: /home/goniz/Work/mlx/python/mlx/lib/libmlx.so: undefined symbol: _ZN3mlx4core11Convolution8eval_gpuERKSt6vectorINS0_5arrayESaIS3_EERS3_

@NripeshN
Copy link
Copy Markdown
Contributor Author

I'm getting this:

ImportError: /home/goniz/Work/mlx/python/mlx/lib/libmlx.so: undefined symbol: _ZN3mlx4core11Convolution8eval_gpuERKSt6vectorINS0_5arrayESaIS3_EERS3_

I forgot to test the Python build my bad, can you try it now?

Unfortunately I might not be able to help after it compiles, I don't have an AMD GPU to run tests😔 I've tried replicating most things from cuda, so hopefully it works

@Geramy
Copy link
Copy Markdown

Geramy commented Mar 31, 2026

Here is my attempt to build it on Pop!_OS 22.04 for use with my gfx1103 card

After installing CMake 4.3.1 from source (because a later version than 3.22.1 installed via apt was needed for this build)

$ which cmake; /usr/local/bin/cmake --version
/usr/local/bin/cmake
cmake version 4.3.1

CMake suite maintained and supported by Kitware (kitware.com/cmake).

Then, when I tried to build from with:

$ mkdir build && cd build
$ /usr/local/bin/cmake -DGGML_BLAS=ON \
                     -DGGML_BLAS_VENDOR=OpenBLAS \
                     -DMLX_BUILD_ROCM=ON \
                     -DCMAKE_PREFIX_PATH=/opt/rocm \
                     -DCMAKE_HIP_ARCHITECTURES="gfx1103" ..

I got this

-- The C compiler identification is GNU 11.4.0
-- The CXX compiler identification is GNU 11.4.0
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working C compiler: /usr/bin/cc - skipped
-- Detecting C compile features
-- Detecting C compile features - done
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /usr/bin/c++ - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Building MLX for x86_64 processor on Linux
-- Setting CMAKE_HIP_ARCHITECTURES to: gfx1103
-- Found HIP compiler: /opt/rocm/bin/hipcc
-- Accelerate not found, using default backend.
-- Looking for sgemm_
-- Looking for sgemm_ - not found
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Success
-- Found Threads: TRUE
CMake Error at /usr/local/share/cmake-4.3/Modules/FindPackageHandleStandardArgs.cmake:290 (message):
  Could NOT find BLAS (missing: BLAS_LIBRARIES)
Call Stack (most recent call first):
  /usr/local/share/cmake-4.3/Modules/FindPackageHandleStandardArgs.cmake:654 (_FPHSA_FAILURE_MESSAGE)
  /usr/local/share/cmake-4.3/Modules/FindBLAS.cmake:1509 (find_package_handle_standard_args)
  /usr/local/share/cmake-4.3/Modules/FindLAPACK.cmake:275 (find_package)
  /usr/local/share/cmake-4.3/Modules/FindLAPACK.cmake:321 (_lapack_find_dependency)
  CMakeLists.txt:326 (find_package)

After installing OpenBLAS via:

sudo apt install libopenblas-dev

I tried again:

$ /usr/local/bin/cmake -DGGML_BLAS=ON \
                                         -DGGML_BLAS_VENDOR=OpenBLAS \
                                         -DMLX_BUILD_ROCM=ON \
                                         -DCMAKE_PREFIX_PATH=/opt/rocm \
                                         -DCMAKE_HIP_ARCHITECTURES="gfx1103" ..
-- Building MLX for x86_64 processor on Linux
-- Setting CMAKE_HIP_ARCHITECTURES to: gfx1103
-- Found HIP compiler: /opt/rocm/bin/hipcc
-- Accelerate not found, using default backend.
-- Looking for sgemm_
-- Looking for sgemm_ - found
-- Found BLAS: /usr/lib/x86_64-linux-gnu/libopenblas.so
-- Looking for cheev_
-- Looking for cheev_ - found
-- Found LAPACK: /usr/lib/x86_64-linux-gnu/libopenblas.so;-lm;-ldl
-- Lapack lib /usr/lib/x86_64-linux-gnu/libopenblas.so-lm-ldl
-- Lapack include LAPACK_INCLUDE_DIRS-NOTFOUND
-- Blas lib /usr/lib/x86_64-linux-gnu/libopenblas.so
-- Blas include /usr/include/x86_64-linux-gnu
-- Downloading json
-- Using the multi-header code from _deps/json-src/include/
-- Downloading gguflib
CMake Warning (dev) at /opt/rocm/lib/cmake/hip/hip-config-amd.cmake:86 (message):
  amdgpu-arch failed with error Failed to get device count
Call Stack (most recent call first):
  /opt/rocm/lib/cmake/hip/hip-config.cmake:149 (include)
  mlx/backend/rocm/CMakeLists.txt:8 (find_package)
This warning is for project developers.  Use -Wno-dev to suppress it.

and the output is 
CMake Warning (dev) at /opt/rocm/lib/cmake/hip/hip-config-amd.cmake:86 (message):
  amdgpu-arch failed with error Failed to get device count
Call Stack (most recent call first):
  /opt/rocm/lib/cmake/hip/hip-config.cmake:149 (include)
  /usr/local/share/cmake-4.3/Modules/CMakeFindDependencyMacro.cmake:93 (find_package)
  /usr/local/share/cmake-4.3/Modules/CMakeFindDependencyMacro.cmake:125 (__find_dependency_common)
  /opt/rocm/lib/cmake/rocblas/rocblas-config.cmake:90 (find_dependency)
  mlx/backend/rocm/CMakeLists.txt:9 (find_package)
This warning is for project developers.  Use -Wno-dev to suppress it.

and the output is 
CMake Error at mlx/backend/rocm/CMakeLists.txt:13 (find_package):
  Could not find a package configuration file provided by "rocwmma" with any
  of the following names:

    rocwmma.cps
    rocwmmaConfig.cmake
    rocwmma-config.cmake

  Add the installation prefix of "rocwmma" to CMAKE_PREFIX_PATH or set
  "rocwmma_DIR" to a directory containing one of the above files.  If
  "rocwmma" provides a separate development package or SDK, be sure it has
  been installed.

You need to install rocm 7.2 at least I believe.Thats probably your primary problem and secondary is to specify the HIP_PLATFORM=amd I believe.

Geramy added 2 commits March 31, 2026 16:23
…, SliceUpdate donation

Arch detection and tuning:
- RocmArchTier enum: RDNA 2/3/3.5/4/CDNA with fine-grained gfx detection
- HWInfo struct: CU count, SIMDs, L2 size, WMMA capability from hipDeviceProp
- ArchTuning: per-arch kernel parameters (QMV tile_n, crossover thresholds)
- Runtime TILE_N for qmv_tiled_kernel via kernel argument (no template bloat)
- MLX_ROCM_QMV_TILE_N env var for manual tuning

WMMA flash attention:
- flash_attention_wmma.hip: rocwmma 16x16x16 tiled kernel for bf16/fp16
- Dispatches for prefill (qL > 4) on supported head dims (64/128/256)
- Integrated into ScaledDotProductAttention dispatch

Arena allocator (DecodeArena):
- Deterministic bump allocator for HIP Graph capture
- Hooked into RocmAllocator malloc/free path
- Proven: 18 KB per decode step with stable addresses

SliceUpdate donation:
- Skip base array copy when input has unique ownership (refcount==1)
- Helps prefill path (200 donated during prompt processing)

GPU memcpy:
- mlx_gpu_memcpy_async (extern C) for direct KV cache writes
- gpu_arena/gpu_graph wrapper functions for engine integration
…V weight loads

custom_kernel.cpp: The hip_kernel lambda's 8th parameter was named
`verbose` but the CustomKernelFunction typedef passes
ensure_row_contiguous in that slot. When gated_delta.cpp called with
ensure_row_contiguous=true, it triggered a cout dump of the full
compiled kernel source — polluting stdout and appearing as model
output on every MoE inference (Qwen3.5-35B, Coder-Next).

qdequant.hpp: Add load_weight_vec<BITS>() helper that loads PPT
uint32 words via a single wide vector load (uint2 for 4-bit,
uint4 for 8-bit) instead of PPT scalar loads.

qmv_tiled_kernel.hip: Use load_weight_vec in both qmv_tiled_kernel
and gather_qmv_tiled_kernel with a warp-uniform branch to separate
the vectorized fast path from the bounds-checked tail path.
@chimezie
Copy link
Copy Markdown
Contributor

chimezie commented Apr 1, 2026

After ensuring I have ROCm, I'm still unable to build (within a rocm/dev-ubuntu-22.04:7.1.1-complete container) after following these steps to ensure it is installed properly:

# cmake -DGGML_BLAS=ON \
                -DGGML_BLAS_VENDOR=OpenBLAS \
                -DMLX_BUILD_ROCM=ON \
                -DCMAKE_PREFIX_PATH=/opt/rocm \
                -DCMAKE_HIP_ARCHITECTURES="gfx1103" ..
-- Building MLX for x86_64 processor on Linux
-- Setting CMAKE_HIP_ARCHITECTURES to: gfx1103
-- Found HIP compiler: /opt/rocm/bin/hipcc
-- Accelerate not found, using default backend.
-- Lapack lib /usr/lib/x86_64-linux-gnu/libopenblas.so-lm-ldl
-- Lapack include LAPACK_INCLUDE_DIRS-NOTFOUND
-- Blas lib /usr/lib/x86_64-linux-gnu/libopenblas.so
-- Blas include /usr/include/x86_64-linux-gnu
-- Downloading json
-- Using the multi-header code from /home/chimezie/Projects/mlx-lm-rocm-support/build/_deps/json-src/include/
-- Downloading gguflib
CMake Error at mlx/backend/rocm/CMakeLists.txt:13 (find_package):
  Could not find a package configuration file provided by "rocwmma" with any
  of the following names:

    rocwmma.cps
    rocwmmaConfig.cmake
    rocwmma-config.cmake

  Add the installation prefix of "rocwmma" to CMAKE_PREFIX_PATH or set
  "rocwmma_DIR" to a directory containing one of the above files.  If
  "rocwmma" provides a separate development package or SDK, be sure it has
  been installed.

-- Configuring incomplete, errors occurred!

@Geramy
Copy link
Copy Markdown

Geramy commented Apr 1, 2026

rocwmma

try this
sudo apt-get install rocwmma-dev

@chimezie
Copy link
Copy Markdown
Contributor

chimezie commented Apr 2, 2026

Still no joy:

# sudo apt install libblas-dev liblapack-dev git rocwmma-dev        
Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
libblas-dev is already the newest version (3.10.0-2ubuntu1).
liblapack-dev is already the newest version (3.10.0-2ubuntu1).
git is already the newest version (1:2.34.1-1ubuntu1.17).
rocwmma-dev is already the newest version (2.1.0.70101-38~22.04).
0 upgraded, 0 newly installed, 0 to remove and 57 not upgraded.
# cmake -DGGML_BLAS=ON \
                -DGGML_BLAS_VENDOR=OpenBLAS \
                -DMLX_BUILD_ROCM=ON \
                -DCMAKE_PREFIX_PATH=/opt/rocm \
               -DCMAKE_HIP_ARCHITECTURES="gfx1103" ..
-- Building MLX for x86_64 processor on Linux
-- Setting CMAKE_HIP_ARCHITECTURES to: gfx1103
-- Found HIP compiler: /opt/rocm/bin/hipcc
-- Accelerate not found, using default backend.
-- Lapack lib /usr/lib/x86_64-linux-gnu/liblapack.so/usr/lib/x86_64-linux-gnu/libblas.so
-- Lapack include LAPACK_INCLUDE_DIRS-NOTFOUND
-- Blas lib /usr/lib/x86_64-linux-gnu/libblas.so
-- Blas include /usr/include/x86_64-linux-gnu
-- Downloading json
-- Using the multi-header code from _deps/json-src/include/
-- Downloading gguflib
CMake Error at mlx/backend/rocm/CMakeLists.txt:13 (find_package):
  Could not find a package configuration file provided by "rocwmma" with any
  of the following names:

    rocwmma.cps
    rocwmmaConfig.cmake
    rocwmma-config.cmake

  Add the installation prefix of "rocwmma" to CMAKE_PREFIX_PATH or set
  "rocwmma_DIR" to a directory containing one of the above files.  If
  "rocwmma" provides a separate development package or SDK, be sure it has
  been installed.

-- Configuring incomplete, errors occurred!

@Geramy
Copy link
Copy Markdown

Geramy commented Apr 2, 2026

Still no joy:

# sudo apt install libblas-dev liblapack-dev git rocwmma-dev        
Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
libblas-dev is already the newest version (3.10.0-2ubuntu1).
liblapack-dev is already the newest version (3.10.0-2ubuntu1).
git is already the newest version (1:2.34.1-1ubuntu1.17).
rocwmma-dev is already the newest version (2.1.0.70101-38~22.04).
0 upgraded, 0 newly installed, 0 to remove and 57 not upgraded.
# cmake -DGGML_BLAS=ON \
                -DGGML_BLAS_VENDOR=OpenBLAS \
                -DMLX_BUILD_ROCM=ON \
                -DCMAKE_PREFIX_PATH=/opt/rocm \
               -DCMAKE_HIP_ARCHITECTURES="gfx1103" ..
-- Building MLX for x86_64 processor on Linux
-- Setting CMAKE_HIP_ARCHITECTURES to: gfx1103
-- Found HIP compiler: /opt/rocm/bin/hipcc
-- Accelerate not found, using default backend.
-- Lapack lib /usr/lib/x86_64-linux-gnu/liblapack.so/usr/lib/x86_64-linux-gnu/libblas.so
-- Lapack include LAPACK_INCLUDE_DIRS-NOTFOUND
-- Blas lib /usr/lib/x86_64-linux-gnu/libblas.so
-- Blas include /usr/include/x86_64-linux-gnu
-- Downloading json
-- Using the multi-header code from _deps/json-src/include/
-- Downloading gguflib
CMake Error at mlx/backend/rocm/CMakeLists.txt:13 (find_package):
  Could not find a package configuration file provided by "rocwmma" with any
  of the following names:

    rocwmma.cps
    rocwmmaConfig.cmake
    rocwmma-config.cmake

  Add the installation prefix of "rocwmma" to CMAKE_PREFIX_PATH or set
  "rocwmma_DIR" to a directory containing one of the above files.  If
  "rocwmma" provides a separate development package or SDK, be sure it has
  been installed.

-- Configuring incomplete, errors occurred!

Can you try this out?

  1. Add the AMD ROCm GPG Key
    First, securely add AMD's package signing key:

Bash
sudo mkdir --parents --mode=0755 /etc/apt/keyrings
wget -q -O - https://repo.radeon.com/rocm/rocm.gpg.key | sudo gpg --dearmor -o /etc/apt/keyrings/rocm.gpg
2. Add the ROCm 7.2 Repository & Set Pinning
Pop!_OS 24.04 uses the Ubuntu "Noble" base, so we use the noble repository. We also need to set the apt pin priority (which System76 explicitly recommends) so your system pulls ROCm packages properly without overriding critical Pop!_OS packages:

Bash
sudo tee /etc/apt/sources.list.d/rocm.list << EOF
deb [arch=amd64 signed-by=/etc/apt/keyrings/rocm.gpg] https://repo.radeon.com/rocm/apt/7.2 noble main
EOF

sudo tee /etc/apt/preferences.d/rocm-pin-600 << EOF
Package: *
Pin: release o=repo.radeon.com
Pin-Priority: 600
EOF

sudo apt update
3. Install ROCm (User-Space Only)
Now, install the core ROCm compute libraries and the specific development headers MLX needs. By just installing rocm, apt pulls the HIP runtime and math libraries without touching your kernel:

Bash
sudo apt install rocm
sudo apt install rocwmma-dev hipblaslt-dev rocblas-dev rocm-core rocm-llvm
4. Assign GPU Permissions
For the ROCm libraries to talk to your GPU without sudo, your user needs to be in the render and video groups:

Bash
sudo usermod -a -G render,video $LOGNAME

If you dont like this method you could install the rocm manually with my bash script which brings in the newest. 7.12 bleeding edge.

FILTER_GFX=gfx1151 is set for me I have set it to gfx1103 for you already.

#!/bin/bash

# --- Configuration ---
BUCKET_URL="https://therock-nightly-tarball.s3.amazonaws.com"
INSTALL_DIR="/opt/rocm"
TEMP_DIR="/tmp/rocm_install_$(date +%s)"
FILTER_GFX="gfx110X"
FILTER_VER="7.12"
SERVICE_NAME="lemonade-server"

# Detect the real user (who ran sudo) to ensure build artifacts aren't owned by root
REAL_USER=${SUDO_USER:-$USER}
USER_HOME=$(getent passwd "$REAL_USER" | cut -d: -f6)
PROJECT_DIR="$USER_HOME/Documents/Programming/llama.rocm/llama.cpp"

# --- Functions ---
echo_step() { echo -e "\n\033[1;34m[INFO]\033[0m $1"; }
echo_err()  { echo -e "\n\033[1;31m[ERROR]\033[0m $1"; }

# Check Root
if [ "$EUID" -ne 0 ]; then 
    echo_err "Please run as root (sudo ./script.sh)"
    exit 1
fi

# ==========================================
# PART 1: Install ROCm (Requires Root)
# ==========================================

echo_step "Fetching release list from $BUCKET_URL..."
FILE_LIST=$(curl -s "$BUCKET_URL")

LATEST_FILE=$(echo "$FILE_LIST" | grep -oP '(?<=<Key>).*?(?=</Key>)' | grep "$FILTER_GFX" | grep "$FILTER_VER" | sort -r | head -n 1)

if [ -z "$LATEST_FILE" ]; then
    echo_err "No files found matching filters."
    exit 1
fi

echo_step "Found latest build: $LATEST_FILE"
mkdir -p "$TEMP_DIR"
cd "$TEMP_DIR"

echo_step "Downloading..."
curl -O "$BUCKET_URL/$LATEST_FILE"

if [ -d "$INSTALL_DIR" ]; then
    echo_step "Removing old ROCm install..."
    rm -rf "$INSTALL_DIR"
fi
mkdir -p "$INSTALL_DIR"

echo_step "Extracting to $INSTALL_DIR..."
tar -xvf "$LATEST_FILE" -C "$INSTALL_DIR" --strip-components=1 > /dev/null
rm -rf "$TEMP_DIR"

echo_step "Done! ROCm updated."

@Geramy
Copy link
Copy Markdown

Geramy commented Apr 2, 2026

I'm getting this:

ImportError: /home/goniz/Work/mlx/python/mlx/lib/libmlx.so: undefined symbol: _ZN3mlx4core11Convolution8eval_gpuERKSt6vectorINS0_5arrayESaIS3_EERS3_

I forgot to test the Python build my bad, can you try it now?

Unfortunately I might not be able to help after it compiles, I don't have an AMD GPU to run tests😔 I've tried replicating most things from cuda, so hopefully it works

Do you know what we have to do in order to get this merged into MLX?

@Geramy
Copy link
Copy Markdown

Geramy commented Apr 2, 2026

@zcbenz Thank you for helping me on the other issue regarding float8 I was hoping you could lend me a hand here on this PR and let me know if there is anyone at MLX that can help give us guidance on getting this PR to land in main. Whatever is needed let me know and i'll do what I can. If need be I'll maintain the ROCm backend.

@chimezie
Copy link
Copy Markdown
Contributor

chimezie commented Apr 3, 2026

I installed Cmake 4.3.1 and ran rocm-7.12.sh with a filter of FILTER_GFX="gfx110X" in a clean container of the rocm/dev-ubuntu-22.04 image

Then make -j$(nproc) fails (after a successful run of cmake) with the following error:

# make -j$(nproc)
[  1%] Built target cpu_compiled_preamble
[  1%] Compiling HIP source /path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip
[  1%] Built target mlx_version
[  3%] Built target gguflib
In file included from /path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:3:
In file included from /path/to/mlx-lm-rocm-support/mlx/backend/rocm/device.h:5:
In file included from /path/to/mlx-lm-rocm-support/mlx/array.h:13:
In file included from /path/to/mlx-lm-rocm-support/mlx/event.h:8:
In file included from /path/to/mlx-lm-rocm-support/mlx/stream.h:8:
/path/to/mlx-lm-rocm-support/mlx/device.h:28:42: warning: defaulted comparison operators are a C++20 extension [-Wc++20-extensions]
   28 |   bool operator==(const Device&) const = default;
      |                                          ^
In file included from /path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:3:
In file included from /path/to/mlx-lm-rocm-support/mlx/backend/rocm/device.h:5:
In file included from /path/to/mlx-lm-rocm-support/mlx/array.h:13:
In file included from /path/to/mlx-lm-rocm-support/mlx/event.h:8:
/path/to/mlx-lm-rocm-support/mlx/stream.h:18:42: warning: defaulted comparison operators are a C++20 extension [-Wc++20-extensions]
   18 |   bool operator==(const Stream&) const = default;
      |                                          ^
In file included from /path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:3:
In file included from /path/to/mlx-lm-rocm-support/mlx/backend/rocm/device.h:15:
In file included from /opt/rocm/include/thrust/execution_policy.h:24:
In file included from /opt/rocm/include/thrust/detail/config.h:22:
In file included from /opt/rocm/include/thrust/detail/config/config.h:23:
In file included from /opt/rocm/include/thrust/detail/config/libcxx.h:44:
In file included from /opt/rocm/include/cuda/std/version:31:
In file included from /opt/rocm/include/cuda/std/detail/__config:67:
/opt/rocm/include/cuda/std/detail/libcxx/include/__config:371:12: warning: Assuming 100 MHz realtime clock rate (TSC) for gfx1100/gfx1101 (according to the RDNA3 ISA). Timing-related APIs (e.g., chrono) or sleep instructions may behave incorrectly! [-W#warnings]
  371 | #          warning Assuming 100 MHz realtime clock rate (TSC) for gfx1100/gfx1101 (according to the RDNA3 ISA). Timing-related APIs (e.g., chrono) or sleep instructions may behave incorrectly!
      |            ^
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:14:25: error: redefinition of 'operator new'
   14 | __device__ inline void* operator new(size_t, void* p) noexcept { return p; }
      |                         ^
/opt/rocm/lib/llvm/lib/clang/22/include/cuda_wrappers/new:95:25: note: previous definition is here
   95 | __device__ inline void *operator new(__SIZE_TYPE__, void *__ptr) CUDA_NOEXCEPT {
      |                         ^
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:15:25: error: redefinition of 'operator new[]'
   15 | __device__ inline void* operator new[](size_t, void* p) noexcept { return p; }
      |                         ^
/opt/rocm/lib/llvm/lib/clang/22/include/cuda_wrappers/new:98:25: note: previous definition is here
   98 | __device__ inline void *operator new[](__SIZE_TYPE__, void *__ptr) CUDA_NOEXCEPT {
      |                         ^
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:16:24: error: redefinition of 'operator delete'
   16 | __device__ inline void operator delete(void*, void*) noexcept {}
      |                        ^
/opt/rocm/lib/llvm/lib/clang/22/include/cuda_wrappers/new:102:24: note: previous definition is here
  102 | __device__ inline void operator delete(void *, void *) CUDA_NOEXCEPT {}
      |                        ^
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:17:24: error: redefinition of 'operator delete[]'
   17 | __device__ inline void operator delete[](void*, void*) noexcept {}
      |                        ^
/opt/rocm/lib/llvm/lib/clang/22/include/cuda_wrappers/new:103:24: note: previous definition is here
  103 | __device__ inline void operator delete[](void *, void *) CUDA_NOEXCEPT {}
      |                        ^
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:453:13: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
  453 |             rocprim::radix_sort_pairs(
      |             ^~~~~~~~~~~~~~~~~~~~~~~~~~
  454 |                 nullptr, temp_bytes,
      |                 ~~~~~~~~~~~~~~~~~~~~
  455 |                 vals_tmp, vals_sorted,
      |                 ~~~~~~~~~~~~~~~~~~~~~~
  456 |                 indices_in, indices_out,
      |                 ~~~~~~~~~~~~~~~~~~~~~~~~
  457 |                 N, 0, sizeof(ValT) * 8, hip_stream);
      |                 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:437:59: note: while substituting into a lambda expression here
  437 |         encoder.launch_kernel([&](hipStream_t hip_stream) {
      |                                                           ^
/path/to/mlx-lm-rocm-support/mlx/dtype_utils.h:49:5: note: in instantiation of function template specialization 'mlx::core::(anonymous namespace)::gpu_sort(const Stream &, array, array &, int, bool)::(anonymous class)::operator()<mlx::core::type_identity<bool>>' requested here
   49 |     MLX_INTERNAL_DTYPE_SWITCH_CASE(bool_, bool);
      |     ^
/path/to/mlx-lm-rocm-support/mlx/dtype_utils.h:17:5: note: expanded from macro 'MLX_INTERNAL_DTYPE_SWITCH_CASE'
   17 |     f(type_identity<TYPE>{});                       \
      |     ^
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:432:5: note: in instantiation of function template specialization 'mlx::core::dispatch_all_types<(lambda at /path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:432:36)>' requested here
  432 |     dispatch_all_types(in.dtype(), [&](auto type_tag) {
      |     ^
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:486:15: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
  486 |               rocprim::radix_sort_pairs(
      |               ^~~~~~~~~~~~~~~~~~~~~~~~~~
  487 |                   temp_storage, temp_bytes,
      |                   ~~~~~~~~~~~~~~~~~~~~~~~~~
  488 |                   vals_tmp, vals_sorted,
      |                   ~~~~~~~~~~~~~~~~~~~~~~
  489 |                   indices_in, indices_out,
      |                   ~~~~~~~~~~~~~~~~~~~~~~~~
  490 |                   N, 0, sizeof(ValT) * 8, hip_stream);
      |                   ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:511:13: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
  511 |             rocprim::radix_sort_keys(
      |             ^~~~~~~~~~~~~~~~~~~~~~~~~
  512 |                 nullptr, temp_bytes,
      |                 ~~~~~~~~~~~~~~~~~~~~
  513 |                 vals_in, vals_out_buf,
      |                 ~~~~~~~~~~~~~~~~~~~~~~
  514 |                 N, 0, sizeof(ValT) * 8, hip_stream);
      |                 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:525:15: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
  525 |               rocprim::radix_sort_keys(
      |               ^~~~~~~~~~~~~~~~~~~~~~~~~
  526 |                   temp_storage, temp_bytes,
      |                   ~~~~~~~~~~~~~~~~~~~~~~~~~
  527 |                   vals_in, vals_out_buf,
      |                   ~~~~~~~~~~~~~~~~~~~~~~
  528 |                   N, 0, sizeof(ValT) * 8, hip_stream);
      |                   ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:453:13: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
  453 |             rocprim::radix_sort_pairs(
      |             ^~~~~~~~~~~~~~~~~~~~~~~~~~
  454 |                 nullptr, temp_bytes,
      |                 ~~~~~~~~~~~~~~~~~~~~
  455 |                 vals_tmp, vals_sorted,
      |                 ~~~~~~~~~~~~~~~~~~~~~~
  456 |                 indices_in, indices_out,
      |                 ~~~~~~~~~~~~~~~~~~~~~~~~
  457 |                 N, 0, sizeof(ValT) * 8, hip_stream);
      |                 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:437:59: note: while substituting into a lambda expression here
  437 |         encoder.launch_kernel([&](hipStream_t hip_stream) {
      |                                                           ^
/path/to/mlx-lm-rocm-support/mlx/dtype_utils.h:50:5: note: in instantiation of function template specialization 'mlx::core::(anonymous namespace)::gpu_sort(const Stream &, array, array &, int, bool)::(anonymous class)::operator()<mlx::core::type_identity<signed char>>' requested here
   50 |     MLX_INTERNAL_DTYPE_SWITCH_INTS();
      |     ^
/path/to/mlx-lm-rocm-support/mlx/dtype_utils.h:21:3: note: expanded from macro 'MLX_INTERNAL_DTYPE_SWITCH_INTS'
   21 |   MLX_INTERNAL_DTYPE_SWITCH_CASE(int8, int8_t);     \
      |   ^
/path/to/mlx-lm-rocm-support/mlx/dtype_utils.h:17:5: note: expanded from macro 'MLX_INTERNAL_DTYPE_SWITCH_CASE'
   17 |     f(type_identity<TYPE>{});                       \
      |     ^
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:432:5: note: in instantiation of function template specialization 'mlx::core::dispatch_all_types<(lambda at /path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:432:36)>' requested here
  432 |     dispatch_all_types(in.dtype(), [&](auto type_tag) {
      |     ^
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:486:15: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
  486 |               rocprim::radix_sort_pairs(
      |               ^~~~~~~~~~~~~~~~~~~~~~~~~~
  487 |                   temp_storage, temp_bytes,
      |                   ~~~~~~~~~~~~~~~~~~~~~~~~~
  488 |                   vals_tmp, vals_sorted,
      |                   ~~~~~~~~~~~~~~~~~~~~~~
  489 |                   indices_in, indices_out,
      |                   ~~~~~~~~~~~~~~~~~~~~~~~~
  490 |                   N, 0, sizeof(ValT) * 8, hip_stream);
      |                   ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:511:13: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
  511 |             rocprim::radix_sort_keys(
      |             ^~~~~~~~~~~~~~~~~~~~~~~~~
  512 |                 nullptr, temp_bytes,
      |                 ~~~~~~~~~~~~~~~~~~~~
  513 |                 vals_in, vals_out_buf,
      |                 ~~~~~~~~~~~~~~~~~~~~~~
  514 |                 N, 0, sizeof(ValT) * 8, hip_stream);
      |                 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:525:15: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
  525 |               rocprim::radix_sort_keys(
      |               ^~~~~~~~~~~~~~~~~~~~~~~~~
  526 |                   temp_storage, temp_bytes,
      |                   ~~~~~~~~~~~~~~~~~~~~~~~~~
  527 |                   vals_in, vals_out_buf,
      |                   ~~~~~~~~~~~~~~~~~~~~~~
  528 |                   N, 0, sizeof(ValT) * 8, hip_stream);
      |                   ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:453:13: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
  453 |             rocprim::radix_sort_pairs(
      |             ^~~~~~~~~~~~~~~~~~~~~~~~~~
  454 |                 nullptr, temp_bytes,
      |                 ~~~~~~~~~~~~~~~~~~~~
  455 |                 vals_tmp, vals_sorted,
      |                 ~~~~~~~~~~~~~~~~~~~~~~
  456 |                 indices_in, indices_out,
      |                 ~~~~~~~~~~~~~~~~~~~~~~~~
  457 |                 N, 0, sizeof(ValT) * 8, hip_stream);
      |                 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:437:59: note: while substituting into a lambda expression here
  437 |         encoder.launch_kernel([&](hipStream_t hip_stream) {
      |                                                           ^
/path/to/mlx-lm-rocm-support/mlx/dtype_utils.h:50:5: note: in instantiation of function template specialization 'mlx::core::(anonymous namespace)::gpu_sort(const Stream &, array, array &, int, bool)::(anonymous class)::operator()<mlx::core::type_identity<short>>' requested here
   50 |     MLX_INTERNAL_DTYPE_SWITCH_INTS();
      |     ^
/path/to/mlx-lm-rocm-support/mlx/dtype_utils.h:22:3: note: expanded from macro 'MLX_INTERNAL_DTYPE_SWITCH_INTS'
   22 |   MLX_INTERNAL_DTYPE_SWITCH_CASE(int16, int16_t);   \
      |   ^
/path/to/mlx-lm-rocm-support/mlx/dtype_utils.h:17:5: note: expanded from macro 'MLX_INTERNAL_DTYPE_SWITCH_CASE'
   17 |     f(type_identity<TYPE>{});                       \
      |     ^
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:432:5: note: in instantiation of function template specialization 'mlx::core::dispatch_all_types<(lambda at /path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:432:36)>' requested here
  432 |     dispatch_all_types(in.dtype(), [&](auto type_tag) {
      |     ^
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:486:15: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
  486 |               rocprim::radix_sort_pairs(
      |               ^~~~~~~~~~~~~~~~~~~~~~~~~~
  487 |                   temp_storage, temp_bytes,
      |                   ~~~~~~~~~~~~~~~~~~~~~~~~~
  488 |                   vals_tmp, vals_sorted,
      |                   ~~~~~~~~~~~~~~~~~~~~~~
  489 |                   indices_in, indices_out,
      |                   ~~~~~~~~~~~~~~~~~~~~~~~~
  490 |                   N, 0, sizeof(ValT) * 8, hip_stream);
      |                   ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:511:13: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
  511 |             rocprim::radix_sort_keys(
      |             ^~~~~~~~~~~~~~~~~~~~~~~~~
  512 |                 nullptr, temp_bytes,
      |                 ~~~~~~~~~~~~~~~~~~~~
  513 |                 vals_in, vals_out_buf,
      |                 ~~~~~~~~~~~~~~~~~~~~~~
  514 |                 N, 0, sizeof(ValT) * 8, hip_stream);
      |                 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:525:15: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
  525 |               rocprim::radix_sort_keys(
      |               ^~~~~~~~~~~~~~~~~~~~~~~~~
  526 |                   temp_storage, temp_bytes,
      |                   ~~~~~~~~~~~~~~~~~~~~~~~~~
  527 |                   vals_in, vals_out_buf,
      |                   ~~~~~~~~~~~~~~~~~~~~~~
  528 |                   N, 0, sizeof(ValT) * 8, hip_stream);
      |                   ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:453:13: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
  453 |             rocprim::radix_sort_pairs(
      |             ^~~~~~~~~~~~~~~~~~~~~~~~~~
  454 |                 nullptr, temp_bytes,
      |                 ~~~~~~~~~~~~~~~~~~~~
  455 |                 vals_tmp, vals_sorted,
      |                 ~~~~~~~~~~~~~~~~~~~~~~
  456 |                 indices_in, indices_out,
      |                 ~~~~~~~~~~~~~~~~~~~~~~~~
  457 |                 N, 0, sizeof(ValT) * 8, hip_stream);
      |                 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:437:59: note: while substituting into a lambda expression here
  437 |         encoder.launch_kernel([&](hipStream_t hip_stream) {
      |                                                           ^
/path/to/mlx-lm-rocm-support/mlx/dtype_utils.h:50:5: note: in instantiation of function template specialization 'mlx::core::(anonymous namespace)::gpu_sort(const Stream &, array, array &, int, bool)::(anonymous class)::operator()<mlx::core::type_identity<int>>' requested here
   50 |     MLX_INTERNAL_DTYPE_SWITCH_INTS();
      |     ^
/path/to/mlx-lm-rocm-support/mlx/dtype_utils.h:23:3: note: expanded from macro 'MLX_INTERNAL_DTYPE_SWITCH_INTS'
   23 |   MLX_INTERNAL_DTYPE_SWITCH_CASE(int32, int32_t);   \
      |   ^
/path/to/mlx-lm-rocm-support/mlx/dtype_utils.h:17:5: note: expanded from macro 'MLX_INTERNAL_DTYPE_SWITCH_CASE'
   17 |     f(type_identity<TYPE>{});                       \
      |     ^
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:432:5: note: in instantiation of function template specialization 'mlx::core::dispatch_all_types<(lambda at /path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:432:36)>' requested here
  432 |     dispatch_all_types(in.dtype(), [&](auto type_tag) {
      |     ^
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:486:15: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
  486 |               rocprim::radix_sort_pairs(
      |               ^~~~~~~~~~~~~~~~~~~~~~~~~~
  487 |                   temp_storage, temp_bytes,
      |                   ~~~~~~~~~~~~~~~~~~~~~~~~~
  488 |                   vals_tmp, vals_sorted,
      |                   ~~~~~~~~~~~~~~~~~~~~~~
  489 |                   indices_in, indices_out,
      |                   ~~~~~~~~~~~~~~~~~~~~~~~~
  490 |                   N, 0, sizeof(ValT) * 8, hip_stream);
      |                   ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:511:13: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
  511 |             rocprim::radix_sort_keys(
      |             ^~~~~~~~~~~~~~~~~~~~~~~~~
  512 |                 nullptr, temp_bytes,
      |                 ~~~~~~~~~~~~~~~~~~~~
  513 |                 vals_in, vals_out_buf,
      |                 ~~~~~~~~~~~~~~~~~~~~~~
  514 |                 N, 0, sizeof(ValT) * 8, hip_stream);
      |                 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:525:15: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
  525 |               rocprim::radix_sort_keys(
      |               ^~~~~~~~~~~~~~~~~~~~~~~~~
  526 |                   temp_storage, temp_bytes,
      |                   ~~~~~~~~~~~~~~~~~~~~~~~~~
  527 |                   vals_in, vals_out_buf,
      |                   ~~~~~~~~~~~~~~~~~~~~~~
  528 |                   N, 0, sizeof(ValT) * 8, hip_stream);
      |                   ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:453:13: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
  453 |             rocprim::radix_sort_pairs(
      |             ^~~~~~~~~~~~~~~~~~~~~~~~~~
  454 |                 nullptr, temp_bytes,
      |                 ~~~~~~~~~~~~~~~~~~~~
  455 |                 vals_tmp, vals_sorted,
      |                 ~~~~~~~~~~~~~~~~~~~~~~
  456 |                 indices_in, indices_out,
      |                 ~~~~~~~~~~~~~~~~~~~~~~~~
  457 |                 N, 0, sizeof(ValT) * 8, hip_stream);
      |                 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:437:59: note: while substituting into a lambda expression here
  437 |         encoder.launch_kernel([&](hipStream_t hip_stream) {
      |                                                           ^
/path/to/mlx-lm-rocm-support/mlx/dtype_utils.h:50:5: note: in instantiation of function template specialization 'mlx::core::(anonymous namespace)::gpu_sort(const Stream &, array, array &, int, bool)::(anonymous class)::operator()<mlx::core::type_identity<long>>' requested here
   50 |     MLX_INTERNAL_DTYPE_SWITCH_INTS();
      |     ^
/path/to/mlx-lm-rocm-support/mlx/dtype_utils.h:24:3: note: expanded from macro 'MLX_INTERNAL_DTYPE_SWITCH_INTS'
   24 |   MLX_INTERNAL_DTYPE_SWITCH_CASE(int64, int64_t);   \
      |   ^
/path/to/mlx-lm-rocm-support/mlx/dtype_utils.h:17:5: note: expanded from macro 'MLX_INTERNAL_DTYPE_SWITCH_CASE'
   17 |     f(type_identity<TYPE>{});                       \
      |     ^
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:432:5: note: in instantiation of function template specialization 'mlx::core::dispatch_all_types<(lambda at /path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:432:36)>' requested here
  432 |     dispatch_all_types(in.dtype(), [&](auto type_tag) {
      |     ^
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:486:15: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
  486 |               rocprim::radix_sort_pairs(
      |               ^~~~~~~~~~~~~~~~~~~~~~~~~~
  487 |                   temp_storage, temp_bytes,
      |                   ~~~~~~~~~~~~~~~~~~~~~~~~~
  488 |                   vals_tmp, vals_sorted,
      |                   ~~~~~~~~~~~~~~~~~~~~~~
  489 |                   indices_in, indices_out,
      |                   ~~~~~~~~~~~~~~~~~~~~~~~~
  490 |                   N, 0, sizeof(ValT) * 8, hip_stream);
      |                   ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:511:13: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
  511 |             rocprim::radix_sort_keys(
      |             ^~~~~~~~~~~~~~~~~~~~~~~~~
  512 |                 nullptr, temp_bytes,
      |                 ~~~~~~~~~~~~~~~~~~~~
  513 |                 vals_in, vals_out_buf,
      |                 ~~~~~~~~~~~~~~~~~~~~~~
  514 |                 N, 0, sizeof(ValT) * 8, hip_stream);
      |                 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:525:15: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
  525 |               rocprim::radix_sort_keys(
      |               ^~~~~~~~~~~~~~~~~~~~~~~~~
  526 |                   temp_storage, temp_bytes,
      |                   ~~~~~~~~~~~~~~~~~~~~~~~~~
  527 |                   vals_in, vals_out_buf,
      |                   ~~~~~~~~~~~~~~~~~~~~~~
  528 |                   N, 0, sizeof(ValT) * 8, hip_stream);
      |                   ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:453:13: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
  453 |             rocprim::radix_sort_pairs(
      |             ^~~~~~~~~~~~~~~~~~~~~~~~~~
  454 |                 nullptr, temp_bytes,
      |                 ~~~~~~~~~~~~~~~~~~~~
  455 |                 vals_tmp, vals_sorted,
      |                 ~~~~~~~~~~~~~~~~~~~~~~
  456 |                 indices_in, indices_out,
      |                 ~~~~~~~~~~~~~~~~~~~~~~~~
  457 |                 N, 0, sizeof(ValT) * 8, hip_stream);
      |                 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:437:59: note: while substituting into a lambda expression here
  437 |         encoder.launch_kernel([&](hipStream_t hip_stream) {
      |                                                           ^
/path/to/mlx-lm-rocm-support/mlx/dtype_utils.h:50:5: note: in instantiation of function template specialization 'mlx::core::(anonymous namespace)::gpu_sort(const Stream &, array, array &, int, bool)::(anonymous class)::operator()<mlx::core::type_identity<unsigned char>>' requested here
   50 |     MLX_INTERNAL_DTYPE_SWITCH_INTS();
      |     ^
/path/to/mlx-lm-rocm-support/mlx/dtype_utils.h:25:3: note: expanded from macro 'MLX_INTERNAL_DTYPE_SWITCH_INTS'
   25 |   MLX_INTERNAL_DTYPE_SWITCH_CASE(uint8, uint8_t);   \
      |   ^
/path/to/mlx-lm-rocm-support/mlx/dtype_utils.h:17:5: note: expanded from macro 'MLX_INTERNAL_DTYPE_SWITCH_CASE'
   17 |     f(type_identity<TYPE>{});                       \
      |     ^
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:432:5: note: in instantiation of function template specialization 'mlx::core::dispatch_all_types<(lambda at /path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:432:36)>' requested here
  432 |     dispatch_all_types(in.dtype(), [&](auto type_tag) {
      |     ^
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:486:15: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
  486 |               rocprim::radix_sort_pairs(
      |               ^~~~~~~~~~~~~~~~~~~~~~~~~~
  487 |                   temp_storage, temp_bytes,
      |                   ~~~~~~~~~~~~~~~~~~~~~~~~~
  488 |                   vals_tmp, vals_sorted,
      |                   ~~~~~~~~~~~~~~~~~~~~~~
  489 |                   indices_in, indices_out,
      |                   ~~~~~~~~~~~~~~~~~~~~~~~~
  490 |                   N, 0, sizeof(ValT) * 8, hip_stream);
      |                   ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:511:13: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
  511 |             rocprim::radix_sort_keys(
      |             ^~~~~~~~~~~~~~~~~~~~~~~~~
  512 |                 nullptr, temp_bytes,
      |                 ~~~~~~~~~~~~~~~~~~~~
  513 |                 vals_in, vals_out_buf,
      |                 ~~~~~~~~~~~~~~~~~~~~~~
  514 |                 N, 0, sizeof(ValT) * 8, hip_stream);
      |                 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:525:15: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
  525 |               rocprim::radix_sort_keys(
      |               ^~~~~~~~~~~~~~~~~~~~~~~~~
  526 |                   temp_storage, temp_bytes,
      |                   ~~~~~~~~~~~~~~~~~~~~~~~~~
  527 |                   vals_in, vals_out_buf,
      |                   ~~~~~~~~~~~~~~~~~~~~~~
  528 |                   N, 0, sizeof(ValT) * 8, hip_stream);
      |                   ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:453:13: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
  453 |             rocprim::radix_sort_pairs(
      |             ^~~~~~~~~~~~~~~~~~~~~~~~~~
  454 |                 nullptr, temp_bytes,
      |                 ~~~~~~~~~~~~~~~~~~~~
  455 |                 vals_tmp, vals_sorted,
      |                 ~~~~~~~~~~~~~~~~~~~~~~
  456 |                 indices_in, indices_out,
      |                 ~~~~~~~~~~~~~~~~~~~~~~~~
  457 |                 N, 0, sizeof(ValT) * 8, hip_stream);
      |                 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:437:59: note: while substituting into a lambda expression here
  437 |         encoder.launch_kernel([&](hipStream_t hip_stream) {
      |                                                           ^
/path/to/mlx-lm-rocm-support/mlx/dtype_utils.h:50:5: note: in instantiation of function template specialization 'mlx::core::(anonymous namespace)::gpu_sort(const Stream &, array, array &, int, bool)::(anonymous class)::operator()<mlx::core::type_identity<unsigned short>>' requested here
   50 |     MLX_INTERNAL_DTYPE_SWITCH_INTS();
      |     ^
/path/to/mlx-lm-rocm-support/mlx/dtype_utils.h:26:3: note: expanded from macro 'MLX_INTERNAL_DTYPE_SWITCH_INTS'
   26 |   MLX_INTERNAL_DTYPE_SWITCH_CASE(uint16, uint16_t); \
      |   ^
/path/to/mlx-lm-rocm-support/mlx/dtype_utils.h:17:5: note: expanded from macro 'MLX_INTERNAL_DTYPE_SWITCH_CASE'
   17 |     f(type_identity<TYPE>{});                       \
      |     ^
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:432:5: note: in instantiation of function template specialization 'mlx::core::dispatch_all_types<(lambda at /path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:432:36)>' requested here
  432 |     dispatch_all_types(in.dtype(), [&](auto type_tag) {
      |     ^
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:486:15: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
  486 |               rocprim::radix_sort_pairs(
      |               ^~~~~~~~~~~~~~~~~~~~~~~~~~
  487 |                   temp_storage, temp_bytes,
      |                   ~~~~~~~~~~~~~~~~~~~~~~~~~
  488 |                   vals_tmp, vals_sorted,
      |                   ~~~~~~~~~~~~~~~~~~~~~~
  489 |                   indices_in, indices_out,
      |                   ~~~~~~~~~~~~~~~~~~~~~~~~
  490 |                   N, 0, sizeof(ValT) * 8, hip_stream);
      |                   ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:511:13: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
  511 |             rocprim::radix_sort_keys(
      |             ^~~~~~~~~~~~~~~~~~~~~~~~~
  512 |                 nullptr, temp_bytes,
      |                 ~~~~~~~~~~~~~~~~~~~~
  513 |                 vals_in, vals_out_buf,
      |                 ~~~~~~~~~~~~~~~~~~~~~~
  514 |                 N, 0, sizeof(ValT) * 8, hip_stream);
      |                 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:525:15: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
  525 |               rocprim::radix_sort_keys(
      |               ^~~~~~~~~~~~~~~~~~~~~~~~~
  526 |                   temp_storage, temp_bytes,
      |                   ~~~~~~~~~~~~~~~~~~~~~~~~~
  527 |                   vals_in, vals_out_buf,
      |                   ~~~~~~~~~~~~~~~~~~~~~~
  528 |                   N, 0, sizeof(ValT) * 8, hip_stream);
      |                   ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:453:13: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
  453 |             rocprim::radix_sort_pairs(
      |             ^~~~~~~~~~~~~~~~~~~~~~~~~~
  454 |                 nullptr, temp_bytes,
      |                 ~~~~~~~~~~~~~~~~~~~~
  455 |                 vals_tmp, vals_sorted,
      |                 ~~~~~~~~~~~~~~~~~~~~~~
  456 |                 indices_in, indices_out,
      |                 ~~~~~~~~~~~~~~~~~~~~~~~~
  457 |                 N, 0, sizeof(ValT) * 8, hip_stream);
      |                 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:437:59: note: while substituting into a lambda expression here
  437 |         encoder.launch_kernel([&](hipStream_t hip_stream) {
      |                                                           ^
/path/to/mlx-lm-rocm-support/mlx/dtype_utils.h:50:5: note: in instantiation of function template specialization 'mlx::core::(anonymous namespace)::gpu_sort(const Stream &, array, array &, int, bool)::(anonymous class)::operator()<mlx::core::type_identity<unsigned int>>' requested here
   50 |     MLX_INTERNAL_DTYPE_SWITCH_INTS();
      |     ^
/path/to/mlx-lm-rocm-support/mlx/dtype_utils.h:27:3: note: expanded from macro 'MLX_INTERNAL_DTYPE_SWITCH_INTS'
   27 |   MLX_INTERNAL_DTYPE_SWITCH_CASE(uint32, uint32_t); \
      |   ^
/path/to/mlx-lm-rocm-support/mlx/dtype_utils.h:17:5: note: expanded from macro 'MLX_INTERNAL_DTYPE_SWITCH_CASE'
   17 |     f(type_identity<TYPE>{});                       \
      |     ^
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:432:5: note: in instantiation of function template specialization 'mlx::core::dispatch_all_types<(lambda at /path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:432:36)>' requested here
  432 |     dispatch_all_types(in.dtype(), [&](auto type_tag) {
      |     ^
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:486:15: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
  486 |               rocprim::radix_sort_pairs(
      |               ^~~~~~~~~~~~~~~~~~~~~~~~~~
  487 |                   temp_storage, temp_bytes,
      |                   ~~~~~~~~~~~~~~~~~~~~~~~~~
  488 |                   vals_tmp, vals_sorted,
      |                   ~~~~~~~~~~~~~~~~~~~~~~
  489 |                   indices_in, indices_out,
      |                   ~~~~~~~~~~~~~~~~~~~~~~~~
  490 |                   N, 0, sizeof(ValT) * 8, hip_stream);
      |                   ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:511:13: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
  511 |             rocprim::radix_sort_keys(
      |             ^~~~~~~~~~~~~~~~~~~~~~~~~
  512 |                 nullptr, temp_bytes,
      |                 ~~~~~~~~~~~~~~~~~~~~
  513 |                 vals_in, vals_out_buf,
      |                 ~~~~~~~~~~~~~~~~~~~~~~
  514 |                 N, 0, sizeof(ValT) * 8, hip_stream);
      |                 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:525:15: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
  525 |               rocprim::radix_sort_keys(
      |               ^~~~~~~~~~~~~~~~~~~~~~~~~
  526 |                   temp_storage, temp_bytes,
      |                   ~~~~~~~~~~~~~~~~~~~~~~~~~
  527 |                   vals_in, vals_out_buf,
      |                   ~~~~~~~~~~~~~~~~~~~~~~
  528 |                   N, 0, sizeof(ValT) * 8, hip_stream);
      |                   ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:453:13: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
  453 |             rocprim::radix_sort_pairs(
      |             ^~~~~~~~~~~~~~~~~~~~~~~~~~
  454 |                 nullptr, temp_bytes,
      |                 ~~~~~~~~~~~~~~~~~~~~
  455 |                 vals_tmp, vals_sorted,
      |                 ~~~~~~~~~~~~~~~~~~~~~~
  456 |                 indices_in, indices_out,
      |                 ~~~~~~~~~~~~~~~~~~~~~~~~
  457 |                 N, 0, sizeof(ValT) * 8, hip_stream);
      |                 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:437:59: note: while substituting into a lambda expression here
  437 |         encoder.launch_kernel([&](hipStream_t hip_stream) {
      |                                                           ^
/path/to/mlx-lm-rocm-support/mlx/dtype_utils.h:50:5: note: in instantiation of function template specialization 'mlx::core::(anonymous namespace)::gpu_sort(const Stream &, array, array &, int, bool)::(anonymous class)::operator()<mlx::core::type_identity<unsigned long>>' requested here
   50 |     MLX_INTERNAL_DTYPE_SWITCH_INTS();
      |     ^
/path/to/mlx-lm-rocm-support/mlx/dtype_utils.h:28:3: note: expanded from macro 'MLX_INTERNAL_DTYPE_SWITCH_INTS'
   28 |   MLX_INTERNAL_DTYPE_SWITCH_CASE(uint64, uint64_t)
      |   ^
/path/to/mlx-lm-rocm-support/mlx/dtype_utils.h:17:5: note: expanded from macro 'MLX_INTERNAL_DTYPE_SWITCH_CASE'
   17 |     f(type_identity<TYPE>{});                       \
      |     ^
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:432:5: note: in instantiation of function template specialization 'mlx::core::dispatch_all_types<(lambda at /path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:432:36)>' requested here
  432 |     dispatch_all_types(in.dtype(), [&](auto type_tag) {
      |     ^
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:486:15: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
  486 |               rocprim::radix_sort_pairs(
      |               ^~~~~~~~~~~~~~~~~~~~~~~~~~
  487 |                   temp_storage, temp_bytes,
      |                   ~~~~~~~~~~~~~~~~~~~~~~~~~
  488 |                   vals_tmp, vals_sorted,
      |                   ~~~~~~~~~~~~~~~~~~~~~~
  489 |                   indices_in, indices_out,
      |                   ~~~~~~~~~~~~~~~~~~~~~~~~
  490 |                   N, 0, sizeof(ValT) * 8, hip_stream);
      |                   ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:511:13: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
  511 |             rocprim::radix_sort_keys(
      |             ^~~~~~~~~~~~~~~~~~~~~~~~~
  512 |                 nullptr, temp_bytes,
      |                 ~~~~~~~~~~~~~~~~~~~~
  513 |                 vals_in, vals_out_buf,
      |                 ~~~~~~~~~~~~~~~~~~~~~~
  514 |                 N, 0, sizeof(ValT) * 8, hip_stream);
      |                 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:525:15: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
  525 |               rocprim::radix_sort_keys(
      |               ^~~~~~~~~~~~~~~~~~~~~~~~~
  526 |                   temp_storage, temp_bytes,
      |                   ~~~~~~~~~~~~~~~~~~~~~~~~~
  527 |                   vals_in, vals_out_buf,
      |                   ~~~~~~~~~~~~~~~~~~~~~~
  528 |                   N, 0, sizeof(ValT) * 8, hip_stream);
      |                   ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:453:13: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
  453 |             rocprim::radix_sort_pairs(
      |             ^~~~~~~~~~~~~~~~~~~~~~~~~~
  454 |                 nullptr, temp_bytes,
      |                 ~~~~~~~~~~~~~~~~~~~~
  455 |                 vals_tmp, vals_sorted,
      |                 ~~~~~~~~~~~~~~~~~~~~~~
  456 |                 indices_in, indices_out,
      |                 ~~~~~~~~~~~~~~~~~~~~~~~~
  457 |                 N, 0, sizeof(ValT) * 8, hip_stream);
      |                 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:437:59: note: while substituting into a lambda expression here
  437 |         encoder.launch_kernel([&](hipStream_t hip_stream) {
      |                                                           ^
/path/to/mlx-lm-rocm-support/mlx/dtype_utils.h:51:5: note: in instantiation of function template specialization 'mlx::core::(anonymous namespace)::gpu_sort(const Stream &, array, array &, int, bool)::(anonymous class)::operator()<mlx::core::type_identity<mlx::core::_MLX_Float16>>' requested here
   51 |     MLX_INTERNAL_DTYPE_SWITCH_FLOATS();
      |     ^
/path/to/mlx-lm-rocm-support/mlx/dtype_utils.h:31:3: note: expanded from macro 'MLX_INTERNAL_DTYPE_SWITCH_FLOATS'
   31 |   MLX_INTERNAL_DTYPE_SWITCH_CASE(float16, float16_t);   \
      |   ^
/path/to/mlx-lm-rocm-support/mlx/dtype_utils.h:17:5: note: expanded from macro 'MLX_INTERNAL_DTYPE_SWITCH_CASE'
   17 |     f(type_identity<TYPE>{});                       \
      |     ^
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:432:5: note: in instantiation of function template specialization 'mlx::core::dispatch_all_types<(lambda at /path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:432:36)>' requested here
  432 |     dispatch_all_types(in.dtype(), [&](auto type_tag) {
      |     ^
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:486:15: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
  486 |               rocprim::radix_sort_pairs(
      |               ^~~~~~~~~~~~~~~~~~~~~~~~~~
  487 |                   temp_storage, temp_bytes,
      |                   ~~~~~~~~~~~~~~~~~~~~~~~~~
  488 |                   vals_tmp, vals_sorted,
      |                   ~~~~~~~~~~~~~~~~~~~~~~
  489 |                   indices_in, indices_out,
      |                   ~~~~~~~~~~~~~~~~~~~~~~~~
  490 |                   N, 0, sizeof(ValT) * 8, hip_stream);
      |                   ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:511:13: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
  511 |             rocprim::radix_sort_keys(
      |             ^~~~~~~~~~~~~~~~~~~~~~~~~
  512 |                 nullptr, temp_bytes,
      |                 ~~~~~~~~~~~~~~~~~~~~
  513 |                 vals_in, vals_out_buf,
      |                 ~~~~~~~~~~~~~~~~~~~~~~
  514 |                 N, 0, sizeof(ValT) * 8, hip_stream);
      |                 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:525:15: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
  525 |               rocprim::radix_sort_keys(
      |               ^~~~~~~~~~~~~~~~~~~~~~~~~
  526 |                   temp_storage, temp_bytes,
      |                   ~~~~~~~~~~~~~~~~~~~~~~~~~
  527 |                   vals_in, vals_out_buf,
      |                   ~~~~~~~~~~~~~~~~~~~~~~
  528 |                   N, 0, sizeof(ValT) * 8, hip_stream);
      |                   ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:453:13: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
  453 |             rocprim::radix_sort_pairs(
      |             ^~~~~~~~~~~~~~~~~~~~~~~~~~
  454 |                 nullptr, temp_bytes,
      |                 ~~~~~~~~~~~~~~~~~~~~
  455 |                 vals_tmp, vals_sorted,
      |                 ~~~~~~~~~~~~~~~~~~~~~~
  456 |                 indices_in, indices_out,
      |                 ~~~~~~~~~~~~~~~~~~~~~~~~
  457 |                 N, 0, sizeof(ValT) * 8, hip_stream);
      |                 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:437:59: note: while substituting into a lambda expression here
  437 |         encoder.launch_kernel([&](hipStream_t hip_stream) {
      |                                                           ^
/path/to/mlx-lm-rocm-support/mlx/dtype_utils.h:51:5: note: in instantiation of function template specialization 'mlx::core::(anonymous namespace)::gpu_sort(const Stream &, array, array &, int, bool)::(anonymous class)::operator()<mlx::core::type_identity<mlx::core::_MLX_BFloat16>>' requested here
   51 |     MLX_INTERNAL_DTYPE_SWITCH_FLOATS();
      |     ^
/path/to/mlx-lm-rocm-support/mlx/dtype_utils.h:32:3: note: expanded from macro 'MLX_INTERNAL_DTYPE_SWITCH_FLOATS'
   32 |   MLX_INTERNAL_DTYPE_SWITCH_CASE(bfloat16, bfloat16_t); \
      |   ^
/path/to/mlx-lm-rocm-support/mlx/dtype_utils.h:17:5: note: expanded from macro 'MLX_INTERNAL_DTYPE_SWITCH_CASE'
   17 |     f(type_identity<TYPE>{});                       \
      |     ^
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:432:5: note: in instantiation of function template specialization 'mlx::core::dispatch_all_types<(lambda at /path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:432:36)>' requested here
  432 |     dispatch_all_types(in.dtype(), [&](auto type_tag) {
      |     ^
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:486:15: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
  486 |               rocprim::radix_sort_pairs(
      |               ^~~~~~~~~~~~~~~~~~~~~~~~~~
  487 |                   temp_storage, temp_bytes,
      |                   ~~~~~~~~~~~~~~~~~~~~~~~~~
  488 |                   vals_tmp, vals_sorted,
      |                   ~~~~~~~~~~~~~~~~~~~~~~
  489 |                   indices_in, indices_out,
      |                   ~~~~~~~~~~~~~~~~~~~~~~~~
  490 |                   N, 0, sizeof(ValT) * 8, hip_stream);
      |                   ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:511:13: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
  511 |             rocprim::radix_sort_keys(
      |             ^~~~~~~~~~~~~~~~~~~~~~~~~
  512 |                 nullptr, temp_bytes,
      |                 ~~~~~~~~~~~~~~~~~~~~
  513 |                 vals_in, vals_out_buf,
      |                 ~~~~~~~~~~~~~~~~~~~~~~
  514 |                 N, 0, sizeof(ValT) * 8, hip_stream);
      |                 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:525:15: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
  525 |               rocprim::radix_sort_keys(
      |               ^~~~~~~~~~~~~~~~~~~~~~~~~
  526 |                   temp_storage, temp_bytes,
      |                   ~~~~~~~~~~~~~~~~~~~~~~~~~
  527 |                   vals_in, vals_out_buf,
      |                   ~~~~~~~~~~~~~~~~~~~~~~
  528 |                   N, 0, sizeof(ValT) * 8, hip_stream);
      |                   ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:453:13: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
  453 |             rocprim::radix_sort_pairs(
      |             ^~~~~~~~~~~~~~~~~~~~~~~~~~
  454 |                 nullptr, temp_bytes,
      |                 ~~~~~~~~~~~~~~~~~~~~
  455 |                 vals_tmp, vals_sorted,
      |                 ~~~~~~~~~~~~~~~~~~~~~~
  456 |                 indices_in, indices_out,
      |                 ~~~~~~~~~~~~~~~~~~~~~~~~
  457 |                 N, 0, sizeof(ValT) * 8, hip_stream);
      |                 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:437:59: note: while substituting into a lambda expression here
  437 |         encoder.launch_kernel([&](hipStream_t hip_stream) {
      |                                                           ^
/path/to/mlx-lm-rocm-support/mlx/dtype_utils.h:51:5: note: in instantiation of function template specialization 'mlx::core::(anonymous namespace)::gpu_sort(const Stream &, array, array &, int, bool)::(anonymous class)::operator()<mlx::core::type_identity<float>>' requested here
   51 |     MLX_INTERNAL_DTYPE_SWITCH_FLOATS();
      |     ^
/path/to/mlx-lm-rocm-support/mlx/dtype_utils.h:33:3: note: expanded from macro 'MLX_INTERNAL_DTYPE_SWITCH_FLOATS'
   33 |   MLX_INTERNAL_DTYPE_SWITCH_CASE(float32, float);       \
      |   ^
/path/to/mlx-lm-rocm-support/mlx/dtype_utils.h:17:5: note: expanded from macro 'MLX_INTERNAL_DTYPE_SWITCH_CASE'
   17 |     f(type_identity<TYPE>{});                       \
      |     ^
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:432:5: note: in instantiation of function template specialization 'mlx::core::dispatch_all_types<(lambda at /path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:432:36)>' requested here
  432 |     dispatch_all_types(in.dtype(), [&](auto type_tag) {
      |     ^
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:486:15: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
  486 |               rocprim::radix_sort_pairs(
      |               ^~~~~~~~~~~~~~~~~~~~~~~~~~
  487 |                   temp_storage, temp_bytes,
      |                   ~~~~~~~~~~~~~~~~~~~~~~~~~
  488 |                   vals_tmp, vals_sorted,
      |                   ~~~~~~~~~~~~~~~~~~~~~~
  489 |                   indices_in, indices_out,
      |                   ~~~~~~~~~~~~~~~~~~~~~~~~
  490 |                   N, 0, sizeof(ValT) * 8, hip_stream);
      |                   ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:511:13: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
  511 |             rocprim::radix_sort_keys(
      |             ^~~~~~~~~~~~~~~~~~~~~~~~~
  512 |                 nullptr, temp_bytes,
      |                 ~~~~~~~~~~~~~~~~~~~~
  513 |                 vals_in, vals_out_buf,
      |                 ~~~~~~~~~~~~~~~~~~~~~~
  514 |                 N, 0, sizeof(ValT) * 8, hip_stream);
      |                 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:525:15: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
  525 |               rocprim::radix_sort_keys(
      |               ^~~~~~~~~~~~~~~~~~~~~~~~~
  526 |                   temp_storage, temp_bytes,
      |                   ~~~~~~~~~~~~~~~~~~~~~~~~~
  527 |                   vals_in, vals_out_buf,
      |                   ~~~~~~~~~~~~~~~~~~~~~~
  528 |                   N, 0, sizeof(ValT) * 8, hip_stream);
      |                   ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:453:13: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
  453 |             rocprim::radix_sort_pairs(
      |             ^~~~~~~~~~~~~~~~~~~~~~~~~~
  454 |                 nullptr, temp_bytes,
      |                 ~~~~~~~~~~~~~~~~~~~~
  455 |                 vals_tmp, vals_sorted,
      |                 ~~~~~~~~~~~~~~~~~~~~~~
  456 |                 indices_in, indices_out,
      |                 ~~~~~~~~~~~~~~~~~~~~~~~~
  457 |                 N, 0, sizeof(ValT) * 8, hip_stream);
      |                 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:437:59: note: while substituting into a lambda expression here
  437 |         encoder.launch_kernel([&](hipStream_t hip_stream) {
      |                                                           ^
/path/to/mlx-lm-rocm-support/mlx/dtype_utils.h:51:5: note: in instantiation of function template specialization 'mlx::core::(anonymous namespace)::gpu_sort(const Stream &, array, array &, int, bool)::(anonymous class)::operator()<mlx::core::type_identity<double>>' requested here
   51 |     MLX_INTERNAL_DTYPE_SWITCH_FLOATS();
      |     ^
/path/to/mlx-lm-rocm-support/mlx/dtype_utils.h:34:3: note: expanded from macro 'MLX_INTERNAL_DTYPE_SWITCH_FLOATS'
   34 |   MLX_INTERNAL_DTYPE_SWITCH_CASE(float64, double)
      |   ^
/path/to/mlx-lm-rocm-support/mlx/dtype_utils.h:17:5: note: expanded from macro 'MLX_INTERNAL_DTYPE_SWITCH_CASE'
   17 |     f(type_identity<TYPE>{});                       \
      |     ^
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:432:5: note: in instantiation of function template specialization 'mlx::core::dispatch_all_types<(lambda at /path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:432:36)>' requested here
  432 |     dispatch_all_types(in.dtype(), [&](auto type_tag) {
      |     ^
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:486:15: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
  486 |               rocprim::radix_sort_pairs(
      |               ^~~~~~~~~~~~~~~~~~~~~~~~~~
  487 |                   temp_storage, temp_bytes,
      |                   ~~~~~~~~~~~~~~~~~~~~~~~~~
  488 |                   vals_tmp, vals_sorted,
      |                   ~~~~~~~~~~~~~~~~~~~~~~
  489 |                   indices_in, indices_out,
      |                   ~~~~~~~~~~~~~~~~~~~~~~~~
  490 |                   N, 0, sizeof(ValT) * 8, hip_stream);
      |                   ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:511:13: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
  511 |             rocprim::radix_sort_keys(
      |             ^~~~~~~~~~~~~~~~~~~~~~~~~
  512 |                 nullptr, temp_bytes,
      |                 ~~~~~~~~~~~~~~~~~~~~
  513 |                 vals_in, vals_out_buf,
      |                 ~~~~~~~~~~~~~~~~~~~~~~
  514 |                 N, 0, sizeof(ValT) * 8, hip_stream);
      |                 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:525:15: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
  525 |               rocprim::radix_sort_keys(
      |               ^~~~~~~~~~~~~~~~~~~~~~~~~
  526 |                   temp_storage, temp_bytes,
      |                   ~~~~~~~~~~~~~~~~~~~~~~~~~
  527 |                   vals_in, vals_out_buf,
      |                   ~~~~~~~~~~~~~~~~~~~~~~
  528 |                   N, 0, sizeof(ValT) * 8, hip_stream);
      |                   ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
55 warnings and 4 errors generated when compiling for gfx1103.
failed to execute:/opt/rocm/lib/llvm/bin/clang++  --offload-arch=gfx1103 -O3  -c -x hip /path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip -o "/path/to/mlx-lm-rocm-support/build/mlx/backend/rocm/hip_objs/sort.o" -fPIC -DMLX_USE_ROCM -I/path/to/mlx-lm-rocm-support -I/opt/rocm/include -I/usr/include/c++/11 -I/usr/include/x86_64-linux-gnu/c++/11 -I/usr/include/c++/11/backward -I/usr/include/x86_64-linux-gnu -I/usr/include -I/opt/rocm/include -I/opt/rocm/include/ -I/opt/rocm/include -I/opt/rocm/include -I/opt/rocm/include -I/opt/rocm/include/hiprand -I/opt/rocm/include -I/opt/rocm/include -std=c++17 -parallel-jobs=8
make[2]: *** [mlx/backend/rocm/CMakeFiles/mlx_rocm_kernels_lib.dir/build.make:254: mlx/backend/rocm/hip_objs/sort.o] Error 1
make[1]: *** [CMakeFiles/Makefile2:1549: mlx/backend/rocm/CMakeFiles/mlx_rocm_kernels_lib.dir/all] Error 2
make: *** [Makefile:146: all] Error 2

Geramy added 2 commits April 3, 2026 17:55
ROCm 7.12 (clang 22) provides __device__ placement new/delete
via cuda_wrappers/new, causing redefinition errors. Guard with
__CLANG_CUDA_WRAPPERS_NEW so the manual definitions are only
compiled on older ROCm versions that lack them.
@Geramy
Copy link
Copy Markdown

Geramy commented Apr 4, 2026

I installed Cmake 4.3.1 and ran rocm-7.12.sh with a filter of FILTER_GFX="gfx110X" in a clean container of the rocm/dev-ubuntu-22.04 image

Then make -j$(nproc) fails (after a successful run of cmake) with the following error:

# make -j$(nproc)
[  1%] Built target cpu_compiled_preamble
[  1%] Compiling HIP source /path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip
[  1%] Built target mlx_version
[  3%] Built target gguflib
In file included from /path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:3:
In file included from /path/to/mlx-lm-rocm-support/mlx/backend/rocm/device.h:5:
In file included from /path/to/mlx-lm-rocm-support/mlx/array.h:13:
In file included from /path/to/mlx-lm-rocm-support/mlx/event.h:8:
In file included from /path/to/mlx-lm-rocm-support/mlx/stream.h:8:
/path/to/mlx-lm-rocm-support/mlx/device.h:28:42: warning: defaulted comparison operators are a C++20 extension [-Wc++20-extensions]
   28 |   bool operator==(const Device&) const = default;
      |                                          ^
In file included from /path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:3:
In file included from /path/to/mlx-lm-rocm-support/mlx/backend/rocm/device.h:5:
In file included from /path/to/mlx-lm-rocm-support/mlx/array.h:13:
In file included from /path/to/mlx-lm-rocm-support/mlx/event.h:8:
/path/to/mlx-lm-rocm-support/mlx/stream.h:18:42: warning: defaulted comparison operators are a C++20 extension [-Wc++20-extensions]
   18 |   bool operator==(const Stream&) const = default;
      |                                          ^
In file included from /path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:3:
In file included from /path/to/mlx-lm-rocm-support/mlx/backend/rocm/device.h:15:
In file included from /opt/rocm/include/thrust/execution_policy.h:24:
In file included from /opt/rocm/include/thrust/detail/config.h:22:
In file included from /opt/rocm/include/thrust/detail/config/config.h:23:
In file included from /opt/rocm/include/thrust/detail/config/libcxx.h:44:
In file included from /opt/rocm/include/cuda/std/version:31:
In file included from /opt/rocm/include/cuda/std/detail/__config:67:
/opt/rocm/include/cuda/std/detail/libcxx/include/__config:371:12: warning: Assuming 100 MHz realtime clock rate (TSC) for gfx1100/gfx1101 (according to the RDNA3 ISA). Timing-related APIs (e.g., chrono) or sleep instructions may behave incorrectly! [-W#warnings]
  371 | #          warning Assuming 100 MHz realtime clock rate (TSC) for gfx1100/gfx1101 (according to the RDNA3 ISA). Timing-related APIs (e.g., chrono) or sleep instructions may behave incorrectly!
      |            ^
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:14:25: error: redefinition of 'operator new'
   14 | __device__ inline void* operator new(size_t, void* p) noexcept { return p; }
      |                         ^
/opt/rocm/lib/llvm/lib/clang/22/include/cuda_wrappers/new:95:25: note: previous definition is here
   95 | __device__ inline void *operator new(__SIZE_TYPE__, void *__ptr) CUDA_NOEXCEPT {
      |                         ^
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:15:25: error: redefinition of 'operator new[]'
   15 | __device__ inline void* operator new[](size_t, void* p) noexcept { return p; }
      |                         ^
/opt/rocm/lib/llvm/lib/clang/22/include/cuda_wrappers/new:98:25: note: previous definition is here
   98 | __device__ inline void *operator new[](__SIZE_TYPE__, void *__ptr) CUDA_NOEXCEPT {
      |                         ^
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:16:24: error: redefinition of 'operator delete'
   16 | __device__ inline void operator delete(void*, void*) noexcept {}
      |                        ^
/opt/rocm/lib/llvm/lib/clang/22/include/cuda_wrappers/new:102:24: note: previous definition is here
  102 | __device__ inline void operator delete(void *, void *) CUDA_NOEXCEPT {}
      |                        ^
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:17:24: error: redefinition of 'operator delete[]'
   17 | __device__ inline void operator delete[](void*, void*) noexcept {}
      |                        ^
/opt/rocm/lib/llvm/lib/clang/22/include/cuda_wrappers/new:103:24: note: previous definition is here
  103 | __device__ inline void operator delete[](void *, void *) CUDA_NOEXCEPT {}
      |                        ^
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:453:13: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
  453 |             rocprim::radix_sort_pairs(
      |             ^~~~~~~~~~~~~~~~~~~~~~~~~~
  454 |                 nullptr, temp_bytes,
      |                 ~~~~~~~~~~~~~~~~~~~~
  455 |                 vals_tmp, vals_sorted,
      |                 ~~~~~~~~~~~~~~~~~~~~~~
  456 |                 indices_in, indices_out,
      |                 ~~~~~~~~~~~~~~~~~~~~~~~~
  457 |                 N, 0, sizeof(ValT) * 8, hip_stream);
      |                 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:437:59: note: while substituting into a lambda expression here
  437 |         encoder.launch_kernel([&](hipStream_t hip_stream) {
      |                                                           ^
/path/to/mlx-lm-rocm-support/mlx/dtype_utils.h:49:5: note: in instantiation of function template specialization 'mlx::core::(anonymous namespace)::gpu_sort(const Stream &, array, array &, int, bool)::(anonymous class)::operator()<mlx::core::type_identity<bool>>' requested here
   49 |     MLX_INTERNAL_DTYPE_SWITCH_CASE(bool_, bool);
      |     ^
/path/to/mlx-lm-rocm-support/mlx/dtype_utils.h:17:5: note: expanded from macro 'MLX_INTERNAL_DTYPE_SWITCH_CASE'
   17 |     f(type_identity<TYPE>{});                       \
      |     ^
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:432:5: note: in instantiation of function template specialization 'mlx::core::dispatch_all_types<(lambda at /path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:432:36)>' requested here
  432 |     dispatch_all_types(in.dtype(), [&](auto type_tag) {
      |     ^
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:486:15: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
  486 |               rocprim::radix_sort_pairs(
      |               ^~~~~~~~~~~~~~~~~~~~~~~~~~
  487 |                   temp_storage, temp_bytes,
      |                   ~~~~~~~~~~~~~~~~~~~~~~~~~
  488 |                   vals_tmp, vals_sorted,
      |                   ~~~~~~~~~~~~~~~~~~~~~~
  489 |                   indices_in, indices_out,
      |                   ~~~~~~~~~~~~~~~~~~~~~~~~
  490 |                   N, 0, sizeof(ValT) * 8, hip_stream);
      |                   ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:511:13: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
  511 |             rocprim::radix_sort_keys(
      |             ^~~~~~~~~~~~~~~~~~~~~~~~~
  512 |                 nullptr, temp_bytes,
      |                 ~~~~~~~~~~~~~~~~~~~~
  513 |                 vals_in, vals_out_buf,
      |                 ~~~~~~~~~~~~~~~~~~~~~~
  514 |                 N, 0, sizeof(ValT) * 8, hip_stream);
      |                 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:525:15: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
  525 |               rocprim::radix_sort_keys(
      |               ^~~~~~~~~~~~~~~~~~~~~~~~~
  526 |                   temp_storage, temp_bytes,
      |                   ~~~~~~~~~~~~~~~~~~~~~~~~~
  527 |                   vals_in, vals_out_buf,
      |                   ~~~~~~~~~~~~~~~~~~~~~~
  528 |                   N, 0, sizeof(ValT) * 8, hip_stream);
      |                   ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:453:13: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
  453 |             rocprim::radix_sort_pairs(
      |             ^~~~~~~~~~~~~~~~~~~~~~~~~~
  454 |                 nullptr, temp_bytes,
      |                 ~~~~~~~~~~~~~~~~~~~~
  455 |                 vals_tmp, vals_sorted,
      |                 ~~~~~~~~~~~~~~~~~~~~~~
  456 |                 indices_in, indices_out,
      |                 ~~~~~~~~~~~~~~~~~~~~~~~~
  457 |                 N, 0, sizeof(ValT) * 8, hip_stream);
      |                 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:437:59: note: while substituting into a lambda expression here
  437 |         encoder.launch_kernel([&](hipStream_t hip_stream) {
      |                                                           ^
/path/to/mlx-lm-rocm-support/mlx/dtype_utils.h:50:5: note: in instantiation of function template specialization 'mlx::core::(anonymous namespace)::gpu_sort(const Stream &, array, array &, int, bool)::(anonymous class)::operator()<mlx::core::type_identity<signed char>>' requested here
   50 |     MLX_INTERNAL_DTYPE_SWITCH_INTS();
      |     ^
/path/to/mlx-lm-rocm-support/mlx/dtype_utils.h:21:3: note: expanded from macro 'MLX_INTERNAL_DTYPE_SWITCH_INTS'
   21 |   MLX_INTERNAL_DTYPE_SWITCH_CASE(int8, int8_t);     \
      |   ^
/path/to/mlx-lm-rocm-support/mlx/dtype_utils.h:17:5: note: expanded from macro 'MLX_INTERNAL_DTYPE_SWITCH_CASE'
   17 |     f(type_identity<TYPE>{});                       \
      |     ^
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:432:5: note: in instantiation of function template specialization 'mlx::core::dispatch_all_types<(lambda at /path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:432:36)>' requested here
  432 |     dispatch_all_types(in.dtype(), [&](auto type_tag) {
      |     ^
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:486:15: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
  486 |               rocprim::radix_sort_pairs(
      |               ^~~~~~~~~~~~~~~~~~~~~~~~~~
  487 |                   temp_storage, temp_bytes,
      |                   ~~~~~~~~~~~~~~~~~~~~~~~~~
  488 |                   vals_tmp, vals_sorted,
      |                   ~~~~~~~~~~~~~~~~~~~~~~
  489 |                   indices_in, indices_out,
      |                   ~~~~~~~~~~~~~~~~~~~~~~~~
  490 |                   N, 0, sizeof(ValT) * 8, hip_stream);
      |                   ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:511:13: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
  511 |             rocprim::radix_sort_keys(
      |             ^~~~~~~~~~~~~~~~~~~~~~~~~
  512 |                 nullptr, temp_bytes,
      |                 ~~~~~~~~~~~~~~~~~~~~
  513 |                 vals_in, vals_out_buf,
      |                 ~~~~~~~~~~~~~~~~~~~~~~
  514 |                 N, 0, sizeof(ValT) * 8, hip_stream);
      |                 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:525:15: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
  525 |               rocprim::radix_sort_keys(
      |               ^~~~~~~~~~~~~~~~~~~~~~~~~
  526 |                   temp_storage, temp_bytes,
      |                   ~~~~~~~~~~~~~~~~~~~~~~~~~
  527 |                   vals_in, vals_out_buf,
      |                   ~~~~~~~~~~~~~~~~~~~~~~
  528 |                   N, 0, sizeof(ValT) * 8, hip_stream);
      |                   ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:453:13: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
  453 |             rocprim::radix_sort_pairs(
      |             ^~~~~~~~~~~~~~~~~~~~~~~~~~
  454 |                 nullptr, temp_bytes,
      |                 ~~~~~~~~~~~~~~~~~~~~
  455 |                 vals_tmp, vals_sorted,
      |                 ~~~~~~~~~~~~~~~~~~~~~~
  456 |                 indices_in, indices_out,
      |                 ~~~~~~~~~~~~~~~~~~~~~~~~
  457 |                 N, 0, sizeof(ValT) * 8, hip_stream);
      |                 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:437:59: note: while substituting into a lambda expression here
  437 |         encoder.launch_kernel([&](hipStream_t hip_stream) {
      |                                                           ^
/path/to/mlx-lm-rocm-support/mlx/dtype_utils.h:50:5: note: in instantiation of function template specialization 'mlx::core::(anonymous namespace)::gpu_sort(const Stream &, array, array &, int, bool)::(anonymous class)::operator()<mlx::core::type_identity<short>>' requested here
   50 |     MLX_INTERNAL_DTYPE_SWITCH_INTS();
      |     ^
/path/to/mlx-lm-rocm-support/mlx/dtype_utils.h:22:3: note: expanded from macro 'MLX_INTERNAL_DTYPE_SWITCH_INTS'
   22 |   MLX_INTERNAL_DTYPE_SWITCH_CASE(int16, int16_t);   \
      |   ^
/path/to/mlx-lm-rocm-support/mlx/dtype_utils.h:17:5: note: expanded from macro 'MLX_INTERNAL_DTYPE_SWITCH_CASE'
   17 |     f(type_identity<TYPE>{});                       \
      |     ^
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:432:5: note: in instantiation of function template specialization 'mlx::core::dispatch_all_types<(lambda at /path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:432:36)>' requested here
  432 |     dispatch_all_types(in.dtype(), [&](auto type_tag) {
      |     ^
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:486:15: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
  486 |               rocprim::radix_sort_pairs(
      |               ^~~~~~~~~~~~~~~~~~~~~~~~~~
  487 |                   temp_storage, temp_bytes,
      |                   ~~~~~~~~~~~~~~~~~~~~~~~~~
  488 |                   vals_tmp, vals_sorted,
      |                   ~~~~~~~~~~~~~~~~~~~~~~
  489 |                   indices_in, indices_out,
      |                   ~~~~~~~~~~~~~~~~~~~~~~~~
  490 |                   N, 0, sizeof(ValT) * 8, hip_stream);
      |                   ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:511:13: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
  511 |             rocprim::radix_sort_keys(
      |             ^~~~~~~~~~~~~~~~~~~~~~~~~
  512 |                 nullptr, temp_bytes,
      |                 ~~~~~~~~~~~~~~~~~~~~
  513 |                 vals_in, vals_out_buf,
      |                 ~~~~~~~~~~~~~~~~~~~~~~
  514 |                 N, 0, sizeof(ValT) * 8, hip_stream);
      |                 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:525:15: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
  525 |               rocprim::radix_sort_keys(
      |               ^~~~~~~~~~~~~~~~~~~~~~~~~
  526 |                   temp_storage, temp_bytes,
      |                   ~~~~~~~~~~~~~~~~~~~~~~~~~
  527 |                   vals_in, vals_out_buf,
      |                   ~~~~~~~~~~~~~~~~~~~~~~
  528 |                   N, 0, sizeof(ValT) * 8, hip_stream);
      |                   ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:453:13: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
  453 |             rocprim::radix_sort_pairs(
      |             ^~~~~~~~~~~~~~~~~~~~~~~~~~
  454 |                 nullptr, temp_bytes,
      |                 ~~~~~~~~~~~~~~~~~~~~
  455 |                 vals_tmp, vals_sorted,
      |                 ~~~~~~~~~~~~~~~~~~~~~~
  456 |                 indices_in, indices_out,
      |                 ~~~~~~~~~~~~~~~~~~~~~~~~
  457 |                 N, 0, sizeof(ValT) * 8, hip_stream);
      |                 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:437:59: note: while substituting into a lambda expression here
  437 |         encoder.launch_kernel([&](hipStream_t hip_stream) {
      |                                                           ^
/path/to/mlx-lm-rocm-support/mlx/dtype_utils.h:50:5: note: in instantiation of function template specialization 'mlx::core::(anonymous namespace)::gpu_sort(const Stream &, array, array &, int, bool)::(anonymous class)::operator()<mlx::core::type_identity<int>>' requested here
   50 |     MLX_INTERNAL_DTYPE_SWITCH_INTS();
      |     ^
/path/to/mlx-lm-rocm-support/mlx/dtype_utils.h:23:3: note: expanded from macro 'MLX_INTERNAL_DTYPE_SWITCH_INTS'
   23 |   MLX_INTERNAL_DTYPE_SWITCH_CASE(int32, int32_t);   \
      |   ^
/path/to/mlx-lm-rocm-support/mlx/dtype_utils.h:17:5: note: expanded from macro 'MLX_INTERNAL_DTYPE_SWITCH_CASE'
   17 |     f(type_identity<TYPE>{});                       \
      |     ^
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:432:5: note: in instantiation of function template specialization 'mlx::core::dispatch_all_types<(lambda at /path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:432:36)>' requested here
  432 |     dispatch_all_types(in.dtype(), [&](auto type_tag) {
      |     ^
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:486:15: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
  486 |               rocprim::radix_sort_pairs(
      |               ^~~~~~~~~~~~~~~~~~~~~~~~~~
  487 |                   temp_storage, temp_bytes,
      |                   ~~~~~~~~~~~~~~~~~~~~~~~~~
  488 |                   vals_tmp, vals_sorted,
      |                   ~~~~~~~~~~~~~~~~~~~~~~
  489 |                   indices_in, indices_out,
      |                   ~~~~~~~~~~~~~~~~~~~~~~~~
  490 |                   N, 0, sizeof(ValT) * 8, hip_stream);
      |                   ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:511:13: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
  511 |             rocprim::radix_sort_keys(
      |             ^~~~~~~~~~~~~~~~~~~~~~~~~
  512 |                 nullptr, temp_bytes,
      |                 ~~~~~~~~~~~~~~~~~~~~
  513 |                 vals_in, vals_out_buf,
      |                 ~~~~~~~~~~~~~~~~~~~~~~
  514 |                 N, 0, sizeof(ValT) * 8, hip_stream);
      |                 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:525:15: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
  525 |               rocprim::radix_sort_keys(
      |               ^~~~~~~~~~~~~~~~~~~~~~~~~
  526 |                   temp_storage, temp_bytes,
      |                   ~~~~~~~~~~~~~~~~~~~~~~~~~
  527 |                   vals_in, vals_out_buf,
      |                   ~~~~~~~~~~~~~~~~~~~~~~
  528 |                   N, 0, sizeof(ValT) * 8, hip_stream);
      |                   ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:453:13: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
  453 |             rocprim::radix_sort_pairs(
      |             ^~~~~~~~~~~~~~~~~~~~~~~~~~
  454 |                 nullptr, temp_bytes,
      |                 ~~~~~~~~~~~~~~~~~~~~
  455 |                 vals_tmp, vals_sorted,
      |                 ~~~~~~~~~~~~~~~~~~~~~~
  456 |                 indices_in, indices_out,
      |                 ~~~~~~~~~~~~~~~~~~~~~~~~
  457 |                 N, 0, sizeof(ValT) * 8, hip_stream);
      |                 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:437:59: note: while substituting into a lambda expression here
  437 |         encoder.launch_kernel([&](hipStream_t hip_stream) {
      |                                                           ^
/path/to/mlx-lm-rocm-support/mlx/dtype_utils.h:50:5: note: in instantiation of function template specialization 'mlx::core::(anonymous namespace)::gpu_sort(const Stream &, array, array &, int, bool)::(anonymous class)::operator()<mlx::core::type_identity<long>>' requested here
   50 |     MLX_INTERNAL_DTYPE_SWITCH_INTS();
      |     ^
/path/to/mlx-lm-rocm-support/mlx/dtype_utils.h:24:3: note: expanded from macro 'MLX_INTERNAL_DTYPE_SWITCH_INTS'
   24 |   MLX_INTERNAL_DTYPE_SWITCH_CASE(int64, int64_t);   \
      |   ^
/path/to/mlx-lm-rocm-support/mlx/dtype_utils.h:17:5: note: expanded from macro 'MLX_INTERNAL_DTYPE_SWITCH_CASE'
   17 |     f(type_identity<TYPE>{});                       \
      |     ^
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:432:5: note: in instantiation of function template specialization 'mlx::core::dispatch_all_types<(lambda at /path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:432:36)>' requested here
  432 |     dispatch_all_types(in.dtype(), [&](auto type_tag) {
      |     ^
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:486:15: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
  486 |               rocprim::radix_sort_pairs(
      |               ^~~~~~~~~~~~~~~~~~~~~~~~~~
  487 |                   temp_storage, temp_bytes,
      |                   ~~~~~~~~~~~~~~~~~~~~~~~~~
  488 |                   vals_tmp, vals_sorted,
      |                   ~~~~~~~~~~~~~~~~~~~~~~
  489 |                   indices_in, indices_out,
      |                   ~~~~~~~~~~~~~~~~~~~~~~~~
  490 |                   N, 0, sizeof(ValT) * 8, hip_stream);
      |                   ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:511:13: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
  511 |             rocprim::radix_sort_keys(
      |             ^~~~~~~~~~~~~~~~~~~~~~~~~
  512 |                 nullptr, temp_bytes,
      |                 ~~~~~~~~~~~~~~~~~~~~
  513 |                 vals_in, vals_out_buf,
      |                 ~~~~~~~~~~~~~~~~~~~~~~
  514 |                 N, 0, sizeof(ValT) * 8, hip_stream);
      |                 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:525:15: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
  525 |               rocprim::radix_sort_keys(
      |               ^~~~~~~~~~~~~~~~~~~~~~~~~
  526 |                   temp_storage, temp_bytes,
      |                   ~~~~~~~~~~~~~~~~~~~~~~~~~
  527 |                   vals_in, vals_out_buf,
      |                   ~~~~~~~~~~~~~~~~~~~~~~
  528 |                   N, 0, sizeof(ValT) * 8, hip_stream);
      |                   ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:453:13: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
  453 |             rocprim::radix_sort_pairs(
      |             ^~~~~~~~~~~~~~~~~~~~~~~~~~
  454 |                 nullptr, temp_bytes,
      |                 ~~~~~~~~~~~~~~~~~~~~
  455 |                 vals_tmp, vals_sorted,
      |                 ~~~~~~~~~~~~~~~~~~~~~~
  456 |                 indices_in, indices_out,
      |                 ~~~~~~~~~~~~~~~~~~~~~~~~
  457 |                 N, 0, sizeof(ValT) * 8, hip_stream);
      |                 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:437:59: note: while substituting into a lambda expression here
  437 |         encoder.launch_kernel([&](hipStream_t hip_stream) {
      |                                                           ^
/path/to/mlx-lm-rocm-support/mlx/dtype_utils.h:50:5: note: in instantiation of function template specialization 'mlx::core::(anonymous namespace)::gpu_sort(const Stream &, array, array &, int, bool)::(anonymous class)::operator()<mlx::core::type_identity<unsigned char>>' requested here
   50 |     MLX_INTERNAL_DTYPE_SWITCH_INTS();
      |     ^
/path/to/mlx-lm-rocm-support/mlx/dtype_utils.h:25:3: note: expanded from macro 'MLX_INTERNAL_DTYPE_SWITCH_INTS'
   25 |   MLX_INTERNAL_DTYPE_SWITCH_CASE(uint8, uint8_t);   \
      |   ^
/path/to/mlx-lm-rocm-support/mlx/dtype_utils.h:17:5: note: expanded from macro 'MLX_INTERNAL_DTYPE_SWITCH_CASE'
   17 |     f(type_identity<TYPE>{});                       \
      |     ^
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:432:5: note: in instantiation of function template specialization 'mlx::core::dispatch_all_types<(lambda at /path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:432:36)>' requested here
  432 |     dispatch_all_types(in.dtype(), [&](auto type_tag) {
      |     ^
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:486:15: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
  486 |               rocprim::radix_sort_pairs(
      |               ^~~~~~~~~~~~~~~~~~~~~~~~~~
  487 |                   temp_storage, temp_bytes,
      |                   ~~~~~~~~~~~~~~~~~~~~~~~~~
  488 |                   vals_tmp, vals_sorted,
      |                   ~~~~~~~~~~~~~~~~~~~~~~
  489 |                   indices_in, indices_out,
      |                   ~~~~~~~~~~~~~~~~~~~~~~~~
  490 |                   N, 0, sizeof(ValT) * 8, hip_stream);
      |                   ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:511:13: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
  511 |             rocprim::radix_sort_keys(
      |             ^~~~~~~~~~~~~~~~~~~~~~~~~
  512 |                 nullptr, temp_bytes,
      |                 ~~~~~~~~~~~~~~~~~~~~
  513 |                 vals_in, vals_out_buf,
      |                 ~~~~~~~~~~~~~~~~~~~~~~
  514 |                 N, 0, sizeof(ValT) * 8, hip_stream);
      |                 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:525:15: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
  525 |               rocprim::radix_sort_keys(
      |               ^~~~~~~~~~~~~~~~~~~~~~~~~
  526 |                   temp_storage, temp_bytes,
      |                   ~~~~~~~~~~~~~~~~~~~~~~~~~
  527 |                   vals_in, vals_out_buf,
      |                   ~~~~~~~~~~~~~~~~~~~~~~
  528 |                   N, 0, sizeof(ValT) * 8, hip_stream);
      |                   ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:453:13: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
  453 |             rocprim::radix_sort_pairs(
      |             ^~~~~~~~~~~~~~~~~~~~~~~~~~
  454 |                 nullptr, temp_bytes,
      |                 ~~~~~~~~~~~~~~~~~~~~
  455 |                 vals_tmp, vals_sorted,
      |                 ~~~~~~~~~~~~~~~~~~~~~~
  456 |                 indices_in, indices_out,
      |                 ~~~~~~~~~~~~~~~~~~~~~~~~
  457 |                 N, 0, sizeof(ValT) * 8, hip_stream);
      |                 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:437:59: note: while substituting into a lambda expression here
  437 |         encoder.launch_kernel([&](hipStream_t hip_stream) {
      |                                                           ^
/path/to/mlx-lm-rocm-support/mlx/dtype_utils.h:50:5: note: in instantiation of function template specialization 'mlx::core::(anonymous namespace)::gpu_sort(const Stream &, array, array &, int, bool)::(anonymous class)::operator()<mlx::core::type_identity<unsigned short>>' requested here
   50 |     MLX_INTERNAL_DTYPE_SWITCH_INTS();
      |     ^
/path/to/mlx-lm-rocm-support/mlx/dtype_utils.h:26:3: note: expanded from macro 'MLX_INTERNAL_DTYPE_SWITCH_INTS'
   26 |   MLX_INTERNAL_DTYPE_SWITCH_CASE(uint16, uint16_t); \
      |   ^
/path/to/mlx-lm-rocm-support/mlx/dtype_utils.h:17:5: note: expanded from macro 'MLX_INTERNAL_DTYPE_SWITCH_CASE'
   17 |     f(type_identity<TYPE>{});                       \
      |     ^
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:432:5: note: in instantiation of function template specialization 'mlx::core::dispatch_all_types<(lambda at /path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:432:36)>' requested here
  432 |     dispatch_all_types(in.dtype(), [&](auto type_tag) {
      |     ^
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:486:15: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
  486 |               rocprim::radix_sort_pairs(
      |               ^~~~~~~~~~~~~~~~~~~~~~~~~~
  487 |                   temp_storage, temp_bytes,
      |                   ~~~~~~~~~~~~~~~~~~~~~~~~~
  488 |                   vals_tmp, vals_sorted,
      |                   ~~~~~~~~~~~~~~~~~~~~~~
  489 |                   indices_in, indices_out,
      |                   ~~~~~~~~~~~~~~~~~~~~~~~~
  490 |                   N, 0, sizeof(ValT) * 8, hip_stream);
      |                   ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:511:13: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
  511 |             rocprim::radix_sort_keys(
      |             ^~~~~~~~~~~~~~~~~~~~~~~~~
  512 |                 nullptr, temp_bytes,
      |                 ~~~~~~~~~~~~~~~~~~~~
  513 |                 vals_in, vals_out_buf,
      |                 ~~~~~~~~~~~~~~~~~~~~~~
  514 |                 N, 0, sizeof(ValT) * 8, hip_stream);
      |                 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:525:15: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
  525 |               rocprim::radix_sort_keys(
      |               ^~~~~~~~~~~~~~~~~~~~~~~~~
  526 |                   temp_storage, temp_bytes,
      |                   ~~~~~~~~~~~~~~~~~~~~~~~~~
  527 |                   vals_in, vals_out_buf,
      |                   ~~~~~~~~~~~~~~~~~~~~~~
  528 |                   N, 0, sizeof(ValT) * 8, hip_stream);
      |                   ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:453:13: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
  453 |             rocprim::radix_sort_pairs(
      |             ^~~~~~~~~~~~~~~~~~~~~~~~~~
  454 |                 nullptr, temp_bytes,
      |                 ~~~~~~~~~~~~~~~~~~~~
  455 |                 vals_tmp, vals_sorted,
      |                 ~~~~~~~~~~~~~~~~~~~~~~
  456 |                 indices_in, indices_out,
      |                 ~~~~~~~~~~~~~~~~~~~~~~~~
  457 |                 N, 0, sizeof(ValT) * 8, hip_stream);
      |                 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:437:59: note: while substituting into a lambda expression here
  437 |         encoder.launch_kernel([&](hipStream_t hip_stream) {
      |                                                           ^
/path/to/mlx-lm-rocm-support/mlx/dtype_utils.h:50:5: note: in instantiation of function template specialization 'mlx::core::(anonymous namespace)::gpu_sort(const Stream &, array, array &, int, bool)::(anonymous class)::operator()<mlx::core::type_identity<unsigned int>>' requested here
   50 |     MLX_INTERNAL_DTYPE_SWITCH_INTS();
      |     ^
/path/to/mlx-lm-rocm-support/mlx/dtype_utils.h:27:3: note: expanded from macro 'MLX_INTERNAL_DTYPE_SWITCH_INTS'
   27 |   MLX_INTERNAL_DTYPE_SWITCH_CASE(uint32, uint32_t); \
      |   ^
/path/to/mlx-lm-rocm-support/mlx/dtype_utils.h:17:5: note: expanded from macro 'MLX_INTERNAL_DTYPE_SWITCH_CASE'
   17 |     f(type_identity<TYPE>{});                       \
      |     ^
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:432:5: note: in instantiation of function template specialization 'mlx::core::dispatch_all_types<(lambda at /path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:432:36)>' requested here
  432 |     dispatch_all_types(in.dtype(), [&](auto type_tag) {
      |     ^
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:486:15: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
  486 |               rocprim::radix_sort_pairs(
      |               ^~~~~~~~~~~~~~~~~~~~~~~~~~
  487 |                   temp_storage, temp_bytes,
      |                   ~~~~~~~~~~~~~~~~~~~~~~~~~
  488 |                   vals_tmp, vals_sorted,
      |                   ~~~~~~~~~~~~~~~~~~~~~~
  489 |                   indices_in, indices_out,
      |                   ~~~~~~~~~~~~~~~~~~~~~~~~
  490 |                   N, 0, sizeof(ValT) * 8, hip_stream);
      |                   ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:511:13: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
  511 |             rocprim::radix_sort_keys(
      |             ^~~~~~~~~~~~~~~~~~~~~~~~~
  512 |                 nullptr, temp_bytes,
      |                 ~~~~~~~~~~~~~~~~~~~~
  513 |                 vals_in, vals_out_buf,
      |                 ~~~~~~~~~~~~~~~~~~~~~~
  514 |                 N, 0, sizeof(ValT) * 8, hip_stream);
      |                 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:525:15: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
  525 |               rocprim::radix_sort_keys(
      |               ^~~~~~~~~~~~~~~~~~~~~~~~~
  526 |                   temp_storage, temp_bytes,
      |                   ~~~~~~~~~~~~~~~~~~~~~~~~~
  527 |                   vals_in, vals_out_buf,
      |                   ~~~~~~~~~~~~~~~~~~~~~~
  528 |                   N, 0, sizeof(ValT) * 8, hip_stream);
      |                   ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:453:13: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
  453 |             rocprim::radix_sort_pairs(
      |             ^~~~~~~~~~~~~~~~~~~~~~~~~~
  454 |                 nullptr, temp_bytes,
      |                 ~~~~~~~~~~~~~~~~~~~~
  455 |                 vals_tmp, vals_sorted,
      |                 ~~~~~~~~~~~~~~~~~~~~~~
  456 |                 indices_in, indices_out,
      |                 ~~~~~~~~~~~~~~~~~~~~~~~~
  457 |                 N, 0, sizeof(ValT) * 8, hip_stream);
      |                 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:437:59: note: while substituting into a lambda expression here
  437 |         encoder.launch_kernel([&](hipStream_t hip_stream) {
      |                                                           ^
/path/to/mlx-lm-rocm-support/mlx/dtype_utils.h:50:5: note: in instantiation of function template specialization 'mlx::core::(anonymous namespace)::gpu_sort(const Stream &, array, array &, int, bool)::(anonymous class)::operator()<mlx::core::type_identity<unsigned long>>' requested here
   50 |     MLX_INTERNAL_DTYPE_SWITCH_INTS();
      |     ^
/path/to/mlx-lm-rocm-support/mlx/dtype_utils.h:28:3: note: expanded from macro 'MLX_INTERNAL_DTYPE_SWITCH_INTS'
   28 |   MLX_INTERNAL_DTYPE_SWITCH_CASE(uint64, uint64_t)
      |   ^
/path/to/mlx-lm-rocm-support/mlx/dtype_utils.h:17:5: note: expanded from macro 'MLX_INTERNAL_DTYPE_SWITCH_CASE'
   17 |     f(type_identity<TYPE>{});                       \
      |     ^
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:432:5: note: in instantiation of function template specialization 'mlx::core::dispatch_all_types<(lambda at /path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:432:36)>' requested here
  432 |     dispatch_all_types(in.dtype(), [&](auto type_tag) {
      |     ^
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:486:15: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
  486 |               rocprim::radix_sort_pairs(
      |               ^~~~~~~~~~~~~~~~~~~~~~~~~~
  487 |                   temp_storage, temp_bytes,
      |                   ~~~~~~~~~~~~~~~~~~~~~~~~~
  488 |                   vals_tmp, vals_sorted,
      |                   ~~~~~~~~~~~~~~~~~~~~~~
  489 |                   indices_in, indices_out,
      |                   ~~~~~~~~~~~~~~~~~~~~~~~~
  490 |                   N, 0, sizeof(ValT) * 8, hip_stream);
      |                   ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:511:13: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
  511 |             rocprim::radix_sort_keys(
      |             ^~~~~~~~~~~~~~~~~~~~~~~~~
  512 |                 nullptr, temp_bytes,
      |                 ~~~~~~~~~~~~~~~~~~~~
  513 |                 vals_in, vals_out_buf,
      |                 ~~~~~~~~~~~~~~~~~~~~~~
  514 |                 N, 0, sizeof(ValT) * 8, hip_stream);
      |                 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:525:15: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
  525 |               rocprim::radix_sort_keys(
      |               ^~~~~~~~~~~~~~~~~~~~~~~~~
  526 |                   temp_storage, temp_bytes,
      |                   ~~~~~~~~~~~~~~~~~~~~~~~~~
  527 |                   vals_in, vals_out_buf,
      |                   ~~~~~~~~~~~~~~~~~~~~~~
  528 |                   N, 0, sizeof(ValT) * 8, hip_stream);
      |                   ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:453:13: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
  453 |             rocprim::radix_sort_pairs(
      |             ^~~~~~~~~~~~~~~~~~~~~~~~~~
  454 |                 nullptr, temp_bytes,
      |                 ~~~~~~~~~~~~~~~~~~~~
  455 |                 vals_tmp, vals_sorted,
      |                 ~~~~~~~~~~~~~~~~~~~~~~
  456 |                 indices_in, indices_out,
      |                 ~~~~~~~~~~~~~~~~~~~~~~~~
  457 |                 N, 0, sizeof(ValT) * 8, hip_stream);
      |                 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:437:59: note: while substituting into a lambda expression here
  437 |         encoder.launch_kernel([&](hipStream_t hip_stream) {
      |                                                           ^
/path/to/mlx-lm-rocm-support/mlx/dtype_utils.h:51:5: note: in instantiation of function template specialization 'mlx::core::(anonymous namespace)::gpu_sort(const Stream &, array, array &, int, bool)::(anonymous class)::operator()<mlx::core::type_identity<mlx::core::_MLX_Float16>>' requested here
   51 |     MLX_INTERNAL_DTYPE_SWITCH_FLOATS();
      |     ^
/path/to/mlx-lm-rocm-support/mlx/dtype_utils.h:31:3: note: expanded from macro 'MLX_INTERNAL_DTYPE_SWITCH_FLOATS'
   31 |   MLX_INTERNAL_DTYPE_SWITCH_CASE(float16, float16_t);   \
      |   ^
/path/to/mlx-lm-rocm-support/mlx/dtype_utils.h:17:5: note: expanded from macro 'MLX_INTERNAL_DTYPE_SWITCH_CASE'
   17 |     f(type_identity<TYPE>{});                       \
      |     ^
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:432:5: note: in instantiation of function template specialization 'mlx::core::dispatch_all_types<(lambda at /path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:432:36)>' requested here
  432 |     dispatch_all_types(in.dtype(), [&](auto type_tag) {
      |     ^
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:486:15: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
  486 |               rocprim::radix_sort_pairs(
      |               ^~~~~~~~~~~~~~~~~~~~~~~~~~
  487 |                   temp_storage, temp_bytes,
      |                   ~~~~~~~~~~~~~~~~~~~~~~~~~
  488 |                   vals_tmp, vals_sorted,
      |                   ~~~~~~~~~~~~~~~~~~~~~~
  489 |                   indices_in, indices_out,
      |                   ~~~~~~~~~~~~~~~~~~~~~~~~
  490 |                   N, 0, sizeof(ValT) * 8, hip_stream);
      |                   ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:511:13: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
  511 |             rocprim::radix_sort_keys(
      |             ^~~~~~~~~~~~~~~~~~~~~~~~~
  512 |                 nullptr, temp_bytes,
      |                 ~~~~~~~~~~~~~~~~~~~~
  513 |                 vals_in, vals_out_buf,
      |                 ~~~~~~~~~~~~~~~~~~~~~~
  514 |                 N, 0, sizeof(ValT) * 8, hip_stream);
      |                 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:525:15: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
  525 |               rocprim::radix_sort_keys(
      |               ^~~~~~~~~~~~~~~~~~~~~~~~~
  526 |                   temp_storage, temp_bytes,
      |                   ~~~~~~~~~~~~~~~~~~~~~~~~~
  527 |                   vals_in, vals_out_buf,
      |                   ~~~~~~~~~~~~~~~~~~~~~~
  528 |                   N, 0, sizeof(ValT) * 8, hip_stream);
      |                   ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:453:13: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
  453 |             rocprim::radix_sort_pairs(
      |             ^~~~~~~~~~~~~~~~~~~~~~~~~~
  454 |                 nullptr, temp_bytes,
      |                 ~~~~~~~~~~~~~~~~~~~~
  455 |                 vals_tmp, vals_sorted,
      |                 ~~~~~~~~~~~~~~~~~~~~~~
  456 |                 indices_in, indices_out,
      |                 ~~~~~~~~~~~~~~~~~~~~~~~~
  457 |                 N, 0, sizeof(ValT) * 8, hip_stream);
      |                 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:437:59: note: while substituting into a lambda expression here
  437 |         encoder.launch_kernel([&](hipStream_t hip_stream) {
      |                                                           ^
/path/to/mlx-lm-rocm-support/mlx/dtype_utils.h:51:5: note: in instantiation of function template specialization 'mlx::core::(anonymous namespace)::gpu_sort(const Stream &, array, array &, int, bool)::(anonymous class)::operator()<mlx::core::type_identity<mlx::core::_MLX_BFloat16>>' requested here
   51 |     MLX_INTERNAL_DTYPE_SWITCH_FLOATS();
      |     ^
/path/to/mlx-lm-rocm-support/mlx/dtype_utils.h:32:3: note: expanded from macro 'MLX_INTERNAL_DTYPE_SWITCH_FLOATS'
   32 |   MLX_INTERNAL_DTYPE_SWITCH_CASE(bfloat16, bfloat16_t); \
      |   ^
/path/to/mlx-lm-rocm-support/mlx/dtype_utils.h:17:5: note: expanded from macro 'MLX_INTERNAL_DTYPE_SWITCH_CASE'
   17 |     f(type_identity<TYPE>{});                       \
      |     ^
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:432:5: note: in instantiation of function template specialization 'mlx::core::dispatch_all_types<(lambda at /path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:432:36)>' requested here
  432 |     dispatch_all_types(in.dtype(), [&](auto type_tag) {
      |     ^
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:486:15: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
  486 |               rocprim::radix_sort_pairs(
      |               ^~~~~~~~~~~~~~~~~~~~~~~~~~
  487 |                   temp_storage, temp_bytes,
      |                   ~~~~~~~~~~~~~~~~~~~~~~~~~
  488 |                   vals_tmp, vals_sorted,
      |                   ~~~~~~~~~~~~~~~~~~~~~~
  489 |                   indices_in, indices_out,
      |                   ~~~~~~~~~~~~~~~~~~~~~~~~
  490 |                   N, 0, sizeof(ValT) * 8, hip_stream);
      |                   ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:511:13: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
  511 |             rocprim::radix_sort_keys(
      |             ^~~~~~~~~~~~~~~~~~~~~~~~~
  512 |                 nullptr, temp_bytes,
      |                 ~~~~~~~~~~~~~~~~~~~~
  513 |                 vals_in, vals_out_buf,
      |                 ~~~~~~~~~~~~~~~~~~~~~~
  514 |                 N, 0, sizeof(ValT) * 8, hip_stream);
      |                 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:525:15: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
  525 |               rocprim::radix_sort_keys(
      |               ^~~~~~~~~~~~~~~~~~~~~~~~~
  526 |                   temp_storage, temp_bytes,
      |                   ~~~~~~~~~~~~~~~~~~~~~~~~~
  527 |                   vals_in, vals_out_buf,
      |                   ~~~~~~~~~~~~~~~~~~~~~~
  528 |                   N, 0, sizeof(ValT) * 8, hip_stream);
      |                   ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:453:13: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
  453 |             rocprim::radix_sort_pairs(
      |             ^~~~~~~~~~~~~~~~~~~~~~~~~~
  454 |                 nullptr, temp_bytes,
      |                 ~~~~~~~~~~~~~~~~~~~~
  455 |                 vals_tmp, vals_sorted,
      |                 ~~~~~~~~~~~~~~~~~~~~~~
  456 |                 indices_in, indices_out,
      |                 ~~~~~~~~~~~~~~~~~~~~~~~~
  457 |                 N, 0, sizeof(ValT) * 8, hip_stream);
      |                 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:437:59: note: while substituting into a lambda expression here
  437 |         encoder.launch_kernel([&](hipStream_t hip_stream) {
      |                                                           ^
/path/to/mlx-lm-rocm-support/mlx/dtype_utils.h:51:5: note: in instantiation of function template specialization 'mlx::core::(anonymous namespace)::gpu_sort(const Stream &, array, array &, int, bool)::(anonymous class)::operator()<mlx::core::type_identity<float>>' requested here
   51 |     MLX_INTERNAL_DTYPE_SWITCH_FLOATS();
      |     ^
/path/to/mlx-lm-rocm-support/mlx/dtype_utils.h:33:3: note: expanded from macro 'MLX_INTERNAL_DTYPE_SWITCH_FLOATS'
   33 |   MLX_INTERNAL_DTYPE_SWITCH_CASE(float32, float);       \
      |   ^
/path/to/mlx-lm-rocm-support/mlx/dtype_utils.h:17:5: note: expanded from macro 'MLX_INTERNAL_DTYPE_SWITCH_CASE'
   17 |     f(type_identity<TYPE>{});                       \
      |     ^
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:432:5: note: in instantiation of function template specialization 'mlx::core::dispatch_all_types<(lambda at /path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:432:36)>' requested here
  432 |     dispatch_all_types(in.dtype(), [&](auto type_tag) {
      |     ^
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:486:15: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
  486 |               rocprim::radix_sort_pairs(
      |               ^~~~~~~~~~~~~~~~~~~~~~~~~~
  487 |                   temp_storage, temp_bytes,
      |                   ~~~~~~~~~~~~~~~~~~~~~~~~~
  488 |                   vals_tmp, vals_sorted,
      |                   ~~~~~~~~~~~~~~~~~~~~~~
  489 |                   indices_in, indices_out,
      |                   ~~~~~~~~~~~~~~~~~~~~~~~~
  490 |                   N, 0, sizeof(ValT) * 8, hip_stream);
      |                   ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:511:13: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
  511 |             rocprim::radix_sort_keys(
      |             ^~~~~~~~~~~~~~~~~~~~~~~~~
  512 |                 nullptr, temp_bytes,
      |                 ~~~~~~~~~~~~~~~~~~~~
  513 |                 vals_in, vals_out_buf,
      |                 ~~~~~~~~~~~~~~~~~~~~~~
  514 |                 N, 0, sizeof(ValT) * 8, hip_stream);
      |                 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:525:15: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
  525 |               rocprim::radix_sort_keys(
      |               ^~~~~~~~~~~~~~~~~~~~~~~~~
  526 |                   temp_storage, temp_bytes,
      |                   ~~~~~~~~~~~~~~~~~~~~~~~~~
  527 |                   vals_in, vals_out_buf,
      |                   ~~~~~~~~~~~~~~~~~~~~~~
  528 |                   N, 0, sizeof(ValT) * 8, hip_stream);
      |                   ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:453:13: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
  453 |             rocprim::radix_sort_pairs(
      |             ^~~~~~~~~~~~~~~~~~~~~~~~~~
  454 |                 nullptr, temp_bytes,
      |                 ~~~~~~~~~~~~~~~~~~~~
  455 |                 vals_tmp, vals_sorted,
      |                 ~~~~~~~~~~~~~~~~~~~~~~
  456 |                 indices_in, indices_out,
      |                 ~~~~~~~~~~~~~~~~~~~~~~~~
  457 |                 N, 0, sizeof(ValT) * 8, hip_stream);
      |                 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:437:59: note: while substituting into a lambda expression here
  437 |         encoder.launch_kernel([&](hipStream_t hip_stream) {
      |                                                           ^
/path/to/mlx-lm-rocm-support/mlx/dtype_utils.h:51:5: note: in instantiation of function template specialization 'mlx::core::(anonymous namespace)::gpu_sort(const Stream &, array, array &, int, bool)::(anonymous class)::operator()<mlx::core::type_identity<double>>' requested here
   51 |     MLX_INTERNAL_DTYPE_SWITCH_FLOATS();
      |     ^
/path/to/mlx-lm-rocm-support/mlx/dtype_utils.h:34:3: note: expanded from macro 'MLX_INTERNAL_DTYPE_SWITCH_FLOATS'
   34 |   MLX_INTERNAL_DTYPE_SWITCH_CASE(float64, double)
      |   ^
/path/to/mlx-lm-rocm-support/mlx/dtype_utils.h:17:5: note: expanded from macro 'MLX_INTERNAL_DTYPE_SWITCH_CASE'
   17 |     f(type_identity<TYPE>{});                       \
      |     ^
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:432:5: note: in instantiation of function template specialization 'mlx::core::dispatch_all_types<(lambda at /path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:432:36)>' requested here
  432 |     dispatch_all_types(in.dtype(), [&](auto type_tag) {
      |     ^
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:486:15: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
  486 |               rocprim::radix_sort_pairs(
      |               ^~~~~~~~~~~~~~~~~~~~~~~~~~
  487 |                   temp_storage, temp_bytes,
      |                   ~~~~~~~~~~~~~~~~~~~~~~~~~
  488 |                   vals_tmp, vals_sorted,
      |                   ~~~~~~~~~~~~~~~~~~~~~~
  489 |                   indices_in, indices_out,
      |                   ~~~~~~~~~~~~~~~~~~~~~~~~
  490 |                   N, 0, sizeof(ValT) * 8, hip_stream);
      |                   ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:511:13: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
  511 |             rocprim::radix_sort_keys(
      |             ^~~~~~~~~~~~~~~~~~~~~~~~~
  512 |                 nullptr, temp_bytes,
      |                 ~~~~~~~~~~~~~~~~~~~~
  513 |                 vals_in, vals_out_buf,
      |                 ~~~~~~~~~~~~~~~~~~~~~~
  514 |                 N, 0, sizeof(ValT) * 8, hip_stream);
      |                 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:525:15: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
  525 |               rocprim::radix_sort_keys(
      |               ^~~~~~~~~~~~~~~~~~~~~~~~~
  526 |                   temp_storage, temp_bytes,
      |                   ~~~~~~~~~~~~~~~~~~~~~~~~~
  527 |                   vals_in, vals_out_buf,
      |                   ~~~~~~~~~~~~~~~~~~~~~~
  528 |                   N, 0, sizeof(ValT) * 8, hip_stream);
      |                   ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
55 warnings and 4 errors generated when compiling for gfx1103.
failed to execute:/opt/rocm/lib/llvm/bin/clang++  --offload-arch=gfx1103 -O3  -c -x hip /path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip -o "/path/to/mlx-lm-rocm-support/build/mlx/backend/rocm/hip_objs/sort.o" -fPIC -DMLX_USE_ROCM -I/path/to/mlx-lm-rocm-support -I/opt/rocm/include -I/usr/include/c++/11 -I/usr/include/x86_64-linux-gnu/c++/11 -I/usr/include/c++/11/backward -I/usr/include/x86_64-linux-gnu -I/usr/include -I/opt/rocm/include -I/opt/rocm/include/ -I/opt/rocm/include -I/opt/rocm/include -I/opt/rocm/include -I/opt/rocm/include/hiprand -I/opt/rocm/include -I/opt/rocm/include -std=c++17 -parallel-jobs=8
make[2]: *** [mlx/backend/rocm/CMakeFiles/mlx_rocm_kernels_lib.dir/build.make:254: mlx/backend/rocm/hip_objs/sort.o] Error 1
make[1]: *** [CMakeFiles/Makefile2:1549: mlx/backend/rocm/CMakeFiles/mlx_rocm_kernels_lib.dir/all] Error 2
make: *** [Makefile:146: all] Error 2

I installed Cmake 4.3.1 and ran rocm-7.12.sh with a filter of FILTER_GFX="gfx110X" in a clean container of the rocm/dev-ubuntu-22.04 image

Then make -j$(nproc) fails (after a successful run of cmake) with the following error:

# make -j$(nproc)
[  1%] Built target cpu_compiled_preamble
[  1%] Compiling HIP source /path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip
[  1%] Built target mlx_version
[  3%] Built target gguflib
In file included from /path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:3:
In file included from /path/to/mlx-lm-rocm-support/mlx/backend/rocm/device.h:5:
In file included from /path/to/mlx-lm-rocm-support/mlx/array.h:13:
In file included from /path/to/mlx-lm-rocm-support/mlx/event.h:8:
In file included from /path/to/mlx-lm-rocm-support/mlx/stream.h:8:
/path/to/mlx-lm-rocm-support/mlx/device.h:28:42: warning: defaulted comparison operators are a C++20 extension [-Wc++20-extensions]
   28 |   bool operator==(const Device&) const = default;
      |                                          ^
In file included from /path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:3:
In file included from /path/to/mlx-lm-rocm-support/mlx/backend/rocm/device.h:5:
In file included from /path/to/mlx-lm-rocm-support/mlx/array.h:13:
In file included from /path/to/mlx-lm-rocm-support/mlx/event.h:8:
/path/to/mlx-lm-rocm-support/mlx/stream.h:18:42: warning: defaulted comparison operators are a C++20 extension [-Wc++20-extensions]
   18 |   bool operator==(const Stream&) const = default;
      |                                          ^
In file included from /path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:3:
In file included from /path/to/mlx-lm-rocm-support/mlx/backend/rocm/device.h:15:
In file included from /opt/rocm/include/thrust/execution_policy.h:24:
In file included from /opt/rocm/include/thrust/detail/config.h:22:
In file included from /opt/rocm/include/thrust/detail/config/config.h:23:
In file included from /opt/rocm/include/thrust/detail/config/libcxx.h:44:
In file included from /opt/rocm/include/cuda/std/version:31:
In file included from /opt/rocm/include/cuda/std/detail/__config:67:
/opt/rocm/include/cuda/std/detail/libcxx/include/__config:371:12: warning: Assuming 100 MHz realtime clock rate (TSC) for gfx1100/gfx1101 (according to the RDNA3 ISA). Timing-related APIs (e.g., chrono) or sleep instructions may behave incorrectly! [-W#warnings]
  371 | #          warning Assuming 100 MHz realtime clock rate (TSC) for gfx1100/gfx1101 (according to the RDNA3 ISA). Timing-related APIs (e.g., chrono) or sleep instructions may behave incorrectly!
      |            ^
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:14:25: error: redefinition of 'operator new'
   14 | __device__ inline void* operator new(size_t, void* p) noexcept { return p; }
      |                         ^
/opt/rocm/lib/llvm/lib/clang/22/include/cuda_wrappers/new:95:25: note: previous definition is here
   95 | __device__ inline void *operator new(__SIZE_TYPE__, void *__ptr) CUDA_NOEXCEPT {
      |                         ^
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:15:25: error: redefinition of 'operator new[]'
   15 | __device__ inline void* operator new[](size_t, void* p) noexcept { return p; }
      |                         ^
/opt/rocm/lib/llvm/lib/clang/22/include/cuda_wrappers/new:98:25: note: previous definition is here
   98 | __device__ inline void *operator new[](__SIZE_TYPE__, void *__ptr) CUDA_NOEXCEPT {
      |                         ^
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:16:24: error: redefinition of 'operator delete'
   16 | __device__ inline void operator delete(void*, void*) noexcept {}
      |                        ^
/opt/rocm/lib/llvm/lib/clang/22/include/cuda_wrappers/new:102:24: note: previous definition is here
  102 | __device__ inline void operator delete(void *, void *) CUDA_NOEXCEPT {}
      |                        ^
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:17:24: error: redefinition of 'operator delete[]'
   17 | __device__ inline void operator delete[](void*, void*) noexcept {}
      |                        ^
/opt/rocm/lib/llvm/lib/clang/22/include/cuda_wrappers/new:103:24: note: previous definition is here
  103 | __device__ inline void operator delete[](void *, void *) CUDA_NOEXCEPT {}
      |                        ^
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:453:13: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
  453 |             rocprim::radix_sort_pairs(
      |             ^~~~~~~~~~~~~~~~~~~~~~~~~~
  454 |                 nullptr, temp_bytes,
      |                 ~~~~~~~~~~~~~~~~~~~~
  455 |                 vals_tmp, vals_sorted,
      |                 ~~~~~~~~~~~~~~~~~~~~~~
  456 |                 indices_in, indices_out,
      |                 ~~~~~~~~~~~~~~~~~~~~~~~~
  457 |                 N, 0, sizeof(ValT) * 8, hip_stream);
      |                 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:437:59: note: while substituting into a lambda expression here
  437 |         encoder.launch_kernel([&](hipStream_t hip_stream) {
      |                                                           ^
/path/to/mlx-lm-rocm-support/mlx/dtype_utils.h:49:5: note: in instantiation of function template specialization 'mlx::core::(anonymous namespace)::gpu_sort(const Stream &, array, array &, int, bool)::(anonymous class)::operator()<mlx::core::type_identity<bool>>' requested here
   49 |     MLX_INTERNAL_DTYPE_SWITCH_CASE(bool_, bool);
      |     ^
/path/to/mlx-lm-rocm-support/mlx/dtype_utils.h:17:5: note: expanded from macro 'MLX_INTERNAL_DTYPE_SWITCH_CASE'
   17 |     f(type_identity<TYPE>{});                       \
      |     ^
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:432:5: note: in instantiation of function template specialization 'mlx::core::dispatch_all_types<(lambda at /path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:432:36)>' requested here
  432 |     dispatch_all_types(in.dtype(), [&](auto type_tag) {
      |     ^
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:486:15: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
  486 |               rocprim::radix_sort_pairs(
      |               ^~~~~~~~~~~~~~~~~~~~~~~~~~
  487 |                   temp_storage, temp_bytes,
      |                   ~~~~~~~~~~~~~~~~~~~~~~~~~
  488 |                   vals_tmp, vals_sorted,
      |                   ~~~~~~~~~~~~~~~~~~~~~~
  489 |                   indices_in, indices_out,
      |                   ~~~~~~~~~~~~~~~~~~~~~~~~
  490 |                   N, 0, sizeof(ValT) * 8, hip_stream);
      |                   ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:511:13: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
  511 |             rocprim::radix_sort_keys(
      |             ^~~~~~~~~~~~~~~~~~~~~~~~~
  512 |                 nullptr, temp_bytes,
      |                 ~~~~~~~~~~~~~~~~~~~~
  513 |                 vals_in, vals_out_buf,
      |                 ~~~~~~~~~~~~~~~~~~~~~~
  514 |                 N, 0, sizeof(ValT) * 8, hip_stream);
      |                 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:525:15: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
  525 |               rocprim::radix_sort_keys(
      |               ^~~~~~~~~~~~~~~~~~~~~~~~~
  526 |                   temp_storage, temp_bytes,
      |                   ~~~~~~~~~~~~~~~~~~~~~~~~~
  527 |                   vals_in, vals_out_buf,
      |                   ~~~~~~~~~~~~~~~~~~~~~~
  528 |                   N, 0, sizeof(ValT) * 8, hip_stream);
      |                   ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:453:13: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
  453 |             rocprim::radix_sort_pairs(
      |             ^~~~~~~~~~~~~~~~~~~~~~~~~~
  454 |                 nullptr, temp_bytes,
      |                 ~~~~~~~~~~~~~~~~~~~~
  455 |                 vals_tmp, vals_sorted,
      |                 ~~~~~~~~~~~~~~~~~~~~~~
  456 |                 indices_in, indices_out,
      |                 ~~~~~~~~~~~~~~~~~~~~~~~~
  457 |                 N, 0, sizeof(ValT) * 8, hip_stream);
      |                 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:437:59: note: while substituting into a lambda expression here
  437 |         encoder.launch_kernel([&](hipStream_t hip_stream) {
      |                                                           ^
/path/to/mlx-lm-rocm-support/mlx/dtype_utils.h:50:5: note: in instantiation of function template specialization 'mlx::core::(anonymous namespace)::gpu_sort(const Stream &, array, array &, int, bool)::(anonymous class)::operator()<mlx::core::type_identity<signed char>>' requested here
   50 |     MLX_INTERNAL_DTYPE_SWITCH_INTS();
      |     ^
/path/to/mlx-lm-rocm-support/mlx/dtype_utils.h:21:3: note: expanded from macro 'MLX_INTERNAL_DTYPE_SWITCH_INTS'
   21 |   MLX_INTERNAL_DTYPE_SWITCH_CASE(int8, int8_t);     \
      |   ^
/path/to/mlx-lm-rocm-support/mlx/dtype_utils.h:17:5: note: expanded from macro 'MLX_INTERNAL_DTYPE_SWITCH_CASE'
   17 |     f(type_identity<TYPE>{});                       \
      |     ^
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:432:5: note: in instantiation of function template specialization 'mlx::core::dispatch_all_types<(lambda at /path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:432:36)>' requested here
  432 |     dispatch_all_types(in.dtype(), [&](auto type_tag) {
      |     ^
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:486:15: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
  486 |               rocprim::radix_sort_pairs(
      |               ^~~~~~~~~~~~~~~~~~~~~~~~~~
  487 |                   temp_storage, temp_bytes,
      |                   ~~~~~~~~~~~~~~~~~~~~~~~~~
  488 |                   vals_tmp, vals_sorted,
      |                   ~~~~~~~~~~~~~~~~~~~~~~
  489 |                   indices_in, indices_out,
      |                   ~~~~~~~~~~~~~~~~~~~~~~~~
  490 |                   N, 0, sizeof(ValT) * 8, hip_stream);
      |                   ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:511:13: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
  511 |             rocprim::radix_sort_keys(
      |             ^~~~~~~~~~~~~~~~~~~~~~~~~
  512 |                 nullptr, temp_bytes,
      |                 ~~~~~~~~~~~~~~~~~~~~
  513 |                 vals_in, vals_out_buf,
      |                 ~~~~~~~~~~~~~~~~~~~~~~
  514 |                 N, 0, sizeof(ValT) * 8, hip_stream);
      |                 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:525:15: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
  525 |               rocprim::radix_sort_keys(
      |               ^~~~~~~~~~~~~~~~~~~~~~~~~
  526 |                   temp_storage, temp_bytes,
      |                   ~~~~~~~~~~~~~~~~~~~~~~~~~
  527 |                   vals_in, vals_out_buf,
      |                   ~~~~~~~~~~~~~~~~~~~~~~
  528 |                   N, 0, sizeof(ValT) * 8, hip_stream);
      |                   ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:453:13: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
  453 |             rocprim::radix_sort_pairs(
      |             ^~~~~~~~~~~~~~~~~~~~~~~~~~
  454 |                 nullptr, temp_bytes,
      |                 ~~~~~~~~~~~~~~~~~~~~
  455 |                 vals_tmp, vals_sorted,
      |                 ~~~~~~~~~~~~~~~~~~~~~~
  456 |                 indices_in, indices_out,
      |                 ~~~~~~~~~~~~~~~~~~~~~~~~
  457 |                 N, 0, sizeof(ValT) * 8, hip_stream);
      |                 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:437:59: note: while substituting into a lambda expression here
  437 |         encoder.launch_kernel([&](hipStream_t hip_stream) {
      |                                                           ^
/path/to/mlx-lm-rocm-support/mlx/dtype_utils.h:50:5: note: in instantiation of function template specialization 'mlx::core::(anonymous namespace)::gpu_sort(const Stream &, array, array &, int, bool)::(anonymous class)::operator()<mlx::core::type_identity<short>>' requested here
   50 |     MLX_INTERNAL_DTYPE_SWITCH_INTS();
      |     ^
/path/to/mlx-lm-rocm-support/mlx/dtype_utils.h:22:3: note: expanded from macro 'MLX_INTERNAL_DTYPE_SWITCH_INTS'
   22 |   MLX_INTERNAL_DTYPE_SWITCH_CASE(int16, int16_t);   \
      |   ^
/path/to/mlx-lm-rocm-support/mlx/dtype_utils.h:17:5: note: expanded from macro 'MLX_INTERNAL_DTYPE_SWITCH_CASE'
   17 |     f(type_identity<TYPE>{});                       \
      |     ^
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:432:5: note: in instantiation of function template specialization 'mlx::core::dispatch_all_types<(lambda at /path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:432:36)>' requested here
  432 |     dispatch_all_types(in.dtype(), [&](auto type_tag) {
      |     ^
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:486:15: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
  486 |               rocprim::radix_sort_pairs(
      |               ^~~~~~~~~~~~~~~~~~~~~~~~~~
  487 |                   temp_storage, temp_bytes,
      |                   ~~~~~~~~~~~~~~~~~~~~~~~~~
  488 |                   vals_tmp, vals_sorted,
      |                   ~~~~~~~~~~~~~~~~~~~~~~
  489 |                   indices_in, indices_out,
      |                   ~~~~~~~~~~~~~~~~~~~~~~~~
  490 |                   N, 0, sizeof(ValT) * 8, hip_stream);
      |                   ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:511:13: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
  511 |             rocprim::radix_sort_keys(
      |             ^~~~~~~~~~~~~~~~~~~~~~~~~
  512 |                 nullptr, temp_bytes,
      |                 ~~~~~~~~~~~~~~~~~~~~
  513 |                 vals_in, vals_out_buf,
      |                 ~~~~~~~~~~~~~~~~~~~~~~
  514 |                 N, 0, sizeof(ValT) * 8, hip_stream);
      |                 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:525:15: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
  525 |               rocprim::radix_sort_keys(
      |               ^~~~~~~~~~~~~~~~~~~~~~~~~
  526 |                   temp_storage, temp_bytes,
      |                   ~~~~~~~~~~~~~~~~~~~~~~~~~
  527 |                   vals_in, vals_out_buf,
      |                   ~~~~~~~~~~~~~~~~~~~~~~
  528 |                   N, 0, sizeof(ValT) * 8, hip_stream);
      |                   ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:453:13: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
  453 |             rocprim::radix_sort_pairs(
      |             ^~~~~~~~~~~~~~~~~~~~~~~~~~
  454 |                 nullptr, temp_bytes,
      |                 ~~~~~~~~~~~~~~~~~~~~
  455 |                 vals_tmp, vals_sorted,
      |                 ~~~~~~~~~~~~~~~~~~~~~~
  456 |                 indices_in, indices_out,
      |                 ~~~~~~~~~~~~~~~~~~~~~~~~
  457 |                 N, 0, sizeof(ValT) * 8, hip_stream);
      |                 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:437:59: note: while substituting into a lambda expression here
  437 |         encoder.launch_kernel([&](hipStream_t hip_stream) {
      |                                                           ^
/path/to/mlx-lm-rocm-support/mlx/dtype_utils.h:50:5: note: in instantiation of function template specialization 'mlx::core::(anonymous namespace)::gpu_sort(const Stream &, array, array &, int, bool)::(anonymous class)::operator()<mlx::core::type_identity<int>>' requested here
   50 |     MLX_INTERNAL_DTYPE_SWITCH_INTS();
      |     ^
/path/to/mlx-lm-rocm-support/mlx/dtype_utils.h:23:3: note: expanded from macro 'MLX_INTERNAL_DTYPE_SWITCH_INTS'
   23 |   MLX_INTERNAL_DTYPE_SWITCH_CASE(int32, int32_t);   \
      |   ^
/path/to/mlx-lm-rocm-support/mlx/dtype_utils.h:17:5: note: expanded from macro 'MLX_INTERNAL_DTYPE_SWITCH_CASE'
   17 |     f(type_identity<TYPE>{});                       \
      |     ^
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:432:5: note: in instantiation of function template specialization 'mlx::core::dispatch_all_types<(lambda at /path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:432:36)>' requested here
  432 |     dispatch_all_types(in.dtype(), [&](auto type_tag) {
      |     ^
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:486:15: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
  486 |               rocprim::radix_sort_pairs(
      |               ^~~~~~~~~~~~~~~~~~~~~~~~~~
  487 |                   temp_storage, temp_bytes,
      |                   ~~~~~~~~~~~~~~~~~~~~~~~~~
  488 |                   vals_tmp, vals_sorted,
      |                   ~~~~~~~~~~~~~~~~~~~~~~
  489 |                   indices_in, indices_out,
      |                   ~~~~~~~~~~~~~~~~~~~~~~~~
  490 |                   N, 0, sizeof(ValT) * 8, hip_stream);
      |                   ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:511:13: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
  511 |             rocprim::radix_sort_keys(
      |             ^~~~~~~~~~~~~~~~~~~~~~~~~
  512 |                 nullptr, temp_bytes,
      |                 ~~~~~~~~~~~~~~~~~~~~
  513 |                 vals_in, vals_out_buf,
      |                 ~~~~~~~~~~~~~~~~~~~~~~
  514 |                 N, 0, sizeof(ValT) * 8, hip_stream);
      |                 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:525:15: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
  525 |               rocprim::radix_sort_keys(
      |               ^~~~~~~~~~~~~~~~~~~~~~~~~
  526 |                   temp_storage, temp_bytes,
      |                   ~~~~~~~~~~~~~~~~~~~~~~~~~
  527 |                   vals_in, vals_out_buf,
      |                   ~~~~~~~~~~~~~~~~~~~~~~
  528 |                   N, 0, sizeof(ValT) * 8, hip_stream);
      |                   ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:453:13: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
  453 |             rocprim::radix_sort_pairs(
      |             ^~~~~~~~~~~~~~~~~~~~~~~~~~
  454 |                 nullptr, temp_bytes,
      |                 ~~~~~~~~~~~~~~~~~~~~
  455 |                 vals_tmp, vals_sorted,
      |                 ~~~~~~~~~~~~~~~~~~~~~~
  456 |                 indices_in, indices_out,
      |                 ~~~~~~~~~~~~~~~~~~~~~~~~
  457 |                 N, 0, sizeof(ValT) * 8, hip_stream);
      |                 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:437:59: note: while substituting into a lambda expression here
  437 |         encoder.launch_kernel([&](hipStream_t hip_stream) {
      |                                                           ^
/path/to/mlx-lm-rocm-support/mlx/dtype_utils.h:50:5: note: in instantiation of function template specialization 'mlx::core::(anonymous namespace)::gpu_sort(const Stream &, array, array &, int, bool)::(anonymous class)::operator()<mlx::core::type_identity<long>>' requested here
   50 |     MLX_INTERNAL_DTYPE_SWITCH_INTS();
      |     ^
/path/to/mlx-lm-rocm-support/mlx/dtype_utils.h:24:3: note: expanded from macro 'MLX_INTERNAL_DTYPE_SWITCH_INTS'
   24 |   MLX_INTERNAL_DTYPE_SWITCH_CASE(int64, int64_t);   \
      |   ^
/path/to/mlx-lm-rocm-support/mlx/dtype_utils.h:17:5: note: expanded from macro 'MLX_INTERNAL_DTYPE_SWITCH_CASE'
   17 |     f(type_identity<TYPE>{});                       \
      |     ^
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:432:5: note: in instantiation of function template specialization 'mlx::core::dispatch_all_types<(lambda at /path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:432:36)>' requested here
  432 |     dispatch_all_types(in.dtype(), [&](auto type_tag) {
      |     ^
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:486:15: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
  486 |               rocprim::radix_sort_pairs(
      |               ^~~~~~~~~~~~~~~~~~~~~~~~~~
  487 |                   temp_storage, temp_bytes,
      |                   ~~~~~~~~~~~~~~~~~~~~~~~~~
  488 |                   vals_tmp, vals_sorted,
      |                   ~~~~~~~~~~~~~~~~~~~~~~
  489 |                   indices_in, indices_out,
      |                   ~~~~~~~~~~~~~~~~~~~~~~~~
  490 |                   N, 0, sizeof(ValT) * 8, hip_stream);
      |                   ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:511:13: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
  511 |             rocprim::radix_sort_keys(
      |             ^~~~~~~~~~~~~~~~~~~~~~~~~
  512 |                 nullptr, temp_bytes,
      |                 ~~~~~~~~~~~~~~~~~~~~
  513 |                 vals_in, vals_out_buf,
      |                 ~~~~~~~~~~~~~~~~~~~~~~
  514 |                 N, 0, sizeof(ValT) * 8, hip_stream);
      |                 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:525:15: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
  525 |               rocprim::radix_sort_keys(
      |               ^~~~~~~~~~~~~~~~~~~~~~~~~
  526 |                   temp_storage, temp_bytes,
      |                   ~~~~~~~~~~~~~~~~~~~~~~~~~
  527 |                   vals_in, vals_out_buf,
      |                   ~~~~~~~~~~~~~~~~~~~~~~
  528 |                   N, 0, sizeof(ValT) * 8, hip_stream);
      |                   ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:453:13: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
  453 |             rocprim::radix_sort_pairs(
      |             ^~~~~~~~~~~~~~~~~~~~~~~~~~
  454 |                 nullptr, temp_bytes,
      |                 ~~~~~~~~~~~~~~~~~~~~
  455 |                 vals_tmp, vals_sorted,
      |                 ~~~~~~~~~~~~~~~~~~~~~~
  456 |                 indices_in, indices_out,
      |                 ~~~~~~~~~~~~~~~~~~~~~~~~
  457 |                 N, 0, sizeof(ValT) * 8, hip_stream);
      |                 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:437:59: note: while substituting into a lambda expression here
  437 |         encoder.launch_kernel([&](hipStream_t hip_stream) {
      |                                                           ^
/path/to/mlx-lm-rocm-support/mlx/dtype_utils.h:50:5: note: in instantiation of function template specialization 'mlx::core::(anonymous namespace)::gpu_sort(const Stream &, array, array &, int, bool)::(anonymous class)::operator()<mlx::core::type_identity<unsigned char>>' requested here
   50 |     MLX_INTERNAL_DTYPE_SWITCH_INTS();
      |     ^
/path/to/mlx-lm-rocm-support/mlx/dtype_utils.h:25:3: note: expanded from macro 'MLX_INTERNAL_DTYPE_SWITCH_INTS'
   25 |   MLX_INTERNAL_DTYPE_SWITCH_CASE(uint8, uint8_t);   \
      |   ^
/path/to/mlx-lm-rocm-support/mlx/dtype_utils.h:17:5: note: expanded from macro 'MLX_INTERNAL_DTYPE_SWITCH_CASE'
   17 |     f(type_identity<TYPE>{});                       \
      |     ^
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:432:5: note: in instantiation of function template specialization 'mlx::core::dispatch_all_types<(lambda at /path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:432:36)>' requested here
  432 |     dispatch_all_types(in.dtype(), [&](auto type_tag) {
      |     ^
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:486:15: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
  486 |               rocprim::radix_sort_pairs(
      |               ^~~~~~~~~~~~~~~~~~~~~~~~~~
  487 |                   temp_storage, temp_bytes,
      |                   ~~~~~~~~~~~~~~~~~~~~~~~~~
  488 |                   vals_tmp, vals_sorted,
      |                   ~~~~~~~~~~~~~~~~~~~~~~
  489 |                   indices_in, indices_out,
      |                   ~~~~~~~~~~~~~~~~~~~~~~~~
  490 |                   N, 0, sizeof(ValT) * 8, hip_stream);
      |                   ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:511:13: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
  511 |             rocprim::radix_sort_keys(
      |             ^~~~~~~~~~~~~~~~~~~~~~~~~
  512 |                 nullptr, temp_bytes,
      |                 ~~~~~~~~~~~~~~~~~~~~
  513 |                 vals_in, vals_out_buf,
      |                 ~~~~~~~~~~~~~~~~~~~~~~
  514 |                 N, 0, sizeof(ValT) * 8, hip_stream);
      |                 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:525:15: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
  525 |               rocprim::radix_sort_keys(
      |               ^~~~~~~~~~~~~~~~~~~~~~~~~
  526 |                   temp_storage, temp_bytes,
      |                   ~~~~~~~~~~~~~~~~~~~~~~~~~
  527 |                   vals_in, vals_out_buf,
      |                   ~~~~~~~~~~~~~~~~~~~~~~
  528 |                   N, 0, sizeof(ValT) * 8, hip_stream);
      |                   ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:453:13: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
  453 |             rocprim::radix_sort_pairs(
      |             ^~~~~~~~~~~~~~~~~~~~~~~~~~
  454 |                 nullptr, temp_bytes,
      |                 ~~~~~~~~~~~~~~~~~~~~
  455 |                 vals_tmp, vals_sorted,
      |                 ~~~~~~~~~~~~~~~~~~~~~~
  456 |                 indices_in, indices_out,
      |                 ~~~~~~~~~~~~~~~~~~~~~~~~
  457 |                 N, 0, sizeof(ValT) * 8, hip_stream);
      |                 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:437:59: note: while substituting into a lambda expression here
  437 |         encoder.launch_kernel([&](hipStream_t hip_stream) {
      |                                                           ^
/path/to/mlx-lm-rocm-support/mlx/dtype_utils.h:50:5: note: in instantiation of function template specialization 'mlx::core::(anonymous namespace)::gpu_sort(const Stream &, array, array &, int, bool)::(anonymous class)::operator()<mlx::core::type_identity<unsigned short>>' requested here
   50 |     MLX_INTERNAL_DTYPE_SWITCH_INTS();
      |     ^
/path/to/mlx-lm-rocm-support/mlx/dtype_utils.h:26:3: note: expanded from macro 'MLX_INTERNAL_DTYPE_SWITCH_INTS'
   26 |   MLX_INTERNAL_DTYPE_SWITCH_CASE(uint16, uint16_t); \
      |   ^
/path/to/mlx-lm-rocm-support/mlx/dtype_utils.h:17:5: note: expanded from macro 'MLX_INTERNAL_DTYPE_SWITCH_CASE'
   17 |     f(type_identity<TYPE>{});                       \
      |     ^
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:432:5: note: in instantiation of function template specialization 'mlx::core::dispatch_all_types<(lambda at /path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:432:36)>' requested here
  432 |     dispatch_all_types(in.dtype(), [&](auto type_tag) {
      |     ^
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:486:15: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
  486 |               rocprim::radix_sort_pairs(
      |               ^~~~~~~~~~~~~~~~~~~~~~~~~~
  487 |                   temp_storage, temp_bytes,
      |                   ~~~~~~~~~~~~~~~~~~~~~~~~~
  488 |                   vals_tmp, vals_sorted,
      |                   ~~~~~~~~~~~~~~~~~~~~~~
  489 |                   indices_in, indices_out,
      |                   ~~~~~~~~~~~~~~~~~~~~~~~~
  490 |                   N, 0, sizeof(ValT) * 8, hip_stream);
      |                   ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:511:13: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
  511 |             rocprim::radix_sort_keys(
      |             ^~~~~~~~~~~~~~~~~~~~~~~~~
  512 |                 nullptr, temp_bytes,
      |                 ~~~~~~~~~~~~~~~~~~~~
  513 |                 vals_in, vals_out_buf,
      |                 ~~~~~~~~~~~~~~~~~~~~~~
  514 |                 N, 0, sizeof(ValT) * 8, hip_stream);
      |                 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:525:15: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
  525 |               rocprim::radix_sort_keys(
      |               ^~~~~~~~~~~~~~~~~~~~~~~~~
  526 |                   temp_storage, temp_bytes,
      |                   ~~~~~~~~~~~~~~~~~~~~~~~~~
  527 |                   vals_in, vals_out_buf,
      |                   ~~~~~~~~~~~~~~~~~~~~~~
  528 |                   N, 0, sizeof(ValT) * 8, hip_stream);
      |                   ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:453:13: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
  453 |             rocprim::radix_sort_pairs(
      |             ^~~~~~~~~~~~~~~~~~~~~~~~~~
  454 |                 nullptr, temp_bytes,
      |                 ~~~~~~~~~~~~~~~~~~~~
  455 |                 vals_tmp, vals_sorted,
      |                 ~~~~~~~~~~~~~~~~~~~~~~
  456 |                 indices_in, indices_out,
      |                 ~~~~~~~~~~~~~~~~~~~~~~~~
  457 |                 N, 0, sizeof(ValT) * 8, hip_stream);
      |                 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:437:59: note: while substituting into a lambda expression here
  437 |         encoder.launch_kernel([&](hipStream_t hip_stream) {
      |                                                           ^
/path/to/mlx-lm-rocm-support/mlx/dtype_utils.h:50:5: note: in instantiation of function template specialization 'mlx::core::(anonymous namespace)::gpu_sort(const Stream &, array, array &, int, bool)::(anonymous class)::operator()<mlx::core::type_identity<unsigned int>>' requested here
   50 |     MLX_INTERNAL_DTYPE_SWITCH_INTS();
      |     ^
/path/to/mlx-lm-rocm-support/mlx/dtype_utils.h:27:3: note: expanded from macro 'MLX_INTERNAL_DTYPE_SWITCH_INTS'
   27 |   MLX_INTERNAL_DTYPE_SWITCH_CASE(uint32, uint32_t); \
      |   ^
/path/to/mlx-lm-rocm-support/mlx/dtype_utils.h:17:5: note: expanded from macro 'MLX_INTERNAL_DTYPE_SWITCH_CASE'
   17 |     f(type_identity<TYPE>{});                       \
      |     ^
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:432:5: note: in instantiation of function template specialization 'mlx::core::dispatch_all_types<(lambda at /path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:432:36)>' requested here
  432 |     dispatch_all_types(in.dtype(), [&](auto type_tag) {
      |     ^
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:486:15: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
  486 |               rocprim::radix_sort_pairs(
      |               ^~~~~~~~~~~~~~~~~~~~~~~~~~
  487 |                   temp_storage, temp_bytes,
      |                   ~~~~~~~~~~~~~~~~~~~~~~~~~
  488 |                   vals_tmp, vals_sorted,
      |                   ~~~~~~~~~~~~~~~~~~~~~~
  489 |                   indices_in, indices_out,
      |                   ~~~~~~~~~~~~~~~~~~~~~~~~
  490 |                   N, 0, sizeof(ValT) * 8, hip_stream);
      |                   ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:511:13: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
  511 |             rocprim::radix_sort_keys(
      |             ^~~~~~~~~~~~~~~~~~~~~~~~~
  512 |                 nullptr, temp_bytes,
      |                 ~~~~~~~~~~~~~~~~~~~~
  513 |                 vals_in, vals_out_buf,
      |                 ~~~~~~~~~~~~~~~~~~~~~~
  514 |                 N, 0, sizeof(ValT) * 8, hip_stream);
      |                 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:525:15: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
  525 |               rocprim::radix_sort_keys(
      |               ^~~~~~~~~~~~~~~~~~~~~~~~~
  526 |                   temp_storage, temp_bytes,
      |                   ~~~~~~~~~~~~~~~~~~~~~~~~~
  527 |                   vals_in, vals_out_buf,
      |                   ~~~~~~~~~~~~~~~~~~~~~~
  528 |                   N, 0, sizeof(ValT) * 8, hip_stream);
      |                   ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:453:13: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
  453 |             rocprim::radix_sort_pairs(
      |             ^~~~~~~~~~~~~~~~~~~~~~~~~~
  454 |                 nullptr, temp_bytes,
      |                 ~~~~~~~~~~~~~~~~~~~~
  455 |                 vals_tmp, vals_sorted,
      |                 ~~~~~~~~~~~~~~~~~~~~~~
  456 |                 indices_in, indices_out,
      |                 ~~~~~~~~~~~~~~~~~~~~~~~~
  457 |                 N, 0, sizeof(ValT) * 8, hip_stream);
      |                 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:437:59: note: while substituting into a lambda expression here
  437 |         encoder.launch_kernel([&](hipStream_t hip_stream) {
      |                                                           ^
/path/to/mlx-lm-rocm-support/mlx/dtype_utils.h:50:5: note: in instantiation of function template specialization 'mlx::core::(anonymous namespace)::gpu_sort(const Stream &, array, array &, int, bool)::(anonymous class)::operator()<mlx::core::type_identity<unsigned long>>' requested here
   50 |     MLX_INTERNAL_DTYPE_SWITCH_INTS();
      |     ^
/path/to/mlx-lm-rocm-support/mlx/dtype_utils.h:28:3: note: expanded from macro 'MLX_INTERNAL_DTYPE_SWITCH_INTS'
   28 |   MLX_INTERNAL_DTYPE_SWITCH_CASE(uint64, uint64_t)
      |   ^
/path/to/mlx-lm-rocm-support/mlx/dtype_utils.h:17:5: note: expanded from macro 'MLX_INTERNAL_DTYPE_SWITCH_CASE'
   17 |     f(type_identity<TYPE>{});                       \
      |     ^
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:432:5: note: in instantiation of function template specialization 'mlx::core::dispatch_all_types<(lambda at /path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:432:36)>' requested here
  432 |     dispatch_all_types(in.dtype(), [&](auto type_tag) {
      |     ^
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:486:15: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
  486 |               rocprim::radix_sort_pairs(
      |               ^~~~~~~~~~~~~~~~~~~~~~~~~~
  487 |                   temp_storage, temp_bytes,
      |                   ~~~~~~~~~~~~~~~~~~~~~~~~~
  488 |                   vals_tmp, vals_sorted,
      |                   ~~~~~~~~~~~~~~~~~~~~~~
  489 |                   indices_in, indices_out,
      |                   ~~~~~~~~~~~~~~~~~~~~~~~~
  490 |                   N, 0, sizeof(ValT) * 8, hip_stream);
      |                   ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:511:13: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
  511 |             rocprim::radix_sort_keys(
      |             ^~~~~~~~~~~~~~~~~~~~~~~~~
  512 |                 nullptr, temp_bytes,
      |                 ~~~~~~~~~~~~~~~~~~~~
  513 |                 vals_in, vals_out_buf,
      |                 ~~~~~~~~~~~~~~~~~~~~~~
  514 |                 N, 0, sizeof(ValT) * 8, hip_stream);
      |                 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:525:15: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
  525 |               rocprim::radix_sort_keys(
      |               ^~~~~~~~~~~~~~~~~~~~~~~~~
  526 |                   temp_storage, temp_bytes,
      |                   ~~~~~~~~~~~~~~~~~~~~~~~~~
  527 |                   vals_in, vals_out_buf,
      |                   ~~~~~~~~~~~~~~~~~~~~~~
  528 |                   N, 0, sizeof(ValT) * 8, hip_stream);
      |                   ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:453:13: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
  453 |             rocprim::radix_sort_pairs(
      |             ^~~~~~~~~~~~~~~~~~~~~~~~~~
  454 |                 nullptr, temp_bytes,
      |                 ~~~~~~~~~~~~~~~~~~~~
  455 |                 vals_tmp, vals_sorted,
      |                 ~~~~~~~~~~~~~~~~~~~~~~
  456 |                 indices_in, indices_out,
      |                 ~~~~~~~~~~~~~~~~~~~~~~~~
  457 |                 N, 0, sizeof(ValT) * 8, hip_stream);
      |                 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:437:59: note: while substituting into a lambda expression here
  437 |         encoder.launch_kernel([&](hipStream_t hip_stream) {
      |                                                           ^
/path/to/mlx-lm-rocm-support/mlx/dtype_utils.h:51:5: note: in instantiation of function template specialization 'mlx::core::(anonymous namespace)::gpu_sort(const Stream &, array, array &, int, bool)::(anonymous class)::operator()<mlx::core::type_identity<mlx::core::_MLX_Float16>>' requested here
   51 |     MLX_INTERNAL_DTYPE_SWITCH_FLOATS();
      |     ^
/path/to/mlx-lm-rocm-support/mlx/dtype_utils.h:31:3: note: expanded from macro 'MLX_INTERNAL_DTYPE_SWITCH_FLOATS'
   31 |   MLX_INTERNAL_DTYPE_SWITCH_CASE(float16, float16_t);   \
      |   ^
/path/to/mlx-lm-rocm-support/mlx/dtype_utils.h:17:5: note: expanded from macro 'MLX_INTERNAL_DTYPE_SWITCH_CASE'
   17 |     f(type_identity<TYPE>{});                       \
      |     ^
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:432:5: note: in instantiation of function template specialization 'mlx::core::dispatch_all_types<(lambda at /path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:432:36)>' requested here
  432 |     dispatch_all_types(in.dtype(), [&](auto type_tag) {
      |     ^
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:486:15: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
  486 |               rocprim::radix_sort_pairs(
      |               ^~~~~~~~~~~~~~~~~~~~~~~~~~
  487 |                   temp_storage, temp_bytes,
      |                   ~~~~~~~~~~~~~~~~~~~~~~~~~
  488 |                   vals_tmp, vals_sorted,
      |                   ~~~~~~~~~~~~~~~~~~~~~~
  489 |                   indices_in, indices_out,
      |                   ~~~~~~~~~~~~~~~~~~~~~~~~
  490 |                   N, 0, sizeof(ValT) * 8, hip_stream);
      |                   ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:511:13: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
  511 |             rocprim::radix_sort_keys(
      |             ^~~~~~~~~~~~~~~~~~~~~~~~~
  512 |                 nullptr, temp_bytes,
      |                 ~~~~~~~~~~~~~~~~~~~~
  513 |                 vals_in, vals_out_buf,
      |                 ~~~~~~~~~~~~~~~~~~~~~~
  514 |                 N, 0, sizeof(ValT) * 8, hip_stream);
      |                 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:525:15: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
  525 |               rocprim::radix_sort_keys(
      |               ^~~~~~~~~~~~~~~~~~~~~~~~~
  526 |                   temp_storage, temp_bytes,
      |                   ~~~~~~~~~~~~~~~~~~~~~~~~~
  527 |                   vals_in, vals_out_buf,
      |                   ~~~~~~~~~~~~~~~~~~~~~~
  528 |                   N, 0, sizeof(ValT) * 8, hip_stream);
      |                   ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:453:13: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
  453 |             rocprim::radix_sort_pairs(
      |             ^~~~~~~~~~~~~~~~~~~~~~~~~~
  454 |                 nullptr, temp_bytes,
      |                 ~~~~~~~~~~~~~~~~~~~~
  455 |                 vals_tmp, vals_sorted,
      |                 ~~~~~~~~~~~~~~~~~~~~~~
  456 |                 indices_in, indices_out,
      |                 ~~~~~~~~~~~~~~~~~~~~~~~~
  457 |                 N, 0, sizeof(ValT) * 8, hip_stream);
      |                 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:437:59: note: while substituting into a lambda expression here
  437 |         encoder.launch_kernel([&](hipStream_t hip_stream) {
      |                                                           ^
/path/to/mlx-lm-rocm-support/mlx/dtype_utils.h:51:5: note: in instantiation of function template specialization 'mlx::core::(anonymous namespace)::gpu_sort(const Stream &, array, array &, int, bool)::(anonymous class)::operator()<mlx::core::type_identity<mlx::core::_MLX_BFloat16>>' requested here
   51 |     MLX_INTERNAL_DTYPE_SWITCH_FLOATS();
      |     ^
/path/to/mlx-lm-rocm-support/mlx/dtype_utils.h:32:3: note: expanded from macro 'MLX_INTERNAL_DTYPE_SWITCH_FLOATS'
   32 |   MLX_INTERNAL_DTYPE_SWITCH_CASE(bfloat16, bfloat16_t); \
      |   ^
/path/to/mlx-lm-rocm-support/mlx/dtype_utils.h:17:5: note: expanded from macro 'MLX_INTERNAL_DTYPE_SWITCH_CASE'
   17 |     f(type_identity<TYPE>{});                       \
      |     ^
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:432:5: note: in instantiation of function template specialization 'mlx::core::dispatch_all_types<(lambda at /path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:432:36)>' requested here
  432 |     dispatch_all_types(in.dtype(), [&](auto type_tag) {
      |     ^
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:486:15: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
  486 |               rocprim::radix_sort_pairs(
      |               ^~~~~~~~~~~~~~~~~~~~~~~~~~
  487 |                   temp_storage, temp_bytes,
      |                   ~~~~~~~~~~~~~~~~~~~~~~~~~
  488 |                   vals_tmp, vals_sorted,
      |                   ~~~~~~~~~~~~~~~~~~~~~~
  489 |                   indices_in, indices_out,
      |                   ~~~~~~~~~~~~~~~~~~~~~~~~
  490 |                   N, 0, sizeof(ValT) * 8, hip_stream);
      |                   ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:511:13: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
  511 |             rocprim::radix_sort_keys(
      |             ^~~~~~~~~~~~~~~~~~~~~~~~~
  512 |                 nullptr, temp_bytes,
      |                 ~~~~~~~~~~~~~~~~~~~~
  513 |                 vals_in, vals_out_buf,
      |                 ~~~~~~~~~~~~~~~~~~~~~~
  514 |                 N, 0, sizeof(ValT) * 8, hip_stream);
      |                 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:525:15: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
  525 |               rocprim::radix_sort_keys(
      |               ^~~~~~~~~~~~~~~~~~~~~~~~~
  526 |                   temp_storage, temp_bytes,
      |                   ~~~~~~~~~~~~~~~~~~~~~~~~~
  527 |                   vals_in, vals_out_buf,
      |                   ~~~~~~~~~~~~~~~~~~~~~~
  528 |                   N, 0, sizeof(ValT) * 8, hip_stream);
      |                   ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:453:13: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
  453 |             rocprim::radix_sort_pairs(
      |             ^~~~~~~~~~~~~~~~~~~~~~~~~~
  454 |                 nullptr, temp_bytes,
      |                 ~~~~~~~~~~~~~~~~~~~~
  455 |                 vals_tmp, vals_sorted,
      |                 ~~~~~~~~~~~~~~~~~~~~~~
  456 |                 indices_in, indices_out,
      |                 ~~~~~~~~~~~~~~~~~~~~~~~~
  457 |                 N, 0, sizeof(ValT) * 8, hip_stream);
      |                 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:437:59: note: while substituting into a lambda expression here
  437 |         encoder.launch_kernel([&](hipStream_t hip_stream) {
      |                                                           ^
/path/to/mlx-lm-rocm-support/mlx/dtype_utils.h:51:5: note: in instantiation of function template specialization 'mlx::core::(anonymous namespace)::gpu_sort(const Stream &, array, array &, int, bool)::(anonymous class)::operator()<mlx::core::type_identity<float>>' requested here
   51 |     MLX_INTERNAL_DTYPE_SWITCH_FLOATS();
      |     ^
/path/to/mlx-lm-rocm-support/mlx/dtype_utils.h:33:3: note: expanded from macro 'MLX_INTERNAL_DTYPE_SWITCH_FLOATS'
   33 |   MLX_INTERNAL_DTYPE_SWITCH_CASE(float32, float);       \
      |   ^
/path/to/mlx-lm-rocm-support/mlx/dtype_utils.h:17:5: note: expanded from macro 'MLX_INTERNAL_DTYPE_SWITCH_CASE'
   17 |     f(type_identity<TYPE>{});                       \
      |     ^
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:432:5: note: in instantiation of function template specialization 'mlx::core::dispatch_all_types<(lambda at /path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:432:36)>' requested here
  432 |     dispatch_all_types(in.dtype(), [&](auto type_tag) {
      |     ^
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:486:15: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
  486 |               rocprim::radix_sort_pairs(
      |               ^~~~~~~~~~~~~~~~~~~~~~~~~~
  487 |                   temp_storage, temp_bytes,
      |                   ~~~~~~~~~~~~~~~~~~~~~~~~~
  488 |                   vals_tmp, vals_sorted,
      |                   ~~~~~~~~~~~~~~~~~~~~~~
  489 |                   indices_in, indices_out,
      |                   ~~~~~~~~~~~~~~~~~~~~~~~~
  490 |                   N, 0, sizeof(ValT) * 8, hip_stream);
      |                   ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:511:13: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
  511 |             rocprim::radix_sort_keys(
      |             ^~~~~~~~~~~~~~~~~~~~~~~~~
  512 |                 nullptr, temp_bytes,
      |                 ~~~~~~~~~~~~~~~~~~~~
  513 |                 vals_in, vals_out_buf,
      |                 ~~~~~~~~~~~~~~~~~~~~~~
  514 |                 N, 0, sizeof(ValT) * 8, hip_stream);
      |                 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:525:15: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
  525 |               rocprim::radix_sort_keys(
      |               ^~~~~~~~~~~~~~~~~~~~~~~~~
  526 |                   temp_storage, temp_bytes,
      |                   ~~~~~~~~~~~~~~~~~~~~~~~~~
  527 |                   vals_in, vals_out_buf,
      |                   ~~~~~~~~~~~~~~~~~~~~~~
  528 |                   N, 0, sizeof(ValT) * 8, hip_stream);
      |                   ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:453:13: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
  453 |             rocprim::radix_sort_pairs(
      |             ^~~~~~~~~~~~~~~~~~~~~~~~~~
  454 |                 nullptr, temp_bytes,
      |                 ~~~~~~~~~~~~~~~~~~~~
  455 |                 vals_tmp, vals_sorted,
      |                 ~~~~~~~~~~~~~~~~~~~~~~
  456 |                 indices_in, indices_out,
      |                 ~~~~~~~~~~~~~~~~~~~~~~~~
  457 |                 N, 0, sizeof(ValT) * 8, hip_stream);
      |                 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:437:59: note: while substituting into a lambda expression here
  437 |         encoder.launch_kernel([&](hipStream_t hip_stream) {
      |                                                           ^
/path/to/mlx-lm-rocm-support/mlx/dtype_utils.h:51:5: note: in instantiation of function template specialization 'mlx::core::(anonymous namespace)::gpu_sort(const Stream &, array, array &, int, bool)::(anonymous class)::operator()<mlx::core::type_identity<double>>' requested here
   51 |     MLX_INTERNAL_DTYPE_SWITCH_FLOATS();
      |     ^
/path/to/mlx-lm-rocm-support/mlx/dtype_utils.h:34:3: note: expanded from macro 'MLX_INTERNAL_DTYPE_SWITCH_FLOATS'
   34 |   MLX_INTERNAL_DTYPE_SWITCH_CASE(float64, double)
      |   ^
/path/to/mlx-lm-rocm-support/mlx/dtype_utils.h:17:5: note: expanded from macro 'MLX_INTERNAL_DTYPE_SWITCH_CASE'
   17 |     f(type_identity<TYPE>{});                       \
      |     ^
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:432:5: note: in instantiation of function template specialization 'mlx::core::dispatch_all_types<(lambda at /path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:432:36)>' requested here
  432 |     dispatch_all_types(in.dtype(), [&](auto type_tag) {
      |     ^
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:486:15: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
  486 |               rocprim::radix_sort_pairs(
      |               ^~~~~~~~~~~~~~~~~~~~~~~~~~
  487 |                   temp_storage, temp_bytes,
      |                   ~~~~~~~~~~~~~~~~~~~~~~~~~
  488 |                   vals_tmp, vals_sorted,
      |                   ~~~~~~~~~~~~~~~~~~~~~~
  489 |                   indices_in, indices_out,
      |                   ~~~~~~~~~~~~~~~~~~~~~~~~
  490 |                   N, 0, sizeof(ValT) * 8, hip_stream);
      |                   ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:511:13: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
  511 |             rocprim::radix_sort_keys(
      |             ^~~~~~~~~~~~~~~~~~~~~~~~~
  512 |                 nullptr, temp_bytes,
      |                 ~~~~~~~~~~~~~~~~~~~~
  513 |                 vals_in, vals_out_buf,
      |                 ~~~~~~~~~~~~~~~~~~~~~~
  514 |                 N, 0, sizeof(ValT) * 8, hip_stream);
      |                 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:525:15: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
  525 |               rocprim::radix_sort_keys(
      |               ^~~~~~~~~~~~~~~~~~~~~~~~~
  526 |                   temp_storage, temp_bytes,
      |                   ~~~~~~~~~~~~~~~~~~~~~~~~~
  527 |                   vals_in, vals_out_buf,
      |                   ~~~~~~~~~~~~~~~~~~~~~~
  528 |                   N, 0, sizeof(ValT) * 8, hip_stream);
      |                   ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
55 warnings and 4 errors generated when compiling for gfx1103.
failed to execute:/opt/rocm/lib/llvm/bin/clang++  --offload-arch=gfx1103 -O3  -c -x hip /path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip -o "/path/to/mlx-lm-rocm-support/build/mlx/backend/rocm/hip_objs/sort.o" -fPIC -DMLX_USE_ROCM -I/path/to/mlx-lm-rocm-support -I/opt/rocm/include -I/usr/include/c++/11 -I/usr/include/x86_64-linux-gnu/c++/11 -I/usr/include/c++/11/backward -I/usr/include/x86_64-linux-gnu -I/usr/include -I/opt/rocm/include -I/opt/rocm/include/ -I/opt/rocm/include -I/opt/rocm/include -I/opt/rocm/include -I/opt/rocm/include/hiprand -I/opt/rocm/include -I/opt/rocm/include -std=c++17 -parallel-jobs=8
make[2]: *** [mlx/backend/rocm/CMakeFiles/mlx_rocm_kernels_lib.dir/build.make:254: mlx/backend/rocm/hip_objs/sort.o] Error 1
make[1]: *** [CMakeFiles/Makefile2:1549: mlx/backend/rocm/CMakeFiles/mlx_rocm_kernels_lib.dir/all] Error 2
make: *** [Makefile:146: all] Error 2

Please try the newest version, I have added a guard which should help.

@chimezie
Copy link
Copy Markdown
Contributor

chimezie commented Apr 4, 2026

I completed the build:

[..snip..]
[ 98%] Built target linear_regression
[ 98%] Built target metal_capture
[ 98%] Built target test_teardown
[ 98%] Built target logistic_regression
[ 98%] Built target tutorial
[ 98%] Built target distributed
[100%] Linking CXX executable tests
[100%] Built target tests
root@pop-os:/path/to/mlx-lm-rocm-support/build# python
bash: python: command not found
root@pop-os:/path/to/mlx-lm-rocm-support/build# curl -LsSf https://astral.sh/uv/install.sh | sh
source $HOME/.local/bin/env   # or: export PATH="$HOME/.local/bin:$PATH"
downloading uv 0.11.3 x86_64-unknown-linux-gnu
installing to /root/.local/bin
  uv
  uvx
everything's installed!

To add $HOME/.local/bin to your PATH, either restart your shell or run:

    source $HOME/.local/bin/env (sh, bash, zsh)
    source $HOME/.local/bin/env.fish (fish)
root@pop-os:/path/to/mlx-lm-rocm-support/build# uv --version
uv 0.11.3 (x86_64-unknown-linux-gnu)
root@pop-os:/path/to/mlx-lm-rocm-support/build# uv venv /opt/mlx-env --python 3.12
source /opt/mlx-env/bin/activate
Using CPython 3.12.13
Creating virtual environment at: /opt/mlx-env
Activate with: source /opt/mlx-env/bin/activate

But failed to import mlx_lm or run mlx_lm.chat :

# uv run --active ipython
warning: No `requires-python` value found in the workspace. Defaulting to `>=3.12`.
Python 3.12.13 (main, Mar 24 2026, 22:49:22) [Clang 22.1.1 ]
Type 'copyright', 'credits' or 'license' for more information
IPython 9.12.0 -- An enhanced Interactive Python. Type '?' for help.
Tip: Use `F2` or %edit with no arguments to open an empty editor with a temporary file.

In [1]: import mlx_lm
---------------------------------------------------------------------------
ImportError                               Traceback (most recent call last)
Cell In[1], line 1
----> 1 import mlx_lm

File /path/to/mlx_lm/__init__.py:9
      5 from ._version import __version__
      7 os.environ["TRANSFORMERS_NO_ADVISORY_WARNINGS"] = "1"
----> 9 from .convert import convert
     10 from .generate import batch_generate, generate, stream_generate
     11 from .utils import load

File /path/to/mlx_lm/convert.py:7
      4 from pathlib import Path
      5 from typing import Callable, Optional, Union
----> 7 import mlx.core as mx
      8 import mlx.nn as nn
      9 from mlx.utils import tree_map_with_path

ImportError: /path/to/mlx-lm-rocm-support/python/mlx/core.cpython-312-x86_64-linux-gnu.so: undefined symbol: _ZN3mlx4core4fast10hip_kernelERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEERKSt6vectorIS7_SaIS7_EESE_S9_S9_bi

In [2]: exit
(mlx-env) root@pop-os:/path/to/mlx-lm-rocm-support# CMAKE_BUILD_PARALLEL_LEVEL=$(nproc)   uv pip install -e ".[dev]" --no-build-isolation
Using Python 3.12.13 environment at: /opt/mlx-env
Resolved 42 packages in 299ms
      Built mlx @ /path/to//mlx-lm-rocm-support
Prepared 1 package in 1m 43s
Uninstalled 1 package in 0.44ms
Installed 1 package in 2ms
 ~ mlx==0.31.2.dev20260404+a866ff4f (from file:///path/to/mlx-lm-rocm-support)
(mlx-env) root@pop-os:/path/to/mlx-lm-rocm-support# pushd
/path/to/mlx-lm /path/to/mlx-lm-rocm-support
(mlx-env) root@pop-os:/path/to/mlx-lm# git pull; CMAKE_BUILD_PARALLEL_LEVEL=$(nproc) uv pip install -U .  
Already up to date.
Using Python 3.12.13 environment at: /opt/mlx-env
Resolved 32 packages in 791ms
      Built mlx-lm @ file:///path/to/mlx-lm
Prepared 1 package in 392ms
Uninstalled 1 package in 1ms
Installed 1 package in 10ms
 ~ mlx-lm==0.31.2 (from file:///path/to/mlx-lm)
(mlx-env) root@pop-os:/path/to/mlx-lm# mlx_lm.
mlx_lm.awq            mlx_lm.chat           mlx_lm.dynamic_quant  mlx_lm.generate       mlx_lm.manage         mlx_lm.share          
mlx_lm.benchmark      mlx_lm.convert        mlx_lm.evaluate       mlx_lm.gptq           mlx_lm.perplexity     mlx_lm.upload         
mlx_lm.cache_prompt   mlx_lm.dwq            mlx_lm.fuse           mlx_lm.lora           mlx_lm.server         
(mlx-env) root@pop-os:/path/to/mlx-lm# mlx_lm.chat --help
Traceback (most recent call last):
  File "/opt/mlx-env/bin/mlx_lm.chat", line 4, in <module>
    from mlx_lm.chat import main
  File "/opt/mlx-env/lib/python3.12/site-packages/mlx_lm/__init__.py", line 9, in <module>
    from .convert import convert
  File "/opt/mlx-env/lib/python3.12/site-packages/mlx_lm/convert.py", line 7, in <module>
    import mlx.core as mx
ImportError: /path/to/mlx-lm-rocm-support/python/mlx/core.cpython-312-x86_64-linux-gnu.so: undefined symbol: _ZN3mlx4core4fast10hip_kernelERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEERKSt6vectorIS7_SaIS7_EESE_S9_S9_bi
(mlx-env) root@pop-os:/path/to/mlx-lm# 

When MLX_BUILD_ROCM=OFF, the Python binding unconditionally references
mlx::core::fast::hip_kernel but no_rocm.cpp only stubbed
rocm::is_available(). Add a throwing stub matching the pattern used by
no_metal.cpp and no_cuda.cpp.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@Geramy
Copy link
Copy Markdown

Geramy commented Apr 4, 2026

# uv run --active ipython
warning: No `requires-python` value found in the workspace. Defaulting to `>=3.12`.
Python 3.12.13 (main, Mar 24 2026, 22:49:22) [Clang 22.1.1 ]
Type 'copyright', 'credits' or 'license' for more information
IPython 9.12.0 -- An enhanced Interactive Python. Type '?' for help.
Tip: Use `F2` or %edit with no arguments to open an empty editor with a temporary file.

In [1]: import mlx_lm
---------------------------------------------------------------------------
ImportError                               Traceback (most recent call last)
Cell In[1], line 1
----> 1 import mlx_lm

File /path/to/mlx_lm/__init__.py:9
      5 from ._version import __version__
      7 os.environ["TRANSFORMERS_NO_ADVISORY_WARNINGS"] = "1"
----> 9 from .convert import convert
     10 from .generate import batch_generate, generate, stream_generate
     11 from .utils import load

File /path/to/mlx_lm/convert.py:7
      4 from pathlib import Path
      5 from typing import Callable, Optional, Union
----> 7 import mlx.core as mx
      8 import mlx.nn as nn
      9 from mlx.utils import tree_map_with_path

ImportError: /path/to/mlx-lm-rocm-support/python/mlx/core.cpython-312-x86_64-linux-gnu.so: undefined symbol: _ZN3mlx4core4fast10hip_kernelERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEERKSt6vectorIS7_SaIS7_EESE_S9_S9_bi

In [2]: exit
(mlx-env) root@pop-os:/path/to/mlx-lm-rocm-support# CMAKE_BUILD_PARALLEL_LEVEL=$(nproc)   uv pip install -e ".[dev]" --no-build-isolation
Using Python 3.12.13 environment at: /opt/mlx-env
Resolved 42 packages in 299ms
      Built mlx @ /path/to//mlx-lm-rocm-support
Prepared 1 package in 1m 43s
Uninstalled 1 package in 0.44ms
Installed 1 package in 2ms
 ~ mlx==0.31.2.dev20260404+a866ff4f (from file:///path/to/mlx-lm-rocm-support)
(mlx-env) root@pop-os:/path/to/mlx-lm-rocm-support# pushd
/path/to/mlx-lm /path/to/mlx-lm-rocm-support
(mlx-env) root@pop-os:/path/to/mlx-lm# git pull; CMAKE_BUILD_PARALLEL_LEVEL=$(nproc) uv pip install -U .  
Already up to date.
Using Python 3.12.13 environment at: /opt/mlx-env
Resolved 32 packages in 791ms
      Built mlx-lm @ file:///path/to/mlx-lm
Prepared 1 package in 392ms
Uninstalled 1 package in 1ms
Installed 1 package in 10ms
 ~ mlx-lm==0.31.2 (from file:///path/to/mlx-lm)
(mlx-env) root@pop-os:/path/to/mlx-lm# mlx_lm.
mlx_lm.awq            mlx_lm.chat           mlx_lm.dynamic_quant  mlx_lm.generate       mlx_lm.manage         mlx_lm.share          
mlx_lm.benchmark      mlx_lm.convert        mlx_lm.evaluate       mlx_lm.gptq           mlx_lm.perplexity     mlx_lm.upload         
mlx_lm.cache_prompt   mlx_lm.dwq            mlx_lm.fuse           mlx_lm.lora           mlx_lm.server         
(mlx-env) root@pop-os:/path/to/mlx-lm# mlx_lm.chat --help
Traceback (most recent call last):
  File "/opt/mlx-env/bin/mlx_lm.chat", line 4, in <module>
    from mlx_lm.chat import main
  File "/opt/mlx-env/lib/python3.12/site-packages/mlx_lm/__init__.py", line 9, in <module>
    from .convert import convert
  File "/opt/mlx-env/lib/python3.12/site-packages/mlx_lm/convert.py", line 7, in <module>
    import mlx.core as mx
ImportError: /path/to/mlx-lm-rocm-support/python/mlx/core.cpython-312-x86_64-linux-gnu.so: undefined symbol: _ZN3mlx4core4fast10hip_kernelERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEERKSt6vectorIS7_SaIS7_EESE_S9_S9_bi
(mlx-env) root@pop-os:/path/to/mlx-lm# 

please try again we where missing no-rocm stubs.

@Geramy
Copy link
Copy Markdown

Geramy commented Apr 4, 2026

@chimezie Please rebuild with these instructions.

Building mlx for ROCm

Prerequisites

ROCm packages installed (via apt or your distro's package manager):

  • hip-dev, rocblas-dev, rocthrust-dev, rocprim-dev, hiprand-dev, rocwmma-dev
  • hiprtc-dev, hipblaslt-dev (for JIT and optimized GEMM)
  • OpenBLAS: libopenblas-dev

Build command

CMAKE_ARGS="-DMLX_BUILD_ROCM=ON -DCMAKE_HIP_ARCHITECTURES=gfx1103 -DBLA_VENDOR=OpenBLAS -DCMAKE_BUILD_TYPE=RelWithDebInfo"

CMAKE_BUILD_PARALLEL_LEVEL=$(nproc)
uv pip install -e ".[dev]" --no-build-isolation

Architecture targeting

Replace gfx1150 with your GPU's architecture. Common values:

  │ Family  │            Architecture            │                    GPU                     │
  ├─────────┼────────────────────────────────────┼────────────────────────────────────────────┤
  │ CDNA    │ gfx908                             │ MI100                                      │
  ├─────────┼────────────────────────────────────┼────────────────────────────────────────────┤
  │ CDNA2   │ gfx90a                             │ MI200                                      │
  ├─────────┼────────────────────────────────────┼────────────────────────────────────────────┤
  │ CDNA3   │ gfx942                             │ MI300                                      │
  ├─────────┼────────────────────────────────────┼────────────────────────────────────────────┤
  │ RDNA1   │ gfx1010, gfx1011, gfx1012          │ RX 5000 series                             │
  ├─────────┼────────────────────────────────────┼────────────────────────────────────────────┤
  │ RDNA2   │ gfx1030, gfx1031, gfx1032          │ RX 6000 series                             │
  ├─────────┼────────────────────────────────────┼────────────────────────────────────────────┤
  │ RDNA3   │ gfx1100, gfx1101, gfx1102, gfx1103 │ RX 7900 / 7600 / iGPU (Phoenix/Hawk Point) │
  ├─────────┼────────────────────────────────────┼────────────────────────────────────────────┤
  │ RDNA3.5 │ gfx1150, gfx1151, gfx1152          │ Ryzen AI / Radeon 8060S                    │
  ├─────────┼────────────────────────────────────┼────────────────────────────────────────────┤
  │ RDNA4   │ gfx1200, gfx1201                   │ RX 9000 series                             │
  └─────────┴────────────────────────────────────┴────────────────────────────────────────────┘

Omit -DCMAKE_HIP_ARCHITECTURES=... to build for all supported architectures (slow — compiles 17 targets).

The CI does it this way

From .github/workflows/build_rocm.yml:

CMAKE_ARGS="-DMLX_BUILD_ROCM=ON -DMLX_ROCM_ARCHITECTURES=gfx1151 -DBLA_VENDOR=OpenBLAS -DCMAKE_BUILD_TYPE=RelWithDebInfo"

The key flag you were missing was -DMLX_BUILD_ROCM=ON inside CMAKE_ARGS. Without it, setup.py passes the default
MLX_BUILD_ROCM=OFF to cmake and you get the no_rocm.cpp stub instead of the real backend.

@chimezie
Copy link
Copy Markdown
Contributor

chimezie commented Apr 4, 2026

# git pull
remote: Enumerating objects: 6, done.
remote: Counting objects: 100% (6/6), done.
remote: Compressing objects: 100% (2/2), done.
remote: Total 6 (delta 4), reused 6 (delta 4), pack-reused 0 (from 0)
Unpacking objects: 100% (6/6), 933 bytes | 466.00 KiB/s, done.
From https://github.com/NripeshN/mlx
   a866ff4f..71d03e59  rocm-support -> origin/rocm-support
Updating a866ff4f..71d03e59
Fast-forward
 mlx/backend/rocm/no_rocm.cpp | 24 ++++++++++++++++++++++--
 1 file changed, 22 insertions(+), 2 deletions(-)
# git pull; CMAKE_BUILD_PARALLEL_LEVEL=$(nproc) uv pip install -U .
Already up to date.
Using Python 3.11.15 environment at: /opt/mlx-env
Resolved 1 package in 891ms
      Built mlx @ file:///path/to/mlx-lm-rocm-support
Prepared 1 package in 35.10s
Uninstalled 1 package in 7ms
Installed 1 package in 7ms
 - mlx==0.31.2.dev20260404+a866ff4f (from file:///path/to/mlx-lm-rocm-support)
 + mlx==0.31.2.dev20260404+71d03e59 (from file:///path/to/mlx-lm-rocm-support)
(mlx-env) root@pop-os:/path/to/mlx-lm-rocm-support# uv run --active ipython
warning: No `requires-python` value found in the workspace. Defaulting to `>=3.11`.
Python 3.11.15 (main, Mar 24 2026, 22:50:29) [Clang 22.1.1 ]
Type 'copyright', 'credits' or 'license' for more information
IPython 9.10.1 -- An enhanced Interactive Python. Type '?' for help.
Tip: IPython supports combining unicode identifiers, eg F\vec<tab> will become F⃗, useful for physics equations. Play with \dot \ddot and others.

In [1]: import mlx_lm

In [2]: exit
# mlx_lm.chat --model mlx-community/Qwen3-1.7B-4bit 
Downloading (incomplete total...): 0.00B [00:00, ?B/s]                                                                                                                                       Warning: You are sending unauthenticated requests to the HF Hub. Please set a HF_TOKEN to enable higher rate limits and faster downloads.                                | 0/9 [00:00<?, ?it/s]
Fetching 9 files: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 9/9 [00:09<00:00,  1.04s/it]
Download complete: : 984MB [00:09, 105MB/s]              ██████████████████████████▉                                                                            | 4/9 [00:09<00:13,  2.75s/it]
[INFO] Starting chat session with mlx-community/Qwen3-1.7B-4bit.
The command list:
- 'q' to exit
- 'r' to reset the chat
- 'h' to display these commands
>> What is JSON?
<think>
Okay, so the user is asking, "What is JSON?" Let me start by breaking this down. JSON stands for JavaScript Object Notation. I remember that JSON is a data format that's used to represent data in a structured

@chimezie
Copy link
Copy Markdown
Contributor

chimezie commented Apr 6, 2026

Following your build instructions:

 CMAKE_ARGS="-DMLX_BUILD_ROCM=ON -DCMAKE_HIP_ARCHITECTURES=gfx1103 -DBLA_VENDOR=OpenBLAS -DCMAKE_BUILD_TYPE=RelWithDebInfo" CMAKE_ARGS="-DMLX_BUILD_ROCM=ON -DCMAKE_HIP_ARCHITECTURES=gfx1103 -DBLA_VENDOR=OpenBLAS -DCMAKE_BUILD_TYPE=RelWithDebInfo" uv pip install -e ".[dev]" --no-build-isolation
Using Python 3.11.15 environment at: /opt/mlx-env
Resolved 42 packages in 183ms
Installed 1 package in 13ms
  x Failed to build `mlx @ file:///path/to/mlx-lm-rocm-support`
[.. snip ..]
     -- Downloading gguflib
      -- Configuring incomplete, errors occurred!
      [stderr]
      CMake Error at mlx/backend/rocm/CMakeLists.txt:8 (find_package):
        Could not find a package configuration file provided by "hip" with any of
        the following names:
          hip.cps
         hipConfig.cmake
          hip-config.cmake

        Add the installation prefix of "hip" to CMAKE_PREFIX_PATH or set "hip_DIR"
        to a directory containing one of the above files.  If "hip" provides a
        separate development package or SDK, be sure it has been installed.
[..snip..]
          raise CalledProcessError(retcode, process.args,
      subprocess.CalledProcessError: Command '['cmake', '/path/to/mlx-lm-rocm-support', '-DCMAKE_INSTALL_PREFIX=/tmp/tmp6q2djk76.build-lib/mlx',
      '-DMLX_PYTHON_BINDINGS_OUTPUT_DIRECTORY=/tmp/tmp6q2djk76.build-lib/mlx', '-DCMAKE_BUILD_TYPE=Release', '-DMLX_BUILD_PYTHON_BINDINGS=ON', '-DMLX_BUILD_TESTS=OFF',
      '-DMLX_BUILD_BENCHMARKS=OFF', '-DMLX_BUILD_EXAMPLES=OFF', '-DBUILD_SHARED_LIBS=ON', '-DMLX_BUILD_ROCM=ON', '-DCMAKE_HIP_ARCHITECTURES=gfx1103', '-DBLA_VENDOR=OpenBLAS',
      '-DCMAKE_BUILD_TYPE=RelWithDebInfo', '-DMLX_VERSION=0.31.2.dev20260406+71d03e59']' returned non-zero exit status 1.
      An error occurred when building editable wheel for mlx.

All the other packages installed fine except hiprtc-dev

# sudo apt-get install hiprtc-dev                
Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
E: Unable to locate package hiprtc-dev
# sudo apt install -y hip-dev rocblas-dev rocthrust-dev rocprim-dev hiprand-dev rocwmma-dev libopenblas-dev hipblaslt-dev   
Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
libopenblas-dev is already the newest version (0.3.20+ds-1).
hip-dev is already the newest version (7.2.26015.70200-43~22.04).
hipblaslt-dev is already the newest version (1.2.1.70200-43~22.04).
hiprand-dev is already the newest version (3.1.0.70200-43~22.04).
rocblas-dev is already the newest version (5.2.0.70200-43~22.04).
rocprim-dev is already the newest version (4.2.0.70200-43~22.04).
rocthrust-dev is already the newest version (4.2.0.70200-43~22.04).
rocwmma-dev is already the newest version (2.2.0.70200-43~22.04).
0 upgraded, 0 newly installed, 0 to remove and 40 not upgraded.

@Geramy
Copy link
Copy Markdown

Geramy commented Apr 6, 2026

Following your build instructions:

 CMAKE_ARGS="-DMLX_BUILD_ROCM=ON -DCMAKE_HIP_ARCHITECTURES=gfx1103 -DBLA_VENDOR=OpenBLAS -DCMAKE_BUILD_TYPE=RelWithDebInfo" CMAKE_ARGS="-DMLX_BUILD_ROCM=ON -DCMAKE_HIP_ARCHITECTURES=gfx1103 -DBLA_VENDOR=OpenBLAS -DCMAKE_BUILD_TYPE=RelWithDebInfo" uv pip install -e ".[dev]" --no-build-isolation
Using Python 3.11.15 environment at: /opt/mlx-env
Resolved 42 packages in 183ms
Installed 1 package in 13ms
  x Failed to build `mlx @ file:///path/to/mlx-lm-rocm-support`
[.. snip ..]
     -- Downloading gguflib
      -- Configuring incomplete, errors occurred!
      [stderr]
      CMake Error at mlx/backend/rocm/CMakeLists.txt:8 (find_package):
        Could not find a package configuration file provided by "hip" with any of
        the following names:
          hip.cps
         hipConfig.cmake
          hip-config.cmake

        Add the installation prefix of "hip" to CMAKE_PREFIX_PATH or set "hip_DIR"
        to a directory containing one of the above files.  If "hip" provides a
        separate development package or SDK, be sure it has been installed.
[..snip..]
          raise CalledProcessError(retcode, process.args,
      subprocess.CalledProcessError: Command '['cmake', '/path/to/mlx-lm-rocm-support', '-DCMAKE_INSTALL_PREFIX=/tmp/tmp6q2djk76.build-lib/mlx',
      '-DMLX_PYTHON_BINDINGS_OUTPUT_DIRECTORY=/tmp/tmp6q2djk76.build-lib/mlx', '-DCMAKE_BUILD_TYPE=Release', '-DMLX_BUILD_PYTHON_BINDINGS=ON', '-DMLX_BUILD_TESTS=OFF',
      '-DMLX_BUILD_BENCHMARKS=OFF', '-DMLX_BUILD_EXAMPLES=OFF', '-DBUILD_SHARED_LIBS=ON', '-DMLX_BUILD_ROCM=ON', '-DCMAKE_HIP_ARCHITECTURES=gfx1103', '-DBLA_VENDOR=OpenBLAS',
      '-DCMAKE_BUILD_TYPE=RelWithDebInfo', '-DMLX_VERSION=0.31.2.dev20260406+71d03e59']' returned non-zero exit status 1.
      An error occurred when building editable wheel for mlx.

All the other packages installed fine except hiprtc-dev

# sudo apt-get install hiprtc-dev                
Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
E: Unable to locate package hiprtc-dev
# sudo apt install -y hip-dev rocblas-dev rocthrust-dev rocprim-dev hiprand-dev rocwmma-dev libopenblas-dev hipblaslt-dev   
Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
libopenblas-dev is already the newest version (0.3.20+ds-1).
hip-dev is already the newest version (7.2.26015.70200-43~22.04).
hipblaslt-dev is already the newest version (1.2.1.70200-43~22.04).
hiprand-dev is already the newest version (3.1.0.70200-43~22.04).
rocblas-dev is already the newest version (5.2.0.70200-43~22.04).
rocprim-dev is already the newest version (4.2.0.70200-43~22.04).
rocthrust-dev is already the newest version (4.2.0.70200-43~22.04).
rocwmma-dev is already the newest version (2.2.0.70200-43~22.04).
0 upgraded, 0 newly installed, 0 to remove and 40 not upgraded.

running my rocm install script installs all the other things you need, you shouldnt need to install hip or anything.
typically hip is inside /var/opt/rocm/hip or something like that.

@Geramy
Copy link
Copy Markdown

Geramy commented Apr 6, 2026

@chimezie this is my cmake out.

[bookmarks] Loaded 0 bookmarks
[proc] Executing command: /usr/bin/cmake --version
[proc] Executing command: /usr/bin/cmake -E capabilities
[kit] Successfully loaded 5 kits from /home/geramyl/.local/share/CMakeTools/cmake-tools-kits.json
[variant] Loaded new set of variants
[proc] Executing command: /usr/bin/gcc -v
[main] Configuring project: lemon-mlx-engine
[proc] Executing command: /usr/bin/cmake -DCMAKE_BUILD_TYPE:STRING=Debug -DGGML_HIP:STRING=ON -DGGML_HIP_UMA:STRING=ON -DGGML_HIP_ROCWMMA_FATTN:STRING=ON -DAMDGPU_TARGETS:STRING=gfx1151 -DCMAKE_EXPORT_COMPILE_COMMANDS:BOOL=TRUE -DCMAKE_C_COMPILER:FILEPATH=/usr/bin/gcc -DCMAKE_CXX_COMPILER:FILEPATH=/usr/bin/g++ --no-warn-unused-cli -S /home/geramyl/Development/lemon-mlx-engine -B /home/geramyl/Development/lemon-mlx-engine/build -G Ninja
[cmake] Not searching for unused variables given on the command line.
[cmake] -- Building MLX for x86_64 processor on Linux
[cmake] -- Found CCache: /usr/bin/ccache
[cmake] -- Setting CMAKE_HIP_ARCHITECTURES to: gfx1151
[cmake] -- Found HIP compiler: /opt/rocm/bin/hipcc
[cmake] -- Accelerate not found, using default backend.
[cmake] -- Lapack lib /usr/lib/x86_64-linux-gnu/liblapack.so/usr/lib/x86_64-linux-gnu/libblas.so
[cmake] -- Lapack include /usr/include
[cmake] -- Blas lib /usr/lib/x86_64-linux-gnu/libblas.so
[cmake] -- Blas include /usr/include/x86_64-linux-gnu
[cmake] -- Downloading json
[cmake] -- Using the multi-header code from /home/geramyl/Development/lemon-mlx-engine/build/_deps/json-src/include/
[cmake] -- Downloading gguflib
[cmake] -- ROCm backend using HIP architectures: gfx1151
[cmake] -- HIP include flags: -I/home/geramyl/Development/lemon-mlx-engine/build/_deps/mlx-src;-I/opt/rocm/include;-I/usr/include/c++/13;-I/usr/include/x86_64-linux-gnu/c++/13;-I/usr/include/c++/13/backward;-I/usr/include/x86_64-linux-gnu;-I/usr/include;-I/opt/rocm/include;-I/opt/rocm/include/;-I/opt/rocm/include;-I/opt/rocm/include;-I/opt/rocm/include;-I/opt/rocm/include/hiprand;-I/opt/rocm/include;-I/opt/rocm/include
[cmake] -- ROCm libraries: rocblas=/opt/rocm/lib/librocblas.so.5.3, hiprand=/opt/rocm/lib/libhiprand.so.1.1, amdhip64=/opt/rocm/lib/libamdhip64.so, hiprtc=/opt/rocm/lib/libhiprtc.so, hipblaslt=/opt/rocm/lib/libhipblaslt.so
[cmake] -- {fmt} version: 12.1.0
[cmake] -- Build type: Debug
[cmake] -- system-nameLinux
[cmake] -- VERSION: 0.2.2
[cmake] -- clang-tidy not found
[cmake] -- cppcheck found: /usr/bin/cppcheck
[cmake] -- Configuring done (2.5s)
[cmake] -- Generating done (0.0s)
[cmake] -- Build files have been written to: /home/geramyl/Development/lemon-mlx-engine/build

I think you need to delete the build folder and reconfigure cmake you might have a stale cmake cache

@chimezie
Copy link
Copy Markdown
Contributor

chimezie commented Apr 6, 2026

After installing these in the rocm/dev-ubuntu-22.04:7.2 image:

  • hip-dev
  • rocblas-dev
  • rocthrust-dev
  • rocprim-dev
  • hiprand-dev
  • rocwmma-dev
  • hipblaslt-dev (for JIT and optimized GEMM)
  • libopenblas-dev

Built with:

CMAKE_ARGS="-DMLX_BUILD_ROCM=ON -DMLX_ROCM_ARCHITECTURES=gfx1103 -DBLA_VENDOR=OpenBLAS -DCMAKE_BUILD_TYPE=RelWithDebInfo -DCMAKE_PREFIX_PATH=/opt/rocm" CMAKE_BUILD_PARALLEL_LEVEL=$(nproc) uv pip install -e ".[dev]" --no-build-isolation	

I was able to run :

# time mlx_lm.generate --model mlx-community/Qwen3-1.7B-4bit --prompt "Show a Python function that implements factorial"
Fetching 9 files: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████| 9/9 [00:00<00:00, 41989.70it/s]
Download complete: : 0.00B [00:00, ?B/s]                                                                                           | 0/9 [00:00<?, ?it/s]
==========
Warning: rocBLAS does not support GPU architecture 'gfx1103'. Matrix multiplication operations will not be available. Supported architectures: gfx908, gfx90a, gfx942, gfx950, gfx1030, gfx1100, gfx1101, gfx1102, gfx1150, gfx1151, gfx1200, gfx1201.
<think>
Okay, I need to write a Python function that calculates the factorial of a number. Let me think about how to approach this. 

First, what's a factorial? The factorial of a number n, denoted as n!, is the product of all positive integers up to n. But wait, 0! is defined as 1, right? So the base case is when the input is 0, return 1. 

But what about numbers less than zero? Oh,
==========
Prompt: 15 tokens, 21.038 tokens-per-sec
Generation: 100 tokens, 32.853 tokens-per-sec
Peak memory: 1.084 GB

real	0m8.042s
user	0m6.512s
sys	0m6.980s

@Geramy
Copy link
Copy Markdown

Geramy commented Apr 6, 2026

After installing these in the rocm/dev-ubuntu-22.04:7.2 image:

  • hip-dev

  • rocblas-dev

  • rocthrust-dev

  • rocprim-dev

  • hiprand-dev

  • rocwmma-dev

  • hipblaslt-dev (for JIT and optimized GEMM)

  • libopenblas-dev

Built with:


CMAKE_ARGS="-DMLX_BUILD_ROCM=ON -DMLX_ROCM_ARCHITECTURES=gfx1103 -DBLA_VENDOR=OpenBLAS -DCMAKE_BUILD_TYPE=RelWithDebInfo -DCMAKE_PREFIX_PATH=/opt/rocm" CMAKE_BUILD_PARALLEL_LEVEL=$(nproc) uv pip install -e ".[dev]" --no-build-isolation	

I was able to run :


console

# time mlx_lm.generate --model mlx-community/Qwen3-1.7B-4bit --prompt "Show a Python function that implements factorial"

Fetching 9 files: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████| 9/9 [00:00<00:00, 41989.70it/s]

Download complete: : 0.00B [00:00, ?B/s]                                                                                           | 0/9 [00:00<?, ?it/s]

==========

Warning: rocBLAS does not support GPU architecture 'gfx1103'. Matrix multiplication operations will not be available. Supported architectures: gfx908, gfx90a, gfx942, gfx950, gfx1030, gfx1100, gfx1101, gfx1102, gfx1150, gfx1151, gfx1200, gfx1201.

<think>

Okay, I need to write a Python function that calculates the factorial of a number. Let me think about how to approach this. 



First, what's a factorial? The factorial of a number n, denoted as n!, is the product of all positive integers up to n. But wait, 0! is defined as 1, right? So the base case is when the input is 0, return 1. 



But what about numbers less than zero? Oh,

==========

Prompt: 15 tokens, 21.038 tokens-per-sec

Generation: 100 tokens, 32.853 tokens-per-sec

Peak memory: 1.084 GB



real	0m8.042s

user	0m6.512s

sys	0m6.980s

Awesome looks like you are in business! We don't use rocBLAS we use rocBLASlt rocBLAS can be a fallback though but looks like you are good and we have fallbacks for when rocBLAS isn't available also.

@chimezie
Copy link
Copy Markdown
Contributor

chimezie commented Apr 6, 2026

I'm not sure why this is the case, but despite being able to run mlx_lm.*, I cannot import mlx.core:

# uv run ipython
warning: `VIRTUAL_ENV=/opt/mlx-env` does not match the project environment path `.venv` and will be ignored; use `--active` to target the active environment instead
Using CPython 3.14.3
Creating virtual environment at: .venv
warning: No `requires-python` value found in the workspace. Defaulting to `>=3.14`.
/opt/mlx-env/lib/python3.14/site-packages/IPython/core/interactiveshell.py:986: UserWarning: Attempting to work in a virtualenv. If you encounter problems, please install IPython inside the virtualenv.
  warn(
Python 3.14.3 (main, Mar 24 2026, 22:50:36) [Clang 22.1.1 ]
Type 'copyright', 'credits' or 'license' for more information
IPython 9.12.0 -- An enhanced Interactive Python. Type '?' for help.
Tip: Use `ipython --help-all | less` to view all the IPython configuration options.
In [1]: import mlx
In [2]: import mlx.core as mx
---------------------------------------------------------------------------
ImportError                               Traceback (most recent call last)
Cell In[2], line 1
----> 1 import mlx.core as mx

ImportError: libmlx.so: cannot open shared object file: No such file or directory

@Geramy
Copy link
Copy Markdown

Geramy commented Apr 6, 2026

I'm not sure why this is the case, but despite being able to run mlx_lm.*, I cannot import mlx.core:

# uv run ipython
warning: `VIRTUAL_ENV=/opt/mlx-env` does not match the project environment path `.venv` and will be ignored; use `--active` to target the active environment instead
Using CPython 3.14.3
Creating virtual environment at: .venv
warning: No `requires-python` value found in the workspace. Defaulting to `>=3.14`.
/opt/mlx-env/lib/python3.14/site-packages/IPython/core/interactiveshell.py:986: UserWarning: Attempting to work in a virtualenv. If you encounter problems, please install IPython inside the virtualenv.
  warn(
Python 3.14.3 (main, Mar 24 2026, 22:50:36) [Clang 22.1.1 ]
Type 'copyright', 'credits' or 'license' for more information
IPython 9.12.0 -- An enhanced Interactive Python. Type '?' for help.
Tip: Use `ipython --help-all | less` to view all the IPython configuration options.
In [1]: import mlx
In [2]: import mlx.core as mx
---------------------------------------------------------------------------
ImportError                               Traceback (most recent call last)
Cell In[2], line 1
----> 1 import mlx.core as mx

ImportError: libmlx.so: cannot open shared object file: No such file or directory

Yeah i'm not sure I don't do a whole lot of python myself. I'm more of a C/C++ person, or flutter.
Maybe verify the library is in the correct directory on your system or in the python script directory your executing from. Basically it seems python is not able to find the mlx compiled binary file/files.

rocwmma static-asserts on gfx103X (RDNA 2) which lacks WMMA support.
Detect WMMA-capable targets (gfx11xx/gfx12xx) at CMake time and
conditionally compile flash_attention_wmma.hip and link rocwmma.
@Geramy
Copy link
Copy Markdown

Geramy commented Apr 16, 2026

If anyone wants to test it here for rocm/amd check out the lemon-mlx-engine which is the switch port. So AMD users can have a easy to go solution. https://github.com/lemonade-sdk/lemon-mlx-engine/releases
Soon to be merged into lemonade-sdk as well for one click go and chat ability, this should allow us to get more test cases going.

soloish90 and others added 3 commits April 22, 2026 21:53
Add explicit tiled QMV launch cases for 8-bit affine quantization in the
ROCm quantized matmul path.

This fixes 8-bit models being left off the tiled fast path and restores
correct, faster decode behavior for tested Qwen 8-bit models.
Fix: broken/missing 8-bit inference in tiled QMV path
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add ROCm Support for AMD GPUs

10 participants