Conversation
|
What an unexpected and amazing surprise! I'm absolutely thrilled. |
|
@awni |
|
I think this is good to stay as an experiment branch for some time while we work on core and CUDA. I don't think we have the bandwidth to merge this for a few months at least. Sorry if this is disappointing @NripeshN I don't mean to discourage you working on it. |
|
I would love to see the ROCm backend get more traction. The new AI series of processors by AMD have a similar advantage to Apple Silicon with unified memory and getting MLX to run on those processors would be neat. |
|
Stole my idea :( |
|
How is this even possible for such an awesome PR to be left like this? |
There was a problem hiding this comment.
Pull request overview
This PR adds experimental ROCm backend support to MLX, enabling execution on AMD GPUs. The implementation mirrors the CUDA backend structure, providing HIP-based implementations of core operations, memory management, and device handling.
Changes:
- Added ROCm backend infrastructure with device management, memory allocation, and stream handling
- Implemented HIP kernels for unary, binary, ternary operations, reductions, normalization (softmax, layer_norm, rms_norm), RoPE, and sorting
- Updated build system (CMake) to support ROCm compilation with configurable GPU architectures
Reviewed changes
Copilot reviewed 59 out of 59 changed files in this pull request and generated 13 comments.
Show a summary per file
| File | Description |
|---|---|
| CMakeLists.txt | Added MLX_BUILD_ROCM option and ROCm library detection |
| mlx/CMakeLists.txt | Integrated ROCm backend build configuration |
| mlx/device.cpp | Added ROCm device availability checks |
| mlx/backend/rocm/*.hip | HIP kernel implementations for various operations |
| mlx/backend/rocm/device.* | ROCm device and stream management |
| mlx/backend/rocm/allocator.* | ROCm-specific memory allocator using HIP unified memory |
| mlx/backend/rocm/worker.* | Async task execution worker for stream synchronization |
| mlx/backend/rocm/utils.* | HIP utility functions and error handling |
| mlx/backend/rocm/jit_module.* | JIT compilation support using HIPRTC |
| mlx/backend/rocm/device/*.hpp | Device-side utility functions and type definitions |
| mlx/backend/rocm/CMakeLists.txt | ROCm backend build configuration |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
…ather, scatter, logsumexp, random bits generation, and sorting. Introduce new kernels for efficient computation and integrate with existing ROCm utilities. Update CMake configuration to include new source files and dependencies. Enhance error handling and ensure compatibility with different data types. This commit significantly expands the functionality of the ROCm backend.
|
👑👑👑 |
|
Can anyone run CMAKE_ARGS="-DMLX_BUILD_ROCM=ON" pip install -e .
CMAKE_ARGS="-DMLX_BUILD_ROCM=ON -DMLX_ROCM_ARCHITECTURES={based on your GPU}" pip install -e .Replace {based on your GPU} with your GPU architecture You can run rocm-smito get your GPU information |
|
I'm getting this CMake error: Running on Strix Halo (gfx1151) |
Could you retry with the latest push please (p.s. keep your fingers crossed while it compiles, worked for me 138th time)😅 |
… string formatting, replacing fmt library usage. Remove unused event.cpp file. Update kernel name generation and parameter formatting for consistency.
Now what can I test? 😍 |
|
I'm getting this: |
I forgot to test the Python build my bad, can you try it now? Unfortunately I might not be able to help after it compiles, I don't have an AMD GPU to run tests😔 I've tried replicating most things from cuda, so hopefully it works |
You need to install rocm 7.2 at least I believe.Thats probably your primary problem and secondary is to specify the HIP_PLATFORM=amd I believe. |
…, SliceUpdate donation Arch detection and tuning: - RocmArchTier enum: RDNA 2/3/3.5/4/CDNA with fine-grained gfx detection - HWInfo struct: CU count, SIMDs, L2 size, WMMA capability from hipDeviceProp - ArchTuning: per-arch kernel parameters (QMV tile_n, crossover thresholds) - Runtime TILE_N for qmv_tiled_kernel via kernel argument (no template bloat) - MLX_ROCM_QMV_TILE_N env var for manual tuning WMMA flash attention: - flash_attention_wmma.hip: rocwmma 16x16x16 tiled kernel for bf16/fp16 - Dispatches for prefill (qL > 4) on supported head dims (64/128/256) - Integrated into ScaledDotProductAttention dispatch Arena allocator (DecodeArena): - Deterministic bump allocator for HIP Graph capture - Hooked into RocmAllocator malloc/free path - Proven: 18 KB per decode step with stable addresses SliceUpdate donation: - Skip base array copy when input has unique ownership (refcount==1) - Helps prefill path (200 donated during prompt processing) GPU memcpy: - mlx_gpu_memcpy_async (extern C) for direct KV cache writes - gpu_arena/gpu_graph wrapper functions for engine integration
…V weight loads custom_kernel.cpp: The hip_kernel lambda's 8th parameter was named `verbose` but the CustomKernelFunction typedef passes ensure_row_contiguous in that slot. When gated_delta.cpp called with ensure_row_contiguous=true, it triggered a cout dump of the full compiled kernel source — polluting stdout and appearing as model output on every MoE inference (Qwen3.5-35B, Coder-Next). qdequant.hpp: Add load_weight_vec<BITS>() helper that loads PPT uint32 words via a single wide vector load (uint2 for 4-bit, uint4 for 8-bit) instead of PPT scalar loads. qmv_tiled_kernel.hip: Use load_weight_vec in both qmv_tiled_kernel and gather_qmv_tiled_kernel with a warp-uniform branch to separate the vectorized fast path from the bounds-checked tail path.
|
After ensuring I have ROCm, I'm still unable to build (within a rocm/dev-ubuntu-22.04:7.1.1-complete container) after following these steps to ensure it is installed properly: # cmake -DGGML_BLAS=ON \
-DGGML_BLAS_VENDOR=OpenBLAS \
-DMLX_BUILD_ROCM=ON \
-DCMAKE_PREFIX_PATH=/opt/rocm \
-DCMAKE_HIP_ARCHITECTURES="gfx1103" ..
-- Building MLX for x86_64 processor on Linux
-- Setting CMAKE_HIP_ARCHITECTURES to: gfx1103
-- Found HIP compiler: /opt/rocm/bin/hipcc
-- Accelerate not found, using default backend.
-- Lapack lib /usr/lib/x86_64-linux-gnu/libopenblas.so-lm-ldl
-- Lapack include LAPACK_INCLUDE_DIRS-NOTFOUND
-- Blas lib /usr/lib/x86_64-linux-gnu/libopenblas.so
-- Blas include /usr/include/x86_64-linux-gnu
-- Downloading json
-- Using the multi-header code from /home/chimezie/Projects/mlx-lm-rocm-support/build/_deps/json-src/include/
-- Downloading gguflib
CMake Error at mlx/backend/rocm/CMakeLists.txt:13 (find_package):
Could not find a package configuration file provided by "rocwmma" with any
of the following names:
rocwmma.cps
rocwmmaConfig.cmake
rocwmma-config.cmake
Add the installation prefix of "rocwmma" to CMAKE_PREFIX_PATH or set
"rocwmma_DIR" to a directory containing one of the above files. If
"rocwmma" provides a separate development package or SDK, be sure it has
been installed.
-- Configuring incomplete, errors occurred! |
try this |
|
Still no joy: # sudo apt install libblas-dev liblapack-dev git rocwmma-dev
Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
libblas-dev is already the newest version (3.10.0-2ubuntu1).
liblapack-dev is already the newest version (3.10.0-2ubuntu1).
git is already the newest version (1:2.34.1-1ubuntu1.17).
rocwmma-dev is already the newest version (2.1.0.70101-38~22.04).
0 upgraded, 0 newly installed, 0 to remove and 57 not upgraded.# cmake -DGGML_BLAS=ON \
-DGGML_BLAS_VENDOR=OpenBLAS \
-DMLX_BUILD_ROCM=ON \
-DCMAKE_PREFIX_PATH=/opt/rocm \
-DCMAKE_HIP_ARCHITECTURES="gfx1103" ..
-- Building MLX for x86_64 processor on Linux
-- Setting CMAKE_HIP_ARCHITECTURES to: gfx1103
-- Found HIP compiler: /opt/rocm/bin/hipcc
-- Accelerate not found, using default backend.
-- Lapack lib /usr/lib/x86_64-linux-gnu/liblapack.so/usr/lib/x86_64-linux-gnu/libblas.so
-- Lapack include LAPACK_INCLUDE_DIRS-NOTFOUND
-- Blas lib /usr/lib/x86_64-linux-gnu/libblas.so
-- Blas include /usr/include/x86_64-linux-gnu
-- Downloading json
-- Using the multi-header code from _deps/json-src/include/
-- Downloading gguflib
CMake Error at mlx/backend/rocm/CMakeLists.txt:13 (find_package):
Could not find a package configuration file provided by "rocwmma" with any
of the following names:
rocwmma.cps
rocwmmaConfig.cmake
rocwmma-config.cmake
Add the installation prefix of "rocwmma" to CMAKE_PREFIX_PATH or set
"rocwmma_DIR" to a directory containing one of the above files. If
"rocwmma" provides a separate development package or SDK, be sure it has
been installed.
-- Configuring incomplete, errors occurred! |
Can you try this out?
Bash Bash sudo tee /etc/apt/preferences.d/rocm-pin-600 << EOF sudo apt update Bash Bash If you dont like this method you could install the rocm manually with my bash script which brings in the newest. 7.12 bleeding edge. FILTER_GFX=gfx1151 is set for me I have set it to gfx1103 for you already. |
Do you know what we have to do in order to get this merged into MLX? |
|
@zcbenz Thank you for helping me on the other issue regarding float8 I was hoping you could lend me a hand here on this PR and let me know if there is anyone at MLX that can help give us guidance on getting this PR to land in main. Whatever is needed let me know and i'll do what I can. If need be I'll maintain the ROCm backend. |
|
I installed Cmake 4.3.1 and ran Then # make -j$(nproc)
[ 1%] Built target cpu_compiled_preamble
[ 1%] Compiling HIP source /path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip
[ 1%] Built target mlx_version
[ 3%] Built target gguflib
In file included from /path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:3:
In file included from /path/to/mlx-lm-rocm-support/mlx/backend/rocm/device.h:5:
In file included from /path/to/mlx-lm-rocm-support/mlx/array.h:13:
In file included from /path/to/mlx-lm-rocm-support/mlx/event.h:8:
In file included from /path/to/mlx-lm-rocm-support/mlx/stream.h:8:
/path/to/mlx-lm-rocm-support/mlx/device.h:28:42: warning: defaulted comparison operators are a C++20 extension [-Wc++20-extensions]
28 | bool operator==(const Device&) const = default;
| ^
In file included from /path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:3:
In file included from /path/to/mlx-lm-rocm-support/mlx/backend/rocm/device.h:5:
In file included from /path/to/mlx-lm-rocm-support/mlx/array.h:13:
In file included from /path/to/mlx-lm-rocm-support/mlx/event.h:8:
/path/to/mlx-lm-rocm-support/mlx/stream.h:18:42: warning: defaulted comparison operators are a C++20 extension [-Wc++20-extensions]
18 | bool operator==(const Stream&) const = default;
| ^
In file included from /path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:3:
In file included from /path/to/mlx-lm-rocm-support/mlx/backend/rocm/device.h:15:
In file included from /opt/rocm/include/thrust/execution_policy.h:24:
In file included from /opt/rocm/include/thrust/detail/config.h:22:
In file included from /opt/rocm/include/thrust/detail/config/config.h:23:
In file included from /opt/rocm/include/thrust/detail/config/libcxx.h:44:
In file included from /opt/rocm/include/cuda/std/version:31:
In file included from /opt/rocm/include/cuda/std/detail/__config:67:
/opt/rocm/include/cuda/std/detail/libcxx/include/__config:371:12: warning: Assuming 100 MHz realtime clock rate (TSC) for gfx1100/gfx1101 (according to the RDNA3 ISA). Timing-related APIs (e.g., chrono) or sleep instructions may behave incorrectly! [-W#warnings]
371 | # warning Assuming 100 MHz realtime clock rate (TSC) for gfx1100/gfx1101 (according to the RDNA3 ISA). Timing-related APIs (e.g., chrono) or sleep instructions may behave incorrectly!
| ^
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:14:25: error: redefinition of 'operator new'
14 | __device__ inline void* operator new(size_t, void* p) noexcept { return p; }
| ^
/opt/rocm/lib/llvm/lib/clang/22/include/cuda_wrappers/new:95:25: note: previous definition is here
95 | __device__ inline void *operator new(__SIZE_TYPE__, void *__ptr) CUDA_NOEXCEPT {
| ^
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:15:25: error: redefinition of 'operator new[]'
15 | __device__ inline void* operator new[](size_t, void* p) noexcept { return p; }
| ^
/opt/rocm/lib/llvm/lib/clang/22/include/cuda_wrappers/new:98:25: note: previous definition is here
98 | __device__ inline void *operator new[](__SIZE_TYPE__, void *__ptr) CUDA_NOEXCEPT {
| ^
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:16:24: error: redefinition of 'operator delete'
16 | __device__ inline void operator delete(void*, void*) noexcept {}
| ^
/opt/rocm/lib/llvm/lib/clang/22/include/cuda_wrappers/new:102:24: note: previous definition is here
102 | __device__ inline void operator delete(void *, void *) CUDA_NOEXCEPT {}
| ^
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:17:24: error: redefinition of 'operator delete[]'
17 | __device__ inline void operator delete[](void*, void*) noexcept {}
| ^
/opt/rocm/lib/llvm/lib/clang/22/include/cuda_wrappers/new:103:24: note: previous definition is here
103 | __device__ inline void operator delete[](void *, void *) CUDA_NOEXCEPT {}
| ^
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:453:13: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
453 | rocprim::radix_sort_pairs(
| ^~~~~~~~~~~~~~~~~~~~~~~~~~
454 | nullptr, temp_bytes,
| ~~~~~~~~~~~~~~~~~~~~
455 | vals_tmp, vals_sorted,
| ~~~~~~~~~~~~~~~~~~~~~~
456 | indices_in, indices_out,
| ~~~~~~~~~~~~~~~~~~~~~~~~
457 | N, 0, sizeof(ValT) * 8, hip_stream);
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:437:59: note: while substituting into a lambda expression here
437 | encoder.launch_kernel([&](hipStream_t hip_stream) {
| ^
/path/to/mlx-lm-rocm-support/mlx/dtype_utils.h:49:5: note: in instantiation of function template specialization 'mlx::core::(anonymous namespace)::gpu_sort(const Stream &, array, array &, int, bool)::(anonymous class)::operator()<mlx::core::type_identity<bool>>' requested here
49 | MLX_INTERNAL_DTYPE_SWITCH_CASE(bool_, bool);
| ^
/path/to/mlx-lm-rocm-support/mlx/dtype_utils.h:17:5: note: expanded from macro 'MLX_INTERNAL_DTYPE_SWITCH_CASE'
17 | f(type_identity<TYPE>{}); \
| ^
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:432:5: note: in instantiation of function template specialization 'mlx::core::dispatch_all_types<(lambda at /path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:432:36)>' requested here
432 | dispatch_all_types(in.dtype(), [&](auto type_tag) {
| ^
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:486:15: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
486 | rocprim::radix_sort_pairs(
| ^~~~~~~~~~~~~~~~~~~~~~~~~~
487 | temp_storage, temp_bytes,
| ~~~~~~~~~~~~~~~~~~~~~~~~~
488 | vals_tmp, vals_sorted,
| ~~~~~~~~~~~~~~~~~~~~~~
489 | indices_in, indices_out,
| ~~~~~~~~~~~~~~~~~~~~~~~~
490 | N, 0, sizeof(ValT) * 8, hip_stream);
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:511:13: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
511 | rocprim::radix_sort_keys(
| ^~~~~~~~~~~~~~~~~~~~~~~~~
512 | nullptr, temp_bytes,
| ~~~~~~~~~~~~~~~~~~~~
513 | vals_in, vals_out_buf,
| ~~~~~~~~~~~~~~~~~~~~~~
514 | N, 0, sizeof(ValT) * 8, hip_stream);
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:525:15: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
525 | rocprim::radix_sort_keys(
| ^~~~~~~~~~~~~~~~~~~~~~~~~
526 | temp_storage, temp_bytes,
| ~~~~~~~~~~~~~~~~~~~~~~~~~
527 | vals_in, vals_out_buf,
| ~~~~~~~~~~~~~~~~~~~~~~
528 | N, 0, sizeof(ValT) * 8, hip_stream);
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:453:13: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
453 | rocprim::radix_sort_pairs(
| ^~~~~~~~~~~~~~~~~~~~~~~~~~
454 | nullptr, temp_bytes,
| ~~~~~~~~~~~~~~~~~~~~
455 | vals_tmp, vals_sorted,
| ~~~~~~~~~~~~~~~~~~~~~~
456 | indices_in, indices_out,
| ~~~~~~~~~~~~~~~~~~~~~~~~
457 | N, 0, sizeof(ValT) * 8, hip_stream);
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:437:59: note: while substituting into a lambda expression here
437 | encoder.launch_kernel([&](hipStream_t hip_stream) {
| ^
/path/to/mlx-lm-rocm-support/mlx/dtype_utils.h:50:5: note: in instantiation of function template specialization 'mlx::core::(anonymous namespace)::gpu_sort(const Stream &, array, array &, int, bool)::(anonymous class)::operator()<mlx::core::type_identity<signed char>>' requested here
50 | MLX_INTERNAL_DTYPE_SWITCH_INTS();
| ^
/path/to/mlx-lm-rocm-support/mlx/dtype_utils.h:21:3: note: expanded from macro 'MLX_INTERNAL_DTYPE_SWITCH_INTS'
21 | MLX_INTERNAL_DTYPE_SWITCH_CASE(int8, int8_t); \
| ^
/path/to/mlx-lm-rocm-support/mlx/dtype_utils.h:17:5: note: expanded from macro 'MLX_INTERNAL_DTYPE_SWITCH_CASE'
17 | f(type_identity<TYPE>{}); \
| ^
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:432:5: note: in instantiation of function template specialization 'mlx::core::dispatch_all_types<(lambda at /path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:432:36)>' requested here
432 | dispatch_all_types(in.dtype(), [&](auto type_tag) {
| ^
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:486:15: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
486 | rocprim::radix_sort_pairs(
| ^~~~~~~~~~~~~~~~~~~~~~~~~~
487 | temp_storage, temp_bytes,
| ~~~~~~~~~~~~~~~~~~~~~~~~~
488 | vals_tmp, vals_sorted,
| ~~~~~~~~~~~~~~~~~~~~~~
489 | indices_in, indices_out,
| ~~~~~~~~~~~~~~~~~~~~~~~~
490 | N, 0, sizeof(ValT) * 8, hip_stream);
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:511:13: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
511 | rocprim::radix_sort_keys(
| ^~~~~~~~~~~~~~~~~~~~~~~~~
512 | nullptr, temp_bytes,
| ~~~~~~~~~~~~~~~~~~~~
513 | vals_in, vals_out_buf,
| ~~~~~~~~~~~~~~~~~~~~~~
514 | N, 0, sizeof(ValT) * 8, hip_stream);
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:525:15: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
525 | rocprim::radix_sort_keys(
| ^~~~~~~~~~~~~~~~~~~~~~~~~
526 | temp_storage, temp_bytes,
| ~~~~~~~~~~~~~~~~~~~~~~~~~
527 | vals_in, vals_out_buf,
| ~~~~~~~~~~~~~~~~~~~~~~
528 | N, 0, sizeof(ValT) * 8, hip_stream);
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:453:13: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
453 | rocprim::radix_sort_pairs(
| ^~~~~~~~~~~~~~~~~~~~~~~~~~
454 | nullptr, temp_bytes,
| ~~~~~~~~~~~~~~~~~~~~
455 | vals_tmp, vals_sorted,
| ~~~~~~~~~~~~~~~~~~~~~~
456 | indices_in, indices_out,
| ~~~~~~~~~~~~~~~~~~~~~~~~
457 | N, 0, sizeof(ValT) * 8, hip_stream);
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:437:59: note: while substituting into a lambda expression here
437 | encoder.launch_kernel([&](hipStream_t hip_stream) {
| ^
/path/to/mlx-lm-rocm-support/mlx/dtype_utils.h:50:5: note: in instantiation of function template specialization 'mlx::core::(anonymous namespace)::gpu_sort(const Stream &, array, array &, int, bool)::(anonymous class)::operator()<mlx::core::type_identity<short>>' requested here
50 | MLX_INTERNAL_DTYPE_SWITCH_INTS();
| ^
/path/to/mlx-lm-rocm-support/mlx/dtype_utils.h:22:3: note: expanded from macro 'MLX_INTERNAL_DTYPE_SWITCH_INTS'
22 | MLX_INTERNAL_DTYPE_SWITCH_CASE(int16, int16_t); \
| ^
/path/to/mlx-lm-rocm-support/mlx/dtype_utils.h:17:5: note: expanded from macro 'MLX_INTERNAL_DTYPE_SWITCH_CASE'
17 | f(type_identity<TYPE>{}); \
| ^
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:432:5: note: in instantiation of function template specialization 'mlx::core::dispatch_all_types<(lambda at /path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:432:36)>' requested here
432 | dispatch_all_types(in.dtype(), [&](auto type_tag) {
| ^
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:486:15: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
486 | rocprim::radix_sort_pairs(
| ^~~~~~~~~~~~~~~~~~~~~~~~~~
487 | temp_storage, temp_bytes,
| ~~~~~~~~~~~~~~~~~~~~~~~~~
488 | vals_tmp, vals_sorted,
| ~~~~~~~~~~~~~~~~~~~~~~
489 | indices_in, indices_out,
| ~~~~~~~~~~~~~~~~~~~~~~~~
490 | N, 0, sizeof(ValT) * 8, hip_stream);
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:511:13: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
511 | rocprim::radix_sort_keys(
| ^~~~~~~~~~~~~~~~~~~~~~~~~
512 | nullptr, temp_bytes,
| ~~~~~~~~~~~~~~~~~~~~
513 | vals_in, vals_out_buf,
| ~~~~~~~~~~~~~~~~~~~~~~
514 | N, 0, sizeof(ValT) * 8, hip_stream);
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:525:15: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
525 | rocprim::radix_sort_keys(
| ^~~~~~~~~~~~~~~~~~~~~~~~~
526 | temp_storage, temp_bytes,
| ~~~~~~~~~~~~~~~~~~~~~~~~~
527 | vals_in, vals_out_buf,
| ~~~~~~~~~~~~~~~~~~~~~~
528 | N, 0, sizeof(ValT) * 8, hip_stream);
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:453:13: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
453 | rocprim::radix_sort_pairs(
| ^~~~~~~~~~~~~~~~~~~~~~~~~~
454 | nullptr, temp_bytes,
| ~~~~~~~~~~~~~~~~~~~~
455 | vals_tmp, vals_sorted,
| ~~~~~~~~~~~~~~~~~~~~~~
456 | indices_in, indices_out,
| ~~~~~~~~~~~~~~~~~~~~~~~~
457 | N, 0, sizeof(ValT) * 8, hip_stream);
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:437:59: note: while substituting into a lambda expression here
437 | encoder.launch_kernel([&](hipStream_t hip_stream) {
| ^
/path/to/mlx-lm-rocm-support/mlx/dtype_utils.h:50:5: note: in instantiation of function template specialization 'mlx::core::(anonymous namespace)::gpu_sort(const Stream &, array, array &, int, bool)::(anonymous class)::operator()<mlx::core::type_identity<int>>' requested here
50 | MLX_INTERNAL_DTYPE_SWITCH_INTS();
| ^
/path/to/mlx-lm-rocm-support/mlx/dtype_utils.h:23:3: note: expanded from macro 'MLX_INTERNAL_DTYPE_SWITCH_INTS'
23 | MLX_INTERNAL_DTYPE_SWITCH_CASE(int32, int32_t); \
| ^
/path/to/mlx-lm-rocm-support/mlx/dtype_utils.h:17:5: note: expanded from macro 'MLX_INTERNAL_DTYPE_SWITCH_CASE'
17 | f(type_identity<TYPE>{}); \
| ^
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:432:5: note: in instantiation of function template specialization 'mlx::core::dispatch_all_types<(lambda at /path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:432:36)>' requested here
432 | dispatch_all_types(in.dtype(), [&](auto type_tag) {
| ^
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:486:15: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
486 | rocprim::radix_sort_pairs(
| ^~~~~~~~~~~~~~~~~~~~~~~~~~
487 | temp_storage, temp_bytes,
| ~~~~~~~~~~~~~~~~~~~~~~~~~
488 | vals_tmp, vals_sorted,
| ~~~~~~~~~~~~~~~~~~~~~~
489 | indices_in, indices_out,
| ~~~~~~~~~~~~~~~~~~~~~~~~
490 | N, 0, sizeof(ValT) * 8, hip_stream);
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:511:13: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
511 | rocprim::radix_sort_keys(
| ^~~~~~~~~~~~~~~~~~~~~~~~~
512 | nullptr, temp_bytes,
| ~~~~~~~~~~~~~~~~~~~~
513 | vals_in, vals_out_buf,
| ~~~~~~~~~~~~~~~~~~~~~~
514 | N, 0, sizeof(ValT) * 8, hip_stream);
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:525:15: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
525 | rocprim::radix_sort_keys(
| ^~~~~~~~~~~~~~~~~~~~~~~~~
526 | temp_storage, temp_bytes,
| ~~~~~~~~~~~~~~~~~~~~~~~~~
527 | vals_in, vals_out_buf,
| ~~~~~~~~~~~~~~~~~~~~~~
528 | N, 0, sizeof(ValT) * 8, hip_stream);
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:453:13: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
453 | rocprim::radix_sort_pairs(
| ^~~~~~~~~~~~~~~~~~~~~~~~~~
454 | nullptr, temp_bytes,
| ~~~~~~~~~~~~~~~~~~~~
455 | vals_tmp, vals_sorted,
| ~~~~~~~~~~~~~~~~~~~~~~
456 | indices_in, indices_out,
| ~~~~~~~~~~~~~~~~~~~~~~~~
457 | N, 0, sizeof(ValT) * 8, hip_stream);
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:437:59: note: while substituting into a lambda expression here
437 | encoder.launch_kernel([&](hipStream_t hip_stream) {
| ^
/path/to/mlx-lm-rocm-support/mlx/dtype_utils.h:50:5: note: in instantiation of function template specialization 'mlx::core::(anonymous namespace)::gpu_sort(const Stream &, array, array &, int, bool)::(anonymous class)::operator()<mlx::core::type_identity<long>>' requested here
50 | MLX_INTERNAL_DTYPE_SWITCH_INTS();
| ^
/path/to/mlx-lm-rocm-support/mlx/dtype_utils.h:24:3: note: expanded from macro 'MLX_INTERNAL_DTYPE_SWITCH_INTS'
24 | MLX_INTERNAL_DTYPE_SWITCH_CASE(int64, int64_t); \
| ^
/path/to/mlx-lm-rocm-support/mlx/dtype_utils.h:17:5: note: expanded from macro 'MLX_INTERNAL_DTYPE_SWITCH_CASE'
17 | f(type_identity<TYPE>{}); \
| ^
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:432:5: note: in instantiation of function template specialization 'mlx::core::dispatch_all_types<(lambda at /path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:432:36)>' requested here
432 | dispatch_all_types(in.dtype(), [&](auto type_tag) {
| ^
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:486:15: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
486 | rocprim::radix_sort_pairs(
| ^~~~~~~~~~~~~~~~~~~~~~~~~~
487 | temp_storage, temp_bytes,
| ~~~~~~~~~~~~~~~~~~~~~~~~~
488 | vals_tmp, vals_sorted,
| ~~~~~~~~~~~~~~~~~~~~~~
489 | indices_in, indices_out,
| ~~~~~~~~~~~~~~~~~~~~~~~~
490 | N, 0, sizeof(ValT) * 8, hip_stream);
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:511:13: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
511 | rocprim::radix_sort_keys(
| ^~~~~~~~~~~~~~~~~~~~~~~~~
512 | nullptr, temp_bytes,
| ~~~~~~~~~~~~~~~~~~~~
513 | vals_in, vals_out_buf,
| ~~~~~~~~~~~~~~~~~~~~~~
514 | N, 0, sizeof(ValT) * 8, hip_stream);
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:525:15: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
525 | rocprim::radix_sort_keys(
| ^~~~~~~~~~~~~~~~~~~~~~~~~
526 | temp_storage, temp_bytes,
| ~~~~~~~~~~~~~~~~~~~~~~~~~
527 | vals_in, vals_out_buf,
| ~~~~~~~~~~~~~~~~~~~~~~
528 | N, 0, sizeof(ValT) * 8, hip_stream);
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:453:13: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
453 | rocprim::radix_sort_pairs(
| ^~~~~~~~~~~~~~~~~~~~~~~~~~
454 | nullptr, temp_bytes,
| ~~~~~~~~~~~~~~~~~~~~
455 | vals_tmp, vals_sorted,
| ~~~~~~~~~~~~~~~~~~~~~~
456 | indices_in, indices_out,
| ~~~~~~~~~~~~~~~~~~~~~~~~
457 | N, 0, sizeof(ValT) * 8, hip_stream);
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:437:59: note: while substituting into a lambda expression here
437 | encoder.launch_kernel([&](hipStream_t hip_stream) {
| ^
/path/to/mlx-lm-rocm-support/mlx/dtype_utils.h:50:5: note: in instantiation of function template specialization 'mlx::core::(anonymous namespace)::gpu_sort(const Stream &, array, array &, int, bool)::(anonymous class)::operator()<mlx::core::type_identity<unsigned char>>' requested here
50 | MLX_INTERNAL_DTYPE_SWITCH_INTS();
| ^
/path/to/mlx-lm-rocm-support/mlx/dtype_utils.h:25:3: note: expanded from macro 'MLX_INTERNAL_DTYPE_SWITCH_INTS'
25 | MLX_INTERNAL_DTYPE_SWITCH_CASE(uint8, uint8_t); \
| ^
/path/to/mlx-lm-rocm-support/mlx/dtype_utils.h:17:5: note: expanded from macro 'MLX_INTERNAL_DTYPE_SWITCH_CASE'
17 | f(type_identity<TYPE>{}); \
| ^
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:432:5: note: in instantiation of function template specialization 'mlx::core::dispatch_all_types<(lambda at /path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:432:36)>' requested here
432 | dispatch_all_types(in.dtype(), [&](auto type_tag) {
| ^
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:486:15: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
486 | rocprim::radix_sort_pairs(
| ^~~~~~~~~~~~~~~~~~~~~~~~~~
487 | temp_storage, temp_bytes,
| ~~~~~~~~~~~~~~~~~~~~~~~~~
488 | vals_tmp, vals_sorted,
| ~~~~~~~~~~~~~~~~~~~~~~
489 | indices_in, indices_out,
| ~~~~~~~~~~~~~~~~~~~~~~~~
490 | N, 0, sizeof(ValT) * 8, hip_stream);
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:511:13: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
511 | rocprim::radix_sort_keys(
| ^~~~~~~~~~~~~~~~~~~~~~~~~
512 | nullptr, temp_bytes,
| ~~~~~~~~~~~~~~~~~~~~
513 | vals_in, vals_out_buf,
| ~~~~~~~~~~~~~~~~~~~~~~
514 | N, 0, sizeof(ValT) * 8, hip_stream);
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:525:15: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
525 | rocprim::radix_sort_keys(
| ^~~~~~~~~~~~~~~~~~~~~~~~~
526 | temp_storage, temp_bytes,
| ~~~~~~~~~~~~~~~~~~~~~~~~~
527 | vals_in, vals_out_buf,
| ~~~~~~~~~~~~~~~~~~~~~~
528 | N, 0, sizeof(ValT) * 8, hip_stream);
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:453:13: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
453 | rocprim::radix_sort_pairs(
| ^~~~~~~~~~~~~~~~~~~~~~~~~~
454 | nullptr, temp_bytes,
| ~~~~~~~~~~~~~~~~~~~~
455 | vals_tmp, vals_sorted,
| ~~~~~~~~~~~~~~~~~~~~~~
456 | indices_in, indices_out,
| ~~~~~~~~~~~~~~~~~~~~~~~~
457 | N, 0, sizeof(ValT) * 8, hip_stream);
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:437:59: note: while substituting into a lambda expression here
437 | encoder.launch_kernel([&](hipStream_t hip_stream) {
| ^
/path/to/mlx-lm-rocm-support/mlx/dtype_utils.h:50:5: note: in instantiation of function template specialization 'mlx::core::(anonymous namespace)::gpu_sort(const Stream &, array, array &, int, bool)::(anonymous class)::operator()<mlx::core::type_identity<unsigned short>>' requested here
50 | MLX_INTERNAL_DTYPE_SWITCH_INTS();
| ^
/path/to/mlx-lm-rocm-support/mlx/dtype_utils.h:26:3: note: expanded from macro 'MLX_INTERNAL_DTYPE_SWITCH_INTS'
26 | MLX_INTERNAL_DTYPE_SWITCH_CASE(uint16, uint16_t); \
| ^
/path/to/mlx-lm-rocm-support/mlx/dtype_utils.h:17:5: note: expanded from macro 'MLX_INTERNAL_DTYPE_SWITCH_CASE'
17 | f(type_identity<TYPE>{}); \
| ^
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:432:5: note: in instantiation of function template specialization 'mlx::core::dispatch_all_types<(lambda at /path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:432:36)>' requested here
432 | dispatch_all_types(in.dtype(), [&](auto type_tag) {
| ^
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:486:15: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
486 | rocprim::radix_sort_pairs(
| ^~~~~~~~~~~~~~~~~~~~~~~~~~
487 | temp_storage, temp_bytes,
| ~~~~~~~~~~~~~~~~~~~~~~~~~
488 | vals_tmp, vals_sorted,
| ~~~~~~~~~~~~~~~~~~~~~~
489 | indices_in, indices_out,
| ~~~~~~~~~~~~~~~~~~~~~~~~
490 | N, 0, sizeof(ValT) * 8, hip_stream);
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:511:13: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
511 | rocprim::radix_sort_keys(
| ^~~~~~~~~~~~~~~~~~~~~~~~~
512 | nullptr, temp_bytes,
| ~~~~~~~~~~~~~~~~~~~~
513 | vals_in, vals_out_buf,
| ~~~~~~~~~~~~~~~~~~~~~~
514 | N, 0, sizeof(ValT) * 8, hip_stream);
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:525:15: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
525 | rocprim::radix_sort_keys(
| ^~~~~~~~~~~~~~~~~~~~~~~~~
526 | temp_storage, temp_bytes,
| ~~~~~~~~~~~~~~~~~~~~~~~~~
527 | vals_in, vals_out_buf,
| ~~~~~~~~~~~~~~~~~~~~~~
528 | N, 0, sizeof(ValT) * 8, hip_stream);
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:453:13: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
453 | rocprim::radix_sort_pairs(
| ^~~~~~~~~~~~~~~~~~~~~~~~~~
454 | nullptr, temp_bytes,
| ~~~~~~~~~~~~~~~~~~~~
455 | vals_tmp, vals_sorted,
| ~~~~~~~~~~~~~~~~~~~~~~
456 | indices_in, indices_out,
| ~~~~~~~~~~~~~~~~~~~~~~~~
457 | N, 0, sizeof(ValT) * 8, hip_stream);
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:437:59: note: while substituting into a lambda expression here
437 | encoder.launch_kernel([&](hipStream_t hip_stream) {
| ^
/path/to/mlx-lm-rocm-support/mlx/dtype_utils.h:50:5: note: in instantiation of function template specialization 'mlx::core::(anonymous namespace)::gpu_sort(const Stream &, array, array &, int, bool)::(anonymous class)::operator()<mlx::core::type_identity<unsigned int>>' requested here
50 | MLX_INTERNAL_DTYPE_SWITCH_INTS();
| ^
/path/to/mlx-lm-rocm-support/mlx/dtype_utils.h:27:3: note: expanded from macro 'MLX_INTERNAL_DTYPE_SWITCH_INTS'
27 | MLX_INTERNAL_DTYPE_SWITCH_CASE(uint32, uint32_t); \
| ^
/path/to/mlx-lm-rocm-support/mlx/dtype_utils.h:17:5: note: expanded from macro 'MLX_INTERNAL_DTYPE_SWITCH_CASE'
17 | f(type_identity<TYPE>{}); \
| ^
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:432:5: note: in instantiation of function template specialization 'mlx::core::dispatch_all_types<(lambda at /path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:432:36)>' requested here
432 | dispatch_all_types(in.dtype(), [&](auto type_tag) {
| ^
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:486:15: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
486 | rocprim::radix_sort_pairs(
| ^~~~~~~~~~~~~~~~~~~~~~~~~~
487 | temp_storage, temp_bytes,
| ~~~~~~~~~~~~~~~~~~~~~~~~~
488 | vals_tmp, vals_sorted,
| ~~~~~~~~~~~~~~~~~~~~~~
489 | indices_in, indices_out,
| ~~~~~~~~~~~~~~~~~~~~~~~~
490 | N, 0, sizeof(ValT) * 8, hip_stream);
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:511:13: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
511 | rocprim::radix_sort_keys(
| ^~~~~~~~~~~~~~~~~~~~~~~~~
512 | nullptr, temp_bytes,
| ~~~~~~~~~~~~~~~~~~~~
513 | vals_in, vals_out_buf,
| ~~~~~~~~~~~~~~~~~~~~~~
514 | N, 0, sizeof(ValT) * 8, hip_stream);
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:525:15: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
525 | rocprim::radix_sort_keys(
| ^~~~~~~~~~~~~~~~~~~~~~~~~
526 | temp_storage, temp_bytes,
| ~~~~~~~~~~~~~~~~~~~~~~~~~
527 | vals_in, vals_out_buf,
| ~~~~~~~~~~~~~~~~~~~~~~
528 | N, 0, sizeof(ValT) * 8, hip_stream);
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:453:13: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
453 | rocprim::radix_sort_pairs(
| ^~~~~~~~~~~~~~~~~~~~~~~~~~
454 | nullptr, temp_bytes,
| ~~~~~~~~~~~~~~~~~~~~
455 | vals_tmp, vals_sorted,
| ~~~~~~~~~~~~~~~~~~~~~~
456 | indices_in, indices_out,
| ~~~~~~~~~~~~~~~~~~~~~~~~
457 | N, 0, sizeof(ValT) * 8, hip_stream);
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:437:59: note: while substituting into a lambda expression here
437 | encoder.launch_kernel([&](hipStream_t hip_stream) {
| ^
/path/to/mlx-lm-rocm-support/mlx/dtype_utils.h:50:5: note: in instantiation of function template specialization 'mlx::core::(anonymous namespace)::gpu_sort(const Stream &, array, array &, int, bool)::(anonymous class)::operator()<mlx::core::type_identity<unsigned long>>' requested here
50 | MLX_INTERNAL_DTYPE_SWITCH_INTS();
| ^
/path/to/mlx-lm-rocm-support/mlx/dtype_utils.h:28:3: note: expanded from macro 'MLX_INTERNAL_DTYPE_SWITCH_INTS'
28 | MLX_INTERNAL_DTYPE_SWITCH_CASE(uint64, uint64_t)
| ^
/path/to/mlx-lm-rocm-support/mlx/dtype_utils.h:17:5: note: expanded from macro 'MLX_INTERNAL_DTYPE_SWITCH_CASE'
17 | f(type_identity<TYPE>{}); \
| ^
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:432:5: note: in instantiation of function template specialization 'mlx::core::dispatch_all_types<(lambda at /path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:432:36)>' requested here
432 | dispatch_all_types(in.dtype(), [&](auto type_tag) {
| ^
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:486:15: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
486 | rocprim::radix_sort_pairs(
| ^~~~~~~~~~~~~~~~~~~~~~~~~~
487 | temp_storage, temp_bytes,
| ~~~~~~~~~~~~~~~~~~~~~~~~~
488 | vals_tmp, vals_sorted,
| ~~~~~~~~~~~~~~~~~~~~~~
489 | indices_in, indices_out,
| ~~~~~~~~~~~~~~~~~~~~~~~~
490 | N, 0, sizeof(ValT) * 8, hip_stream);
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:511:13: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
511 | rocprim::radix_sort_keys(
| ^~~~~~~~~~~~~~~~~~~~~~~~~
512 | nullptr, temp_bytes,
| ~~~~~~~~~~~~~~~~~~~~
513 | vals_in, vals_out_buf,
| ~~~~~~~~~~~~~~~~~~~~~~
514 | N, 0, sizeof(ValT) * 8, hip_stream);
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:525:15: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
525 | rocprim::radix_sort_keys(
| ^~~~~~~~~~~~~~~~~~~~~~~~~
526 | temp_storage, temp_bytes,
| ~~~~~~~~~~~~~~~~~~~~~~~~~
527 | vals_in, vals_out_buf,
| ~~~~~~~~~~~~~~~~~~~~~~
528 | N, 0, sizeof(ValT) * 8, hip_stream);
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:453:13: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
453 | rocprim::radix_sort_pairs(
| ^~~~~~~~~~~~~~~~~~~~~~~~~~
454 | nullptr, temp_bytes,
| ~~~~~~~~~~~~~~~~~~~~
455 | vals_tmp, vals_sorted,
| ~~~~~~~~~~~~~~~~~~~~~~
456 | indices_in, indices_out,
| ~~~~~~~~~~~~~~~~~~~~~~~~
457 | N, 0, sizeof(ValT) * 8, hip_stream);
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:437:59: note: while substituting into a lambda expression here
437 | encoder.launch_kernel([&](hipStream_t hip_stream) {
| ^
/path/to/mlx-lm-rocm-support/mlx/dtype_utils.h:51:5: note: in instantiation of function template specialization 'mlx::core::(anonymous namespace)::gpu_sort(const Stream &, array, array &, int, bool)::(anonymous class)::operator()<mlx::core::type_identity<mlx::core::_MLX_Float16>>' requested here
51 | MLX_INTERNAL_DTYPE_SWITCH_FLOATS();
| ^
/path/to/mlx-lm-rocm-support/mlx/dtype_utils.h:31:3: note: expanded from macro 'MLX_INTERNAL_DTYPE_SWITCH_FLOATS'
31 | MLX_INTERNAL_DTYPE_SWITCH_CASE(float16, float16_t); \
| ^
/path/to/mlx-lm-rocm-support/mlx/dtype_utils.h:17:5: note: expanded from macro 'MLX_INTERNAL_DTYPE_SWITCH_CASE'
17 | f(type_identity<TYPE>{}); \
| ^
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:432:5: note: in instantiation of function template specialization 'mlx::core::dispatch_all_types<(lambda at /path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:432:36)>' requested here
432 | dispatch_all_types(in.dtype(), [&](auto type_tag) {
| ^
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:486:15: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
486 | rocprim::radix_sort_pairs(
| ^~~~~~~~~~~~~~~~~~~~~~~~~~
487 | temp_storage, temp_bytes,
| ~~~~~~~~~~~~~~~~~~~~~~~~~
488 | vals_tmp, vals_sorted,
| ~~~~~~~~~~~~~~~~~~~~~~
489 | indices_in, indices_out,
| ~~~~~~~~~~~~~~~~~~~~~~~~
490 | N, 0, sizeof(ValT) * 8, hip_stream);
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:511:13: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
511 | rocprim::radix_sort_keys(
| ^~~~~~~~~~~~~~~~~~~~~~~~~
512 | nullptr, temp_bytes,
| ~~~~~~~~~~~~~~~~~~~~
513 | vals_in, vals_out_buf,
| ~~~~~~~~~~~~~~~~~~~~~~
514 | N, 0, sizeof(ValT) * 8, hip_stream);
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:525:15: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
525 | rocprim::radix_sort_keys(
| ^~~~~~~~~~~~~~~~~~~~~~~~~
526 | temp_storage, temp_bytes,
| ~~~~~~~~~~~~~~~~~~~~~~~~~
527 | vals_in, vals_out_buf,
| ~~~~~~~~~~~~~~~~~~~~~~
528 | N, 0, sizeof(ValT) * 8, hip_stream);
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:453:13: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
453 | rocprim::radix_sort_pairs(
| ^~~~~~~~~~~~~~~~~~~~~~~~~~
454 | nullptr, temp_bytes,
| ~~~~~~~~~~~~~~~~~~~~
455 | vals_tmp, vals_sorted,
| ~~~~~~~~~~~~~~~~~~~~~~
456 | indices_in, indices_out,
| ~~~~~~~~~~~~~~~~~~~~~~~~
457 | N, 0, sizeof(ValT) * 8, hip_stream);
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:437:59: note: while substituting into a lambda expression here
437 | encoder.launch_kernel([&](hipStream_t hip_stream) {
| ^
/path/to/mlx-lm-rocm-support/mlx/dtype_utils.h:51:5: note: in instantiation of function template specialization 'mlx::core::(anonymous namespace)::gpu_sort(const Stream &, array, array &, int, bool)::(anonymous class)::operator()<mlx::core::type_identity<mlx::core::_MLX_BFloat16>>' requested here
51 | MLX_INTERNAL_DTYPE_SWITCH_FLOATS();
| ^
/path/to/mlx-lm-rocm-support/mlx/dtype_utils.h:32:3: note: expanded from macro 'MLX_INTERNAL_DTYPE_SWITCH_FLOATS'
32 | MLX_INTERNAL_DTYPE_SWITCH_CASE(bfloat16, bfloat16_t); \
| ^
/path/to/mlx-lm-rocm-support/mlx/dtype_utils.h:17:5: note: expanded from macro 'MLX_INTERNAL_DTYPE_SWITCH_CASE'
17 | f(type_identity<TYPE>{}); \
| ^
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:432:5: note: in instantiation of function template specialization 'mlx::core::dispatch_all_types<(lambda at /path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:432:36)>' requested here
432 | dispatch_all_types(in.dtype(), [&](auto type_tag) {
| ^
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:486:15: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
486 | rocprim::radix_sort_pairs(
| ^~~~~~~~~~~~~~~~~~~~~~~~~~
487 | temp_storage, temp_bytes,
| ~~~~~~~~~~~~~~~~~~~~~~~~~
488 | vals_tmp, vals_sorted,
| ~~~~~~~~~~~~~~~~~~~~~~
489 | indices_in, indices_out,
| ~~~~~~~~~~~~~~~~~~~~~~~~
490 | N, 0, sizeof(ValT) * 8, hip_stream);
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:511:13: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
511 | rocprim::radix_sort_keys(
| ^~~~~~~~~~~~~~~~~~~~~~~~~
512 | nullptr, temp_bytes,
| ~~~~~~~~~~~~~~~~~~~~
513 | vals_in, vals_out_buf,
| ~~~~~~~~~~~~~~~~~~~~~~
514 | N, 0, sizeof(ValT) * 8, hip_stream);
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:525:15: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
525 | rocprim::radix_sort_keys(
| ^~~~~~~~~~~~~~~~~~~~~~~~~
526 | temp_storage, temp_bytes,
| ~~~~~~~~~~~~~~~~~~~~~~~~~
527 | vals_in, vals_out_buf,
| ~~~~~~~~~~~~~~~~~~~~~~
528 | N, 0, sizeof(ValT) * 8, hip_stream);
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:453:13: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
453 | rocprim::radix_sort_pairs(
| ^~~~~~~~~~~~~~~~~~~~~~~~~~
454 | nullptr, temp_bytes,
| ~~~~~~~~~~~~~~~~~~~~
455 | vals_tmp, vals_sorted,
| ~~~~~~~~~~~~~~~~~~~~~~
456 | indices_in, indices_out,
| ~~~~~~~~~~~~~~~~~~~~~~~~
457 | N, 0, sizeof(ValT) * 8, hip_stream);
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:437:59: note: while substituting into a lambda expression here
437 | encoder.launch_kernel([&](hipStream_t hip_stream) {
| ^
/path/to/mlx-lm-rocm-support/mlx/dtype_utils.h:51:5: note: in instantiation of function template specialization 'mlx::core::(anonymous namespace)::gpu_sort(const Stream &, array, array &, int, bool)::(anonymous class)::operator()<mlx::core::type_identity<float>>' requested here
51 | MLX_INTERNAL_DTYPE_SWITCH_FLOATS();
| ^
/path/to/mlx-lm-rocm-support/mlx/dtype_utils.h:33:3: note: expanded from macro 'MLX_INTERNAL_DTYPE_SWITCH_FLOATS'
33 | MLX_INTERNAL_DTYPE_SWITCH_CASE(float32, float); \
| ^
/path/to/mlx-lm-rocm-support/mlx/dtype_utils.h:17:5: note: expanded from macro 'MLX_INTERNAL_DTYPE_SWITCH_CASE'
17 | f(type_identity<TYPE>{}); \
| ^
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:432:5: note: in instantiation of function template specialization 'mlx::core::dispatch_all_types<(lambda at /path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:432:36)>' requested here
432 | dispatch_all_types(in.dtype(), [&](auto type_tag) {
| ^
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:486:15: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
486 | rocprim::radix_sort_pairs(
| ^~~~~~~~~~~~~~~~~~~~~~~~~~
487 | temp_storage, temp_bytes,
| ~~~~~~~~~~~~~~~~~~~~~~~~~
488 | vals_tmp, vals_sorted,
| ~~~~~~~~~~~~~~~~~~~~~~
489 | indices_in, indices_out,
| ~~~~~~~~~~~~~~~~~~~~~~~~
490 | N, 0, sizeof(ValT) * 8, hip_stream);
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:511:13: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
511 | rocprim::radix_sort_keys(
| ^~~~~~~~~~~~~~~~~~~~~~~~~
512 | nullptr, temp_bytes,
| ~~~~~~~~~~~~~~~~~~~~
513 | vals_in, vals_out_buf,
| ~~~~~~~~~~~~~~~~~~~~~~
514 | N, 0, sizeof(ValT) * 8, hip_stream);
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:525:15: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
525 | rocprim::radix_sort_keys(
| ^~~~~~~~~~~~~~~~~~~~~~~~~
526 | temp_storage, temp_bytes,
| ~~~~~~~~~~~~~~~~~~~~~~~~~
527 | vals_in, vals_out_buf,
| ~~~~~~~~~~~~~~~~~~~~~~
528 | N, 0, sizeof(ValT) * 8, hip_stream);
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:453:13: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
453 | rocprim::radix_sort_pairs(
| ^~~~~~~~~~~~~~~~~~~~~~~~~~
454 | nullptr, temp_bytes,
| ~~~~~~~~~~~~~~~~~~~~
455 | vals_tmp, vals_sorted,
| ~~~~~~~~~~~~~~~~~~~~~~
456 | indices_in, indices_out,
| ~~~~~~~~~~~~~~~~~~~~~~~~
457 | N, 0, sizeof(ValT) * 8, hip_stream);
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:437:59: note: while substituting into a lambda expression here
437 | encoder.launch_kernel([&](hipStream_t hip_stream) {
| ^
/path/to/mlx-lm-rocm-support/mlx/dtype_utils.h:51:5: note: in instantiation of function template specialization 'mlx::core::(anonymous namespace)::gpu_sort(const Stream &, array, array &, int, bool)::(anonymous class)::operator()<mlx::core::type_identity<double>>' requested here
51 | MLX_INTERNAL_DTYPE_SWITCH_FLOATS();
| ^
/path/to/mlx-lm-rocm-support/mlx/dtype_utils.h:34:3: note: expanded from macro 'MLX_INTERNAL_DTYPE_SWITCH_FLOATS'
34 | MLX_INTERNAL_DTYPE_SWITCH_CASE(float64, double)
| ^
/path/to/mlx-lm-rocm-support/mlx/dtype_utils.h:17:5: note: expanded from macro 'MLX_INTERNAL_DTYPE_SWITCH_CASE'
17 | f(type_identity<TYPE>{}); \
| ^
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:432:5: note: in instantiation of function template specialization 'mlx::core::dispatch_all_types<(lambda at /path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:432:36)>' requested here
432 | dispatch_all_types(in.dtype(), [&](auto type_tag) {
| ^
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:486:15: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
486 | rocprim::radix_sort_pairs(
| ^~~~~~~~~~~~~~~~~~~~~~~~~~
487 | temp_storage, temp_bytes,
| ~~~~~~~~~~~~~~~~~~~~~~~~~
488 | vals_tmp, vals_sorted,
| ~~~~~~~~~~~~~~~~~~~~~~
489 | indices_in, indices_out,
| ~~~~~~~~~~~~~~~~~~~~~~~~
490 | N, 0, sizeof(ValT) * 8, hip_stream);
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:511:13: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
511 | rocprim::radix_sort_keys(
| ^~~~~~~~~~~~~~~~~~~~~~~~~
512 | nullptr, temp_bytes,
| ~~~~~~~~~~~~~~~~~~~~
513 | vals_in, vals_out_buf,
| ~~~~~~~~~~~~~~~~~~~~~~
514 | N, 0, sizeof(ValT) * 8, hip_stream);
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip:525:15: warning: ignoring return value of type 'hipError_t' declared with 'nodiscard' attribute [-Wunused-value]
525 | rocprim::radix_sort_keys(
| ^~~~~~~~~~~~~~~~~~~~~~~~~
526 | temp_storage, temp_bytes,
| ~~~~~~~~~~~~~~~~~~~~~~~~~
527 | vals_in, vals_out_buf,
| ~~~~~~~~~~~~~~~~~~~~~~
528 | N, 0, sizeof(ValT) * 8, hip_stream);
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
55 warnings and 4 errors generated when compiling for gfx1103.
failed to execute:/opt/rocm/lib/llvm/bin/clang++ --offload-arch=gfx1103 -O3 -c -x hip /path/to/mlx-lm-rocm-support/mlx/backend/rocm/sort.hip -o "/path/to/mlx-lm-rocm-support/build/mlx/backend/rocm/hip_objs/sort.o" -fPIC -DMLX_USE_ROCM -I/path/to/mlx-lm-rocm-support -I/opt/rocm/include -I/usr/include/c++/11 -I/usr/include/x86_64-linux-gnu/c++/11 -I/usr/include/c++/11/backward -I/usr/include/x86_64-linux-gnu -I/usr/include -I/opt/rocm/include -I/opt/rocm/include/ -I/opt/rocm/include -I/opt/rocm/include -I/opt/rocm/include -I/opt/rocm/include/hiprand -I/opt/rocm/include -I/opt/rocm/include -std=c++17 -parallel-jobs=8
make[2]: *** [mlx/backend/rocm/CMakeFiles/mlx_rocm_kernels_lib.dir/build.make:254: mlx/backend/rocm/hip_objs/sort.o] Error 1
make[1]: *** [CMakeFiles/Makefile2:1549: mlx/backend/rocm/CMakeFiles/mlx_rocm_kernels_lib.dir/all] Error 2
make: *** [Makefile:146: all] Error 2 |
ROCm 7.12 (clang 22) provides __device__ placement new/delete via cuda_wrappers/new, causing redefinition errors. Guard with __CLANG_CUDA_WRAPPERS_NEW so the manual definitions are only compiled on older ROCm versions that lack them.
Please try the newest version, I have added a guard which should help. |
|
I completed the build: [..snip..]
[ 98%] Built target linear_regression
[ 98%] Built target metal_capture
[ 98%] Built target test_teardown
[ 98%] Built target logistic_regression
[ 98%] Built target tutorial
[ 98%] Built target distributed
[100%] Linking CXX executable tests
[100%] Built target tests
root@pop-os:/path/to/mlx-lm-rocm-support/build# python
bash: python: command not found
root@pop-os:/path/to/mlx-lm-rocm-support/build# curl -LsSf https://astral.sh/uv/install.sh | sh
source $HOME/.local/bin/env # or: export PATH="$HOME/.local/bin:$PATH"
downloading uv 0.11.3 x86_64-unknown-linux-gnu
installing to /root/.local/bin
uv
uvx
everything's installed!
To add $HOME/.local/bin to your PATH, either restart your shell or run:
source $HOME/.local/bin/env (sh, bash, zsh)
source $HOME/.local/bin/env.fish (fish)
root@pop-os:/path/to/mlx-lm-rocm-support/build# uv --version
uv 0.11.3 (x86_64-unknown-linux-gnu)
root@pop-os:/path/to/mlx-lm-rocm-support/build# uv venv /opt/mlx-env --python 3.12
source /opt/mlx-env/bin/activate
Using CPython 3.12.13
Creating virtual environment at: /opt/mlx-env
Activate with: source /opt/mlx-env/bin/activateBut failed to import # uv run --active ipython
warning: No `requires-python` value found in the workspace. Defaulting to `>=3.12`.
Python 3.12.13 (main, Mar 24 2026, 22:49:22) [Clang 22.1.1 ]
Type 'copyright', 'credits' or 'license' for more information
IPython 9.12.0 -- An enhanced Interactive Python. Type '?' for help.
Tip: Use `F2` or %edit with no arguments to open an empty editor with a temporary file.
In [1]: import mlx_lm
---------------------------------------------------------------------------
ImportError Traceback (most recent call last)
Cell In[1], line 1
----> 1 import mlx_lm
File /path/to/mlx_lm/__init__.py:9
5 from ._version import __version__
7 os.environ["TRANSFORMERS_NO_ADVISORY_WARNINGS"] = "1"
----> 9 from .convert import convert
10 from .generate import batch_generate, generate, stream_generate
11 from .utils import load
File /path/to/mlx_lm/convert.py:7
4 from pathlib import Path
5 from typing import Callable, Optional, Union
----> 7 import mlx.core as mx
8 import mlx.nn as nn
9 from mlx.utils import tree_map_with_path
ImportError: /path/to/mlx-lm-rocm-support/python/mlx/core.cpython-312-x86_64-linux-gnu.so: undefined symbol: _ZN3mlx4core4fast10hip_kernelERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEERKSt6vectorIS7_SaIS7_EESE_S9_S9_bi
In [2]: exit
(mlx-env) root@pop-os:/path/to/mlx-lm-rocm-support# CMAKE_BUILD_PARALLEL_LEVEL=$(nproc) uv pip install -e ".[dev]" --no-build-isolation
Using Python 3.12.13 environment at: /opt/mlx-env
Resolved 42 packages in 299ms
Built mlx @ /path/to//mlx-lm-rocm-support
Prepared 1 package in 1m 43s
Uninstalled 1 package in 0.44ms
Installed 1 package in 2ms
~ mlx==0.31.2.dev20260404+a866ff4f (from file:///path/to/mlx-lm-rocm-support)
(mlx-env) root@pop-os:/path/to/mlx-lm-rocm-support# pushd
/path/to/mlx-lm /path/to/mlx-lm-rocm-support
(mlx-env) root@pop-os:/path/to/mlx-lm# git pull; CMAKE_BUILD_PARALLEL_LEVEL=$(nproc) uv pip install -U .
Already up to date.
Using Python 3.12.13 environment at: /opt/mlx-env
Resolved 32 packages in 791ms
Built mlx-lm @ file:///path/to/mlx-lm
Prepared 1 package in 392ms
Uninstalled 1 package in 1ms
Installed 1 package in 10ms
~ mlx-lm==0.31.2 (from file:///path/to/mlx-lm)
(mlx-env) root@pop-os:/path/to/mlx-lm# mlx_lm.
mlx_lm.awq mlx_lm.chat mlx_lm.dynamic_quant mlx_lm.generate mlx_lm.manage mlx_lm.share
mlx_lm.benchmark mlx_lm.convert mlx_lm.evaluate mlx_lm.gptq mlx_lm.perplexity mlx_lm.upload
mlx_lm.cache_prompt mlx_lm.dwq mlx_lm.fuse mlx_lm.lora mlx_lm.server
(mlx-env) root@pop-os:/path/to/mlx-lm# mlx_lm.chat --help
Traceback (most recent call last):
File "/opt/mlx-env/bin/mlx_lm.chat", line 4, in <module>
from mlx_lm.chat import main
File "/opt/mlx-env/lib/python3.12/site-packages/mlx_lm/__init__.py", line 9, in <module>
from .convert import convert
File "/opt/mlx-env/lib/python3.12/site-packages/mlx_lm/convert.py", line 7, in <module>
import mlx.core as mx
ImportError: /path/to/mlx-lm-rocm-support/python/mlx/core.cpython-312-x86_64-linux-gnu.so: undefined symbol: _ZN3mlx4core4fast10hip_kernelERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEERKSt6vectorIS7_SaIS7_EESE_S9_S9_bi
(mlx-env) root@pop-os:/path/to/mlx-lm# |
When MLX_BUILD_ROCM=OFF, the Python binding unconditionally references mlx::core::fast::hip_kernel but no_rocm.cpp only stubbed rocm::is_available(). Add a throwing stub matching the pattern used by no_metal.cpp and no_cuda.cpp. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
please try again we where missing no-rocm stubs. |
|
@chimezie Please rebuild with these instructions. Building mlx for ROCm Prerequisites ROCm packages installed (via apt or your distro's package manager):
Build command CMAKE_ARGS="-DMLX_BUILD_ROCM=ON -DCMAKE_HIP_ARCHITECTURES=gfx1103 -DBLA_VENDOR=OpenBLAS -DCMAKE_BUILD_TYPE=RelWithDebInfo" Architecture targeting Replace gfx1150 with your GPU's architecture. Common values: Omit -DCMAKE_HIP_ARCHITECTURES=... to build for all supported architectures (slow — compiles 17 targets). The CI does it this way From .github/workflows/build_rocm.yml: CMAKE_ARGS="-DMLX_BUILD_ROCM=ON -DMLX_ROCM_ARCHITECTURES=gfx1151 -DBLA_VENDOR=OpenBLAS -DCMAKE_BUILD_TYPE=RelWithDebInfo" The key flag you were missing was -DMLX_BUILD_ROCM=ON inside CMAKE_ARGS. Without it, setup.py passes the default |
|
|
Following your build instructions: CMAKE_ARGS="-DMLX_BUILD_ROCM=ON -DCMAKE_HIP_ARCHITECTURES=gfx1103 -DBLA_VENDOR=OpenBLAS -DCMAKE_BUILD_TYPE=RelWithDebInfo" CMAKE_ARGS="-DMLX_BUILD_ROCM=ON -DCMAKE_HIP_ARCHITECTURES=gfx1103 -DBLA_VENDOR=OpenBLAS -DCMAKE_BUILD_TYPE=RelWithDebInfo" uv pip install -e ".[dev]" --no-build-isolation
Using Python 3.11.15 environment at: /opt/mlx-env
Resolved 42 packages in 183ms
Installed 1 package in 13ms
x Failed to build `mlx @ file:///path/to/mlx-lm-rocm-support`
[.. snip ..]
-- Downloading gguflib
-- Configuring incomplete, errors occurred!
[stderr]
CMake Error at mlx/backend/rocm/CMakeLists.txt:8 (find_package):
Could not find a package configuration file provided by "hip" with any of
the following names:
hip.cps
hipConfig.cmake
hip-config.cmake
Add the installation prefix of "hip" to CMAKE_PREFIX_PATH or set "hip_DIR"
to a directory containing one of the above files. If "hip" provides a
separate development package or SDK, be sure it has been installed.
[..snip..]
raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['cmake', '/path/to/mlx-lm-rocm-support', '-DCMAKE_INSTALL_PREFIX=/tmp/tmp6q2djk76.build-lib/mlx',
'-DMLX_PYTHON_BINDINGS_OUTPUT_DIRECTORY=/tmp/tmp6q2djk76.build-lib/mlx', '-DCMAKE_BUILD_TYPE=Release', '-DMLX_BUILD_PYTHON_BINDINGS=ON', '-DMLX_BUILD_TESTS=OFF',
'-DMLX_BUILD_BENCHMARKS=OFF', '-DMLX_BUILD_EXAMPLES=OFF', '-DBUILD_SHARED_LIBS=ON', '-DMLX_BUILD_ROCM=ON', '-DCMAKE_HIP_ARCHITECTURES=gfx1103', '-DBLA_VENDOR=OpenBLAS',
'-DCMAKE_BUILD_TYPE=RelWithDebInfo', '-DMLX_VERSION=0.31.2.dev20260406+71d03e59']' returned non-zero exit status 1.
An error occurred when building editable wheel for mlx.All the other packages installed fine except # sudo apt-get install hiprtc-dev
Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
E: Unable to locate package hiprtc-dev
# sudo apt install -y hip-dev rocblas-dev rocthrust-dev rocprim-dev hiprand-dev rocwmma-dev libopenblas-dev hipblaslt-dev
Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
libopenblas-dev is already the newest version (0.3.20+ds-1).
hip-dev is already the newest version (7.2.26015.70200-43~22.04).
hipblaslt-dev is already the newest version (1.2.1.70200-43~22.04).
hiprand-dev is already the newest version (3.1.0.70200-43~22.04).
rocblas-dev is already the newest version (5.2.0.70200-43~22.04).
rocprim-dev is already the newest version (4.2.0.70200-43~22.04).
rocthrust-dev is already the newest version (4.2.0.70200-43~22.04).
rocwmma-dev is already the newest version (2.2.0.70200-43~22.04).
0 upgraded, 0 newly installed, 0 to remove and 40 not upgraded. |
running my rocm install script installs all the other things you need, you shouldnt need to install hip or anything. |
|
@chimezie this is my cmake out. [bookmarks] Loaded 0 bookmarks I think you need to delete the build folder and reconfigure cmake you might have a stale cmake cache |
|
After installing these in the
Built with: CMAKE_ARGS="-DMLX_BUILD_ROCM=ON -DMLX_ROCM_ARCHITECTURES=gfx1103 -DBLA_VENDOR=OpenBLAS -DCMAKE_BUILD_TYPE=RelWithDebInfo -DCMAKE_PREFIX_PATH=/opt/rocm" CMAKE_BUILD_PARALLEL_LEVEL=$(nproc) uv pip install -e ".[dev]" --no-build-isolation I was able to run : # time mlx_lm.generate --model mlx-community/Qwen3-1.7B-4bit --prompt "Show a Python function that implements factorial"
Fetching 9 files: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████| 9/9 [00:00<00:00, 41989.70it/s]
Download complete: : 0.00B [00:00, ?B/s] | 0/9 [00:00<?, ?it/s]
==========
Warning: rocBLAS does not support GPU architecture 'gfx1103'. Matrix multiplication operations will not be available. Supported architectures: gfx908, gfx90a, gfx942, gfx950, gfx1030, gfx1100, gfx1101, gfx1102, gfx1150, gfx1151, gfx1200, gfx1201.
<think>
Okay, I need to write a Python function that calculates the factorial of a number. Let me think about how to approach this.
First, what's a factorial? The factorial of a number n, denoted as n!, is the product of all positive integers up to n. But wait, 0! is defined as 1, right? So the base case is when the input is 0, return 1.
But what about numbers less than zero? Oh,
==========
Prompt: 15 tokens, 21.038 tokens-per-sec
Generation: 100 tokens, 32.853 tokens-per-sec
Peak memory: 1.084 GB
real 0m8.042s
user 0m6.512s
sys 0m6.980s |
Awesome looks like you are in business! We don't use rocBLAS we use rocBLASlt rocBLAS can be a fallback though but looks like you are good and we have fallbacks for when rocBLAS isn't available also. |
|
I'm not sure why this is the case, but despite being able to run mlx_lm.*, I cannot import mlx.core: # uv run ipython
warning: `VIRTUAL_ENV=/opt/mlx-env` does not match the project environment path `.venv` and will be ignored; use `--active` to target the active environment instead
Using CPython 3.14.3
Creating virtual environment at: .venv
warning: No `requires-python` value found in the workspace. Defaulting to `>=3.14`.
/opt/mlx-env/lib/python3.14/site-packages/IPython/core/interactiveshell.py:986: UserWarning: Attempting to work in a virtualenv. If you encounter problems, please install IPython inside the virtualenv.
warn(
Python 3.14.3 (main, Mar 24 2026, 22:50:36) [Clang 22.1.1 ]
Type 'copyright', 'credits' or 'license' for more information
IPython 9.12.0 -- An enhanced Interactive Python. Type '?' for help.
Tip: Use `ipython --help-all | less` to view all the IPython configuration options.In [1]: import mlx
In [2]: import mlx.core as mx
---------------------------------------------------------------------------
ImportError Traceback (most recent call last)
Cell In[2], line 1
----> 1 import mlx.core as mx
ImportError: libmlx.so: cannot open shared object file: No such file or directory |
Yeah i'm not sure I don't do a whole lot of python myself. I'm more of a C/C++ person, or flutter. |
rocwmma static-asserts on gfx103X (RDNA 2) which lacks WMMA support. Detect WMMA-capable targets (gfx11xx/gfx12xx) at CMake time and conditionally compile flash_attention_wmma.hip and link rocwmma.
|
If anyone wants to test it here for rocm/amd check out the lemon-mlx-engine which is the switch port. So AMD users can have a easy to go solution. https://github.com/lemonade-sdk/lemon-mlx-engine/releases |
Add explicit tiled QMV launch cases for 8-bit affine quantization in the ROCm quantized matmul path. This fixes 8-bit models being left off the tiled fast path and restores correct, faster decode behavior for tested Qwen 8-bit models.
Fix: broken/missing 8-bit inference in tiled QMV path
Experiment with ROCm backend.
install MLX with ROCm backend using:
closes #2556
Inspired by @zcbenz