[CUDA] Fix qmm_naive K-tail dispatch for FP quantized kernels by Lyxot · Pull Request #3445 · ml-explore/mlx

Lyxot · 2026-04-23T13:09:02Z

Root Cause

In qmm.cu, qmm_naive selected the HasKResidue specialization using:

bool has_k_residue = k % group_size != 0;

but the kernel tiles the K dimension using max(64, group_size).

For FP quantization modes:

mxfp4 / mxfp8 use group_size = 32
nvfp4 uses group_size = 16

So shapes like K=544 satisfy:

K % group_size == 0
K % 64 != 0

The old dispatch therefore selected the no-residue specialization even when the kernel still had a real K tail.

Fix

Change the residue check to match the kernel's actual K tiling

Copilot

Pull request overview

Fixes a CUDA quantized-matmul (qmm_naive) dispatch bug where the wrong K-tail specialization was chosen for FP-quantized modes by aligning the residue check with the kernel’s actual K tiling (max(64, group_size)), addressing issue #3444.

Changes:

Compute tile_k = max(64, group_size) inside qmm_naive.
Change the HasKResidue dispatch condition from k % group_size to k % tile_k.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

zcbenz

Nice fix, thanks!

fix(cuda): correct qmm naive k-tail dispatch

d6e3964

Copilot AI review requested due to automatic review settings April 23, 2026 13:09

Copilot started reviewing on behalf of Lyxot April 23, 2026 13:09 View session

Copilot AI reviewed Apr 23, 2026

View reviewed changes

Comment thread mlx/backend/cuda/quantized/qmm/qmm.cu

chore(cuda): clarify qmm naive k-tail error

60165d2

zcbenz approved these changes Apr 23, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[CUDA] Fix qmm_naive K-tail dispatch for FP quantized kernels#3445

[CUDA] Fix qmm_naive K-tail dispatch for FP quantized kernels#3445
Lyxot wants to merge 2 commits intoml-explore:mainfrom
Lyxot:cuda/qmm-naive-k-tail-fix

Lyxot commented Apr 23, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

zcbenz left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

Lyxot commented Apr 23, 2026

Root Cause

Fix

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

zcbenz left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants