Skip to content

[CUDA] Fix qmm_naive K-tail dispatch for FP quantized kernels#3445

Open
Lyxot wants to merge 2 commits intoml-explore:mainfrom
Lyxot:cuda/qmm-naive-k-tail-fix
Open

[CUDA] Fix qmm_naive K-tail dispatch for FP quantized kernels#3445
Lyxot wants to merge 2 commits intoml-explore:mainfrom
Lyxot:cuda/qmm-naive-k-tail-fix

Conversation

@Lyxot
Copy link
Copy Markdown
Contributor

@Lyxot Lyxot commented Apr 23, 2026

Fixes: #3444

Root Cause

In qmm.cu, qmm_naive selected the HasKResidue specialization using:

bool has_k_residue = k % group_size != 0;

but the kernel tiles the K dimension using max(64, group_size).

For FP quantization modes:

  • mxfp4 / mxfp8 use group_size = 32
  • nvfp4 uses group_size = 16

So shapes like K=544 satisfy:

  • K % group_size == 0
  • K % 64 != 0

The old dispatch therefore selected the no-residue specialization even when the kernel still had a real K tail.

Fix

Change the residue check to match the kernel's actual K tiling

Copilot AI review requested due to automatic review settings April 23, 2026 13:09
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes a CUDA quantized-matmul (qmm_naive) dispatch bug where the wrong K-tail specialization was chosen for FP-quantized modes by aligning the residue check with the kernel’s actual K tiling (max(64, group_size)), addressing issue #3444.

Changes:

  • Compute tile_k = max(64, group_size) inside qmm_naive.
  • Change the HasKResidue dispatch condition from k % group_size to k % tile_k.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread mlx/backend/cuda/quantized/qmm/qmm.cu
Copy link
Copy Markdown
Collaborator

@zcbenz zcbenz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice fix, thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[BUG] qmm_naive picks the wrong K-tail path for some FP quantized shapes

3 participants