Add 1-bit affine quantization support by dusterbloom · Pull Request #3478 · ml-explore/mlx

dusterbloom · 2026-05-04T09:33:58Z

Summary

Adds support for bits=1 to affine_quantize / affine_dequantize /
quantized_matmul, enabling 1.25-bpw model serving in MLX (1 bit/weight +
fp16 scale + bias per group of 128 input columns).

The 3 commits (originally authored by Pasha Khosravi, cherry-picked from
research work) are:

SHA	Author	Description
`309f947`	Pasha Khosravi	Add 1-bit affine quantization support — relaxes `bits < 2` guard, adds bit-0/bit-1 → w_min/w_max codepath, ships Metal kernel `affine_dequantize_*_gs_128_b_1`, adds Python tests
`17edfc9`	Pasha Khosravi	Guard fast-path Metal kernel dispatch for 1-bit quantization (1-line fix)
`06ee46c`	Pasha Khosravi	Fix `qmv_fast` tail iteration for non-aligned K (26-line correctness fix)

Total: 11 files changed, +484 / −98 in the kernel commit, plus 27 lines of
tail-iteration / dispatch hardening.

Motivation: 1.25-bpw production models

Several models in the wild are now shipping in 1.25-bpw (1 bit/weight +
group-128 affine scale/bias) format:

prism-ml/Bonsai-1.7B-mlx-1bit (~260 MB residency)
prism-ml/Bonsai-8B-mlx-1bit (~1.25 GB residency)
prism-ml/Bonsai-4B-mlx-1bit

These checkpoints declare quantization.bits == 1 in config.json and
require the affine_dequantize_*_gs_128_b_1 Metal kernel that this PR adds.

Validated end-to-end

The kernels have been integrated into the higgs inference engine
(PR #142) and validated on
real Bonsai-8B inference:

Metric	Bonsai-8B (M4 base, 32GB)
Decode tps (median, 3 trials, max_tokens=200, T=0)	61.10 tok/s
Decode tps stdev	0.32 tok/s
TTFT	323 ms
Output coherence (greedy 3-word smoke)	"Hello. World. Friend." ✅

Runtime parity validated against the original feat/magic-canvas
research branch: matches within thermal noise on M4.

Test coverage

The first commit (309f947) ships:

python/tests/test_quantized.py — adds test_quantize_1bit covering
affine_quantize round-trip + quantized_matmul correctness for the
1-bit path (96 lines added)
python/tests/cuda_skip.py — explicit skip entry for the new test on CUDA

The two follow-up commits are correctness fixes guarded by the existing
mlx_quantize / mlx_qmm test suite.

Downstream chain

Once this PR merges, two follow-up PRs drop the fork dependency for downstream
Rust consumers:

ml-explore/mlx-c — bump submodule to a merged-mlx SHA, no signature
changes anticipated (the v0.6.0-3 C bindings already accept global_scale
as a nullable mlx_array)
oxideai/mlx-rs — 12-line plumbing fix in src/ops/quantization.rs
to pass null global_scale arrays through mlx_quantize /
mlx_dequantize / mlx_qqmm (matches the v0.6.0-3 C signature)

Acknowledgements

All three commits authored by Pasha Khosravi.
This PR is a packaging step to bring his research work into upstream MLX so
production model loaders can drop their fork chains.

Cherry-picks are clean against current main (the branch base sits on
upstream ce45c52 "[CUDA] Use qmv kernel for fp quantizations (#3239)").

🤖 PR prepared with Claude Code

zcbenz · 2026-05-05T00:46:29Z

Closing as a duplicate of #3161.

khosravipasha and others added 4 commits April 24, 2026 19:14

Add 1-bit affine quantization support

309f947

Guard fast-path Metal kernel dispatch for 1-bit quantization

17edfc9

fix qmv_fast tail iteration for non-aligned K

06ee46c

Merge branch 'main' into feat/bits-1-affine-quantization

1ce6b9d

zcbenz closed this May 5, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add 1-bit affine quantization support#3478

Add 1-bit affine quantization support#3478
dusterbloom wants to merge 4 commits intoml-explore:mainfrom
dusterbloom:feat/bits-1-affine-quantization

dusterbloom commented May 4, 2026

Uh oh!

zcbenz commented May 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

dusterbloom commented May 4, 2026

Summary

Motivation: 1.25-bpw production models

Validated end-to-end

Test coverage

Downstream chain

Acknowledgements

Uh oh!

zcbenz commented May 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants