Skip to content

[TRITON] Add MXFP4 quantization support to triton unified attention kernel#2012

Draft
amd-xiaoyu12 wants to merge 2 commits intoROCm:mainfrom
amd-xiaoyu12:xiaoyu/unified_attention_mxfp4
Draft

[TRITON] Add MXFP4 quantization support to triton unified attention kernel#2012
amd-xiaoyu12 wants to merge 2 commits intoROCm:mainfrom
amd-xiaoyu12:xiaoyu/unified_attention_mxfp4

Conversation

@amd-xiaoyu12
Copy link

Motivation

add MXFP4 quantization support to unified attention kernel

- Add Q-MXFP4, QK-MXFP4, and PV-MXFP4 quantization modes (0-3)
- Automatic fallback for incompatible HEAD_SIZE_PADDED
- Compatibility check: requires HEAD_SIZE_PADDED >= 32 and divisible by 32
- Support smoothed quantization with mean subtraction for better accuracy
- Add comprehensive tests

Files modified:
- aiter/ops/triton/_triton_kernels/attention/unified_attention.py (+184 lines)
- aiter/ops/triton/attention/unified_attention.py (+24 lines)

Files added:
- op_tests/triton_tests/attention/test_unified_attention_mxfp4.py (comprehensive test suite)
- op_tests/op_benchmarks/triton/bench_mxfp4_attention.py" (usage example with benchmarks)

MXFP4 modes:
- Mode 0: Original (no quantization, baseline)
- Mode 1: Native MXFP4 QK
- Mode 2: Smoothed MXFP4 QK (recommended)
- Mode 3+: Smoothed MXFP4 QK + PV

Usage: Set MXFP4_OPTION env var or pass use_native_fp4 parameter

Test Plan

Evaluated with Llama3 8B, QWen 3 32B and QWen 3 think 30B using gsk8k

Test Result

image

Submission Checklist

@amd-xiaoyu12 amd-xiaoyu12 requested a review from azaidy February 11, 2026 18:22
@azaidy azaidy requested a review from cagrikymk February 11, 2026 19:12
@cagrikymk cagrikymk changed the title Add MXFP4 quantization support to triton unified attention kernel [TRITON] Add MXFP4 quantization support to triton unified attention kernel Feb 11, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant