[ROCm] Add missing gemm_a8w8_blockscale import #28378

sarckk · 2025-11-10T05:57:05Z

Purpose

BugFix: #27058 seems to have missed import for aiter.gemm_a8w8_blockscale

BugFix: #24490 changed it so that we always do quant op. Fix to do quant op only if the input_scale is None:

vllm/vllm/model_executor/layers/quantization/utils/fp8_utils.py

Lines 319 to 350 in d0e186c

    
           if input_scale is not None: 
        
               q_input = input_2d 
        
           # MI350 case uses triton kernel 
        
           if ( 
        
               not current_platform.is_fp8_fnuz() 
        
               and rocm_aiter_ops.is_triton_gemm_w8a8_tuned(n, k) 
        
           ): 
        
               q_input, input_scale = per_token_group_quant_fp8( 
        
                   input_2d, 
        
                   self.act_quant_group_shape.col, 
        
                   column_major_scales=False, 
        
                   use_ue8m0=False, 
        
               ) 
        
               return rocm_aiter_ops.triton_gemm_a8w8_blockscale( 
        
                   q_input, 
        
                   weight, 
        
                   input_scale, 
        
                   weight_scale, 
        
                   input_2d.dtype, 
        
               ) 
        
           # MI300 uses tuned AITER ASM/C++ kernel 
        
           else: 
        
               q_input, input_scale = rocm_aiter_ops.per_1x128_fp8_quant(input_2d) 
        
               return rocm_aiter_ops.gemm_w8a8_blockscale( 
        
                   q_input, 
        
                   weight, 
        
                   input_scale, 
        
                   weight_scale, 
        
                   input_2d.dtype, 
        
               )

BugFix: correct invocation of aiter ops

Test Plan

Test Result

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

gemini-code-assist

Code Review

The pull request correctly addresses a missing import for gemm_a8w8_blockscale within a conditional block, preventing a NameError in the rocm_aiter_gemm_w8a8_blockscale_impl function. However, the current implementation strategy for gemm_a8w8_blockscale introduces an ambiguity, as the same symbol is dynamically bound to different functions (from aiter and aiter.ops.triton.gemm_a8w8_blockscale) depending on the execution path. While Python's scoping rules allow this, it can lead to confusion and potential subtle bugs if the functions are not perfectly interchangeable. A clearer approach would involve using distinct aliases for these imports to explicitly differentiate the intended function in each branch.

vllm/model_executor/layers/quantization/utils/fp8_utils.py

wuhuikx · 2025-11-10T07:17:58Z

@HaiShaw @tjtanaa could you please help review?

yewentao256

Thanks for the work!

vllm/model_executor/layers/quantization/utils/fp8_utils.py

mergify · 2025-11-10T16:27:06Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @sarckk.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

Signed-off-by: Yong Hoon Shin <yhshin@meta.com>

tjtanaa

@sarckk A better approach would be to move all the gemm_a8w8_blockscale imports to a separate if else condition. This will handle both MI300x and MI355x

+    use_triton = not current_platform.is_fp8_fnuz() and is_aiter_triton_kernel_tuned(n, k)
    
+    if use_triton:
+        from aiter.ops.triton.gemm_a8w8_blockscale import gemm_a8w8_blockscale
+    else:
+        from aiter import gemm_a8w8_blockscale
        
    if input_scale is not None:
        q_input = input_2d
    elif not current_platform.is_fp8_fnuz() and is_aiter_triton_kernel_tuned(n, k):
-        from aiter.ops.triton.gemm_a8w8_blockscale import gemm_a8w8_blockscale

        # MI350 case uses triton kernel
        q_input, input_scale = per_token_group_quant_fp8(
            input_2d,
            group_size,
            column_major_scales=False,
            use_ue8m0=False,
        )
    else:
        # MI300 uses tuned AITER ASM/C++ kernel
        import aiter as rocm_aiter
-        from aiter import gemm_a8w8_blockscale, get_hip_quant
+      from aiter import get_hip_quant
        aiter_per1x128_quant = get_hip_quant(rocm_aiter.QuantType.per_1x128)
        q_input, input_scale = aiter_per1x128_quant(
            input_2d.contiguous(), quant_dtype=rocm_aiter.dtypes.fp8
        )

Signed-off-by: Yong Hoon Shin <yhshin@meta.com>

sarckk · 2025-11-10T16:57:00Z

@tjtanaa addressed your comments. although it seems that the real issue was that the use_triton branch should not have been in an elif? looks like your #24490 changed this so we no longer need this PR, I think

vllm/vllm/model_executor/layers/quantization/utils/fp8_utils.py

Lines 319 to 350 in d0e186c

    
           if input_scale is not None: 
        
               q_input = input_2d 
        
           # MI350 case uses triton kernel 
        
           if ( 
        
               not current_platform.is_fp8_fnuz() 
        
               and rocm_aiter_ops.is_triton_gemm_w8a8_tuned(n, k) 
        
           ): 
        
               q_input, input_scale = per_token_group_quant_fp8( 
        
                   input_2d, 
        
                   self.act_quant_group_shape.col, 
        
                   column_major_scales=False, 
        
                   use_ue8m0=False, 
        
               ) 
        
               return rocm_aiter_ops.triton_gemm_a8w8_blockscale( 
        
                   q_input, 
        
                   weight, 
        
                   input_scale, 
        
                   weight_scale, 
        
                   input_2d.dtype, 
        
               ) 
        
           # MI300 uses tuned AITER ASM/C++ kernel 
        
           else: 
        
               q_input, input_scale = rocm_aiter_ops.per_1x128_fp8_quant(input_2d) 
        
               return rocm_aiter_ops.gemm_w8a8_blockscale( 
        
                   q_input, 
        
                   weight, 
        
                   input_scale, 
        
                   weight_scale, 
        
                   input_2d.dtype, 
        
               )

hongxiayang · 2025-11-10T16:59:45Z

vllm/model_executor/layers/quantization/utils/fp8_utils.py

+        if use_triton:
+            gemm_w8a8_blockscale_op = rocm_aiter_ops.triton_gemm_a8w8_blockscale
+        else:
+            gemm_w8a8_blockscale_op = rocm_aiter_ops.gemm_a8w8_blockscale


any reason the gemm_a8w8_blockscale_bpreshuffle is not used?

@hongxiayang the gemm_a8w8_blockscaled_bpreshuffle is in another PR. Moreover, the AITER commit in the Dockerfile.rocm_base does not have that kernel, so it is not yet merged.

vllm/model_executor/layers/quantization/utils/fp8_utils.py

Signed-off-by: Yong Hoon Shin <yhshin@meta.com>

vllm/model_executor/layers/quantization/utils/fp8_utils.py

Signed-off-by: Yong Hoon Shin <yhshin@meta.com>

vllm/model_executor/layers/quantization/utils/fp8_utils.py

Signed-off-by: Yong Hoon Shin <yhshin@meta.com>

tjtanaa

LGTM. Thank you for your work @sarckk

Signed-off-by: Yong Hoon Shin <yhshin@meta.com> Signed-off-by: xuebwang-amd <xuebwang@amd.com>

Signed-off-by: Yong Hoon Shin <yhshin@meta.com>

sarckk requested review from mgoin, pavanimajety, robertgshaw2-redhat, tlrmchlsmth and yewentao256 as code owners November 10, 2025 05:57

mergify bot added the rocm Related to AMD ROCm label Nov 10, 2025

gemini-code-assist bot reviewed Nov 10, 2025

View reviewed changes

vllm/model_executor/layers/quantization/utils/fp8_utils.py Outdated Show resolved Hide resolved

houseroad approved these changes Nov 10, 2025

View reviewed changes

houseroad added ready-for-merge Indicate this PR is ready to be merged by the maintainers, used by reviewers without merge access. ready ONLY add when PR is ready to merge/full CI is needed labels Nov 10, 2025

yewentao256 reviewed Nov 10, 2025

View reviewed changes

vllm/model_executor/layers/quantization/utils/fp8_utils.py Outdated Show resolved Hide resolved

mergify bot added the needs-rebase label Nov 10, 2025

Add missing gemm_a8w8_blockscale import

e667127

Signed-off-by: Yong Hoon Shin <yhshin@meta.com>

sarckk force-pushed the fix-missing-aiter-import branch from 059e08f to e667127 Compare November 10, 2025 16:44

mergify bot removed the needs-rebase label Nov 10, 2025

tjtanaa requested changes Nov 10, 2025

View reviewed changes

Refactor

5dc482d

Signed-off-by: Yong Hoon Shin <yhshin@meta.com>

hongxiayang reviewed Nov 10, 2025

View reviewed changes

tjtanaa requested changes Nov 10, 2025

View reviewed changes

vllm/model_executor/layers/quantization/utils/fp8_utils.py Outdated Show resolved Hide resolved

Fix to only call quant op if input scale is None

aaf716c

Signed-off-by: Yong Hoon Shin <yhshin@meta.com>

tjtanaa reviewed Nov 10, 2025

View reviewed changes

vllm/model_executor/layers/quantization/utils/fp8_utils.py Outdated Show resolved Hide resolved

tjtanaa reviewed Nov 10, 2025

View reviewed changes

vllm/model_executor/layers/quantization/utils/fp8_utils.py Outdated Show resolved Hide resolved

sarckk added 2 commits November 10, 2025 10:29

Fix aiter ops

075f66f

Signed-off-by: Yong Hoon Shin <yhshin@meta.com>

Name change

8f9b1d8

Signed-off-by: Yong Hoon Shin <yhshin@meta.com>

tjtanaa reviewed Nov 10, 2025

View reviewed changes

vllm/model_executor/layers/quantization/utils/fp8_utils.py Show resolved Hide resolved

Address comments

a1c980e

Signed-off-by: Yong Hoon Shin <yhshin@meta.com>

tjtanaa approved these changes Nov 10, 2025

View reviewed changes

tjtanaa enabled auto-merge (squash) November 10, 2025 21:35

tjtanaa merged commit 0211435 into vllm-project:main Nov 10, 2025
54 checks passed

sarckk deleted the fix-missing-aiter-import branch November 10, 2025 23:14

xuebwang-amd pushed a commit to xuebwang-amd/vllm that referenced this pull request Nov 13, 2025

[ROCm] Add missing gemm_a8w8_blockscale import (vllm-project#28378)

f908b40

Signed-off-by: Yong Hoon Shin <yhshin@meta.com> Signed-off-by: xuebwang-amd <xuebwang@amd.com>

devpatelio pushed a commit to SumanthRH/vllm that referenced this pull request Nov 29, 2025

[ROCm] Add missing gemm_a8w8_blockscale import (vllm-project#28378)

f8b9427

Signed-off-by: Yong Hoon Shin <yhshin@meta.com>

	if input_scale is not None:
	q_input = input_2d

	# MI350 case uses triton kernel
	if (
	not current_platform.is_fp8_fnuz()
	and rocm_aiter_ops.is_triton_gemm_w8a8_tuned(n, k)
	):
	q_input, input_scale = per_token_group_quant_fp8(
	input_2d,
	self.act_quant_group_shape.col,
	column_major_scales=False,
	use_ue8m0=False,
	)
	return rocm_aiter_ops.triton_gemm_a8w8_blockscale(
	q_input,
	weight,
	input_scale,
	weight_scale,
	input_2d.dtype,
	)

	# MI300 uses tuned AITER ASM/C++ kernel
	else:
	q_input, input_scale = rocm_aiter_ops.per_1x128_fp8_quant(input_2d)
	return rocm_aiter_ops.gemm_w8a8_blockscale(
	q_input,
	weight,
	input_scale,
	weight_scale,
	input_2d.dtype,
	)

Uh oh!

[ROCm] Add missing gemm_a8w8_blockscale import #28378

[ROCm] Add missing gemm_a8w8_blockscale import #28378

Uh oh!

Conversation

sarckk commented Nov 10, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

wuhuikx commented Nov 10, 2025

Uh oh!

yewentao256 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

mergify bot commented Nov 10, 2025

Uh oh!

tjtanaa left a comment

Choose a reason for hiding this comment

Uh oh!

sarckk commented Nov 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

hongxiayang Nov 10, 2025

Choose a reason for hiding this comment

Uh oh!

tjtanaa Nov 10, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

tjtanaa left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

sarckk commented Nov 10, 2025 •

edited by github-actions bot

Loading

sarckk commented Nov 10, 2025 •

edited

Loading