[Bugfix] Fix persistent_masked_m_silu_mul_quant tests #28366

varun-sundar-rabindranath · 2025-11-09T19:03:42Z

Purpose

The persistent_masked_m_silu_mul_quant tests fail on main on H100/B200. This is because on H100/B200 the kernel performs ue8m0 scale ceiling, but the reference implementation does not do that. To fix this, pass an argument to the kernel to specify if it should do ue8m0 scale ceiling and only test the non ue8m0 case.
The kernel produces incorrect results for test input (8, 128, 128 * 33, fp8_dtype). This is because when hidden size is >= 4096 we launch a kernel with 8 warps and the kernel requires that the NUM_GROUPS (hidden_size / 128) divides evenly between all warps. For this case, the PR just falls back to the 1 warp case.

Test Plan

Run test_silu_mul_fp8_quant_deep_gemm.py locally

Test Result

Test passes

Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com>

gemini-code-assist

Code Review

This pull request introduces two important bug fixes for the persistent_masked_m_silu_mul_quant kernel. The first fix correctly handles cases where the number of groups is not divisible by the number of warps by falling back to a 1-warp configuration. The second fix adds a parameter to control ue8m0 scale ceiling to align the kernel's behavior with the reference implementation in tests. The changes are logical and well-targeted. I've found one critical issue in an updated validation check that needs to be addressed.

csrc/quantization/activation_kernels.cu

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

csrc/quantization/activation_kernels.cu

Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com>

…8366) Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com> Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com> Signed-off-by: xuebwang-amd <xuebwang@amd.com>

…8366) Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com> Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com>

fallback to 1-warp config in edge case

7a340e7

Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com>

varun-sundar-rabindranath requested review from WoosukKwon, mgoin, pavanimajety, tlrmchlsmth and yewentao256 as code owners November 9, 2025 19:03

varun-sundar-rabindranath mentioned this pull request Nov 9, 2025

[Performance][B200] silu_mul_quant: pack scales in int32 #28358

Merged

gemini-code-assist bot reviewed Nov 9, 2025

View reviewed changes

csrc/quantization/activation_kernels.cu Outdated Show resolved Hide resolved

chatgpt-codex-connector bot reviewed Nov 9, 2025

View reviewed changes

csrc/quantization/activation_kernels.cu Outdated Show resolved Hide resolved

mgoin approved these changes Nov 9, 2025

View reviewed changes

mgoin added bug Something isn't working ready ONLY add when PR is ready to merge/full CI is needed labels Nov 9, 2025

fixers

e7c5900

Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com>

robertgshaw2-redhat enabled auto-merge (squash) November 10, 2025 14:33

vllm-bot merged commit b039bfd into vllm-project:main Nov 10, 2025
90 of 92 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Bugfix] Fix persistent_masked_m_silu_mul_quant tests #28366

[Bugfix] Fix persistent_masked_m_silu_mul_quant tests #28366

Uh oh!

varun-sundar-rabindranath commented Nov 9, 2025 •

edited by github-actions bot

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

chatgpt-codex-connector bot left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

[Bugfix] Fix persistent_masked_m_silu_mul_quant tests #28366

[Bugfix] Fix persistent_masked_m_silu_mul_quant tests #28366

Uh oh!

Conversation

varun-sundar-rabindranath commented Nov 9, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

varun-sundar-rabindranath commented Nov 9, 2025 •

edited by github-actions bot

Loading