【0.11.0-dev】optimization of kimi-k2 in cann8.3 #4555

Levi-JQ · 2025-11-28T14:24:32Z

What this PR does / why we need it?

In cann8.3， npu_moe_gating_top_k operator can support expert nums with 384, so kimi can use the operator to get better preformance.

Does this PR introduce any user-facing change?

How was this patch tested?

github-actions · 2025-11-28T14:24:40Z

👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:‌‌

A PR should do only one thing, smaller PRs enable faster reviews.
Every PR should include unit tests and end-to-end tests ‌to ensure it works and is not broken by other future PRs.
Write the commit message by fulfilling the PR description to help reviewer and future developers understand.

If CI fails, you can run linting and testing checks locally according Contributing and Testing.

gemini-code-assist

Code Review

This pull request introduces an optimization for the 'kimi' model on CANN 8.3 by enabling a fused MoE gating kernel. The changes are applied across several files related to MoE, including different quantization paths. My review includes a critical comment about an inconsistency in one of the quantization files that could lead to incorrect behavior, and a high-severity comment about code duplication that impacts maintainability. Addressing these points will improve the robustness and clarity of the implementation.

gemini-code-assist · 2025-11-28T14:26:26Z

vllm_ascend/torchair/quantization/torchair_w4a8_dynamic.py

+        if global_num_experts == 256 or (global_num_experts == 384 and
+                                         torch.version.cann.startswith("8.3")):


There's an inconsistency in how the model type is determined here. This file checks global_num_experts directly, while other files in this PR (e.g., torchair_w8a8_dynamic.py) check against the effective number of experts (global_num_experts - global_redundant_expert_num). This could lead to incorrect kernel selection if global_redundant_expert_num is non-zero, which would be a bug. The logic should be consistent across all files. The original logic if global_num_experts == 256: was also likely incorrect for the same reason.

Suggested change

if global_num_experts == 256 or (global_num_experts == 384 and

torch.version.cann.startswith("8.3")):

if (global_num_experts - global_redundant_expert_num == 256) or \

((global_num_experts - global_redundant_expert_num == 384) and torch.version.cann.startswith("8.3")):

gemini-code-assist · 2025-11-28T14:26:26Z

vllm_ascend/ops/moe/experts_selector.py

    is_deepseek_v3_r1 = global_num_experts - global_redundant_expert_num == 256
-    if is_deepseek_v3_r1:
+    is_kimi = global_num_experts - global_redundant_expert_num == 384
+    # NOTE: now npu_moe_gating_top_k can support `group_count=256` pattern, and `group_count=384` pattern in cann8.3
+    if is_deepseek_v3_r1 or (is_kimi and torch.version.cann.startswith("8.3")):


The logic to identify deepseek_v3_r1 and kimi models using magic numbers (256, 384), and the check for CANN version 8.3, is duplicated across multiple files (experts_selector.py, torchair_fused_moe.py, torchair_w8a8_dynamic.py, and torchair_w4a8_dynamic.py). This makes the code harder to maintain and increases the risk of inconsistencies when adding support for new models or CANN versions. Consider centralizing this logic into a helper function or a configuration object for better maintainability and readability.

Levi-JQ · 2025-12-01T09:11:09Z

vllm_ascend/ops/moe/experts_selector.py

-    if is_deepseek_v3_r1:
+    is_kimi = global_num_experts - global_redundant_expert_num == 384
+    # NOTE: now npu_moe_gating_top_k can support `group_count=256` pattern, and `group_count=384` pattern in cann8.3
+    if is_deepseek_v3_r1 or (is_kimi and torch.version.cann.startswith("8.3")):


不用判断专家数；直接删除判断

直接修改torchair部分，[experts_selector.py]由PR4352修改

Signed-off-by: Levi-JQ <yujinqi2@huawei.com>

wangxiyuan

I think it's done by c4a11a7 ?

Signed-off-by: Levi-JQ <yujinqi2@huawei.com>

Levi-JQ · 2025-12-08T12:32:07Z

I think it's done by c4a11a7 ?

This PR is for torchair

github-actions bot added the module:ops label Nov 28, 2025

gemini-code-assist bot reviewed Nov 28, 2025

View reviewed changes

Levi-JQ commented Dec 1, 2025

View reviewed changes

Levi-JQ force-pushed the 0.11.0-dev-kimi-opt branch from 820697b to 492d78c Compare December 1, 2025 15:05

github-actions bot removed the module:ops label Dec 1, 2025

Levi-JQ force-pushed the 0.11.0-dev-kimi-opt branch 2 times, most recently from f0edcbd to b78b7a9 Compare December 2, 2025 06:13

Levi-JQ added 2 commits December 3, 2025 20:39

optimization of kimi-k2

db9307a

Signed-off-by: Levi-JQ <yujinqi2@huawei.com>

fix ut

3c32524

Signed-off-by: Levi-JQ <yujinqi2@huawei.com>

Levi-JQ force-pushed the 0.11.0-dev-kimi-opt branch from b78b7a9 to 3c32524 Compare December 3, 2025 12:39

github-actions bot added the module:tests label Dec 3, 2025

Levi-JQ force-pushed the 0.11.0-dev-kimi-opt branch 4 times, most recently from 9271e19 to 5e779f1 Compare December 4, 2025 10:14

fix ut no feature

9facab7

Signed-off-by: Levi-JQ <yujinqi2@huawei.com>

Levi-JQ force-pushed the 0.11.0-dev-kimi-opt branch from 5e779f1 to 9facab7 Compare December 4, 2025 11:21

wangxiyuan reviewed Dec 8, 2025

View reviewed changes

wangxiyuan added the hold-on The PR should be hold-on but no need to release label Dec 8, 2025

fix ut

d2013eb

Signed-off-by: Levi-JQ <yujinqi2@huawei.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

【0.11.0-dev】optimization of kimi-k2 in cann8.3 #4555

【0.11.0-dev】optimization of kimi-k2 in cann8.3 #4555

Levi-JQ commented Nov 28, 2025 •

edited

Loading

Uh oh!

github-actions bot commented Nov 28, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Nov 28, 2025

Uh oh!

gemini-code-assist bot Nov 28, 2025

Uh oh!

Levi-JQ Dec 1, 2025

Uh oh!

Levi-JQ Dec 1, 2025

Uh oh!

wangxiyuan left a comment

Uh oh!

Levi-JQ commented Dec 8, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		if global_num_experts == 256 or (global_num_experts == 384 and
		torch.version.cann.startswith("8.3")):

【0.11.0-dev】optimization of kimi-k2 in cann8.3 #4555

Are you sure you want to change the base?

【0.11.0-dev】optimization of kimi-k2 in cann8.3 #4555

Conversation

Levi-JQ commented Nov 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What this PR does / why we need it?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

github-actions bot commented Nov 28, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Nov 28, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Nov 28, 2025

Choose a reason for hiding this comment

Uh oh!

Levi-JQ Dec 1, 2025

Choose a reason for hiding this comment

Uh oh!

Levi-JQ Dec 1, 2025

Choose a reason for hiding this comment

Uh oh!

wangxiyuan left a comment

Choose a reason for hiding this comment

Uh oh!

Levi-JQ commented Dec 8, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Levi-JQ commented Nov 28, 2025 •

edited

Loading