[XPU] [Optimization] [EP] EP communication optimization. #5145

zccjjj · 2025-11-20T09:21:44Z

Motivation

Implement low-latency version communication operators for pure D requests, and high-throughput version communication operators for P requests in centralized inference scenarios.

Modifications

Usage or Command

export MOE_FFN_USE_DENSE_INPUT=1

Accuracy Tests

Checklist

Add at least a tag in the PR title.
- Tag list: [[FDConfig],[APIServer],[Engine], [Scheduler], [PD Disaggregation], [Executor], [Graph Optimization], [Speculative Decoding], [RL], [Models], [Quantization], [Loader], [OP], [KVCache], [DataProcessor], [BugFix], [Docs], [CI], [Optimization], [Feature], [Benchmark], [Others], [XPU], [HPU], [GCU], [DCU], [Iluvatar], [Metax]]
- You can add new tags based on the PR content, but the semantics must be clear.
Format your code, run pre-commit before commit.
Add unit tests. Please write the reason in this PR if no unit tests.
Provide accuracy results.
If the current PR is submitting to the release branch, make sure the PR has been submitted to the develop branch, then cherry-pick it to the release branch with the [Cherry-Pick] PR tag.

paddle-bot · 2025-11-20T09:21:59Z

Thanks for your contribution!

zhupengyang · 2025-11-20T09:33:43Z

fastdeploy/model_executor/layers/backends/xpu/moe/ep.py

        self.group = group
        self.num_local_experts = num_experts // ep_size
-        self.deepep_engine = None
+        self.deepep_engine = None  # deepep_engine只调用dispatch, combine


注释都用英文

zhupengyang · 2025-11-20T09:43:37Z

fastdeploy/worker/xpu_model_runner.py

+        if_only_decode = self.only_decode()
+        if (
+            self.fd_config.scheduler_config.splitwise_role == "mixed"
+        ):  # 集中式场景，phase默认初始化为prefill, 推理运行时不同类型的batch能够在此处实现phase切换
+            self.fd_config.model_config.moe_phase.phase = "decode" if if_only_decode else "prefill"
+


only_decoder=self.forward_meta.len_info_cpu[0]<=0

zhupengyang · 2025-11-20T09:46:08Z

fastdeploy/model_executor/layers/backends/xpu/moe/fused_moe.py

            permute_input,
            token_nums_per_expert,
-            valid_token_num,
+            max(1, valid_token_num),  # 确保空跑时也不为0


在算子moe_expert_ffn中支持valid_token_num=0的情况

fastdeploy/model_executor/layers/backends/xpu/moe/ep.py

codecov-commenter · 2025-11-21T04:41:51Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
⚠️ Please upload report for BASE (develop@6471dad). Learn more about missing BASE report.

Additional details and impacted files

@@            Coverage Diff             @@
##             develop    #5145   +/-   ##
==========================================
  Coverage           ?   57.86%           
==========================================
  Files              ?      317           
  Lines              ?    38315           
  Branches           ?     5727           
==========================================
  Hits               ?    22171           
  Misses             ?    14380           
  Partials           ?     1764

Flag	Coverage Δ
diff	`57.86% <ø> (?)`

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

zhupengyang reviewed Nov 20, 2025

View reviewed changes

zccjjj force-pushed the develop branch from 09ec64b to 524e8f0 Compare November 21, 2025 03:28

zccjjj force-pushed the develop branch from 524e8f0 to 887efe2 Compare November 21, 2025 06:18

zccjjj added 12 commits November 21, 2025 08:46

experiment 2: something wrong with dispatch op syn in apply_ep_decode

d46aad6

success experience

0704a69

delete debug message printer

17c9512

delete useless printer

d129305

check only_decode by exist_prefill

1ef1e97

support both distribute and mixed infer

a832e90

delete unused printer

f2ed7a4

revise note

2fbdd85

change some name of variable

fc48d17

CI

83d21a0

refactor DeepEPEngine

8ba8d08

change some methods'position

604c1d9

zccjjj force-pushed the develop branch from 887efe2 to 604c1d9 Compare November 21, 2025 08:46

zccjjj added 2 commits November 21, 2025 10:40

check whether decode only using shared memory and barrier

244b8fe

delete printer

4ee933c

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[XPU] [Optimization] [EP] EP communication optimization. #5145

[XPU] [Optimization] [EP] EP communication optimization. #5145

Uh oh!

zccjjj commented Nov 20, 2025 •

edited

Loading

Uh oh!

paddle-bot bot commented Nov 20, 2025

Uh oh!

zhupengyang Nov 20, 2025

Uh oh!

zhupengyang Nov 20, 2025

Uh oh!

zhupengyang Nov 20, 2025

Uh oh!

Uh oh!

codecov-commenter commented Nov 21, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

[XPU] [Optimization] [EP] EP communication optimization. #5145

Are you sure you want to change the base?

[XPU] [Optimization] [EP] EP communication optimization. #5145

Uh oh!

Conversation

zccjjj commented Nov 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Modifications

Usage or Command

Accuracy Tests

Checklist

Uh oh!

paddle-bot bot commented Nov 20, 2025

Uh oh!

zhupengyang Nov 20, 2025

Choose a reason for hiding this comment

Uh oh!

zhupengyang Nov 20, 2025

Choose a reason for hiding this comment

Uh oh!

zhupengyang Nov 20, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

codecov-commenter commented Nov 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

zccjjj commented Nov 20, 2025 •

edited

Loading

codecov-commenter commented Nov 21, 2025 •

edited

Loading