[Engine] Revert TTFT optimize (#6680) and add EP batched token scheduler by zoooo0820 · Pull Request #7791 · PaddlePaddle/FastDeploy

zoooo0820 · 2026-05-12T08:53:00Z

Motivation

通过实际测试，#6680 导致k3_kl_mean 相比branch release2.6上涨50倍；由于相关部分已经被多次修改，代码已经无法自动通过git revert

Modifications

Revert 结果

无冲突文件（8个，git 自动处理）

docs/usage/environment_variables.md — 恢复了 FD_ENGINE_FORWARD_SIGNAL 环境变量文档
docs/zh/usage/environment_variables.md — 同上（中文版）
fastdeploy/envs.py — 恢复了 FD_ENGINE_FORWARD_SIGNAL 环境变量定义
fastdeploy/scheduler/dp_scheduler.py — 恢复了原始调度器逻辑
fastdeploy/splitwise/internal_adapter_utils.py — 恢复了原始逻辑
tests/ci_use/metrics/test_metrics.py — 恢复了测试
tests/engine/test_common_engine.py — 恢复了测试
tests/scheduler/test_dp_scheduler.py — 恢复了测试
tests/splitwise/test_internal_adapter_utils.py — 恢复了测试

有冲突文件（2个，手动解决）

fastdeploy/engine/common_engine.py（1处冲突）：
移除 engine_forward_signal IPCSignal
恢复调度器循环原始逻辑
移除 EP 空 task 批处理 else 分支
使用 batch_request 变量名（后续 commit 的重命名，非 PR #6680 引入）
fastdeploy/worker/worker_process.py（3处冲突）：

移除 engine_forward_signal 信号初始化（同时未恢复已被 PR #7299 删除的 gpu_cache_lock）
移除 engine_forward_signal.value[0] = 1 赋值
移除 EP barrier（paddle.distributed.barrier(ep_group)）
移除 EP idle else 分支
保留了后续 commit 的改进：_get_exist_task_flag()、BatchRequest.from_tasks()、batch_request

Usage or Command

Accuracy Tests

Checklist

Add at least a tag in the PR title.
- Tag list: [[FDConfig],[APIServer],[Engine], [Scheduler], [PD Disaggregation], [Executor], [Graph Optimization], [Speculative Decoding], [RL], [Models], [Quantization], [Loader], [OP], [KVCache], [DataProcessor], [BugFix], [Docs], [CI], [Optimization], [Feature], [Benchmark], [Others], [XPU], [HPU], [GCU], [DCU], [Iluvatar], [Metax]]
- You can add new tags based on the PR content, but the semantics must be clear.
Format your code, run pre-commit before commit.
Add unit tests. Please write the reason in this PR if no unit tests.
Provide accuracy results.
If the current PR is submitting to the release branch, make sure the PR has been submitted to the develop branch, then cherry-pick it to the release branch with the [Cherry-Pick] PR tag.

paddle-bot · 2026-05-12T08:53:10Z

Thanks for your contribution!

PaddlePaddle-bot · 2026-05-12T09:50:52Z

🤖 Paddle-CI-Agent | ci_status_monitor | 2026-05-12 17:48:07

CI报告基于以下代码生成（30分钟更新一次）:

PR commit: 0bad507
Merge base: 3faf93e (branch: develop)
查看完整 Diff
CI 详情

1 任务总览

CI 有 1 个 Required 任务失败，另有 1 个运行中、4 个等待中，合并被阻塞，请优先处理 Required 失败任务。

总执行（rerun次数）	总任务	✅ 通过	❌ 失败	⏳ 运行中	⏸️ 等待中	跳过
28(0)	28	18	3	2	5	0

2 任务状态汇总

2.1 Required任务 : 2/8 通过

必选任务阻塞合并，失败需优先处理。

状态	任务	耗时	根因	修复建议	日志	重跑
❌	`Approval`	8s	PR问题：修改受限文件需2组特定RD审批	请 jiangjiajun等/zhouchong等在PR审批	Job	-
⏳	`Run Base Tests / base_tests`	-	运行中	-	Job	-
⏸️	`Extracted partial CE model tasks to run in CI. / run_ce_cases`	-	等待中	-	-	-
⏸️	`Run FastDeploy Unit Tests and Coverage / run_tests_with_coverage`	-	等待中	-	-	-
⏸️	`Run Four Cards Tests / run_4_cards_tests`	-	等待中	-	-	-
⏸️	`Run Stable Tests / stable_tests`	-	等待中	-	-	-
✅	其余 2 个必选任务通过	-	-	-	-	-

2.2 可选任务 — 16/20 通过

可选任务不阻塞合并，失败仅供参考。

状态	任务	耗时	日志	重跑
❌	`Run iluvatar Tests / run_iluvatar_cases`	7m10s	Job	-
❌	`Check PR Template`	14s	Job	-
⏳	`Trigger Jenkins for PR`	-	Job	-
⏸️	`CI_HPU`	-	-	-
✅	其余 16 个可选任务通过	-	-	-

3 失败详情（仅 required）

Approval — 流程审批（置信度: 高）

Approval

状态: ❌ 失败
错误类型: 流程审批
置信度: 高
根因摘要: PR修改受限文件及日志行为，需2组特定RD审批
分析器: 通用分析(fallback)

根因详情:
检查脚本 scripts/check_approval.sh 发现 PR 存在 2 项未满足的审批要求：① 修改了 fastdeploy/envs.py，需要 @jiangjiajun / @liuyuanle / @chenjian26 / @wanglongzhi 中至少一人审批；② 新增了日志调用（.info / .debug），需要 @zhouchong / @zhangyongyue 中至少一人审批。两项审批均未满足，脚本以 exit code 6 退出。

关键日志:

0. You must have one FastDeploy RD (Jiang-Jia-Jun(jiangjiajun), yuanlehome(liuyuanle), rainyfly(chenjian26), Wanglongzhi2001(wanglongzhi)) approval for modifying [fastdeploy/envs.py].
1. You must have one FastDeploy RD (xyxinyang(zhouchong), zyyzghb(zhangyongyue)) approval for modifying logging behavior (.info/.debug/.error/log_request).
There are 2 approved errors.

修复建议:

请 @jiangjiajun、@liuyuanle、@chenjian26 或 @wanglongzhi 对 fastdeploy/envs.py 的修改进行审批
请 @zhouchong 或 @zhangyongyue 对日志行为修改（新增 .info/.debug 调用）进行审批

修复建议摘要: 请相关RD（jiangjiajun等/zhouchong等）在PR上审批

链接: 查看日志

PaddlePaddle-bot

🤖 Paddle-CI-Agent | pr_review | 2026-05-12 18:08:51

📋 Review 摘要

PR 概述：回滚 EP 模式 TTFT 优化（删除 engine_forward_signal 机制），并在 dp_scheduler 中引入基于超时的批量请求收集策略
变更范围：engine/common_engine.py、scheduler/dp_scheduler.py、worker/worker_process.py、splitwise/
影响面 Tag：[Engine] [Scheduler] [PD Disaggregation]

📝 PR 规范检查

标题 [for test] revert ttft optimize 中 [for test] 不属于 checklist §D1 官方 Tag 列表；PR 描述各必填段落均为空（仅保留模板注释）。

标题建议（可直接复制）：

[Engine] Revert TTFT optimize and add EP batched token scheduler

PR 描述建议（可直接复制）：

## Motivation
回滚 TTFT（Time to First Token）优化。该优化通过 `engine_forward_signal` IPC 信号在 EP（Expert Parallel）模式下同步调度器与 Worker 前向执行状态。本次回滚删除该信号机制，并在 `dp_scheduler` 中引入基于 `FD_EP_BATCHED_TOKEN_TIMEOUT` 超时的批量请求收集策略作为替代方案。

## Modifications
- `fastdeploy/engine/common_engine.py`：删除 `engine_forward_signal` IPCSignal；重构调度循环，先检查 `exist_tasks()` 再提交请求拉取；移除 EP 模式下发送空任务批到 Worker 的逻辑；`RuntimeError` 处理仅保留在 mixed 分支
- `fastdeploy/scheduler/dp_scheduler.py`：新增资源前置检查；将单请求拉取改为带超时的批量请求收集循环，引入 `FD_EP_BATCHED_TOKEN_TIMEOUT` 环境变量（默认 0.1s）
- `fastdeploy/envs.py` / `docs/`：新增 `FD_EP_BATCHED_TOKEN_TIMEOUT` 环境变量
- `fastdeploy/splitwise/internal_adapter_utils.py`：移除 `ENABLE_V1_KVCACHE_SCHEDULER` 条件分支，简化 `unhandled_request_num` 计算
- `fastdeploy/worker/worker_process.py`：删除 `engine_forward_signal`；将 `_run_eplb` 移至 event loop 顶部；删除 EP prefill barrier 及空任务 assert 断言
- `tests/`：同步更新 `test_common_engine.py`、`test_dp_scheduler.py`、`test_internal_adapter_utils.py`、`test_metrics.py`

## Usage or Command
N/A

## Accuracy Tests
N/A

## Checklist

- [x] Add at least a tag in the PR title.
  - Tag list: [`[FDConfig]`,`[APIServer]`,`[Engine]`, `[Scheduler]`, `[PD Disaggregation]`, `[Executor]`, `[Graph Optimization]`, `[Speculative Decoding]`, `[RL]`, `[Models]`, `[Quantization]`, `[Loader]`, `[OP]`, `[KVCache]`, `[DataProcessor]`, `[BugFix]`, `[Docs]`, `[CI]`, `[Optimization]`, `[Feature]`, `[Benchmark]`, `[Others]`, `[XPU]`, `[HPU]`, `[GCU]`, `[DCU]`, `[Iluvatar]`, `[Metax]`]
  - You can add new tags based on the PR content, but the semantics must be clear.
- [ ] Format your code, run `pre-commit` before commit.
- [x] Add unit tests. Please write the reason in this PR if no unit tests.
- [ ] Provide accuracy results.
- [ ] If the current PR is submitting to the `release` branch, make sure the PR has been submitted to the `develop` branch, then cherry-pick it to the `release` branch with the `[Cherry-Pick]` PR tag.

问题

级别	文件	概述
🔴 Bug	`fastdeploy/engine/common_engine.py:1091`	非 mixed 模式下 `get_request_pool.submit()` 缺少 `RuntimeError` 捕获，线程池关闭时调度线程会崩溃
🔴 Bug	`fastdeploy/scheduler/dp_scheduler.py:161`	`current_prefill_tokens` / `required_total_blocks` 在请求被拒绝前已累加，外层 while 的 token 预算检查可能被误触发，导致后续可调度的小请求被跳过
❓ 疑问	`fastdeploy/scheduler/dp_scheduler.py:183`	日志信息与实际语义相反：`len(requests)==0` 说明无任何请求入队，但日志显示"has put all just-pulled request into the queue"

总体评价

此 PR 回滚了 EP 模式的 TTFT 优化并引入新的批量调度策略，核心逻辑变动较大。dp_scheduler.py 的计数器提前累加是正确性 Bug，可能导致部分有效请求被漏调度；common_engine.py 非 mixed 路径遗漏异常处理存在稳定性风险。建议修复两处 P1 问题后再合入。

PaddlePaddle-bot · 2026-05-12T10:12:20Z

+                if self.cfg.scheduler_config.splitwise_role != "mixed":
+                    if not is_fetching:
                        is_fetching = True
                        get_request_pool.submit(_fetch_request)


🔴 Bug 非 mixed 模式下 get_request_pool.submit(_fetch_request) 缺少 RuntimeError 捕获。

mixed 模式的 else 分支保留了完整的 try/except 处理（线程池关闭时优雅退出），但非 mixed 路径没有。当引擎关闭时，ThreadPoolExecutor 会抛出 RuntimeError: cannot schedule new futures after shutdown，将直接导致调度线程崩溃。

建议修复：

if self.cfg.scheduler_config.splitwise_role != "mixed": if not is_fetching: try: is_fetching = True get_request_pool.submit(_fetch_request) except RuntimeError as e: if "shutdown" in str(e): self.llm_logger.info("Thread pool shutdown detected, exiting scheduler loop") break else: raise

PaddlePaddle-bot · 2026-05-12T10:12:20Z

+                    for request_id in batch_ids:
+                        request = self.requests[request_id]
+                        required_input_blocks = self.calc_required_blocks(request.prompt_tokens_ids_len, block_size)
+                        current_prefill_tokens += request.prompt_tokens_ids_len


🔴 Bug current_prefill_tokens 和 required_total_blocks 在请求被拒绝之前已完成累加，当 required_total_blocks > available_blocks 触发 break 时，这两个计数器已包含被拒绝请求的数据。

外层 while 循环下一次迭代时，wait_for 仍会返回同一批 batch_ids（游标未推进），进入 for 循环后计数器会被再次累加，导致：

current_prefill_tokens 虚高，可能误触发外层 current_prefill_tokens > max_num_batched_tokens 检查，提前退出循环，后续体积较小、本可调度的请求（如 req3）被跳过。

外层循环只能靠超时（FD_EP_BATCHED_TOKEN_TIMEOUT = 0.1s）退出，浪费等待时间。

建议在 break 之前检查而非之后累加：

if required_total_blocks + required_input_blocks + reserved_output_blocks > available_blocks: break # 先判断，通过后再累加 current_prefill_tokens += request.prompt_tokens_ids_len required_total_blocks += required_input_blocks + reserved_output_blocks

PaddlePaddle-bot · 2026-05-12T10:12:20Z

+        if batch_ids:
+            if len(batch_ids) > 0 and len(requests) == 0:
+                self.scheduler_logger.debug(
+                    f"Scheduler has put all just-pulled request into the queue: {len(batch_ids)}"


❓ 疑问 日志信息语义与条件相反。

此处触发条件为 len(batch_ids) > 0 and len(requests) == 0，即拉取到了请求 ID 但 一个请求都没有入队（全部因资源不足被拒绝）。然而日志显示 "Scheduler has put all just-pulled request into the queue"，容易误导排查方向。

建议改为：

self.scheduler_logger.debug( f"No requests could be scheduled due to insufficient resources, pending={len(batch_ids)}" )

codecov-commenter · 2026-05-12T10:42:39Z

Codecov Report

❌ Patch coverage is 63.63636% with 16 lines in your changes missing coverage. Please review.
⚠️ Please upload report for BASE (develop@3faf93e). Learn more about missing BASE report.

Files with missing lines	Patch %	Lines
fastdeploy/scheduler/dp_scheduler.py	68.96%	3 Missing and 6 partials ⚠️
fastdeploy/engine/common_engine.py	50.00%	3 Missing and 4 partials ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             develop    #7791   +/-   ##
==========================================
  Coverage           ?   63.83%           
==========================================
  Files              ?      458           
  Lines              ?    63515           
  Branches           ?     9731           
==========================================
  Hits               ?    40543           
  Misses             ?    20201           
  Partials           ?     2771

Flag	Coverage Δ
GPU	`72.42% <63.63%> (?)`
XPU	`7.20% <2.27%> (?)`

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

revert ttft optimize

d44052f

zoooo0820 had a problem deploying to Metax_ci May 12, 2026 08:53 — with GitHub Actions Error

This comment was marked as outdated.

Sign in to view

revert other files

0bad507

zoooo0820 had a problem deploying to Metax_ci May 12, 2026 09:33 — with GitHub Actions Failure

This comment was marked as outdated.

Sign in to view

PaddlePaddle-bot suggested changes May 12, 2026

View reviewed changes

zoooo0820 changed the title ~~[for test] revert ttft optimize~~ [Engine] Revert TTFT optimize (#6680) and add EP batched token scheduler May 12, 2026

liyonghua0910 approved these changes May 13, 2026

View reviewed changes

Jiang-Jia-Jun approved these changes May 13, 2026

View reviewed changes

Jiang-Jia-Jun merged commit a83a685 into PaddlePaddle:develop May 13, 2026
55 of 61 checks passed

zoooo0820 deleted the revert_ttft branch May 13, 2026 11:41

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Engine] Revert TTFT optimize (#6680) and add EP batched token scheduler#7791

[Engine] Revert TTFT optimize (#6680) and add EP batched token scheduler#7791
Jiang-Jia-Jun merged 2 commits into
PaddlePaddle:developfrom
zoooo0820:revert_ttft

zoooo0820 commented May 12, 2026 •

edited

Loading

Uh oh!

paddle-bot Bot commented May 12, 2026

Uh oh!

This comment was marked as outdated.

Uh oh!

PaddlePaddle-bot commented May 12, 2026

Approval

Uh oh!

This comment was marked as outdated.

Uh oh!

PaddlePaddle-bot left a comment

Uh oh!

PaddlePaddle-bot May 12, 2026

Uh oh!

PaddlePaddle-bot May 12, 2026

Uh oh!

PaddlePaddle-bot May 12, 2026

Uh oh!

codecov-commenter commented May 12, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

zoooo0820 commented May 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Modifications

Usage or Command

Accuracy Tests

Checklist

Uh oh!

paddle-bot Bot commented May 12, 2026

Uh oh!

This comment was marked as outdated.

Uh oh!

PaddlePaddle-bot commented May 12, 2026

1 任务总览

2 任务状态汇总

2.1 Required任务 : 2/8 通过

2.2 可选任务 — 16/20 通过

3 失败详情（仅 required）

Approval

Uh oh!

This comment was marked as outdated.

Uh oh!

PaddlePaddle-bot left a comment

Choose a reason for hiding this comment

📋 Review 摘要

📝 PR 规范检查

问题

总体评价

Uh oh!

PaddlePaddle-bot May 12, 2026

Choose a reason for hiding this comment

Uh oh!

PaddlePaddle-bot May 12, 2026

Choose a reason for hiding this comment

Uh oh!

PaddlePaddle-bot May 12, 2026

Choose a reason for hiding this comment

Uh oh!

codecov-commenter commented May 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

zoooo0820 commented May 12, 2026 •

edited

Loading

codecov-commenter commented May 12, 2026 •

edited

Loading