Skip to content

[Engine] Revert TTFT optimize (#6680) and add EP batched token scheduler#7791

Merged
Jiang-Jia-Jun merged 2 commits into
PaddlePaddle:developfrom
zoooo0820:revert_ttft
May 13, 2026
Merged

[Engine] Revert TTFT optimize (#6680) and add EP batched token scheduler#7791
Jiang-Jia-Jun merged 2 commits into
PaddlePaddle:developfrom
zoooo0820:revert_ttft

Conversation

@zoooo0820
Copy link
Copy Markdown
Collaborator

@zoooo0820 zoooo0820 commented May 12, 2026

Motivation

通过实际测试,#6680 导致k3_kl_mean 相比branch release2.6上涨50倍;由于相关部分已经被多次修改,代码已经无法自动通过git revert

Modifications

Revert 结果

  • 无冲突文件(8个,git 自动处理)
docs/usage/environment_variables.md — 恢复了 FD_ENGINE_FORWARD_SIGNAL 环境变量文档
docs/zh/usage/environment_variables.md — 同上(中文版)
fastdeploy/envs.py — 恢复了 FD_ENGINE_FORWARD_SIGNAL 环境变量定义
fastdeploy/scheduler/dp_scheduler.py — 恢复了原始调度器逻辑
fastdeploy/splitwise/internal_adapter_utils.py — 恢复了原始逻辑
tests/ci_use/metrics/test_metrics.py — 恢复了测试
tests/engine/test_common_engine.py — 恢复了测试
tests/scheduler/test_dp_scheduler.py — 恢复了测试
tests/splitwise/test_internal_adapter_utils.py — 恢复了测试
  • 有冲突文件(2个,手动解决)
fastdeploy/engine/common_engine.py(1处冲突):
移除 engine_forward_signal IPCSignal
恢复调度器循环原始逻辑
移除 EP 空 task 批处理 else 分支
使用 batch_request 变量名(后续 commit 的重命名,非 PR #6680 引入)
fastdeploy/worker/worker_process.py(3处冲突):

移除 engine_forward_signal 信号初始化(同时未恢复已被 PR #7299 删除的 gpu_cache_lock)
移除 engine_forward_signal.value[0] = 1 赋值
移除 EP barrier(paddle.distributed.barrier(ep_group))
移除 EP idle else 分支
保留了后续 commit 的改进:_get_exist_task_flag()、BatchRequest.from_tasks()、batch_request

Usage or Command

Accuracy Tests

Checklist

  • Add at least a tag in the PR title.
    • Tag list: [[FDConfig],[APIServer],[Engine], [Scheduler], [PD Disaggregation], [Executor], [Graph Optimization], [Speculative Decoding], [RL], [Models], [Quantization], [Loader], [OP], [KVCache], [DataProcessor], [BugFix], [Docs], [CI], [Optimization], [Feature], [Benchmark], [Others], [XPU], [HPU], [GCU], [DCU], [Iluvatar], [Metax]]
    • You can add new tags based on the PR content, but the semantics must be clear.
  • Format your code, run pre-commit before commit.
  • Add unit tests. Please write the reason in this PR if no unit tests.
  • Provide accuracy results.
  • If the current PR is submitting to the release branch, make sure the PR has been submitted to the develop branch, then cherry-pick it to the release branch with the [Cherry-Pick] PR tag.

@paddle-bot
Copy link
Copy Markdown

paddle-bot Bot commented May 12, 2026

Thanks for your contribution!

PaddlePaddle-bot

This comment was marked as outdated.

@PaddlePaddle-bot
Copy link
Copy Markdown

🤖 Paddle-CI-Agent | ci_status_monitor | 2026-05-12 17:48:07

CI报告基于以下代码生成(30分钟更新一次):


1 任务总览

CI 有 1 个 Required 任务失败,另有 1 个运行中、4 个等待中,合并被阻塞,请优先处理 Required 失败任务。

总执行(rerun次数) 总任务 ✅ 通过 ❌ 失败 ⏳ 运行中 ⏸️ 等待中 跳过
28(0) 28 18 3 2 5 0

2 任务状态汇总

2.1 Required任务 : 2/8 通过

必选任务阻塞合并,失败需优先处理。

状态 任务 耗时 根因 修复建议 日志 重跑
Approval 8s PR问题:修改受限文件需2组特定RD审批 请 jiangjiajun等/zhouchong等在PR审批 Job -
Run Base Tests / base_tests - 运行中 - Job -
⏸️ Extracted partial CE model tasks to run in CI. / run_ce_cases - 等待中 - - -
⏸️ Run FastDeploy Unit Tests and Coverage / run_tests_with_coverage - 等待中 - - -
⏸️ Run Four Cards Tests / run_4_cards_tests - 等待中 - - -
⏸️ Run Stable Tests / stable_tests - 等待中 - - -
其余 2 个必选任务通过 - - - - -

2.2 可选任务 — 16/20 通过

可选任务不阻塞合并,失败仅供参考。

状态 任务 耗时 日志 重跑
Run iluvatar Tests / run_iluvatar_cases 7m10s Job -
Check PR Template 14s Job -
Trigger Jenkins for PR - Job -
⏸️ CI_HPU - - -
其余 16 个可选任务通过 - - -

3 失败详情(仅 required)

Approval — 流程审批(置信度: 高)

Approval

  • 状态: ❌ 失败
  • 错误类型: 流程审批
  • 置信度: 高
  • 根因摘要: PR修改受限文件及日志行为,需2组特定RD审批
  • 分析器: 通用分析(fallback)

根因详情:
检查脚本 scripts/check_approval.sh 发现 PR 存在 2 项未满足的审批要求:① 修改了 fastdeploy/envs.py,需要 @jiangjiajun / @liuyuanle / @chenjian26 / @wanglongzhi 中至少一人审批;② 新增了日志调用(.info / .debug),需要 @zhouchong / @zhangyongyue 中至少一人审批。两项审批均未满足,脚本以 exit code 6 退出。

关键日志:

0. You must have one FastDeploy RD (Jiang-Jia-Jun(jiangjiajun), yuanlehome(liuyuanle), rainyfly(chenjian26), Wanglongzhi2001(wanglongzhi)) approval for modifying [fastdeploy/envs.py].
1. You must have one FastDeploy RD (xyxinyang(zhouchong), zyyzghb(zhangyongyue)) approval for modifying logging behavior (.info/.debug/.error/log_request).
There are 2 approved errors.

修复建议:

  1. @jiangjiajun、@liuyuanle、@chenjian26 或 @wanglongzhifastdeploy/envs.py 的修改进行审批
  2. @zhouchong@zhangyongyue 对日志行为修改(新增 .info/.debug 调用)进行审批

修复建议摘要: 请相关RD(jiangjiajun等/zhouchong等)在PR上审批

链接: 查看日志

PaddlePaddle-bot

This comment was marked as outdated.

Copy link
Copy Markdown

@PaddlePaddle-bot PaddlePaddle-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 Paddle-CI-Agent | pr_review | 2026-05-12 18:08:51

📋 Review 摘要

PR 概述:回滚 EP 模式 TTFT 优化(删除 engine_forward_signal 机制),并在 dp_scheduler 中引入基于超时的批量请求收集策略
变更范围engine/common_engine.pyscheduler/dp_scheduler.pyworker/worker_process.pysplitwise/
影响面 Tag[Engine] [Scheduler] [PD Disaggregation]

📝 PR 规范检查

标题 [for test] revert ttft optimize[for test] 不属于 checklist §D1 官方 Tag 列表;PR 描述各必填段落均为空(仅保留模板注释)。

标题建议(可直接复制):

  • [Engine] Revert TTFT optimize and add EP batched token scheduler

PR 描述建议(可直接复制):

## Motivation
回滚 TTFT(Time to First Token)优化。该优化通过 `engine_forward_signal` IPC 信号在 EP(Expert Parallel)模式下同步调度器与 Worker 前向执行状态。本次回滚删除该信号机制,并在 `dp_scheduler` 中引入基于 `FD_EP_BATCHED_TOKEN_TIMEOUT` 超时的批量请求收集策略作为替代方案。

## Modifications
- `fastdeploy/engine/common_engine.py`:删除 `engine_forward_signal` IPCSignal;重构调度循环,先检查 `exist_tasks()` 再提交请求拉取;移除 EP 模式下发送空任务批到 Worker 的逻辑;`RuntimeError` 处理仅保留在 mixed 分支
- `fastdeploy/scheduler/dp_scheduler.py`:新增资源前置检查;将单请求拉取改为带超时的批量请求收集循环,引入 `FD_EP_BATCHED_TOKEN_TIMEOUT` 环境变量(默认 0.1s)
- `fastdeploy/envs.py` / `docs/`:新增 `FD_EP_BATCHED_TOKEN_TIMEOUT` 环境变量
- `fastdeploy/splitwise/internal_adapter_utils.py`:移除 `ENABLE_V1_KVCACHE_SCHEDULER` 条件分支,简化 `unhandled_request_num` 计算
- `fastdeploy/worker/worker_process.py`:删除 `engine_forward_signal`;将 `_run_eplb` 移至 event loop 顶部;删除 EP prefill barrier 及空任务 assert 断言
- `tests/`:同步更新 `test_common_engine.py``test_dp_scheduler.py``test_internal_adapter_utils.py``test_metrics.py`

## Usage or Command
N/A

## Accuracy Tests
N/A

## Checklist

- [x] Add at least a tag in the PR title.
  - Tag list: [`[FDConfig]`,`[APIServer]`,`[Engine]`, `[Scheduler]`, `[PD Disaggregation]`, `[Executor]`, `[Graph Optimization]`, `[Speculative Decoding]`, `[RL]`, `[Models]`, `[Quantization]`, `[Loader]`, `[OP]`, `[KVCache]`, `[DataProcessor]`, `[BugFix]`, `[Docs]`, `[CI]`, `[Optimization]`, `[Feature]`, `[Benchmark]`, `[Others]`, `[XPU]`, `[HPU]`, `[GCU]`, `[DCU]`, `[Iluvatar]`, `[Metax]`]
  - You can add new tags based on the PR content, but the semantics must be clear.
- [ ] Format your code, run `pre-commit` before commit.
- [x] Add unit tests. Please write the reason in this PR if no unit tests.
- [ ] Provide accuracy results.
- [ ] If the current PR is submitting to the `release` branch, make sure the PR has been submitted to the `develop` branch, then cherry-pick it to the `release` branch with the `[Cherry-Pick]` PR tag.

问题

级别 文件 概述
🔴 Bug fastdeploy/engine/common_engine.py:1091 非 mixed 模式下 get_request_pool.submit() 缺少 RuntimeError 捕获,线程池关闭时调度线程会崩溃
🔴 Bug fastdeploy/scheduler/dp_scheduler.py:161 current_prefill_tokens / required_total_blocks 在请求被拒绝前已累加,外层 while 的 token 预算检查可能被误触发,导致后续可调度的小请求被跳过
❓ 疑问 fastdeploy/scheduler/dp_scheduler.py:183 日志信息与实际语义相反:len(requests)==0 说明无任何请求入队,但日志显示"has put all just-pulled request into the queue"

总体评价

此 PR 回滚了 EP 模式的 TTFT 优化并引入新的批量调度策略,核心逻辑变动较大。dp_scheduler.py 的计数器提前累加是正确性 Bug,可能导致部分有效请求被漏调度;common_engine.py 非 mixed 路径遗漏异常处理存在稳定性风险。建议修复两处 P1 问题后再合入。

if self.cfg.scheduler_config.splitwise_role != "mixed":
if not is_fetching:
is_fetching = True
get_request_pool.submit(_fetch_request)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 Bug 非 mixed 模式下 get_request_pool.submit(_fetch_request) 缺少 RuntimeError 捕获。

mixed 模式的 else 分支保留了完整的 try/except 处理(线程池关闭时优雅退出),但非 mixed 路径没有。当引擎关闭时,ThreadPoolExecutor 会抛出 RuntimeError: cannot schedule new futures after shutdown,将直接导致调度线程崩溃。

建议修复:

if self.cfg.scheduler_config.splitwise_role != "mixed":
    if not is_fetching:
        try:
            is_fetching = True
            get_request_pool.submit(_fetch_request)
        except RuntimeError as e:
            if "shutdown" in str(e):
                self.llm_logger.info("Thread pool shutdown detected, exiting scheduler loop")
                break
            else:
                raise

for request_id in batch_ids:
request = self.requests[request_id]
required_input_blocks = self.calc_required_blocks(request.prompt_tokens_ids_len, block_size)
current_prefill_tokens += request.prompt_tokens_ids_len
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 Bug current_prefill_tokensrequired_total_blocks 在请求被拒绝 之前 已完成累加,当 required_total_blocks > available_blocks 触发 break 时,这两个计数器已包含被拒绝请求的数据。

外层 while 循环下一次迭代时,wait_for 仍会返回同一批 batch_ids(游标未推进),进入 for 循环后计数器会被再次累加,导致:

  1. current_prefill_tokens 虚高,可能误触发外层 current_prefill_tokens > max_num_batched_tokens 检查,提前退出循环,后续体积较小、本可调度的请求(如 req3)被跳过。
  2. 外层循环只能靠超时(FD_EP_BATCHED_TOKEN_TIMEOUT = 0.1s)退出,浪费等待时间。

建议在 break 之前检查而非之后累加:

if required_total_blocks + required_input_blocks + reserved_output_blocks > available_blocks:
    break  # 先判断,通过后再累加
current_prefill_tokens += request.prompt_tokens_ids_len
required_total_blocks += required_input_blocks + reserved_output_blocks

if batch_ids:
if len(batch_ids) > 0 and len(requests) == 0:
self.scheduler_logger.debug(
f"Scheduler has put all just-pulled request into the queue: {len(batch_ids)}"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❓ 疑问 日志信息语义与条件相反。

此处触发条件为 len(batch_ids) > 0 and len(requests) == 0,即拉取到了请求 ID 但 一个请求都没有入队(全部因资源不足被拒绝)。然而日志显示 "Scheduler has put all just-pulled request into the queue",容易误导排查方向。

建议改为:

self.scheduler_logger.debug(
    f"No requests could be scheduled due to insufficient resources, pending={len(batch_ids)}"
)

@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented May 12, 2026

Codecov Report

❌ Patch coverage is 63.63636% with 16 lines in your changes missing coverage. Please review.
⚠️ Please upload report for BASE (develop@3faf93e). Learn more about missing BASE report.

Files with missing lines Patch % Lines
fastdeploy/scheduler/dp_scheduler.py 68.96% 3 Missing and 6 partials ⚠️
fastdeploy/engine/common_engine.py 50.00% 3 Missing and 4 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             develop    #7791   +/-   ##
==========================================
  Coverage           ?   63.83%           
==========================================
  Files              ?      458           
  Lines              ?    63515           
  Branches           ?     9731           
==========================================
  Hits               ?    40543           
  Misses             ?    20201           
  Partials           ?     2771           
Flag Coverage Δ
GPU 72.42% <63.63%> (?)
XPU 7.20% <2.27%> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@zoooo0820 zoooo0820 changed the title [for test] revert ttft optimize [Engine] Revert TTFT optimize (#6680) and add EP batched token scheduler May 12, 2026
@Jiang-Jia-Jun Jiang-Jia-Jun merged commit a83a685 into PaddlePaddle:develop May 13, 2026
55 of 61 checks passed
@zoooo0820 zoooo0820 deleted the revert_ttft branch May 13, 2026 11:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants