[Cherry-pick][PD Disaggregation] Limit prefill fetch num with FD_MAX_INFLIGHT_PREFILL(#7981) by Sunny-bot1 · Pull Request #8081 · PaddlePaddle/FastDeploy

Sunny-bot1 · 2026-06-26T05:33:08Z

Motivation

Modifications

Usage or Command

Accuracy Tests

Checklist

Add at least a tag in the PR title.
- Tag list: [[FDConfig],[APIServer],[Engine], [Scheduler], [PD Disaggregation], [Executor], [Graph Optimization], [Speculative Decoding], [RL], [Models], [Quantization], [Loader], [OP], [KVCache], [DataProcessor], [BugFix], [Docs], [CI], [Optimization], [Feature], [Benchmark], [Others], [XPU], [HPU], [GCU], [DCU], [Iluvatar], [Metax]]
- You can add new tags based on the PR content, but the semantics must be clear.
Format your code, run pre-commit before commit.
Add unit tests. Please write the reason in this PR if no unit tests.
Provide accuracy results.
If the current PR is submitting to the release branch, make sure the PR has been submitted to the develop branch, then cherry-pick it to the release branch with the [Cherry-Pick] PR tag.

In PD separation mode, different ranks may receive cache_info at different times. When consume_signals gets a layer0 signal, some ranks find the engine_idx already in idx_cache_task_dict (ready) while others don't (pending). This causes different ranks to put different batch_engine_signals into the queue, leading to mismatched finish_send_cache_barrier.wait() calls and deadlock. Fix: route all layer0 signals through pending_layer0_signals uniformly, then immediately recover any that already have cache_info registered. Each recovered signal is put into the queue individually (single-request batch) to ensure all ranks have identical batch granularity regardless of recovery timing.

PaddlePaddle-bot

🤖 Paddle-CI-Agent | pr_review | 2026-06-26 13:41:58

📋 Review 摘要

PR 概述：为 PD prefill 请求拉取增加 FD_MAX_INFLIGHT_PREFILL 上限。
变更范围：fastdeploy/engine/common_engine_prepare_mixin.py、fastdeploy/envs.py
影响面 Tag：[Engine] [PD Disaggregation]

问题

级别	文件	概述
🔴 Bug	`fastdeploy/engine/common_engine_prepare_mixin.py:93`	多个 prefill fetch 线程会同时按同一个剩余额度出队，`FD_MAX_INFLIGHT_PREFILL` 实际可能被突破

📝 PR 规范检查

当前 PR 标题的 Cherry-Pick 标记大小写不符合模板，PR 描述保留了模板占位注释且核心章节为空。建议替换为以下可复制内容。

标题建议（可直接复制）：

[Cherry-Pick][PD Disaggregation] Limit prefill fetch num with FD_MAX_INFLIGHT_PREFILL(#7981)

PR 描述建议（点击展开，可直接复制）

## Motivation
限制 PD Disaggregation prefill 实例一次性拉取的在途请求数量，避免 prefill fetch 过多请求导致资源压力放大。

## Modifications
- 在 `fastdeploy/engine/common_engine_prepare_mixin.py` 中读取 `FD_MAX_INFLIGHT_PREFILL`，并在 `_fetch_request_prefill()` 中根据当前 running prefill 数量限制本轮 `scheduler.get_requests()` 的 batch。
- 在 `fastdeploy/envs.py` 中新增环境变量 `FD_MAX_INFLIGHT_PREFILL`，默认值为 `20`。

## Usage or Command
设置环境变量控制 prefill 在途请求上限，例如：
`FD_MAX_INFLIGHT_PREFILL=20`

## Accuracy Tests
N/A，本 PR 不涉及模型数值逻辑；当前 diff 未新增单测，需补充多 fetch 线程同时调用 `_fetch_request_prefill()` 时不突破 `FD_MAX_INFLIGHT_PREFILL` 的回归测试后再勾选单测项。

## Checklist

- [x] Add at least a tag in the PR title.
  - Tag list: [`[FDConfig]`,`[APIServer]`,`[Engine]`, `[Scheduler]`, `[PD Disaggregation]`, `[Executor]`, `[Graph Optimization]`, `[Speculative Decoding]`, `[RL]`, `[Models]`, `[Quantization]`, `[Loader]`, `[OP]`, `[KVCache]`, `[DataProcessor]`, `[BugFix]`, `[Docs]`, `[CI]`, `[Optimization]`, `[Feature]`, `[Benchmark]`, `[Others]`, `[XPU]`, `[HPU]`, `[GCU]`, `[DCU]`, `[Iluvatar]`, `[Metax]`]
  - You can add new tags based on the PR content, but the semantics must be clear.
- [ ] Format your code, run `pre-commit` before commit.
- [ ] Add unit tests. Please write the reason in this PR if no unit tests.
- [x] Provide accuracy results.
- [x] If the current PR is submitting to the `release` branch, make sure the PR has been submitted to the `develop` branch, then cherry-pick it to the `release` branch with the `[Cherry-Pick]` PR tag.

总体评价

新增 env 和 fetch 批量限制方向与 PR 目的一致，但当前实现没有把多线程 fetch 的 pending 请求计入上限，默认配置下限制会失效。建议先修复该并发记账问题，并补一条多 fetch 线程同时拉取时不突破上限的回归测试。

PaddlePaddle-bot · 2026-06-26T05:43:55Z

    def _fetch_request_prefill(self) -> bool:
        """Fetch and prepare requests for a prefill instance. Returns True if tasks were fetched."""
+        max_inflight_prefill = envs.FD_MAX_INFLIGHT_PREFILL
+        inflight_prefill = len(self.resource_manager.running)


🔴 Bug 这里用 len(self.resource_manager.running) 计算剩余额度，无法限制默认的多 fetch 线程场景。

_prepare_request_v1() 在 prefill 角色会按 FD_PREFILL_PREPARE_REQ_THREAD_NUM 启动多个 _fetch_loop（默认 3 个线程）。每个线程都会在请求还没有执行到 add_request_in_p() 之前读取同一个 running 长度并调用 scheduler.get_requests(batch=available_for_new)；而这些已出队、正在申请 D 侧资源或异步预处理的请求尚未进入 running。因此 FD_MAX_INFLIGHT_PREFILL=20 时，3 个线程可以同时各拉 20 个请求，实际 inflight 变成 60，限制失效并继续放大 PD 侧资源压力。

建议修复方式：
把“检查剩余额度”和“登记已占用额度”放到同一个共享临界区，例如在 prefill fetch 入口维护一个受锁保护的 pending 计数，出队前先 reserve available_for_new，请求失败或完成后再释放；或者将 FD_MAX_INFLIGHT_PREFILL 的判断移动到 ResourceManagerV1 内部，与 preallocate_resource_in_p() / add_request_in_p() 共用锁，并把已出队但未进入 running 的 pending 请求也计入上限。

codecov-commenter · 2026-06-26T06:54:04Z

Codecov Report

❌ Patch coverage is 40.00000% with 3 lines in your changes missing coverage. Please review.
⚠️ Please upload report for BASE (release/2.6@82c7c7a). Learn more about missing BASE report.

Files with missing lines	Patch %	Lines
fastdeploy/engine/common_engine_prepare_mixin.py	40.00%	3 Missing ⚠️

Additional details and impacted files

@@              Coverage Diff               @@
##             release/2.6    #8081   +/-   ##
==============================================
  Coverage               ?   71.48%           
==============================================
  Files                  ?      386           
  Lines                  ?    55795           
  Branches               ?     8765           
==============================================
  Hits                   ?    39885           
  Misses                 ?    13104           
  Partials               ?     2806

Flag	Coverage Δ
GPU	`71.48% <40.00%> (?)`

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

PaddlePaddle-bot · 2026-06-27T04:08:14Z

🤖 Paddle-CI-Agent | ci_status_monitor | 2026-06-27 12:06:58 UTC+08:00

CI报告基于以下代码生成（30分钟更新一次）:
PR commit: cbbb9c6 | Merge base: 82c7c7a (branch: release/2.6)

1 Required任务 : 8/10 通过

总执行（rerun次数）	总任务	✅ 通过	❌ 失败	⏳ 运行中	⏸️ 等待中	跳过
36(0)	36	31	5	0	0	0

任务	错误类型	置信度	日志
`Run FastDeploy Unit Tests and Coverage / run_tests_with_coverage`	PR问题	高	Job
`Approval`	需要 Approval	高	Job

2 失败详情

🔴 Run FastDeploy Unit Tests and Coverage / run_tests_with_coverage — PR问题（置信度: 高）

错误类型: PR问题 | 置信度: 高
分析器: 通用分析(fallback)
失败用例:

用例	错误摘要
`tests/engine/test_common_engine.py::TestCommonEngineAdditionalCoverage::test_schedule_request_to_worker_v1_prefill_continuous_wait_async_none`	`_fetch_request_prefill` 新增读取 `resource_manager.running`，测试 DummyRM 缺少该属性导致 AttributeError

关键日志:

fastdeploy/engine/common_engine_prepare_mixin.py:93
inflight_prefill = len(self.resource_manager.running)
AttributeError: 'DummyRM' object has no attribute 'running'

根因摘要: PR新增并发限制后测试桩缺少running

本 PR 在 fastdeploy/engine/common_engine_prepare_mixin.py:92-100 为 _fetch_request_prefill() 增加 FD_MAX_INFLIGHT_PREFILL 限流逻辑，并直接读取 len(self.resource_manager.running)。失败用例使用的 tests/engine/test_common_engine.py:336-364 中 _make_v1_prefill_continuous_rm() 返回的 DummyRM 只定义了 waiting、real_bsz、add_request_in_p、pre_recycle_resource 等字段，没有同步新增 running 属性，因此在进入新逻辑前抛出 AttributeError。真实 ResourceManagerV1 在 fastdeploy/engine/sched/resource_manager_v1.py:181 初始化了 self.running，当前 CI 失败更像是本 PR 新行为引入后单测桩未更新。

修复建议:

在 tests/engine/test_common_engine.py 的 _make_v1_prefill_continuous_rm() / DummyRM.__init__ 中补充 self.running = []，必要时按场景构造已有在飞请求数量。
补一条断言覆盖 len(resource_manager.running) >= FD_MAX_INFLIGHT_PREFILL 时 _fetch_request_prefill() 返回 False 且不继续从 scheduler 拉取请求。

关联变更: fastdeploy/engine/common_engine_prepare_mixin.py:92-100, fastdeploy/envs.py:199; 关联测试桩 tests/engine/test_common_engine.py:336-364

🔴 Approval — 需要 Approval（置信度: 高）

该 Job 需要人工 Approval，完成审批后 CI 才会继续执行。

kevincheng2 and others added 3 commits June 10, 2026 17:16

cp 7981

c94fa98

Merge branch 'release/2.6' into fd_26_fix_prefill_num

cbbb9c6

PaddlePaddle-bot suggested changes Jun 26, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Cherry-pick][PD Disaggregation] Limit prefill fetch num with FD_MAX_INFLIGHT_PREFILL(#7981)#8081

[Cherry-pick][PD Disaggregation] Limit prefill fetch num with FD_MAX_INFLIGHT_PREFILL(#7981)#8081
Sunny-bot1 wants to merge 3 commits into
PaddlePaddle:release/2.6from
Sunny-bot1:fd_26_fix_prefill_num

Sunny-bot1 commented Jun 26, 2026 •

edited

Loading

Uh oh!

PaddlePaddle-bot left a comment

Uh oh!

PaddlePaddle-bot Jun 26, 2026

Uh oh!

codecov-commenter commented Jun 26, 2026

Uh oh!

PaddlePaddle-bot commented Jun 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Uh oh!

Conversation

Sunny-bot1 commented Jun 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Modifications

Usage or Command

Accuracy Tests

Checklist

Uh oh!

PaddlePaddle-bot left a comment

Choose a reason for hiding this comment

📋 Review 摘要

问题

📝 PR 规范检查

总体评价

Uh oh!

PaddlePaddle-bot Jun 26, 2026

Choose a reason for hiding this comment

Uh oh!

codecov-commenter commented Jun 26, 2026

Codecov Report

Uh oh!

PaddlePaddle-bot commented Jun 27, 2026

1 Required任务 : 8/10 通过

2 失败详情

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Sunny-bot1 commented Jun 26, 2026 •

edited

Loading