[Cherry-Pick][BugFix][PD Disaggregation] Fix garbled text in PD disaggregation by adding early return in prefix cache insertion(#7797)#7802
Conversation
…rn when no new tokens to cache (PaddlePaddle#7797) - Add early return check when can_cache_computed_tokens <= num_cached_tokens - Avoid unnecessary cache insertion operations that cause garbled output - Only affects PD disaggregation scenarios
PaddlePaddle-bot
left a comment
There was a problem hiding this comment.
🤖 Paddle-CI-Agent | pr_review |
2026-05-13 15:19:43
📋 Review 摘要
PR 概述:Cherry-pick #7797,在 PD 分离场景下 update_cache_blocks 中补充提前返回判断,修复前缀缓存插入条件缺失导致的乱码问题。
变更范围:fastdeploy/cache_manager/prefix_cache_manager.py
影响面 Tag:[KVCache] [PD Disaggregation] [BugFix]
📝 PR 规范检查
PR 描述缺少 ## Accuracy Tests 段落;## Modifications 中方法名有误(写作 insert_prefix_cache,实际为 update_cache_blocks)。
PR 描述建议(可直接复制):
## Motivation
修复 PD 分离场景下前缀缓存插入条件缺失导致的乱码问题。当 `can_cache_computed_tokens <= num_cached_tokens` 时,没有新的 token 需要缓存,但仍会执行后续的缓存插入操作,导致 block 错乱。
## Modifications
- `fastdeploy/cache_manager/prefix_cache_manager.py`:在 `update_cache_blocks` 方法中新增提前返回判断,当 `can_cache_computed_tokens <= num_cached_tokens` 时直接返回,避免无新 token 可缓存时的多余缓存操作。
## Usage or Command
启动 PD 分离服务进行测试:
bash run_router.sh
bash run_p.sh
bash run_d.sh
## Accuracy Tests
N/A
## Checklist
- [x] Add at least a tag in the PR title.
- Tag list: [`[FDConfig]`,`[APIServer]`,`[Engine]`, `[Scheduler]`, `[PD Disaggregation]`, `[Executor]`, `[Graph Optimization]`, `[Speculative Decoding]`, `[RL]`, `[Models]`, `[Quantization]`, `[Loader]`, `[OP]`, `[KVCache]`, `[DataProcessor]`, `[BugFix]`, `[Docs]`, `[CI]`, `[Optimization]`, `[Feature]`, `[Benchmark]`, `[Others]`, `[XPU]`, `[HPU]`, `[GCU]`, `[DCU]`, `[Iluvatar]`, `[Metax]`]
- You can add new tags based on the PR content, but the semantics must be clear.
- [x] Format your code, run `pre-commit` before commit.
- [x] Add unit tests. No unit tests added as this is a guard condition that is covered by existing integration tests.
- [ ] Provide accuracy results.
- [x] If the current PR is submitting to the `release` branch, make sure the PR has been submitted to the `develop` branch, then cherry-pick it to the `release` branch with the `[Cherry-Pick]` PR tag.问题
| 级别 | 文件 | 概述 |
|---|---|---|
| 📝 PR 规范 | 无 | PR 描述缺少 ## Accuracy Tests 段落,且 Modifications 中方法名有误(insert_prefix_cache → update_cache_blocks) |
总体评价
代码变更简洁正确,<= 守护条件放置在 leaf_req_map 修改和锁获取之前,有效避免了 PD 分离场景下 block 错乱问题。仅 PR 描述存在结构性缺失,不阻塞合入。
CI报告基于以下代码生成(30分钟更新一次): 1 任务总览⏳ CI 进行中 — 4 个 Required 任务运行中,0 个 Required 任务失败,暂无阻塞合并的失败。
2 任务状态汇总2.1 Required任务 : 4/8 通过
2.2 可选任务 — 20/23 通过
3 失败详情(仅 required)无 required 失败任务。 |
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## release/2.6 #7802 +/- ##
==============================================
Coverage ? 72.89%
==============================================
Files ? 379
Lines ? 54056
Branches ? 8455
==============================================
Hits ? 39402
Misses ? 11866
Partials ? 2788
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
Thanks for your contribution! |
Cherry-pick of #7797 (authored by @kevincheng2) to
release/2.6.devPR:#7797
Motivation
修复 PD 分离场景下前缀缓存插入条件缺失导致的乱码问题。当
can_cache_computed_tokens <= num_cached_tokens时,没有新的 token 需要缓存,但仍会执行后续的缓存插入操作,导致block错乱。Modifications
fastdeploy/cache_manager/prefix_cache_manager.py: 在insert_prefix_cache方法中新增提前返回判断,当can_cache_computed_tokens <= num_cached_tokens时直接返回,避免无新 token 可缓存时的多余缓存操作。Usage or Command
启动 PD 分离服务进行测试:
Checklist
pre-commitbefore commit.releasebranch, make sure the PR has been submitted to thedevelopbranch, then cherry-pick it to thereleasebranch with the[Cherry-Pick]PR tag.