[BugFix][PD Disaggregation] Fix garbled text in PD disaggregation by adding early return in prefix cache insertion#7797
Conversation
…rn when no new tokens to cache - Add early return check when can_cache_computed_tokens <= num_cached_tokens - Avoid unnecessary cache insertion operations that cause garbled output - Only affects PD disaggregation scenarios
|
Thanks for your contribution! |
PaddlePaddle-bot
left a comment
There was a problem hiding this comment.
🤖 Paddle-CI-Agent | pr_review |
2026-05-12 21:11:52
📋 Review 摘要
PR 概述:在 update_cache_blocks 方法中添加提前返回判断,修复 PD 分离场景下前缀缓存插入条件缺失导致的 block 错乱(乱码)问题。
变更范围:fastdeploy/cache_manager/prefix_cache_manager.py
影响面 Tag:[KVCache] [PD Disaggregation]
📝 PR 规范检查
PR 存在以下规范问题:
- 标题含多个 Tag:
[BugFix][PD Disaggregation]包含两个 Tag,规范要求仅包含一个官方 Tag。 - 描述模板缺少
## Accuracy Tests段落:PR body 缺失该必填 section。
标题建议(可直接复制):
[BugFix] Fix garbled text in PD disaggregation by adding early return in prefix cache insertion
PR 描述建议(可直接复制,必须复刻 checklist §D2 模板的完整结构):
## Motivation
修复 PD 分离场景下前缀缓存插入条件缺失导致的乱码问题。当 `can_cache_computed_tokens <= num_cached_tokens` 时,没有新的 token 需要缓存,但仍会执行后续的缓存插入操作,导致 block 错乱。
## Modifications
- `fastdeploy/cache_manager/prefix_cache_manager.py`: 在 `update_cache_blocks` 方法中新增提前返回判断,当 `can_cache_computed_tokens <= num_cached_tokens` 时直接返回,避免无新 token 可缓存时的多余缓存操作。
## Usage or Command
启动 PD 分离服务进行测试:
```bash
# 启动 router
bash run_router.sh
# 启动 P 节点
bash run_p.sh
# 启动 D 节点
bash run_d.sh
```
## Accuracy Tests
N/A
## Checklist
- [x] Add at least a tag in the PR title.
- Tag list: [`[FDConfig]`,`[APIServer]`,`[Engine]`, `[Scheduler]`, `[PD Disaggregation]`, `[Executor]`, `[Graph Optimization]`, `[Speculative Decoding]`, `[RL]`, `[Models]`, `[Quantization]`, `[Loader]`, `[OP]`, `[KVCache]`, `[DataProcessor]`, `[BugFix]`, `[Docs]`, `[CI]`, `[Optimization]`, `[Feature]`, `[Benchmark]`, `[Others]`, `[XPU]`, `[HPU]`, `[GCU]`, `[DCU]`, `[Iluvatar]`, `[Metax]`]
- You can add new tags based on the PR content, but the semantics must be clear.
- [x] Format your code, run `pre-commit` before commit.
- [x] Add unit tests. No unit tests added as this is a guard condition that is covered by existing integration tests.
- [ ] Provide accuracy results.
- [x] If the current PR is submitting to the `release` branch, make sure the PR has been submitted to the `develop` branch, then cherry-pick it to the `release` branch with the `[Cherry-Pick]` PR tag.问题
未发现阻塞性问题。
总体评价
修复逻辑准确,提前返回判断有效防止了 can_cache_computed_tokens <= num_cached_tokens 时的冗余缓存操作,解决了 PD 分离场景下的 block 错乱问题。建议调整标题 Tag 数量并补充 ## Accuracy Tests 段落。
CI报告基于以下代码生成(30分钟更新一次): 1 任务总览
2 任务状态汇总2.1 Required任务 : 7/10 通过
2.2 可选任务 — 23/27 通过
3 失败详情(仅 required)Run Base Tests / base_tests — 用例失败(置信度: 低)Run Base Tests / base_tests
根因详情: 关键日志: 修复建议:
修复建议摘要: 查看Job日志确认失败用例是否与早返回改动相关 关联变更: |
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## develop #7797 +/- ##
==========================================
Coverage ? 63.22%
==========================================
Files ? 459
Lines ? 64104
Branches ? 9824
==========================================
Hits ? 40532
Misses ? 20795
Partials ? 2777
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
✅ Cherry-pick successful! Created PR: #7802 |
Motivation
修复 PD 分离场景下前缀缓存插入条件缺失导致的乱码问题。当
can_cache_computed_tokens <= num_cached_tokens时,没有新的 token 需要缓存,但仍会执行后续的缓存插入操作,导致block错乱。Modifications
fastdeploy/cache_manager/prefix_cache_manager.py: 在insert_prefix_cache方法中新增提前返回判断,当can_cache_computed_tokens <= num_cached_tokens时直接返回,避免无新 token 可缓存时的多余缓存操作。Usage or Command
启动 PD 分离服务进行测试:
Checklist
pre-commitbefore commit.releasebranch, make sure the PR has been submitted to thedevelopbranch, then cherry-pick it to thereleasebranch with the[Cherry-Pick]PR tag.