【Do Not Merge】 测试dynamic loading用,存储模型pdparams#7789
Conversation
Added functionality to save model parameters and configuration files when FD_SAVE_PDPARAMS is set. Includes handling for saving directory and rank-based file copying.
|
Thanks for your contribution! |
There was a problem hiding this comment.
Pull request overview
该 PR 在 GPUModelRunner.load_model() 中新增一个通过环境变量触发的“导出权重/配置”能力:当设置 FD_SAVE_PDPARAMS=1 时,将模型参数(以及部分 config/tokenizer 文件)保存到指定目录,用于 dynamic loading 相关测试/排查。
Changes:
- 新增
FD_SAVE_PDPARAMS+FD_SAVE_DIR控制的权重/配置导出逻辑(按 rank 写出分片文件) - rank0 额外复制模型目录下的部分配置与 tokenizer 文件到导出目录
- 尝试将保存的权重包含 proposer/MTP 模型参数
补充说明(需人工处理):
- PR 标题未按仓库要求包含标签(例如
[Feature]...),且标题包含“【Do Not Merge】”不符合合入规范。 - PR 描述模板的 Motivation/Modifications/Usage 等未补全;并且新增环境变量建议同步更新环境变量文档(如
docs/usage/environment_variables.md)以避免“隐藏开关”。
| visible_devices = os.getenv("CUDA_VISIBLE_DEVICES", "0").split(",") | ||
| meta_src_id = int(visible_devices[int(os.getenv("FLAGS_selected_gpus", "0"))]) | ||
| rank = paddle.distributed.get_rank() |
| # Save model weights (main model + proposer/MTP model if exists) | ||
| model_state_dict = self.model.state_dict() | ||
| if hasattr(self, 'proposer') and self.proposer is not None and hasattr(self.proposer, 'model'): | ||
| proposer_state_dict = self.proposer.model.state_dict() |
| logger.info(f"Including proposer model weights ({len(proposer_state_dict)} params)") | ||
|
|
||
| clean_state_dict = { | ||
| k: paddle.to_tensor(v.contiguous().numpy()) |
| # Usage: FD_SAVE_PDPARAMS=1 FD_SAVE_DIR=/path/to/output python -m fastdeploy... | ||
| if os.getenv("FD_SAVE_PDPARAMS", "0") == "1": | ||
| import shutil | ||
| import glob as glob_mod | ||
|
|
||
| visible_devices = os.getenv("CUDA_VISIBLE_DEVICES", "0").split(",") | ||
| meta_src_id = int(visible_devices[int(os.getenv("FLAGS_selected_gpus", "0"))]) | ||
| rank = paddle.distributed.get_rank() | ||
|
|
||
| # Determine save directory: FD_SAVE_DIR or default to model directory | ||
| save_dir = os.getenv("FD_SAVE_DIR", self.fd_config.model_config.model) | ||
| os.makedirs(save_dir, exist_ok=True) | ||
|
|
||
| # Copy config and tokenizer files (only rank 0 to avoid race) | ||
| if rank == 0: | ||
| src_dir = self.fd_config.model_config.model | ||
| copy_patterns = [ | ||
| "config.json", "generation_config.json", | ||
| "tokenizer*", "added_tokens.json", | ||
| "special_tokens_map.json", "chat_template*", | ||
| ] | ||
| for pattern in copy_patterns: | ||
| for f in glob_mod.glob(os.path.join(src_dir, pattern)): | ||
| dst = os.path.join(save_dir, os.path.basename(f)) | ||
| if not os.path.exists(dst): | ||
| shutil.copy2(f, dst) | ||
|
|
||
| # Save model weights (main model + proposer/MTP model if exists) | ||
| model_state_dict = self.model.state_dict() | ||
| if hasattr(self, 'proposer') and self.proposer is not None and hasattr(self.proposer, 'model'): | ||
| proposer_state_dict = self.proposer.model.state_dict() | ||
| model_state_dict.update(proposer_state_dict) | ||
| logger.info(f"Including proposer model weights ({len(proposer_state_dict)} params)") | ||
|
|
||
| clean_state_dict = { | ||
| k: paddle.to_tensor(v.contiguous().numpy()) | ||
| for k, v in model_state_dict.items() | ||
| } | ||
| model_path = os.path.join( | ||
| save_dir, | ||
| f"model_state.tp{rank}.{meta_src_id}.pdparams", | ||
| ) | ||
| paddle.save(clean_state_dict, model_path, safetensors=True) | ||
| del clean_state_dict | ||
| logger.info(f"Saved model state dict to {model_path}") |
| save_dir, | ||
| f"model_state.tp{rank}.{meta_src_id}.pdparams", | ||
| ) | ||
| paddle.save(clean_state_dict, model_path, safetensors=True) |
PaddlePaddle-bot
left a comment
There was a problem hiding this comment.
🤖 Paddle-CI-Agent | pr_review |
2026-05-12 15:35:34
📋 Review 摘要
PR 概述:新增 FD_SAVE_PDPARAMS 环境变量支持,在 load_model() 完成后将模型权重保存为 pdparams 文件,用于 RL 动态加载功能测试。
变更范围:fastdeploy/worker/gpu_model_runner.py
影响面 Tag:[RL] [Engine]
📝 PR 规范检查
标题使用中文全角括号且无官方 Tag;描述中所有 section 内容为空/仅占位符,需补全。
标题建议(可直接复制):
[RL] 新增 FD_SAVE_PDPARAMS 环境变量保存模型 pdparams 用于 RL 动态加载测试(Do Not Merge)
PR 描述建议(可直接复制,必须复刻 checklist §D2 模板的完整结构):
## Motivation
为 RL 动态加载(dynamic loading)功能开发测试,需要提前保存各 TP rank 的 pdparams 权重文件,以验证 `DynamicWeightManager` 的加载流程正确性。
## Modifications
- `fastdeploy/worker/gpu_model_runner.py`:在 `load_model()` 方法末尾新增保存逻辑
- 当环境变量 `FD_SAVE_PDPARAMS=1` 时,模型加载完成后自动保存权重
- 支持 `FD_SAVE_DIR` 指定输出目录(默认为模型目录)
- rank 0 复制 config / tokenizer 等配置文件
- 所有 rank 各自保存对应 TP 分片权重,文件命名为 `model_state.tp{rank}.{gpu_id}.pdparams`
- 若存在 proposer(投机解码),同时合并保存 proposer 模型权重
## Usage or Command
```bash
FD_SAVE_PDPARAMS=1 FD_SAVE_DIR=/path/to/output python -m fastdeploy ...
```
## Accuracy Tests
N/A(本 PR 为测试辅助工具,不影响模型推理精度)
## Checklist
- [ ] Add at least a tag in the PR title.
- Tag list: [`[FDConfig]`,`[APIServer]`,`[Engine]`, `[Scheduler]`, `[PD Disaggregation]`, `[Executor]`, `[Graph Optimization]`, `[Speculative Decoding]`, `[RL]`, `[Models]`, `[Quantization]`, `[Loader]`, `[OP]`, `[KVCache]`, `[DataProcessor]`, `[BugFix]`, `[Docs]`, `[CI]`, `[Optimization]`, `[Feature]`, `[Benchmark]`, `[Others]`, `[XPU]`, `[HPU]`, `[GCU]`, `[DCU]`, `[Iluvatar]`, `[Metax]`]
- You can add new tags based on the PR content, but the semantics must be clear.
- [ ] Format your code, run `pre-commit` before commit.
- [ ] Add unit tests. Please write the reason in this PR if no unit tests.
- [ ] Provide accuracy results.
- [ ] If the current PR is submitting to the `release` branch, make sure the PR has been submitted to the `develop` branch, then cherry-pick it to the `release` branch with the `[Cherry-Pick]` PR tag.问题
| 级别 | 文件 | 概述 |
|---|---|---|
| 🔴 Bug | fastdeploy/worker/gpu_model_runner.py:1435 |
visible_devices 数组越界,当 FLAGS_selected_gpus 值 ≥ visible_devices 长度时抛 IndexError |
| 🟡 建议 | fastdeploy/worker/gpu_model_runner.py:1464 |
.contiguous().numpy() + paddle.to_tensor() 双拷贝,大模型易 OOM |
总体评价
这是一个临时调试用 PR(Do Not Merge),整体逻辑清晰。但存在 visible_devices 数组越界的 P0 Bug,需在合入前修复;同时建议优化 tensor 转换方式以节省内存。
| import glob as glob_mod | ||
|
|
||
| visible_devices = os.getenv("CUDA_VISIBLE_DEVICES", "0").split(",") | ||
| meta_src_id = int(visible_devices[int(os.getenv("FLAGS_selected_gpus", "0"))]) |
There was a problem hiding this comment.
🔴 Bug visible_devices 数组可能越界,当 FLAGS_selected_gpus 值 ≥ visible_devices 长度时将抛出 IndexError。
例如 CUDA_VISIBLE_DEVICES=0 但 FLAGS_selected_gpus=1 时触发;或 CUDA_VISIBLE_DEVICES 为空字符串时 split(',') 产生 [''],int('') 同样报错。
建议修复:
visible_devices = os.getenv("CUDA_VISIBLE_DEVICES", "0").split(",")
local_rank = int(os.getenv("FLAGS_selected_gpus", "0"))
if local_rank < len(visible_devices) and visible_devices[local_rank].strip():
meta_src_id = int(visible_devices[local_rank].strip())
else:
meta_src_id = local_rank # fallback| logger.info(f"Including proposer model weights ({len(proposer_state_dict)} params)") | ||
|
|
||
| clean_state_dict = { | ||
| k: paddle.to_tensor(v.contiguous().numpy()) |
There was a problem hiding this comment.
🟡 建议 .contiguous().numpy() + paddle.to_tensor() 构成两次完整的 GPU→CPU→GPU 内存拷贝,对大模型(数十 GB 参数)极易触发 OOM,且耗时倍增。
paddle.save 可直接保存 GPU tensor,无需中转 numpy。建议简化为:
clean_state_dict = {k: v.contiguous() for k, v in model_state_dict.items()}或直接传入 model_state_dict(若已是 contiguous 则更佳)。
CI报告基于以下代码生成(30分钟更新一次): 1 任务总览存在 2 个 required 失败,3 个 required 运行中,1 个 required 等待中,需处理失败后等待其余任务完成。
2 任务状态汇总2.1 Required任务 : 4/10 通过
2.2 可选任务 — 27/31 通过
3 失败详情(仅 required)Approval — 代码审批检查(置信度: 高)Approval
根因详情: 关键日志: 修复建议:
修复建议摘要: 请求 xyxinyang 或 zyyzghb 审批此 PR 关联变更: Pre Commit — 代码格式检查(置信度: 高)Pre Commit
根因详情: 关键日志: 修复建议:
修复建议摘要: 本地运行 pre-commit 修复代码格式后重新提交 关联变更: |
Motivation
为 RL 动态加载(dynamic loading)功能开发测试,需要提前保存各 TP rank 的 pdparams 权重文件,以验证
DynamicWeightManager的加载流程正确性。Modifications
fastdeploy/worker/gpu_model_runner.py:在load_model()方法末尾新增保存逻辑FD_SAVE_PDPARAMS=1时,模型加载完成后自动保存权重FD_SAVE_DIR指定输出目录(默认为模型目录)model_state.tp{rank}.{gpu_id}.pdparamsUsage or Command
Accuracy Tests
N/A(本 PR 为测试辅助工具,不影响模型推理精度)
Checklist
[FDConfig],[APIServer],[Engine],[Scheduler],[PD Disaggregation],[Executor],[Graph Optimization],[Speculative Decoding],[RL],[Models],[Quantization],[Loader],[OP],[KVCache],[DataProcessor],[BugFix],[Docs],[CI],[Optimization],[Feature],[Benchmark],[Others],[XPU],[HPU],[GCU],[DCU],[Iluvatar],[Metax]]pre-commitbefore commit.releasebranch, make sure the PR has been submitted to thedevelopbranch, then cherry-pick it to thereleasebranch with the[Cherry-Pick]PR tag.