[grpo] set default load_format auto#9100
Conversation
There was a problem hiding this comment.
Code Review
This pull request updates the default load_format for vLLM engines from 'dummy' to 'auto' across several rollout-related modules. The review feedback highlights that existing code comments in swift/pipelines/infer/rollout.py and swift/rlhf_trainers/rollout_mixin.py are now stale and need to be updated to reflect the new default. Furthermore, there is a suggestion to improve weight synchronization by including buffers, which would address the performance regression introduced by loading from disk and potentially allow for a return to the faster 'dummy' load format.
| # as they will be synced from the trainer process. | ||
| # This will accelerate the rollout speed. | ||
| load_format = engine_kwargs.pop('load_format', 'dummy') | ||
| load_format = engine_kwargs.pop('load_format', 'auto') |
There was a problem hiding this comment.
The comment block on lines 452-454 is now stale as it explicitly mentions using 'dummy' to accelerate rollout speed by preventing disk loading. Since the default is being changed to 'auto', this comment should be updated to reflect the new behavior and the reason for it (e.g., ensuring buffer weights are initialized), while acknowledging the performance trade-off of loading from disk.
| # Use load_format from vllm_engine_kwargs if provided, otherwise default to 'dummy' | ||
| vllm_engine_kwargs = self.args.vllm_engine_kwargs or {} | ||
| load_format = vllm_engine_kwargs.pop('load_format', 'dummy') | ||
| load_format = vllm_engine_kwargs.pop('load_format', 'auto') |
There was a problem hiding this comment.
The comment on line 236 still refers to 'dummy' as the default value. It should be updated to 'auto' to match the code change.
Additionally, while using 'auto' is a valid workaround for missing buffer weights in the synchronization logic, it introduces a performance regression by loading the entire model from disk during initialization. A more efficient long-term solution would be to include buffers in the weight synchronization logic (e.g., in split_batches and _collect_state_dict_for_vllm), which would allow reverting to the 'dummy' load format for faster rollout engine startup.
|
I have try setting vllm load format to auto from cmd line, even though it solve model garbage output at iteration 0, weight sync is the actual root cause and GRPO training is not making progress. |
#9096
Some models, such as Gemma4, have buffer weights that are not covered by the current weight synchronization logic. Previously, the default load_format was set to dummy, which caused these buffer weights to be incorrectly initialized. Set load_format to auto to work around this issue.