Skip to content

[grpo] set default load_format auto#9100

Open
hjh0119 wants to merge 2 commits intomodelscope:mainfrom
hjh0119:set-load-format
Open

[grpo] set default load_format auto#9100
hjh0119 wants to merge 2 commits intomodelscope:mainfrom
hjh0119:set-load-format

Conversation

@hjh0119
Copy link
Copy Markdown
Collaborator

@hjh0119 hjh0119 commented Apr 14, 2026

#9096

Some models, such as Gemma4, have buffer weights that are not covered by the current weight synchronization logic. Previously, the default load_format was set to dummy, which caused these buffer weights to be incorrectly initialized. Set load_format to auto to work around this issue.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request updates the default load_format for vLLM engines from 'dummy' to 'auto' across several rollout-related modules. The review feedback highlights that existing code comments in swift/pipelines/infer/rollout.py and swift/rlhf_trainers/rollout_mixin.py are now stale and need to be updated to reflect the new default. Furthermore, there is a suggestion to improve weight synchronization by including buffers, which would address the performance regression introduced by loading from disk and potentially allow for a return to the faster 'dummy' load format.

# as they will be synced from the trainer process.
# This will accelerate the rollout speed.
load_format = engine_kwargs.pop('load_format', 'dummy')
load_format = engine_kwargs.pop('load_format', 'auto')
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The comment block on lines 452-454 is now stale as it explicitly mentions using 'dummy' to accelerate rollout speed by preventing disk loading. Since the default is being changed to 'auto', this comment should be updated to reflect the new behavior and the reason for it (e.g., ensuring buffer weights are initialized), while acknowledging the performance trade-off of loading from disk.

# Use load_format from vllm_engine_kwargs if provided, otherwise default to 'dummy'
vllm_engine_kwargs = self.args.vllm_engine_kwargs or {}
load_format = vllm_engine_kwargs.pop('load_format', 'dummy')
load_format = vllm_engine_kwargs.pop('load_format', 'auto')
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The comment on line 236 still refers to 'dummy' as the default value. It should be updated to 'auto' to match the code change.

Additionally, while using 'auto' is a valid workaround for missing buffer weights in the synchronization logic, it introduces a performance regression by loading the entire model from disk during initialization. A more efficient long-term solution would be to include buffers in the weight synchronization logic (e.g., in split_batches and _collect_state_dict_for_vllm), which would allow reverting to the 'dummy' load format for faster rollout engine startup.

@AceHao
Copy link
Copy Markdown

AceHao commented Apr 14, 2026

I have try setting vllm load format to auto from cmd line, even though it solve model garbage output at iteration 0, weight sync is the actual root cause and GRPO training is not making progress.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants