Fix ZeRO-3: Use per-param dtype for output buffers in _allgather_params_coalesced by albertvillanova · Pull Request #8073 · deepspeedai/DeepSpeed

albertvillanova · 2026-06-17T08:35:13Z

This PR fixes the _allgather_params_coalesced method in partition_parameters.py. The change ensures that each flat_tensor is created with the correct data type by referencing the corresponding parameter in param_list, rather than always using the first parameter's data type.

Fix #8072.

Problem

_allgather_params_coalesced allocates all output buffers using the dtype of the first parameter in param_list:

# before
for psize in partition_sizes:
    flat_tensor = torch.empty(tensor_size, dtype=param_list[0].ds_tensor.dtype, ...)

This assumed every persistent parameter shares the same dtype. The assumption was incidentally maintained before 0.19.2 because _configure_distributed_model called module.bfloat16() unconditionally, normalising all persistent parameters (including PEFT LoRA adapters) to a uniform dtype.

PR #8066 "Mixed-precision: per-policy param/buffer dtype cast (preserve fp32 buffers)" (commit b919284) correctly stopped casting ZeRO-Init model params, but exposed the latent bug: PEFT's default autocast_adapter_dtype=True keeps LoRA adapters in fp32 even when the base model is bf16. persistent_parameters therefore ends up with mixed dtypes (bf16 base-model params + fp32 LoRA params), and the mismatch between a bf16 output buffer and a fp32 input tensor raises:

TypeError: output tensor must have the same type as input tensor

Solution

Allocate each output buffer with the dtype of its own parameter:

# after
for i, psize in enumerate(partition_sizes):
    flat_tensor = torch.empty(tensor_size, dtype=param_list[i].ds_tensor.dtype, ...)

This removes the shared-dtype assumption at the source rather than relying on upstream callers to normalise dtypes before calling _allgather_params_coalesced.

Changes

Corrected tensor data type selection in _allgather_params_coalesced to use the data type of each parameter in param_list, ensuring proper handling of mixed data types.

Signed-off-by: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com>

albertvillanova requested review from tjruwase and tohtana as code owners June 17, 2026 08:35

Use per-param dtype in _allgather_params_coalesced output buffers

c43a5a5

Signed-off-by: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com>

albertvillanova force-pushed the fix-8072 branch from b6007b5 to c43a5a5 Compare June 17, 2026 08:38

albertvillanova changed the title ~~Fix zero3: Use per-param dtype for output buffers in _allgather_params_coalesced~~ Fix ZeRO-3: Use per-param dtype for output buffers in _allgather_params_coalesced Jun 17, 2026

albertvillanova mentioned this pull request Jun 17, 2026

Fix ZeRO-3 + PEFT mixed-dtype error for core trainers huggingface/trl#6091

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix ZeRO-3: Use per-param dtype for output buffers in _allgather_params_coalesced#8073

Fix ZeRO-3: Use per-param dtype for output buffers in _allgather_params_coalesced#8073
albertvillanova wants to merge 1 commit into
deepspeedai:masterfrom
albertvillanova:fix-8072

albertvillanova commented Jun 17, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

albertvillanova commented Jun 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Problem

Solution

Changes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

albertvillanova commented Jun 17, 2026 •

edited

Loading