Skip to content

Conversation

@a120092009
Copy link
Contributor

@a120092009 a120092009 commented Nov 11, 2025

What does this PR do?

Add Cambricon MLU support for diffusers. Context parallelism test will be error without mlu support due to " Supported strategies are: balanced, cpu".

Before submitting

Who can review?

@sayakpaul @yiyixuxu
Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

Test Code

import torch
from diffusers import AutoModel, QwenImagePipeline, ContextParallelConfig

try:
    torch.distributed.init_process_group("cncl")
    rank = torch.distributed.get_rank()
    device = torch.device("mlu", rank % torch.mlu.device_count())
    torch.mlu.set_device(device)

    transformer = AutoModel.from_pretrained("/data/sd/sd_models/hf_models/Qwen/Qwen-Image", subfolder="transformer", torch_dtype=torch.bfloat16, parallel_config=ContextParallelConfig(ulysses_degree=2), device_map='mlu')
    pipeline = QwenImagePipeline.from_pretrained("/data/sd/sd_models/hf_models/Qwen/Qwen-Image", transformer=transformer, torch_dtype=torch.bfloat16, device_map="mlu")

    prompt = """
    cinematic film still of a cat sipping a margarita in a pool in Palm Springs, California
    highly detailed, high budget hollywood movie, cinemascope, moody, epic, gorgeous, film grain
    """

    # Must specify generator so all ranks start with same latents (or pass your own)
    generator = torch.Generator().manual_seed(42)
    image = pipeline(prompt, num_inference_steps=50, generator=generator).images[0]

    if rank == 0:
        image.save("output.png")

except Exception as e:
    print(f"An error occurred: {e}")
    torch.distributed.breakpoint()
    raise

finally:
    if torch.distributed.is_initialized():
        torch.distributed.destroy_process_group()

Result:
image
image

@sayakpaul
Copy link
Member

@a120092009 thanks for your PR!

Could you talk a bit about the performance of this new accelerator? Any advantages over regular NVIDIA GPUs, etc.?

@a120092009
Copy link
Contributor Author

Cambricon is a well-known AI-chip company in China.

For the detailed product message, please see our homepage: https://www.cambricon.com/

Copy link
Collaborator

@yiyixuxu yiyixuxu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for the PR!

return "xpu"
elif torch.backends.mps.is_available():
return "mps"
elif torch.mlu.is_available():
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
elif torch.mlu.is_available():
elif is_mlu_available:

also need to import this

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@yiyixuxu
Copy link
Collaborator

@bot /style

@github-actions
Copy link
Contributor

github-actions bot commented Nov 12, 2025

Style bot fixed some files and pushed the changes.

@yiyixuxu yiyixuxu merged commit aecf0c5 into huggingface:main Nov 12, 2025
10 of 11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants