Skip to content

[Pipelines] Add DreamLite text-to-image and image-edit pipelines#13815

Draft
Carlofkl wants to merge 3 commits into
huggingface:mainfrom
Carlofkl:feature/dreamlite-integration
Draft

[Pipelines] Add DreamLite text-to-image and image-edit pipelines#13815
Carlofkl wants to merge 3 commits into
huggingface:mainfrom
Carlofkl:feature/dreamlite-integration

Conversation

@Carlofkl
Copy link
Copy Markdown

Context

This PR integrates DreamLite — ByteDance's text-to-image / image-edit diffusion model — into diffusers, following an invitation from @NielsRogge to release the model on the Hub in diffusers format.

Related issue: ByteVisionLab/DreamLite#3 (comment)

Model cards (public, ungated):

Both repos use a diffusers branch (loaded via revision="diffusers") to keep the original ByteDance-internal main branch intact for backward compatibility with existing users.

What's added

src/diffusers/
├── models/unets/unet_dreamlite.py            # DreamLiteUNetModel
├── pipelines/dreamlite/
│   ├── __init__.py
│   ├── pipeline_dreamlite.py                  # DreamLitePipeline (3-branch dual CFG)
│   ├── pipeline_dreamlite_mobile.py           # DreamLiteMobilePipeline (distilled)
│   └── pipeline_output.py
└── (registered in src/diffusers/__init__.py, models/__init__.py,
    pipelines/__init__.py, utils/dummy_*.py)

docs/source/en/api/pipelines/dreamlite.md
tests/pipelines/dreamlite/
├── test_pipeline_dreamlite.py
└── test_pipeline_dreamlite_mobile.py

Architecture highlights

  • DreamLiteUNetModel — UNet-based denoiser conditioned on Qwen3-VL text/vision embeddings.
  • DreamLitePipeline — runs 3 forward passes per step (text-cond / image-cond / uncond) and combines them with a dual-CFG schedule for high-fidelity text-to-image and image edit.
  • DreamLiteMobilePipeline — distilled single-pass variant; no CFG; designed for on-device inference. Pairs with AutoencoderTiny.
  • Both pipelines use FlowMatchEulerDiscreteScheduler.

Testing

  • Loading smoke test against carlofkl/DreamLite-base with revision="diffusers" — all 6 sub-modules resolve to the correct diffusers.* namespace.
  • Inference smoke test — generates a 1024×1024 image in ~0.6s/step on a single A800; output stats sane (std≈93, no NaN/Inf).
  • Standard pipeline tests in tests/pipelines/dreamlite/.

Before submitting

Who can review?

cc @sayakpaul @yiyixuxu @DN6 — thanks in advance for the review!

Carlofkl added 3 commits May 27, 2026 11:38
Add ByteDance's DreamLite model family to diffusers. DreamLite is a
UNet-based diffusion model that supports both text-to-image generation
and reference-image editing through a shared 3-branch dual-CFG design.
Two pipelines are shipped:

* DreamLitePipeline           - full 3-branch dual CFG (negative,
                                reference, prompt); supports T2I and
                                I2I editing at 1024x1024.
* DreamLiteMobilePipeline     - distilled single-branch variant for
                                on-device inference; no CFG.

New model code (all isolated under *_dreamlite.py / unet_dreamlite.py
to avoid touching shared upstream files):

* models/transformers/transformer_2d_dreamlite.py - DreamLite 2D
  transformer block.
* models/unets/unet_dreamlite.py                  - DreamLiteUNetModel.
* models/unets/unet_2d_blocks_dreamlite.py        - DreamLite-specific
  down/up/mid blocks.
* models/resnet_dreamlite.py                      - DreamLite ResNet
  variants.
* models/attention_processor.py                   - add
  DreamLiteAttnProcessor2_0 (pure addition, no existing processor
  modified).

Pipeline + tests + docs:

* pipelines/dreamlite/{__init__.py, pipeline_dreamlite.py,
  pipeline_dreamlite_mobile.py, pipeline_output.py}.
* tests/pipelines/dreamlite/{test_pipeline_dreamlite.py,
  test_pipeline_dreamlite_mobile.py} with the standard
  PipelineTesterMixin suite; setUp/tearDown auto-patches encode_prompt
  with a fake so MagicMock text encoders work without per-test
  boilerplate.
* Skip 8 mixin tests that don't apply to DreamLite (MagicMock
  serialisation, custom attention processor, encode_prompt return
  shape, batch_size > 1 sweep), mirroring SD3 / Flux conventions.
* docs/source/en/api/pipelines/dreamlite.md + _toctree.yml entry
  (alphabetically between DiT and EasyAnimate).
* Register exports in 6 __init__.py files.

Two real bugs surfaced by the mixin test suite are fixed in this
commit:

* num_images_per_prompt > 1: prompt_embeds and text_attention_mask
  are now repeated along the batch dimension in both pipelines'
  T2I and I2I branches before being passed to the UNet.
* vae=None: __init__ now guards the encoder_block_out_channels
  lookup so encode_prompt can be tested in isolation per
  PipelineTesterMixin convention.

SlowTests real-checkpoint resolution is set to 1024x1024 (the only
size DreamLite is trained for).

Test result: 27 passed, 50 skipped, 0 failed on CPU fast suite.
make style && make quality: clean.
The `carlofkl/DreamLite-{base,mobile}` Hub repos host two flavours of the
same checkpoint:

* `main` branch      - keeps `model_index.json` pointing at ByteDance's
                       internal package path so the original (non-diffusers)
                       reference code can still load these weights.
* `diffusers` branch - rewrites the `unet` entry of `model_index.json` to
                       `["diffusers", "DreamLiteUNetModel"]` so this
                       integration loads correctly from `diffusers`.

This commit pins every `from_pretrained(...)` call shipped with the
diffusers integration (docs examples, pipeline docstrings, SlowTests) to
`revision="diffusers"`. Local-override env vars (DREAMLITE_BASE_PATH /
DREAMLITE_MOBILE_PATH) still bypass the revision pin.
…ts after rebase

Mechanical changes after rebasing onto current `main`:

* `pipeline_dreamlite.py::retrieve_timesteps` — re-synced from
  `diffusers.pipelines.flux.pipeline_flux.retrieve_timesteps` (PEP 604
  type hints, expanded docstring, plus the new
  `accepts_timesteps` / `accept_sigmas` introspection guards). DreamLite's
  default code path uses `num_inference_steps` (uniform schedule) and never
  passes custom `timesteps` / `sigmas`, so the added guards are dead-code
  for this pipeline — behaviour is unchanged.
* `dummy_pt_objects.py` / `dummy_torch_and_transformers_objects.py` —
  registered the dummy classes auto-generated by `make fix-copies` for
  `DreamLiteTransformer2DModel`, `DreamLiteUNetModel`, `DreamLitePipeline`,
  `DreamLiteMobilePipeline`, `DreamLitePipelineOutput`.

Generated by `make fix-copies`. No hand edits.
@github-actions github-actions Bot added size/L PR with diff > 200 LOC documentation Improvements or additions to documentation models tests utils pipelines and removed size/L PR with diff > 200 LOC labels May 27, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation models pipelines tests utils

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant