[Pipelines] Add DreamLite text-to-image and image-edit pipelines#13815
Draft
Carlofkl wants to merge 3 commits into
Draft
[Pipelines] Add DreamLite text-to-image and image-edit pipelines#13815Carlofkl wants to merge 3 commits into
Carlofkl wants to merge 3 commits into
Conversation
Add ByteDance's DreamLite model family to diffusers. DreamLite is a
UNet-based diffusion model that supports both text-to-image generation
and reference-image editing through a shared 3-branch dual-CFG design.
Two pipelines are shipped:
* DreamLitePipeline - full 3-branch dual CFG (negative,
reference, prompt); supports T2I and
I2I editing at 1024x1024.
* DreamLiteMobilePipeline - distilled single-branch variant for
on-device inference; no CFG.
New model code (all isolated under *_dreamlite.py / unet_dreamlite.py
to avoid touching shared upstream files):
* models/transformers/transformer_2d_dreamlite.py - DreamLite 2D
transformer block.
* models/unets/unet_dreamlite.py - DreamLiteUNetModel.
* models/unets/unet_2d_blocks_dreamlite.py - DreamLite-specific
down/up/mid blocks.
* models/resnet_dreamlite.py - DreamLite ResNet
variants.
* models/attention_processor.py - add
DreamLiteAttnProcessor2_0 (pure addition, no existing processor
modified).
Pipeline + tests + docs:
* pipelines/dreamlite/{__init__.py, pipeline_dreamlite.py,
pipeline_dreamlite_mobile.py, pipeline_output.py}.
* tests/pipelines/dreamlite/{test_pipeline_dreamlite.py,
test_pipeline_dreamlite_mobile.py} with the standard
PipelineTesterMixin suite; setUp/tearDown auto-patches encode_prompt
with a fake so MagicMock text encoders work without per-test
boilerplate.
* Skip 8 mixin tests that don't apply to DreamLite (MagicMock
serialisation, custom attention processor, encode_prompt return
shape, batch_size > 1 sweep), mirroring SD3 / Flux conventions.
* docs/source/en/api/pipelines/dreamlite.md + _toctree.yml entry
(alphabetically between DiT and EasyAnimate).
* Register exports in 6 __init__.py files.
Two real bugs surfaced by the mixin test suite are fixed in this
commit:
* num_images_per_prompt > 1: prompt_embeds and text_attention_mask
are now repeated along the batch dimension in both pipelines'
T2I and I2I branches before being passed to the UNet.
* vae=None: __init__ now guards the encoder_block_out_channels
lookup so encode_prompt can be tested in isolation per
PipelineTesterMixin convention.
SlowTests real-checkpoint resolution is set to 1024x1024 (the only
size DreamLite is trained for).
Test result: 27 passed, 50 skipped, 0 failed on CPU fast suite.
make style && make quality: clean.
The `carlofkl/DreamLite-{base,mobile}` Hub repos host two flavours of the
same checkpoint:
* `main` branch - keeps `model_index.json` pointing at ByteDance's
internal package path so the original (non-diffusers)
reference code can still load these weights.
* `diffusers` branch - rewrites the `unet` entry of `model_index.json` to
`["diffusers", "DreamLiteUNetModel"]` so this
integration loads correctly from `diffusers`.
This commit pins every `from_pretrained(...)` call shipped with the
diffusers integration (docs examples, pipeline docstrings, SlowTests) to
`revision="diffusers"`. Local-override env vars (DREAMLITE_BASE_PATH /
DREAMLITE_MOBILE_PATH) still bypass the revision pin.
…ts after rebase Mechanical changes after rebasing onto current `main`: * `pipeline_dreamlite.py::retrieve_timesteps` — re-synced from `diffusers.pipelines.flux.pipeline_flux.retrieve_timesteps` (PEP 604 type hints, expanded docstring, plus the new `accepts_timesteps` / `accept_sigmas` introspection guards). DreamLite's default code path uses `num_inference_steps` (uniform schedule) and never passes custom `timesteps` / `sigmas`, so the added guards are dead-code for this pipeline — behaviour is unchanged. * `dummy_pt_objects.py` / `dummy_torch_and_transformers_objects.py` — registered the dummy classes auto-generated by `make fix-copies` for `DreamLiteTransformer2DModel`, `DreamLiteUNetModel`, `DreamLitePipeline`, `DreamLiteMobilePipeline`, `DreamLitePipelineOutput`. Generated by `make fix-copies`. No hand edits.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Context
This PR integrates DreamLite — ByteDance's text-to-image / image-edit diffusion model — into
diffusers, following an invitation from @NielsRogge to release the model on the Hub indiffusersformat.Related issue: ByteVisionLab/DreamLite#3 (comment)
Model cards (public, ungated):
Both repos use a
diffusersbranch (loaded viarevision="diffusers") to keep the original ByteDance-internalmainbranch intact for backward compatibility with existing users.What's added
Architecture highlights
DreamLiteUNetModel— UNet-based denoiser conditioned on Qwen3-VL text/vision embeddings.DreamLitePipeline— runs 3 forward passes per step (text-cond / image-cond / uncond) and combines them with a dual-CFG schedule for high-fidelity text-to-image and image edit.DreamLiteMobilePipeline— distilled single-pass variant; no CFG; designed for on-device inference. Pairs withAutoencoderTiny.FlowMatchEulerDiscreteScheduler.Testing
carlofkl/DreamLite-basewithrevision="diffusers"— all 6 sub-modules resolve to the correctdiffusers.*namespace.std≈93, no NaN/Inf).tests/pipelines/dreamlite/.Before submitting
Who can review?
cc @sayakpaul @yiyixuxu @DN6 — thanks in advance for the review!