feat: ernie image/turbo by Pfannkuchensack · Pull Request #9115 · invoke-ai/InvokeAI

Pfannkuchensack · 2026-05-03T15:33:13Z

Summary

Adds Baidu ERNIE-Image and ERNIE-Image-Turbo (HF, HF Turbo) as a new BaseModelType, mirroring the FLUX.2 Klein and Z-Image integration patterns.

The two checkpoints share the same ErnieImageTransformer2DModel architecture (3072 hidden, 24 layers, 24 heads, Mistral3 text encoder, AutoencoderKLFlux2 VAE) and only differ in inference defaults (50 steps + CFG 4.0 vs. 8 steps + CFG 1.0 for Turbo), so they live under one BaseModelType.ErnieImage without a variant enum. The optional 3B Mistral3-based prompt enhancer that ships with the pipeline is wired through with a UI toggle.

This PR builds on top of #8859 (transformers 5.1+) and additionally bumps diffusers 0.36.0 → 0.38.0, which is the first release containing ErnieImagePipeline and ErnieImageTransformer2DModel.

Backend changes

BaseModelType.ErnieImage, ModelType.PromptEnhancer, plus two new SubModelTypes (pe, pe_tokenizer) for the bundled prompt enhancer.
Main_Diffusers_ErnieImage_Config and Main_Checkpoint_ErnieImage_Config with state-dict-based detection (x_embedder + text_proj + adaLN_modulation).
A diffusers loader that loads transformer / vae / text_encoder / tokenizer plus optional pe / pe_tokenizer from a single pipeline directory.
invokeai/backend/ernie_image/ with sampling utilities (2×2 patchify, BN normalize/denormalize, sigma schedule, padded text packing) and a rectified-flow denoise loop supporting Euler / Heun / LCM (reusing the FlowMatch* schedulers from FLUX).
Five new invocations: ernie_image_model_loader, ernie_image_text_encoder (with prompt-enhancer toggle), ernie_image_denoise, ernie_image_vae_encode, ernie_image_vae_decode.
ErnieImageConditioningInfo + conditioning field/output, including pickle allowlist for the disk serializer.
Generation modes ernie_image_{txt2img,img2img,inpaint,outpaint}.
Starter models for baidu/ERNIE-Image and baidu/ERNIE-Image-Turbo, plus a STARTER_BUNDLES entry.

Frontend changes

services/api/schema.ts regenerated to expose the new node types.
Type unions extended (ImageOutput, LatentToImage, ImageToLatents, DenoiseLatents, MainModelLoaderNodes).
ParamsState gains ernieImageScheduler and ernieImageUsePromptEnhancer, with reducers, selectors, and selectIsErnieImage.
buildErnieImageGraph (txt2img / img2img / inpaint / outpaint) wired into useEnqueueCanvas.
ERNIE entries added to MODEL_BASE_TO_{COLOR,LONG_NAME,SHORT_NAME}; prompt_enhancer added to MODEL_TYPE_TO_LONG_NAME.
ParamErnieImageScheduler and ParamErnieImagePromptEnhancer rendered conditionally in GenerationSettingsAccordion.
All add{TextTo,ImageTo,Inpaint,Outpaint}Image rectified-flow type guards extended to accept ernie_image_denoise; isMainModelWithoutUnet likewise.

Diffusers 0.38 fix-out

hotfixes.py: import LoRACompatibleConv directly. The lazy module loader in 0.38 no longer exposes diffusers.models.lora as an attribute, so the legacy patch path crashes on import.
pyproject.toml declares prerelease = "allow" under [tool.uv], with a comment explaining that diffusers 0.38.0 itself hard-pins safetensors>=0.8.0-rc.0. We can drop this again once a diffusers patch release ships with a stable safetensors floor.

Related Issues / Discussions

Builds on Update to transformers 5.1.0 #8859 (transformers 5.1.0).
Diffusers 0.38.0 release notes: https://github.com/huggingface/diffusers/releases/tag/v0.38.0
Baidu ERNIE-Image: https://huggingface.co/baidu/ERNIE-Image / https://huggingface.co/baidu/ERNIE-Image-Turbo

QA Instructions

Automated

pytest tests/ -m "not slow" — passes (619 / 619, the same tests/model_identification LFS-skip as on main).
pnpm lint:tsc, pnpm lint:eslint, pnpm lint:prettier — all clean.
pnpm test:no-watch — 563 / 563 frontend tests pass.

Manual smoketest of pre-existing models (regression for the diffusers / transformers bump)

Pick at least two of these and run a single txt2img each. Confirm no crashes and visually-plausible output:

SDXL
FLUX.1 Dev
FLUX.2 Klein
SD3
Z-Image-Turbo

Plus the relevant items from #8859's test plan: SD 1.5 prompt-weighted generation (compel path), FLUX text-to-image (T5 tokenizer path), HF model install via repo ID, NSFW checker first-time download.

Manual smoketest of ERNIE-Image (requires a GPU — 8B parameters)

Install baidu/ERNIE-Image-Turbo from the new starter bundle. The Model Manager should classify it as BaseModelType.ErnieImage.
Pick the model on the Generate tab. Confirm the scheduler dropdown and Prompt Enhancer toggle appear in the Generation accordion.
txt2img at 1024×1024, 8 steps, CFG 1.0 — should produce an image.
Toggle the prompt enhancer on with a short prompt (e.g. "a fox"); the enhancer log line in the backend should show a rewritten longer prompt before encoding.
Repeat with baidu/ERNIE-Image (50 steps, CFG 4.0).
img2img / inpaint / outpaint — one run each on the canvas tab.

Merge Plan

This PR depends on Update to transformers 5.1.0 #8859 being merged first (or being merged in together, since they share the transformers>=5.1.0 override). I have no preference; happy to rebase whenever Update to transformers 5.1.0 #8859 lands.
After this PR merges, a follow-up release should call out the diffusers and transformers major bumps in the changelog. The ERNIE-Image starter bundle is gated behind those bumps.
The prerelease = "allow" line in pyproject.toml is a temporary measure tied to diffusers 0.38.0's safetensors>=0.8.0-rc.0 upstream pin. Worth revisiting (and removing) once a diffusers patch release relaxes that requirement.

Checklist

The PR has a short but descriptive title, suitable for a changelog
Tests added / updated (if applicable) — backend regression suite covers the new code paths via existing model-config and loader-registry tests; no new dedicated unit tests for the ERNIE sampling helpers (the patchify roundtrip was sanity-checked manually).
❗Changes to a redux slice have a corresponding migration — N/A: only additive fields with defaults in paramsSlice.
Documentation added / updated (if applicable) — none yet; the docs-old/contributing/NEW_MODEL_INTEGRATION.md checklist describes the integration shape this PR follows.
Updated What's New copy (if doing a release after this PR)

Out of scope (planned follow-ups)

ControlNet, IP-Adapter, and LoRA support for ERNIE-Image.
Single-file checkpoint loading (the Main_Checkpoint_ErnieImage_Config is in place as defensive scaffolding, but the loader currently raises NotImplementedError for that format).
Metadata recall handlers in the gallery side panel for ERNIE-specific parameters.

Adds Baidu ERNIE-Image and ERNIE-Image-Turbo as a new BaseModelType, mirroring the FLUX.2 / Z-Image integration pattern. Both models share the ErnieImageTransformer2DModel architecture (3072 hidden, 24 layers, 24 heads) and an AutoencoderKLFlux2 VAE; they differ only in default inference settings (50 steps + CFG 4.0 vs 8 steps + CFG 1.0 for Turbo). Built on top of PR invoke-ai#8859 (transformers 5.1+) and additionally bumps diffusers 0.36.0 -> 0.38.0, which is the first release containing the ErnieImagePipeline and ErnieImageTransformer2DModel. Backend - BaseModelType.ErnieImage, ModelType.PromptEnhancer, two new SubModelTypes (pe, pe_tokenizer) for the bundled prompt enhancer - Main_Diffusers_ErnieImage_Config + Main_Checkpoint_ErnieImage_Config with state-dict-based detection (x_embedder + text_proj + adaLN_modulation) - Diffusers loader registered for ERNIE-Image; uses upstream subdir conventions, loads transformer / vae / text_encoder / tokenizer plus optional pe / pe_tokenizer - New invokeai/backend/ernie_image/ with sampling utilities (2x2 patchify, BN normalize/denormalize, sigma schedule, padded text packing) and a rectified-flow denoise loop supporting Euler/Heun/LCM - Five invocations: model_loader, text_encoder (with prompt-enhancer toggle), denoise, vae_encode, vae_decode - ErnieImageConditioningInfo + ConditioningField/Output + pickle allowlist - ERNIE_IMAGE_SCHEDULER_MAP reusing the FlowMatch* scheduler classes - New generation modes ernie_image_{txt2img,img2img,inpaint,outpaint} - Starter models for baidu/ERNIE-Image and baidu/ERNIE-Image-Turbo + STARTER_BUNDLES entry Frontend - Regenerated services/api/schema.ts to expose the new node types - Type unions extended (ImageOutput / LatentToImage / ImageToLatents / DenoiseLatents / MainModelLoaderNodes) - ParamsState gains ernieImageScheduler + ernieImageUsePromptEnhancer, with reducers, selectors, and selectIsErnieImage - buildErnieImageGraph (txt2img/img2img/inpaint/outpaint) wired into useEnqueueCanvas - ERNIE entries added to MODEL_BASE_TO_{COLOR,LONG_NAME,SHORT_NAME} and prompt_enhancer to MODEL_TYPE_TO_LONG_NAME - ParamErnieImageScheduler and ParamErnieImagePromptEnhancer rendered conditionally in GenerationSettingsAccordion - All add{TextTo,ImageTo,Inpaint,Outpaint}Image type guards extended to accept ernie_image_denoise; isMainModelWithoutUnet ditto Diffusers 0.38 fix-out - hotfixes.py: import LoRACompatibleConv directly; the lazy-module __getattr__ no longer exposes diffusers.models.lora as an attribute Verification - pytest tests/ -m "not slow": 619 passed, 0 failed - pnpm lint:tsc / lint:eslint / lint:prettier: clean - pnpm test:no-watch: 563 passed, 0 failed - Manual smoketest pending: requires baidu/ERNIE-Image weights and a GPU (8B parameters; CPU not practical) Out of scope (follow-up phases) - ControlNet, IP-Adapter, LoRA support for ERNIE-Image - Single-file checkpoint loading (defensive scaffolding only) - Metadata recall handlers in the gallery side panel

…nd UI cleanup - Pass timesteps in [0, num_train_timesteps] to the transformer instead of [0, 1]; the diffusers Timesteps embedding expects the unnormalised range, which produced mosaic-pattern garbage instead of an image. - Unpatchify predicted-x0 before the denoise step callback and route through sd_step_callback so the canvas shows a live preview during sampling (uses FLUX.2's RGB factors -- same AutoencoderKLFlux2 / 32 latent channels). - Add the ernie-image case to useEnqueueGenerate (Generate tab); was only wired up in useEnqueueCanvas, so plain text-to-image failed with "No graph builders for base ernie-image". - Move the Prompt Enhancer toggle from the Generation accordion into a dedicated ERNIE-Image block in the Advanced accordion; hide the rest of the SD-style advanced controls (CLIP skip, CFG rescale, seamless, color comp., separate VAE) since none apply to ERNIE-Image. - Detect ERNIE-Image-Turbo by name in MainModelDefaultSettings.from_base so installs (starter or manual) get steps=8, cfg_scale=1.0 instead of the standard 50/4.0. - Pin compel to Cstannahill/compel5@chore/transformers5-diffusers-smoke for transformers>=5 compatibility (PR damian0815/compel#129).

Your Name and others added 4 commits February 6, 2026 19:58

Update to Transformers 5.1.0

df20012

remove extra stuff

f69f22c

Merge branch 'main' into main

57e91b4

github-actions Bot added api python PRs that change python files Root invocations PRs that change invocations backend PRs that change backend files services PRs that change app services frontend PRs that change frontend files python-deps PRs that change python dependencies labels May 3, 2026

Pfannkuchensack added 4 commits May 3, 2026 23:56

Merge remote-tracking branch 'upstream/main' into feature/ernie-image

7e93caf

Chore Ruff + Typegen

85f2646

Chore ruff

c58f087

Pfannkuchensack added the DO NOT MERGE label May 5, 2026

Merge branch 'main' into feature/ernie-image

0a268fd

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: ernie image/turbo#9115

feat: ernie image/turbo#9115
Pfannkuchensack wants to merge 9 commits intoinvoke-ai:mainfrom
Pfannkuchensack:feature/ernie-image

Pfannkuchensack commented May 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Pfannkuchensack commented May 3, 2026

Summary

Backend changes

Frontend changes

Diffusers 0.38 fix-out

Related Issues / Discussions

QA Instructions

Automated

Manual smoketest of pre-existing models (regression for the diffusers / transformers bump)

Manual smoketest of ERNIE-Image (requires a GPU — 8B parameters)

Merge Plan

Checklist

Out of scope (planned follow-ups)

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants