feat: ernie image/turbo#9115
Draft
Pfannkuchensack wants to merge 9 commits intoinvoke-ai:mainfrom
Draft
Conversation
Adds Baidu ERNIE-Image and ERNIE-Image-Turbo as a new BaseModelType, mirroring the FLUX.2 / Z-Image integration pattern. Both models share the ErnieImageTransformer2DModel architecture (3072 hidden, 24 layers, 24 heads) and an AutoencoderKLFlux2 VAE; they differ only in default inference settings (50 steps + CFG 4.0 vs 8 steps + CFG 1.0 for Turbo). Built on top of PR invoke-ai#8859 (transformers 5.1+) and additionally bumps diffusers 0.36.0 -> 0.38.0, which is the first release containing the ErnieImagePipeline and ErnieImageTransformer2DModel. Backend - BaseModelType.ErnieImage, ModelType.PromptEnhancer, two new SubModelTypes (pe, pe_tokenizer) for the bundled prompt enhancer - Main_Diffusers_ErnieImage_Config + Main_Checkpoint_ErnieImage_Config with state-dict-based detection (x_embedder + text_proj + adaLN_modulation) - Diffusers loader registered for ERNIE-Image; uses upstream subdir conventions, loads transformer / vae / text_encoder / tokenizer plus optional pe / pe_tokenizer - New invokeai/backend/ernie_image/ with sampling utilities (2x2 patchify, BN normalize/denormalize, sigma schedule, padded text packing) and a rectified-flow denoise loop supporting Euler/Heun/LCM - Five invocations: model_loader, text_encoder (with prompt-enhancer toggle), denoise, vae_encode, vae_decode - ErnieImageConditioningInfo + ConditioningField/Output + pickle allowlist - ERNIE_IMAGE_SCHEDULER_MAP reusing the FlowMatch* scheduler classes - New generation modes ernie_image_{txt2img,img2img,inpaint,outpaint} - Starter models for baidu/ERNIE-Image and baidu/ERNIE-Image-Turbo + STARTER_BUNDLES entry Frontend - Regenerated services/api/schema.ts to expose the new node types - Type unions extended (ImageOutput / LatentToImage / ImageToLatents / DenoiseLatents / MainModelLoaderNodes) - ParamsState gains ernieImageScheduler + ernieImageUsePromptEnhancer, with reducers, selectors, and selectIsErnieImage - buildErnieImageGraph (txt2img/img2img/inpaint/outpaint) wired into useEnqueueCanvas - ERNIE entries added to MODEL_BASE_TO_{COLOR,LONG_NAME,SHORT_NAME} and prompt_enhancer to MODEL_TYPE_TO_LONG_NAME - ParamErnieImageScheduler and ParamErnieImagePromptEnhancer rendered conditionally in GenerationSettingsAccordion - All add{TextTo,ImageTo,Inpaint,Outpaint}Image type guards extended to accept ernie_image_denoise; isMainModelWithoutUnet ditto Diffusers 0.38 fix-out - hotfixes.py: import LoRACompatibleConv directly; the lazy-module __getattr__ no longer exposes diffusers.models.lora as an attribute Verification - pytest tests/ -m "not slow": 619 passed, 0 failed - pnpm lint:tsc / lint:eslint / lint:prettier: clean - pnpm test:no-watch: 563 passed, 0 failed - Manual smoketest pending: requires baidu/ERNIE-Image weights and a GPU (8B parameters; CPU not practical) Out of scope (follow-up phases) - ControlNet, IP-Adapter, LoRA support for ERNIE-Image - Single-file checkpoint loading (defensive scaffolding only) - Metadata recall handlers in the gallery side panel
…nd UI cleanup - Pass timesteps in [0, num_train_timesteps] to the transformer instead of [0, 1]; the diffusers Timesteps embedding expects the unnormalised range, which produced mosaic-pattern garbage instead of an image. - Unpatchify predicted-x0 before the denoise step callback and route through sd_step_callback so the canvas shows a live preview during sampling (uses FLUX.2's RGB factors -- same AutoencoderKLFlux2 / 32 latent channels). - Add the ernie-image case to useEnqueueGenerate (Generate tab); was only wired up in useEnqueueCanvas, so plain text-to-image failed with "No graph builders for base ernie-image". - Move the Prompt Enhancer toggle from the Generation accordion into a dedicated ERNIE-Image block in the Advanced accordion; hide the rest of the SD-style advanced controls (CLIP skip, CFG rescale, seamless, color comp., separate VAE) since none apply to ERNIE-Image. - Detect ERNIE-Image-Turbo by name in MainModelDefaultSettings.from_base so installs (starter or manual) get steps=8, cfg_scale=1.0 instead of the standard 50/4.0. - Pin compel to Cstannahill/compel5@chore/transformers5-diffusers-smoke for transformers>=5 compatibility (PR damian0815/compel#129).
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds Baidu ERNIE-Image and ERNIE-Image-Turbo (HF, HF Turbo) as a new
BaseModelType, mirroring the FLUX.2 Klein and Z-Image integration patterns.The two checkpoints share the same
ErnieImageTransformer2DModelarchitecture (3072 hidden, 24 layers, 24 heads, Mistral3 text encoder, AutoencoderKLFlux2 VAE) and only differ in inference defaults (50 steps + CFG 4.0 vs. 8 steps + CFG 1.0 for Turbo), so they live under oneBaseModelType.ErnieImagewithout a variant enum. The optional 3B Mistral3-based prompt enhancer that ships with the pipeline is wired through with a UI toggle.This PR builds on top of #8859 (transformers 5.1+) and additionally bumps
diffusers0.36.0 → 0.38.0, which is the first release containingErnieImagePipelineandErnieImageTransformer2DModel.Backend changes
BaseModelType.ErnieImage,ModelType.PromptEnhancer, plus two newSubModelTypes (pe,pe_tokenizer) for the bundled prompt enhancer.Main_Diffusers_ErnieImage_ConfigandMain_Checkpoint_ErnieImage_Configwith state-dict-based detection (x_embedder+text_proj+adaLN_modulation).transformer/vae/text_encoder/tokenizerplus optionalpe/pe_tokenizerfrom a single pipeline directory.invokeai/backend/ernie_image/with sampling utilities (2×2 patchify, BN normalize/denormalize, sigma schedule, padded text packing) and a rectified-flow denoise loop supporting Euler / Heun / LCM (reusing the FlowMatch* schedulers from FLUX).ernie_image_model_loader,ernie_image_text_encoder(with prompt-enhancer toggle),ernie_image_denoise,ernie_image_vae_encode,ernie_image_vae_decode.ErnieImageConditioningInfo+ conditioning field/output, including pickle allowlist for the disk serializer.ernie_image_{txt2img,img2img,inpaint,outpaint}.baidu/ERNIE-Imageandbaidu/ERNIE-Image-Turbo, plus aSTARTER_BUNDLESentry.Frontend changes
services/api/schema.tsregenerated to expose the new node types.ImageOutput,LatentToImage,ImageToLatents,DenoiseLatents,MainModelLoaderNodes).ParamsStategainsernieImageSchedulerandernieImageUsePromptEnhancer, with reducers, selectors, andselectIsErnieImage.buildErnieImageGraph(txt2img / img2img / inpaint / outpaint) wired intouseEnqueueCanvas.MODEL_BASE_TO_{COLOR,LONG_NAME,SHORT_NAME};prompt_enhanceradded toMODEL_TYPE_TO_LONG_NAME.ParamErnieImageSchedulerandParamErnieImagePromptEnhancerrendered conditionally inGenerationSettingsAccordion.add{TextTo,ImageTo,Inpaint,Outpaint}Imagerectified-flow type guards extended to accepternie_image_denoise;isMainModelWithoutUnetlikewise.Diffusers 0.38 fix-out
hotfixes.py: importLoRACompatibleConvdirectly. The lazy module loader in 0.38 no longer exposesdiffusers.models.loraas an attribute, so the legacy patch path crashes on import.pyproject.tomldeclaresprerelease = "allow"under[tool.uv], with a comment explaining that diffusers 0.38.0 itself hard-pinssafetensors>=0.8.0-rc.0. We can drop this again once a diffusers patch release ships with a stable safetensors floor.Related Issues / Discussions
QA Instructions
Automated
pytest tests/ -m "not slow"— passes (619 / 619, the sametests/model_identificationLFS-skip as onmain).pnpm lint:tsc,pnpm lint:eslint,pnpm lint:prettier— all clean.pnpm test:no-watch— 563 / 563 frontend tests pass.Manual smoketest of pre-existing models (regression for the diffusers / transformers bump)
Pick at least two of these and run a single txt2img each. Confirm no crashes and visually-plausible output:
Plus the relevant items from #8859's test plan: SD 1.5 prompt-weighted generation (compel path), FLUX text-to-image (T5 tokenizer path), HF model install via repo ID, NSFW checker first-time download.
Manual smoketest of ERNIE-Image (requires a GPU — 8B parameters)
baidu/ERNIE-Image-Turbofrom the new starter bundle. The Model Manager should classify it asBaseModelType.ErnieImage."a fox"); the enhancer log line in the backend should show a rewritten longer prompt before encoding.baidu/ERNIE-Image(50 steps, CFG 4.0).Merge Plan
transformers>=5.1.0override). I have no preference; happy to rebase whenever Update to transformers 5.1.0 #8859 lands.prerelease = "allow"line inpyproject.tomlis a temporary measure tied to diffusers 0.38.0'ssafetensors>=0.8.0-rc.0upstream pin. Worth revisiting (and removing) once a diffusers patch release relaxes that requirement.Checklist
paramsSlice.docs-old/contributing/NEW_MODEL_INTEGRATION.mdchecklist describes the integration shape this PR follows.What's Newcopy (if doing a release after this PR)Out of scope (planned follow-ups)
Main_Checkpoint_ErnieImage_Configis in place as defensive scaffolding, but the loader currently raisesNotImplementedErrorfor that format).