Feat:(model) qwen image vae checkpoint by Pfannkuchensack · Pull Request #9108 · invoke-ai/InvokeAI

Pfannkuchensack · 2026-05-01T22:46:08Z

Summary

Adds standalone model support for Qwen Image so users no longer need the full ~40 GB Diffusers pipeline. A GGUF transformer can now be combined with a standalone VAE checkpoint, a standalone Qwen2.5-VL encoder (Diffusers folder or ComfyUI single-file fp8), and the Component Source (Diffusers) field becomes a fallback rather than a hard requirement. All standalone components are also exposed as installable starter models, so a fully working GGUF setup can be installed in one click.

Why: The Qwen Image PR (#9000) only allowed loading the VAE and text encoder from the full Diffusers pipeline. That meant ~40 GB on disk just to use a tiny VAE (~250 MB) plus the encoder (~16 GB), and re-downloading both for every model variant. The smallest fully-standalone setup with this PR drops to ~12 GB (GGUF transformer + ~250 MB VAE + ~7 GB ComfyUI fp8 encoder).

How:

Backend

VAE checkpoint: new VAE_Checkpoint_QwenImage_Config detects single-file Qwen Image VAEs via 5D conv weights + z_dim=16 and loads them via AutoencoderKLQwenImage (init_empty_weights + load_state_dict). The generic VAE checkpoint matcher now explicitly excludes Qwen Image VAEs so they aren't misclassified as FLUX.
Qwen2.5-VL encoder (Diffusers folder): new ModelType.QwenVLEncoder + ModelFormat.QwenVLEncoder with QwenVLEncoder_Diffusers_Config recognising directories that contain text_encoder/ (with Qwen2_5_VLForConditionalGeneration / Qwen2VLForConditionalGeneration) + tokenizer/. The new QwenVLEncoderLoader handles Tokenizer and TextEncoder submodel loading from the folder layout.
Qwen2.5-VL encoder (ComfyUI single-file): new QwenVLEncoder_Checkpoint_Config matches consolidated single-file checkpoints (e.g. qwen_2.5_vl_7b_fp8_scaled.safetensors) by detecting both LM keys (model.embed_tokens / model.layers.*) and visual tower keys (visual.patch_embed.* / visual.blocks.*). The new QwenVLEncoderCheckpointLoader loads the safetensors, dequantises ComfyUI fp8 weights via weight * weight_scale (with block-wise expansion, mirroring the Z-Image Qwen3 loader), strips comfy_quant / weight_scale / scaled_fp8 metadata, fetches the architecture config from Qwen/Qwen2.5-VL-7B-Instruct (offline-cache fallback), and instantiates Qwen2_5_VLForConditionalGeneration via init_empty_weights + assign load. Tokenizer comes from the same HF repo with offline fallback.
Text encoder invocation: qwen_image_text_encoder.py now branches on whether model_root is a file. Single-file checkpoints get tokenizer + image processor from HuggingFace (Qwen/Qwen2.5-VL-7B-Instruct, ~10 MB, cached); the existing folder layout path is unchanged. BnB-quantised loading falls back to the cached encoder for single-file checkpoints since BnB can't load from a bare safetensors and the file is already FP8.
Loader invocation: QwenImageModelLoaderInvocation gains optional vae_model and qwen_vl_encoder_model fields. Resolution priority for each component: standalone override → main model (if Diffusers) → Component Source. Bumped to v1.2.0.
Starter models: three new starter entries — Qwen Image VAE (single-file checkpoint, ~250 MB), Qwen2.5-VL Encoder (fp8 scaled) (ComfyUI single-file, ~7 GB), and Qwen2.5-VL Encoder (Diffusers) (multi-folder HF download text_encoder+tokenizer+processor, ~16 GB). All 8 GGUF main starters (Q2_K / Q4_K_M / Q6_K / Q8_0 for both Edit and txt2img) declare the VAE + fp8 encoder as dependencies, so installing any of them auto-installs a complete generation-ready setup. The Qwen Image starter bundle gets the VAE and fp8 encoder prepended too.

Frontend

New params state, selectors and reducers for qwenImageVaeModel and qwenImageQwenVLEncoderModel, plus a migration entry.
Combobox UI added alongside the existing Component Source picker (single component now renders all three: VAE / Qwen2.5-VL Encoder / Component Source).
New useQwenImageVAEModels / useQwenVLEncoderModels hooks, isQwenImageVAEModelConfig / isQwenVLEncoderModelConfig type guards, and Model Manager category + format badge entries.
Graph builder passes the new fields to the loader node.
Readiness check for GGUF Qwen Image now allows either a standalone source or a Component Source for the VAE and encoder independently.
schema.ts patched manually for the new ModelType / ModelFormat values, the QwenVLEncoder_Diffusers_Config and QwenVLEncoder_Checkpoint_Config schemas, the new loader fields, and the AnyModelConfig union.

Related Issues / Discussions

Follow-up to #9000 (Qwen Image full pipeline support). Closes the standalone-component gap that was called out for users with limited disk space.

QA Instructions

Quickest verification (recommended):
Install one of the GGUF starter models (e.g. Qwen Image Edit 2511 (Q4_K_M)) from the starter list. The VAE and fp8 encoder should be auto-installed as dependencies, and the model should generate without any further configuration.

Setup options for manual testing:

VAE only: install Qwen Image VAE from the starter list (or download vae/diffusion_pytorch_model.safetensors from a Qwen Image HF repo manually, ~250 MB). Verify it's identified as a Qwen Image VAE checkpoint.
Encoder folder: install Qwen2.5-VL Encoder (Diffusers) from the starter list (or download text_encoder/ + tokenizer/ (+ optionally processor/) from Qwen/Qwen-Image-Edit-2511 manually). Verify it's identified as qwen_vl_encoder / qwen_vl_encoder.
Encoder single-file: install Qwen2.5-VL Encoder (fp8 scaled) from the starter list (or qwen_2.5_vl_7b_fp8_scaled.safetensors directly, ~7 GB). Verify it's identified as qwen_vl_encoder / checkpoint. First generation will fetch the tokenizer + processor configs from Qwen/Qwen2.5-VL-7B-Instruct (~10 MB) and cache them.
Full standalone setup: GGUF transformer + standalone VAE + standalone Qwen2.5-VL encoder (folder or single-file), with no Component Source set — should generate successfully.

Cases to verify on the Qwen Image generation tab:

Diffusers main model, no overrides → all submodels come from main (existing behaviour).
Diffusers main model + standalone VAE → VAE override is used, encoder still from main.
GGUF main + Component Source only → unchanged behaviour, still works.
GGUF main + standalone VAE + Component Source → VAE from standalone, encoder from Component Source.
GGUF main + standalone VAE + standalone Encoder (folder), no Component Source → both come from the standalone models, no error.
GGUF main + standalone VAE + ComfyUI single-file Encoder, no Component Source → generates after first-time tokenizer/processor download.
GGUF main with neither standalone Encoder nor Component Source → readiness check blocks generation with a clear reason.
Quantized encoder (int8 / nf4) still works against a standalone encoder folder. Single-file Encoder + int8 / nf4 falls back to the cached non-BnB path (still works, no error).

Starter model checks:

Starter list shows three new entries under model components: Qwen Image VAE, Qwen2.5-VL Encoder (fp8 scaled), Qwen2.5-VL Encoder (Diffusers).
Installing any GGUF Qwen Image starter (Q2_K / Q4_K_M / Q6_K / Q8_0, Edit or txt2img) also auto-installs the VAE and fp8 encoder.

Automated checks:

pytest tests/app/invocations/test_qwen_image_model_loader.py tests/backend/model_manager/configs/ — 16 passed.
pytest -k "qwen_image" (excluding unrelated PIL get_flattened_data test) — 53 passed.
Frontend: pnpm lint:tsc / pnpm lint:eslint / pnpm lint:prettier / pnpm lint:knip all green.

Merge Plan

Standard merge.

Checklist

The PR has a short but descriptive title, suitable for a changelog
Tests added / updated (if applicable)
❗Changes to a redux slice have a corresponding migration
Documentation added / updated (if applicable)
Updated What's New copy (if doing a release after this PR)

…pport Add standalone model types so Qwen Image can be run without downloading the full ~40 GB Diffusers pipeline. The VAE and Qwen2.5-VL encoder can now each come from their own model, with the Component Source (Diffusers) acting as a fallback for any submodel not provided separately.

Add a checkpoint loader for ComfyUI-style consolidated Qwen2.5-VL encoder files (e.g. qwen_2.5_vl_7b_fp8_scaled.safetensors), which bundle the language model and visual tower into one safetensors with FP8 + per-tensor weight_scale quantization. This drops the standalone encoder footprint from ~16 GB (Diffusers folder, FP16) to ~7 GB.

Add three new starter models so users can install a complete GGUF Qwen Image setup in one click without ever touching the full ~40 GB Diffusers pipeline: - "Qwen Image VAE" — single-file VAE checkpoint pulled from the Qwen-Image repo (~250 MB). - "Qwen2.5-VL Encoder (fp8 scaled)" — ComfyUI single-file FP8 encoder (~7 GB). - "Qwen2.5-VL Encoder (Diffusers)" — full-precision encoder via multi-folder HF download (text_encoder+tokenizer+processor, ~16 GB). The 8 GGUF main starters (Q2_K / Q4_K_M / Q6_K / Q8_0 for both Edit and txt2img) now declare the VAE + fp8 encoder as dependencies, so installing any of them automatically pulls in everything needed to generate. The fp8 encoder is preferred as the default dependency since it's smaller and the on-the-fly dequantization is essentially free at runtime. The Qwen Image starter bundle gets the VAE and fp8 encoder prepended so the bundled Lightning LoRA variants also benefit.

Pfannkuchensack added 4 commits May 2, 2026 00:26

Chore Ruff Format

ba717ed

github-actions Bot added python PRs that change python files invocations PRs that change invocations backend PRs that change backend files frontend PRs that change frontend files labels May 1, 2026

Pfannkuchensack mentioned this pull request May 2, 2026

[bug]: VAE/Text_Encoder Source (required for GGUF) missing for Qwen2511 and 2512 models #9109

Open

1 task

lstein self-assigned this May 5, 2026

lstein added the v6.13.x label May 5, 2026

lstein added this to Invoke - Community Roadmap May 5, 2026

lstein moved this to 6.13.x Theme: MODELS in Invoke - Community Roadmap May 5, 2026

lstein assigned JPPhoto May 5, 2026

Pfannkuchensack added 2 commits May 5, 2026 04:30

Merge branch 'main' into feat/qwen-image-vae-checkpoint

51fe104

Merge branch 'main' into feat/qwen-image-vae-checkpoint

aa24f3e

Pfannkuchensack marked this pull request as ready for review May 5, 2026 17:11

Pfannkuchensack requested review from JPPhoto, blessedcoolant, dunkeroni and lstein as code owners May 5, 2026 17:11

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feat:(model) qwen image vae checkpoint#9108

Feat:(model) qwen image vae checkpoint#9108
Pfannkuchensack wants to merge 6 commits intoinvoke-ai:mainfrom
Pfannkuchensack:feat/qwen-image-vae-checkpoint

Pfannkuchensack commented May 1, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

Pfannkuchensack commented May 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Backend

Frontend

Related Issues / Discussions

QA Instructions

Merge Plan

Checklist

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Pfannkuchensack commented May 1, 2026 •

edited

Loading