Skip to content

Feat:(model) qwen image vae checkpoint#9108

Open
Pfannkuchensack wants to merge 6 commits intoinvoke-ai:mainfrom
Pfannkuchensack:feat/qwen-image-vae-checkpoint
Open

Feat:(model) qwen image vae checkpoint#9108
Pfannkuchensack wants to merge 6 commits intoinvoke-ai:mainfrom
Pfannkuchensack:feat/qwen-image-vae-checkpoint

Conversation

@Pfannkuchensack
Copy link
Copy Markdown
Collaborator

@Pfannkuchensack Pfannkuchensack commented May 1, 2026

Summary

Adds standalone model support for Qwen Image so users no longer need the full ~40 GB Diffusers pipeline. A GGUF transformer can now be combined with a standalone VAE checkpoint, a standalone Qwen2.5-VL encoder (Diffusers folder or ComfyUI single-file fp8), and the Component Source (Diffusers) field becomes a fallback rather than a hard requirement. All standalone components are also exposed as installable starter models, so a fully working GGUF setup can be installed in one click.

Why: The Qwen Image PR (#9000) only allowed loading the VAE and text encoder from the full Diffusers pipeline. That meant ~40 GB on disk just to use a tiny VAE (~250 MB) plus the encoder (~16 GB), and re-downloading both for every model variant. The smallest fully-standalone setup with this PR drops to ~12 GB (GGUF transformer + ~250 MB VAE + ~7 GB ComfyUI fp8 encoder).

How:

Backend

  • VAE checkpoint: new VAE_Checkpoint_QwenImage_Config detects single-file Qwen Image VAEs via 5D conv weights + z_dim=16 and loads them via AutoencoderKLQwenImage (init_empty_weights + load_state_dict). The generic VAE checkpoint matcher now explicitly excludes Qwen Image VAEs so they aren't misclassified as FLUX.
  • Qwen2.5-VL encoder (Diffusers folder): new ModelType.QwenVLEncoder + ModelFormat.QwenVLEncoder with QwenVLEncoder_Diffusers_Config recognising directories that contain text_encoder/ (with Qwen2_5_VLForConditionalGeneration / Qwen2VLForConditionalGeneration) + tokenizer/. The new QwenVLEncoderLoader handles Tokenizer and TextEncoder submodel loading from the folder layout.
  • Qwen2.5-VL encoder (ComfyUI single-file): new QwenVLEncoder_Checkpoint_Config matches consolidated single-file checkpoints (e.g. qwen_2.5_vl_7b_fp8_scaled.safetensors) by detecting both LM keys (model.embed_tokens / model.layers.*) and visual tower keys (visual.patch_embed.* / visual.blocks.*). The new QwenVLEncoderCheckpointLoader loads the safetensors, dequantises ComfyUI fp8 weights via weight * weight_scale (with block-wise expansion, mirroring the Z-Image Qwen3 loader), strips comfy_quant / weight_scale / scaled_fp8 metadata, fetches the architecture config from Qwen/Qwen2.5-VL-7B-Instruct (offline-cache fallback), and instantiates Qwen2_5_VLForConditionalGeneration via init_empty_weights + assign load. Tokenizer comes from the same HF repo with offline fallback.
  • Text encoder invocation: qwen_image_text_encoder.py now branches on whether model_root is a file. Single-file checkpoints get tokenizer + image processor from HuggingFace (Qwen/Qwen2.5-VL-7B-Instruct, ~10 MB, cached); the existing folder layout path is unchanged. BnB-quantised loading falls back to the cached encoder for single-file checkpoints since BnB can't load from a bare safetensors and the file is already FP8.
  • Loader invocation: QwenImageModelLoaderInvocation gains optional vae_model and qwen_vl_encoder_model fields. Resolution priority for each component: standalone override → main model (if Diffusers) → Component Source. Bumped to v1.2.0.
  • Starter models: three new starter entries — Qwen Image VAE (single-file checkpoint, ~250 MB), Qwen2.5-VL Encoder (fp8 scaled) (ComfyUI single-file, ~7 GB), and Qwen2.5-VL Encoder (Diffusers) (multi-folder HF download text_encoder+tokenizer+processor, ~16 GB). All 8 GGUF main starters (Q2_K / Q4_K_M / Q6_K / Q8_0 for both Edit and txt2img) declare the VAE + fp8 encoder as dependencies, so installing any of them auto-installs a complete generation-ready setup. The Qwen Image starter bundle gets the VAE and fp8 encoder prepended too.

Frontend

  • New params state, selectors and reducers for qwenImageVaeModel and qwenImageQwenVLEncoderModel, plus a migration entry.
  • Combobox UI added alongside the existing Component Source picker (single component now renders all three: VAE / Qwen2.5-VL Encoder / Component Source).
  • New useQwenImageVAEModels / useQwenVLEncoderModels hooks, isQwenImageVAEModelConfig / isQwenVLEncoderModelConfig type guards, and Model Manager category + format badge entries.
  • Graph builder passes the new fields to the loader node.
  • Readiness check for GGUF Qwen Image now allows either a standalone source or a Component Source for the VAE and encoder independently.
  • schema.ts patched manually for the new ModelType / ModelFormat values, the QwenVLEncoder_Diffusers_Config and QwenVLEncoder_Checkpoint_Config schemas, the new loader fields, and the AnyModelConfig union.

Related Issues / Discussions

Follow-up to #9000 (Qwen Image full pipeline support). Closes the standalone-component gap that was called out for users with limited disk space.

QA Instructions

Quickest verification (recommended):
Install one of the GGUF starter models (e.g. Qwen Image Edit 2511 (Q4_K_M)) from the starter list. The VAE and fp8 encoder should be auto-installed as dependencies, and the model should generate without any further configuration.

Setup options for manual testing:

  1. VAE only: install Qwen Image VAE from the starter list (or download vae/diffusion_pytorch_model.safetensors from a Qwen Image HF repo manually, ~250 MB). Verify it's identified as a Qwen Image VAE checkpoint.
  2. Encoder folder: install Qwen2.5-VL Encoder (Diffusers) from the starter list (or download text_encoder/ + tokenizer/ (+ optionally processor/) from Qwen/Qwen-Image-Edit-2511 manually). Verify it's identified as qwen_vl_encoder / qwen_vl_encoder.
  3. Encoder single-file: install Qwen2.5-VL Encoder (fp8 scaled) from the starter list (or qwen_2.5_vl_7b_fp8_scaled.safetensors directly, ~7 GB). Verify it's identified as qwen_vl_encoder / checkpoint. First generation will fetch the tokenizer + processor configs from Qwen/Qwen2.5-VL-7B-Instruct (~10 MB) and cache them.
  4. Full standalone setup: GGUF transformer + standalone VAE + standalone Qwen2.5-VL encoder (folder or single-file), with no Component Source set — should generate successfully.

Cases to verify on the Qwen Image generation tab:

  • Diffusers main model, no overrides → all submodels come from main (existing behaviour).
  • Diffusers main model + standalone VAE → VAE override is used, encoder still from main.
  • GGUF main + Component Source only → unchanged behaviour, still works.
  • GGUF main + standalone VAE + Component Source → VAE from standalone, encoder from Component Source.
  • GGUF main + standalone VAE + standalone Encoder (folder), no Component Source → both come from the standalone models, no error.
  • GGUF main + standalone VAE + ComfyUI single-file Encoder, no Component Source → generates after first-time tokenizer/processor download.
  • GGUF main with neither standalone Encoder nor Component Source → readiness check blocks generation with a clear reason.
  • Quantized encoder (int8 / nf4) still works against a standalone encoder folder. Single-file Encoder + int8 / nf4 falls back to the cached non-BnB path (still works, no error).

Starter model checks:

  • Starter list shows three new entries under model components: Qwen Image VAE, Qwen2.5-VL Encoder (fp8 scaled), Qwen2.5-VL Encoder (Diffusers).
  • Installing any GGUF Qwen Image starter (Q2_K / Q4_K_M / Q6_K / Q8_0, Edit or txt2img) also auto-installs the VAE and fp8 encoder.

Automated checks:

  • pytest tests/app/invocations/test_qwen_image_model_loader.py tests/backend/model_manager/configs/ — 16 passed.
  • pytest -k "qwen_image" (excluding unrelated PIL get_flattened_data test) — 53 passed.
  • Frontend: pnpm lint:tsc / pnpm lint:eslint / pnpm lint:prettier / pnpm lint:knip all green.

Merge Plan

Standard merge.

Checklist

  • The PR has a short but descriptive title, suitable for a changelog
  • Tests added / updated (if applicable)
  • ❗Changes to a redux slice have a corresponding migration
  • Documentation added / updated (if applicable)
  • Updated What's New copy (if doing a release after this PR)

…pport

Add standalone model types so Qwen Image can be run without downloading the
full ~40 GB Diffusers pipeline. The VAE and Qwen2.5-VL encoder can now each
come from their own model, with the Component Source (Diffusers) acting as a
fallback for any submodel not provided separately.
Add a checkpoint loader for ComfyUI-style consolidated Qwen2.5-VL encoder
files (e.g. qwen_2.5_vl_7b_fp8_scaled.safetensors), which bundle the language
model and visual tower into one safetensors with FP8 + per-tensor weight_scale
quantization. This drops the standalone encoder footprint from ~16 GB
(Diffusers folder, FP16) to ~7 GB.
Add three new starter models so users can install a complete GGUF Qwen Image
setup in one click without ever touching the full ~40 GB Diffusers pipeline:

- "Qwen Image VAE" — single-file VAE checkpoint pulled from the Qwen-Image
  repo (~250 MB).
- "Qwen2.5-VL Encoder (fp8 scaled)" — ComfyUI single-file FP8 encoder
  (~7 GB).
- "Qwen2.5-VL Encoder (Diffusers)" — full-precision encoder via multi-folder
  HF download (text_encoder+tokenizer+processor, ~16 GB).

The 8 GGUF main starters (Q2_K / Q4_K_M / Q6_K / Q8_0 for both Edit and
txt2img) now declare the VAE + fp8 encoder as dependencies, so installing
any of them automatically pulls in everything needed to generate. The
fp8 encoder is preferred as the default dependency since it's smaller and
the on-the-fly dequantization is essentially free at runtime.

The Qwen Image starter bundle gets the VAE and fp8 encoder prepended so
the bundled Lightning LoRA variants also benefit.
@github-actions github-actions Bot added python PRs that change python files invocations PRs that change invocations backend PRs that change backend files frontend PRs that change frontend files labels May 1, 2026
@lstein lstein self-assigned this May 5, 2026
@lstein lstein added the v6.13.x label May 5, 2026
@lstein lstein moved this to 6.13.x Theme: MODELS in Invoke - Community Roadmap May 5, 2026
@Pfannkuchensack Pfannkuchensack marked this pull request as ready for review May 5, 2026 17:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backend PRs that change backend files frontend PRs that change frontend files invocations PRs that change invocations python PRs that change python files v6.13.x

Projects

Status: 6.13.x Theme: MODELS

Development

Successfully merging this pull request may close these issues.

3 participants