Skip to content

feat(vae): support running VAEs on CPU via cpu_only setting#9293

Open
Pfannkuchensack wants to merge 2 commits into
invoke-ai:mainfrom
Pfannkuchensack:feat/running_vae_on_cpu
Open

feat(vae): support running VAEs on CPU via cpu_only setting#9293
Pfannkuchensack wants to merge 2 commits into
invoke-ai:mainfrom
Pfannkuchensack:feat/running_vae_on_cpu

Conversation

@Pfannkuchensack

@Pfannkuchensack Pfannkuchensack commented Jun 17, 2026

Copy link
Copy Markdown
Collaborator

Summary

Extends the cpu_only mechanism from #8777 (text encoders) to VAE decode. Adds a cpu_only field to all standalone VAE configs; the loader already forces standalone configs with cpu_only=True onto the CPU. The 7 decode invocations now move latents to the VAE's effective device instead of hard-coding CUDA, and the SD/SDXL path falls back to fp32 on CPU (fp16 conv is unsupported there). Adds a "Run on CPU" toggle to the VAE model settings panel and regenerates the API schema.

Decode-only for now; encode and main-model VAE submodels are unchanged.

Backend

  • cpu_only field added to all standalone VAE configs (SD1/SD2/SDXL/FLUX checkpoint, SD1/SDXL diffusers, FLUX.2/Qwen-Image/Anima, FLUX.2 diffusers).
  • No loader change needed — _get_execution_device already returns cpu for any standalone config with cpu_only=True.
  • All 7 latents-to-image invocations (l2i, FLUX, FLUX.2, SD3, CogView4, Qwen-Image, Z-Image, Anima) now send latents to get_effective_device(vae) rather than TorchDevice.choose_torch_device().
  • SD/SDXL decode is forced to fp32 when the VAE is on CPU (fp16 conv is not implemented on CPU).
  • Cache invalidation on toggle is already covered by the existing _LOAD_AFFECTING_SETTINGS (cpu_only is in it).

Frontend

  • New VAEModelSettings panel + useVAEModelSettings hook (mirrors the encoder panel, reuses the shared form-data type).
  • ModelView renders the panel for type === 'vae'.

Related Issues / Discussions

Closes #7276 (VAE part — the CLIP/text-encoder part was delivered in #8777)

QA Instructions

  1. In Model Manager, select a standalone VAE model → the settings panel now shows a Run on CPU toggle. Enable it and save.
  2. Run a generation that decodes with that VAE (e.g. SDXL → Latents to Image). Confirm:
    • The image decodes correctly (no device-mismatch / "Half not implemented on CPU" errors).
    • During decode, VRAM usage stays flat for the VAE (it loads into system RAM, not VRAM) — visible as reduced peak VRAM on low-memory GPUs.
  3. Toggle the setting off, save, regenerate → decode runs on GPU again (no restart needed; the cached entry is evicted on the settings change).
  4. Repeat the smoke test across architectures: SD1/SDXL (l2i), FLUX, FLUX.2, SD3, CogView4, Qwen-Image, Z-Image, Anima.
  5. Expected trade-off: CPU decode is noticeably slower (esp. SDXL at high resolution, now fp32) — this is intentional (VRAM savings vs. speed).

Tested on SD1,SDXL,Flux1,Flux2klein9b,Anima,Zimage. Open are SD3,CogView4, Qwen-Image,Flux2klein4b(should work). Anima is really slow on CPU, anything else is ok.

Automated:

  • pytest tests/backend/model_manager/load/test_load_default_cpu_only.py tests/app/routers/test_update_model_record_cache_invalidation.py
  • Frontend: pnpm lint + pnpm test:no-watch (all green).

Merge Plan

Standard merge. No DB schema or redux migration.

Checklist

  • The PR has a short but descriptive title, suitable for a changelog
  • Tests added / updated (if applicable)
  • ❗Changes to a redux slice have a corresponding migration
  • Documentation added / updated (if applicable)
  • Updated What's New copy (if doing a release after this PR)

Extends the cpu_only mechanism from invoke-ai#8777 (text encoders) to VAE decode.
Adds a cpu_only field to all standalone VAE configs; the loader already
forces standalone configs with cpu_only=True onto the CPU. The 7 decode
invocations now move latents to the VAE's effective device instead of
hard-coding CUDA, and the SD/SDXL path falls back to fp32 on CPU (fp16
conv is unsupported there). Adds a "Run on CPU" toggle to the VAE model
settings panel and regenerates the API schema.

Decode-only for now; encode and main-model VAE submodels are unchanged.

Closes invoke-ai#7276 (VAE part)
@github-actions github-actions Bot added python PRs that change python files invocations PRs that change invocations backend PRs that change backend files frontend PRs that change frontend files python-tests PRs that change python tests labels Jun 17, 2026
@Pfannkuchensack Pfannkuchensack marked this pull request as ready for review June 19, 2026 01:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backend PRs that change backend files frontend PRs that change frontend files invocations PRs that change invocations python PRs that change python files python-tests PRs that change python tests

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[enhancement]: Force clip & vae to cpu toggle button

1 participant