feat(vae): support running VAEs on CPU via cpu_only setting by Pfannkuchensack · Pull Request #9293 · invoke-ai/InvokeAI

Pfannkuchensack · 2026-06-17T21:48:08Z

Summary

Extends the cpu_only mechanism from #8777 (text encoders) to VAE decode. Adds a cpu_only field to all standalone VAE configs; the loader already forces standalone configs with cpu_only=True onto the CPU. The 7 decode invocations now move latents to the VAE's effective device instead of hard-coding CUDA, and the SD/SDXL path falls back to fp32 on CPU (fp16 conv is unsupported there). Adds a "Run on CPU" toggle to the VAE model settings panel and regenerates the API schema.

Decode-only for now; encode and main-model VAE submodels are unchanged.

Backend

cpu_only field added to all standalone VAE configs (SD1/SD2/SDXL/FLUX checkpoint, SD1/SDXL diffusers, FLUX.2/Qwen-Image/Anima, FLUX.2 diffusers).
No loader change needed — _get_execution_device already returns cpu for any standalone config with cpu_only=True.
All 7 latents-to-image invocations (l2i, FLUX, FLUX.2, SD3, CogView4, Qwen-Image, Z-Image, Anima) now send latents to get_effective_device(vae) rather than TorchDevice.choose_torch_device().
SD/SDXL decode is forced to fp32 when the VAE is on CPU (fp16 conv is not implemented on CPU).
Cache invalidation on toggle is already covered by the existing _LOAD_AFFECTING_SETTINGS (cpu_only is in it).

Frontend

New VAEModelSettings panel + useVAEModelSettings hook (mirrors the encoder panel, reuses the shared form-data type).
ModelView renders the panel for type === 'vae'.

Related Issues / Discussions

Closes #7276 (VAE part — the CLIP/text-encoder part was delivered in #8777)

QA Instructions

In Model Manager, select a standalone VAE model → the settings panel now shows a Run on CPU toggle. Enable it and save.
Run a generation that decodes with that VAE (e.g. SDXL → Latents to Image). Confirm:
- The image decodes correctly (no device-mismatch / "Half not implemented on CPU" errors).
- During decode, VRAM usage stays flat for the VAE (it loads into system RAM, not VRAM) — visible as reduced peak VRAM on low-memory GPUs.
Toggle the setting off, save, regenerate → decode runs on GPU again (no restart needed; the cached entry is evicted on the settings change).
Repeat the smoke test across architectures: SD1/SDXL (l2i), FLUX, FLUX.2, SD3, CogView4, Qwen-Image, Z-Image, Anima.
Expected trade-off: CPU decode is noticeably slower (esp. SDXL at high resolution, now fp32) — this is intentional (VRAM savings vs. speed).

Tested on SD1,SDXL,Flux1,Flux2klein9b,Anima,Zimage. Open are SD3,CogView4, Qwen-Image,Flux2klein4b(should work). Anima is really slow on CPU, anything else is ok.

Automated:

pytest tests/backend/model_manager/load/test_load_default_cpu_only.py tests/app/routers/test_update_model_record_cache_invalidation.py
Frontend: pnpm lint + pnpm test:no-watch (all green).

Merge Plan

Standard merge. No DB schema or redux migration.

Checklist

The PR has a short but descriptive title, suitable for a changelog
Tests added / updated (if applicable)
❗Changes to a redux slice have a corresponding migration
Documentation added / updated (if applicable)
Updated What's New copy (if doing a release after this PR)

Extends the cpu_only mechanism from invoke-ai#8777 (text encoders) to VAE decode. Adds a cpu_only field to all standalone VAE configs; the loader already forces standalone configs with cpu_only=True onto the CPU. The 7 decode invocations now move latents to the VAE's effective device instead of hard-coding CUDA, and the SD/SDXL path falls back to fp32 on CPU (fp16 conv is unsupported there). Adds a "Run on CPU" toggle to the VAE model settings panel and regenerates the API schema. Decode-only for now; encode and main-model VAE submodels are unchanged. Closes invoke-ai#7276 (VAE part)

github-actions Bot added python PRs that change python files invocations PRs that change invocations backend PRs that change backend files frontend PRs that change frontend files python-tests PRs that change python tests labels Jun 17, 2026

Chore Openapi + Fix logging

3a7a575

Pfannkuchensack marked this pull request as ready for review June 19, 2026 01:02

Pfannkuchensack requested review from JPPhoto, blessedcoolant, dunkeroni and lstein as code owners June 19, 2026 01:02

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(vae): support running VAEs on CPU via cpu_only setting#9293

feat(vae): support running VAEs on CPU via cpu_only setting#9293
Pfannkuchensack wants to merge 2 commits into
invoke-ai:mainfrom
Pfannkuchensack:feat/running_vae_on_cpu

Pfannkuchensack commented Jun 17, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Pfannkuchensack commented Jun 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Related Issues / Discussions

QA Instructions

Merge Plan

Checklist

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Pfannkuchensack commented Jun 17, 2026 •

edited

Loading