feat(vae): support running VAEs on CPU via cpu_only setting#9293
Open
Pfannkuchensack wants to merge 2 commits into
Open
feat(vae): support running VAEs on CPU via cpu_only setting#9293Pfannkuchensack wants to merge 2 commits into
Pfannkuchensack wants to merge 2 commits into
Conversation
Extends the cpu_only mechanism from invoke-ai#8777 (text encoders) to VAE decode. Adds a cpu_only field to all standalone VAE configs; the loader already forces standalone configs with cpu_only=True onto the CPU. The 7 decode invocations now move latents to the VAE's effective device instead of hard-coding CUDA, and the SD/SDXL path falls back to fp32 on CPU (fp16 conv is unsupported there). Adds a "Run on CPU" toggle to the VAE model settings panel and regenerates the API schema. Decode-only for now; encode and main-model VAE submodels are unchanged. Closes invoke-ai#7276 (VAE part)
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Extends the
cpu_onlymechanism from #8777 (text encoders) to VAE decode. Adds acpu_onlyfield to all standalone VAE configs; the loader already forces standalone configs withcpu_only=Trueonto the CPU. The 7 decode invocations now move latents to the VAE's effective device instead of hard-coding CUDA, and the SD/SDXL path falls back to fp32 on CPU (fp16 conv is unsupported there). Adds a "Run on CPU" toggle to the VAE model settings panel and regenerates the API schema.Decode-only for now; encode and main-model VAE submodels are unchanged.
Backend
cpu_onlyfield added to all standalone VAE configs (SD1/SD2/SDXL/FLUX checkpoint, SD1/SDXL diffusers, FLUX.2/Qwen-Image/Anima, FLUX.2 diffusers)._get_execution_devicealready returnscpufor any standalone config withcpu_only=True.l2i, FLUX, FLUX.2, SD3, CogView4, Qwen-Image, Z-Image, Anima) now send latents toget_effective_device(vae)rather thanTorchDevice.choose_torch_device()._LOAD_AFFECTING_SETTINGS(cpu_onlyis in it).Frontend
VAEModelSettingspanel +useVAEModelSettingshook (mirrors the encoder panel, reuses the shared form-data type).ModelViewrenders the panel fortype === 'vae'.Related Issues / Discussions
Closes #7276 (VAE part — the CLIP/text-encoder part was delivered in #8777)
QA Instructions
l2i), FLUX, FLUX.2, SD3, CogView4, Qwen-Image, Z-Image, Anima.Tested on SD1,SDXL,Flux1,Flux2klein9b,Anima,Zimage. Open are SD3,CogView4, Qwen-Image,Flux2klein4b(should work). Anima is really slow on CPU, anything else is ok.
Automated:
pytest tests/backend/model_manager/load/test_load_default_cpu_only.py tests/app/routers/test_update_model_record_cache_invalidation.pypnpm lint+pnpm test:no-watch(all green).Merge Plan
Standard merge. No DB schema or redux migration.
Checklist
What's Newcopy (if doing a release after this PR)