Skip to content

sd: support for CLIP and VAE on different devices#2184

Open
wbruna wants to merge 5 commits into
LostRuins:concedo_experimentalfrom
wbruna:kcpp_sd_multi_device_backend
Open

sd: support for CLIP and VAE on different devices#2184
wbruna wants to merge 5 commits into
LostRuins:concedo_experimentalfrom
wbruna:kcpp_sd_multi_device_backend

Conversation

@wbruna
Copy link
Copy Markdown

@wbruna wbruna commented May 3, 2026

Support for placing CLIP or VAE on separate devices (e.g. diffusion on Vulkan0, VAE on Vulkan1). It also enables keeping the diffusion model itself on CPU.

The first two commits adapt the C++ code: the interface receives device numbers instead of booleans, with -1 for "main device" and -2 for "CPU", and the backend includes a global config to choose which model gets which device. The last commit changes the sdclipgpu and sdvaecpu boolean parameters to accept "CPU", "main" or a device number.

Tested on Vulkan with my GPU and iGPU. Seems to work fine with command-line and config settings; however, I wasn't able to fully test the launcher, because there doesn't seem to be a way to select a discrete GPU and an iGPU through it (so I likely got its 1-based indexes wrong).

@wbruna wbruna force-pushed the kcpp_sd_multi_device_backend branch 2 times, most recently from 1517922 to 197cc2f Compare May 4, 2026 10:23
@wbruna wbruna marked this pull request as ready for review May 4, 2026 10:24
@LostRuins
Copy link
Copy Markdown
Owner

merged your other PR so now this conflicts

@wbruna wbruna force-pushed the kcpp_sd_multi_device_backend branch from 197cc2f to ed81427 Compare May 7, 2026 15:37
@wbruna
Copy link
Copy Markdown
Author

wbruna commented May 7, 2026

Fixed.

@wbruna wbruna force-pushed the kcpp_sd_multi_device_backend branch from ed81427 to bd5330a Compare May 9, 2026 12:08
@LostRuins
Copy link
Copy Markdown
Owner

So I looked through this PR and it does seem like quite a lot of changes + complexities for something that doesn't seem too useful in my opinion.

Especially with the ability to already use offload_cpu (runtime load/unload for each component) it doesn't really seem too useful compare to simply using the same GPU for all image gen components. Is there something I'm missing.

Also it does modify a bunch of extra upstream code too.

@wbruna
Copy link
Copy Markdown
Author

wbruna commented May 11, 2026

So I looked through this PR and it does seem like quite a lot of changes + complexities for something that doesn't seem too useful in my opinion.

Especially with the ability to already use offload_cpu (runtime load/unload for each component) it doesn't really seem too useful compare to simply using the same GPU for all image gen components. Is there something I'm missing.

offload_cpu doesn't help situations which benefit from a second GPU for the same gen: a weaker card with more memory (like an iGPU) could e..g. run a video VAE which wouldn't fit on the main GPU. Also, its cost isn't trivial: it pins a lot of extra system RAM, and introduces extra latency for all generations.

We (and upstream) do get requests for this functionality from time to time.

Also it does modify a bunch of extra upstream code too.

Kind of? It's mostly the unavoidable device initialization for each component. I expect that code to change upstream when multi-device support gets implemented, but in that case dropping our changes would be simple enough.

Edit: rebased on top of #2204 to avoid conflicts.

@wbruna wbruna force-pushed the kcpp_sd_multi_device_backend branch from bd5330a to 49ca0ac Compare May 12, 2026 22:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants