Replies: 4 comments 4 replies
-
|
You can use Also loading is technically already sequential, which makes it slower than it could be when the models and text encoders are on different drives. |
Beta Was this translation helpful? Give feedback.
-
|
Some info from a 9b run on a 16GB thinkpad TP495 (ryzen3500u Vega8 iGPU) (!) In early stage sd-cli consumes a bit over 10GB when it really only needs 6GB to do the job. It does seem the wrong approach for flux-2 klein. I have to say, it's friggin awesome to get flux-2-klein 4GB gens done in under 5 minutes, and 9b around 20 mins. CPU is idle most of the time meaning if I can fix the unneeded memory consumption, I can leave sd-cli running in batch mode all day.. That's fantastic! Thanks to leejet and all contributors!!!!!! |
Beta Was this translation helpful? Give feedback.
-
This could be improved, but...
... not in this way, because we already do that: sd-cli unloads the conditioner weights before generation. The problem is the other way around: the text encoder runs with the diffusion weights already loaded into VRAM, and that's what typically causes peak VRAM usage. |
Beta Was this translation helpful? Give feedback.
-
|
I super-appreciate all your contributions, wbruna. ty. Maybe we'll be able to move loading of diffusion weights until after the text-encoder is finished. Would that break other models to do it in that order? Right now with just sd-cli, xorg and a browser running I see: That's during this stage: Then once it gets to: Laptop drops to around 6.5GB 'used' Why zram is using 'swap' there idk, things seem to get sticky when allocated into 'swap'. If we could avoid having Qwen TE and Flux loaded at the same time, I could have fully usable laptop while it does image generation in the background. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
For Flux 2 generation under memory constrained hardware:
Why load text encoder ~4GB and diffusor ~4-8GB at the same time?
We don't need the TE loaded at all during generation. Just cache the embedding (preferably to disk for fast re-gens). Unload the text encoder. Load the Flux2 model of your choice. Runs fine in 6GB VRAM.
Beta Was this translation helpful? Give feedback.
All reactions