sd: support for CLIP and VAE on different devices by wbruna · Pull Request #2184 · LostRuins/koboldcpp

wbruna · 2026-05-03T22:51:57Z

Support for placing CLIP or VAE on separate devices (e.g. diffusion on Vulkan0, VAE on Vulkan1). It also enables keeping the diffusion model itself on CPU.

The first two commits adapt the C++ code: the interface receives device numbers instead of booleans, with -1 for "main device" and -2 for "CPU", and the backend includes a global config to choose which model gets which device. The last commit changes the sdclipgpu and sdvaecpu boolean parameters to accept "CPU", "main" or a device number.

Tested on Vulkan with my GPU and iGPU. Seems to work fine with command-line and config settings; however, I wasn't able to fully test the launcher, because there doesn't seem to be a way to select a discrete GPU and an iGPU through it (so I likely got its 1-based indexes wrong).

Edit: rebased on top of #2213 for the upstream multi-device support.

LostRuins · 2026-05-07T14:54:27Z

merged your other PR so now this conflicts

wbruna · 2026-05-07T15:38:34Z

Fixed.

LostRuins · 2026-05-11T07:01:15Z

So I looked through this PR and it does seem like quite a lot of changes + complexities for something that doesn't seem too useful in my opinion.

Especially with the ability to already use offload_cpu (runtime load/unload for each component) it doesn't really seem too useful compare to simply using the same GPU for all image gen components. Is there something I'm missing.

Also it does modify a bunch of extra upstream code too.

wbruna · 2026-05-11T10:44:28Z

So I looked through this PR and it does seem like quite a lot of changes + complexities for something that doesn't seem too useful in my opinion.

Especially with the ability to already use offload_cpu (runtime load/unload for each component) it doesn't really seem too useful compare to simply using the same GPU for all image gen components. Is there something I'm missing.

offload_cpu doesn't help situations which benefit from a second GPU for the same gen: a weaker card with more memory (like an iGPU) could e..g. run a video VAE which wouldn't fit on the main GPU. Also, its cost isn't trivial: it pins a lot of extra system RAM, and introduces extra latency for all generations.

We (and upstream) do get requests for this functionality from time to time.

Also it does modify a bunch of extra upstream code too.

Kind of? It's mostly the unavoidable device initialization for each component. I expect that code to change upstream when multi-device support gets implemented, but in that case dropping our changes would be simple enough.

Edit: rebased on top of #2204 to avoid conflicts.

LostRuins · 2026-05-14T15:50:43Z

merged 2204

LostRuins · 2026-05-16T03:15:09Z

This PR is still behaving very weirdly for me. Tried to add a patch to show more info but you didn't enable push permissions on this PR branch. So here's the patch

From e4975d84c666d250e755c6a9d42910cf972b1076 Mon Sep 17 00:00:00 2001
From: Concedo <39025047+LostRuins@users.noreply.github.com>
Date: Sat, 16 May 2026 11:07:10 +0800
Subject: [PATCH] add a bit of debug info, this PR is still behaving very
 strangely

---
 otherarch/sdcpp/sdtype_adapter.cpp | 9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/otherarch/sdcpp/sdtype_adapter.cpp b/otherarch/sdcpp/sdtype_adapter.cpp
index c1d16fc84..aaf37f1fc 100644
--- a/otherarch/sdcpp/sdtype_adapter.cpp
+++ b/otherarch/sdcpp/sdtype_adapter.cpp
@@ -35,6 +35,7 @@ using namespace kcpp_sd;
 #include "stb_image_resize.h"
 
 #include "avi_writer.h"
+#include "util.h"
 
 struct LoraMap {
     std::vector<std::pair<std::string, float>> items;
@@ -426,8 +427,10 @@ bool sdtype_load_model(const sd_load_model_inputs inputs) {
     params.vae_conv_direct = sd_params->vae_conv_direct;
     params.chroma_use_dit_mask = sd_params->chroma_use_dit_mask;
     params.offload_params_to_cpu = inputs.offload_cpu;
-    //params.keep_vae_on_cpu = (inputs.kcpp_vae_device <= -2);
-    //params.keep_clip_on_cpu = (inputs.kcpp_clip_device <= -2);
+    std::string vae_dev = kcpp_sd_get_device_name("vae");
+    std::string clip_dev = kcpp_sd_get_device_name("vae");
+    // params.keep_vae_on_cpu =  (vae_dev=="CPU");
+    // params.keep_clip_on_cpu = (clip_dev=="CPU");
     params.lora_apply_mode = (lora_apply_mode_t)lora_apply_mode;
 
     // also switches flash attn for the vae and conditioner
@@ -440,6 +443,8 @@ bool sdtype_load_model(const sd_load_model_inputs inputs) {
 
     if(inputs.debugmode==1)
     {
+        printf("\nkcpp_vae_device: %d = %s",inputs.kcpp_vae_device, vae_dev.c_str());
+        printf("\nkcpp_clip_device: %d = %s\n",inputs.kcpp_clip_device, clip_dev.c_str());
         char* buf = sd_ctx_params_to_str(&params);
         if(buf)
         {
-- 
2.30.2.windows.1

Anyway, as I mentioned in discord:

right now keep_clip_on_cpu is incorrectly set (and displayed), but it seems to crap out when i set it correctly (see params.keep_clip_on_cpu = (clip_dev=="CPU");), its almost like bool clip_on_cpu = sd_ctx_params->keep_clip_on_cpu; must be false for the pr to work correctly

then especially this line, it doesn't feel right,

 auto backend = std::shared_ptr<struct ggml_backend>(init_named_backend(device_name), ggml_backend_free);

simply querying a backend force inits it. So it takes up double the VRAM. But without it, it seems the VAE is forced onto gpu for some reason even if i attempt to select CPU explictly.

From the python side the defaults are also wrong. Loading any existing config somehow sets both options to "1" instead

the expected behavior would need changes to:

match previous defaults (clip on main cpu, vae on main gpu)
port old config if applicable (inside convert_invalid_args, if found, detect and port the old flags to the new ones)

wbruna · 2026-05-16T11:03:49Z

Just rebased for now, to deal with the conflict with the mmap flag.

The flags params.keep_*_on_cpu are basically ignored when setting the backend through kcpp_get_backend. Setting them on sdtype_adapter would simply force the backend to "CPU" regardless of the string; but since both settings come from the same place, it becomes redundant (the only effect should be the original sd.cpp log line). So I've removed the attribution to make it easier to test, because other backends wouldn't have this special case.

There is a related bug if the boolean flags are set, though 😕 And I agree the display should be adjusted for that; will fix it.

auto backend = std::shared_ptr<struct ggml_backend>(init_named_backend(device_name), ggml_backend_free);

This is not a query: kcpp_get_backend may be badly named, but it's supposed to initialize it: "give me the (possibly already initialized) backend that matches this device name".

LostRuins · 2026-05-16T11:21:54Z

I see. But yeah, it was glitching out and double allocating when i set params.keep_clip_on_cpu = (clip_dev=="CPU");

wbruna · 2026-05-16T11:34:09Z

I see. But yeah, it was glitching out and double allocating when i set params.keep_clip_on_cpu = (clip_dev=="CPU");

Indeed, it would pose as a separate module, and get its own entry on the backend map. 🤦

And I think I understand why it looks like it is initializing everything prematurely:

Setting clip device to -2
ggml_vulkan: Found 2 Vulkan devices:
ggml_vulkan: 0 = AMD Radeon RX 7600 XT (RADV NAVI33) (radv) | uma: 0 | fp16: 1 | bf16: 0 | warp size: 64 | shared memory: 65536 | int dot: 1 | matrix cores: KHR_coopmat
ggml_vulkan: 1 = AMD Radeon Vega 11 Graphics (RADV RAVEN) (radv) | uma: 1 | fp16: 1 | bf16: 0 | warp size: 64 | shared memory: 65536 | int dot: 0 | matrix cores: none
Setting vae device to 2

kcpp_sd::config_device validates the device number before it gets stored. To do that, it needs the number of available devices, and that path calls the device initialization code, printing the GPU info.

I could either store the config as-is first and validate it later, or make the device scan explicit at a previous point.

wbruna · 2026-05-16T11:48:21Z

Oh, one thing I forgot to ask you about: GPU numbering. env vars, device names, and (I think) our own command line use zero-based counting; but the launcher shows them as one-based. I guess we avoid zero-counting on the launcher because it doesn't feel natural for normal people; but maybe showing them as names instead ("CUDA0", "GPU0", etc) would avoid that issue while still keeping everything consistent?

wbruna · 2026-05-16T13:28:40Z

GUI-wise, what do you think about organizing the device selection like this?

(asking before trying to fight with control offsets 🙂)

wbruna · 2026-05-16T13:34:52Z

I believe the config selection is fixed; but since leejet just pushed multi-device support, I'll focus on getting that ready, then adapt this PR for it.

LostRuins · 2026-05-16T14:17:14Z

GUI-wise, what do you think about organizing the device selection like this?

main concern is space. window is very tight on space so i try to maximize as much as possible,
how does it look with that

wbruna · 2026-05-16T21:13:05Z

Adapted to the mainstream support.

wbruna · 2026-05-16T21:52:41Z

GUI-wise, what do you think about organizing the device selection like this?

main concern is space. window is very tight on space so i try to maximize as much as possible, how does it look with that

Since they wouldn't take much horizontal space, one approach could be shrinking the model filename fields a bit:

and maybe replace "Browse" with "📂".

Or... since the filename fields refer to the same modules, we make a full grid module x (filenames, backend), and drop the backend labels:

(the mock looks awful, but hopefully the idea is clear 🙂)

LostRuins · 2026-05-17T02:56:29Z

Hmm, if they enable tae sd then the vae backend selector will disappear?
or if they load a model that has different number of files (e.g. sdxl is all in one, flux needs both clip1 and 2 but z-image only needs one etc)

probably can be confusing

LostRuins · 2026-05-17T03:36:16Z

Should I still consider merging this or is this superceded by #2213 ?

wbruna · 2026-05-17T03:39:56Z

Should I still consider merging this or is this superceded by #2213 ?

#2213 just syncs with upstream, and by itself just keeps the current "*_on_cpu" support. This one provides the multi-device selection.

wbruna · 2026-05-17T03:41:46Z

Hmm, if they enable tae sd then the vae backend selector will disappear?

I think the VAE backend is used for the TAE model, too. TAE is a VAE, just a lightweight one.

or if they load a model that has different number of files (e.g. sdxl is all in one, flux needs both clip1 and 2 but z-image only needs one etc)

If we had a lot of space, or better controls, I'd add a "Text encoder" box grouping the backend selector and all the text models; a VAE one with backend, models, threshold; etc.

wbruna force-pushed the kcpp_sd_multi_device_backend branch 2 times, most recently from 1517922 to 197cc2f Compare May 4, 2026 10:23

wbruna marked this pull request as ready for review May 4, 2026 10:24

wbruna mentioned this pull request May 5, 2026

sd: build each source file separately #2188

Merged

wbruna force-pushed the kcpp_sd_multi_device_backend branch from 197cc2f to ed81427 Compare May 7, 2026 15:37

wbruna force-pushed the kcpp_sd_multi_device_backend branch from ed81427 to bd5330a Compare May 9, 2026 12:08

wbruna force-pushed the kcpp_sd_multi_device_backend branch from bd5330a to 49ca0ac Compare May 12, 2026 22:00

wbruna force-pushed the kcpp_sd_multi_device_backend branch 2 times, most recently from c900e1c to 93caa6d Compare May 15, 2026 21:21

wbruna marked this pull request as draft May 16, 2026 10:19

wbruna force-pushed the kcpp_sd_multi_device_backend branch from 93caa6d to a8eb544 Compare May 16, 2026 10:19

wbruna mentioned this pull request May 16, 2026

sd: sync to master-612-d7ecbe1 #2213

Open

wbruna force-pushed the kcpp_sd_multi_device_backend branch from 0e70800 to c93423a Compare May 16, 2026 21:11

wbruna marked this pull request as ready for review May 16, 2026 21:12

wbruna added 4 commits May 17, 2026 00:43

sd: sync to master-612-d7ecbe1

2cfb5bf

sd: generalize internal interfaces to place generation on CPU

201adca

sd: backend support for multi-device selection

a403af0

sd: frontend support for multi-device selection

8b998f9

wbruna force-pushed the kcpp_sd_multi_device_backend branch from c93423a to 8b998f9 Compare May 17, 2026 03:44

Conversation

wbruna commented May 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

LostRuins commented May 7, 2026

Uh oh!

wbruna commented May 7, 2026

Uh oh!

LostRuins commented May 11, 2026

Uh oh!

wbruna commented May 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

LostRuins commented May 14, 2026

Uh oh!

LostRuins commented May 16, 2026

Uh oh!

wbruna commented May 16, 2026

Uh oh!

LostRuins commented May 16, 2026

Uh oh!

wbruna commented May 16, 2026

Uh oh!

wbruna commented May 16, 2026

Uh oh!

wbruna commented May 16, 2026

Uh oh!

wbruna commented May 16, 2026

Uh oh!

LostRuins commented May 16, 2026

Uh oh!

wbruna commented May 16, 2026

Uh oh!

wbruna commented May 16, 2026

Uh oh!

LostRuins commented May 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

LostRuins commented May 17, 2026

Uh oh!

wbruna commented May 17, 2026

Uh oh!

wbruna commented May 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

wbruna commented May 3, 2026 •

edited

Loading

wbruna commented May 11, 2026 •

edited

Loading

LostRuins commented May 17, 2026 •

edited

Loading