Skip to content

fix: remove kv padding from flash attention wrapper#1453

Open
leejet wants to merge 3 commits intomasterfrom
remove-kv-pad-for-flash-attn
Open

fix: remove kv padding from flash attention wrapper#1453
leejet wants to merge 3 commits intomasterfrom
remove-kv-pad-for-flash-attn

Conversation

@leejet
Copy link
Copy Markdown
Owner

@leejet leejet commented Apr 22, 2026

Most backends already handle non-256 KV lengths internally or fall back via backend support checks. Avoid generating synthetic padding masks, which can trigger incorrect Vulkan flash attention output for short prompt lengths.

Fix #1431.

@daniandtheweb
Copy link
Copy Markdown
Contributor

daniandtheweb commented Apr 23, 2026

I've just tested the changes and this still doesn't fix the issue. The issue is not only happening on short prompts but on any prompt lenght using flash attention on vulkan for Ernie and Anima models.

@leejet
Copy link
Copy Markdown
Owner Author

leejet commented Apr 23, 2026

I’ve tried to fix it, and it’s working properly on my device now. @daniandtheweb Could you pull the latest commit and give it another try? Also, don’t forget to sync the ggml submodule.

git submodule sync --recursive
git submodule update --init --recursive --force

@leejet
Copy link
Copy Markdown
Owner Author

leejet commented Apr 23, 2026

.\bin\Release\sd-cli.exe --diffusion-model  ..\..\ComfyUI\models\diffusion_models\ernie-image-UD-Q4_K_M.gguf --vae ..\..\ComfyUI\models\vae\flux2_ae.safetensors  --llm ..\..\ComfyUI\models\text_encoders\ministral-3-3b.safetensors -p "a lovely cat" --cfg-scale 5.0 -v --offload-to-cpu --diffusion-fa

before ggml update

output

after ggml update

output

@daniandtheweb
Copy link
Copy Markdown
Contributor

daniandtheweb commented Apr 23, 2026

I've done a clean build using this branch and the issue is still there. My current prompt is taken from civitai:

./sd-cli -M img_gen -p "year 2023, year 2024, year 2025, highres,masterpiece, best quality, score_7, score_8, score_9, @miclot, safe, a group of five anime girls and a small dog posing for a selfie in a snowy landscape, the girl in the foreground has long pink hair and purple eyes, wearing a teal beanie with white stripes and a white puffer jacket, smiling widely and making a peace sign with her left hand, the girl to the left has short brown hair and glasses, wearing a maroon beanie and a light blue jacket, raising her right hand in a peace sign, the girl in the center has black hair and brown eyes, wearing a black beanie and a dark jacket, looking at the camera, the girl to the right has short purple hair and purple eyes, wearing a striped scarf and a green jacket, looking at the camera, the girl in the back has blonde hair and green eyes, wearing a white beanie and a yellow jacket, making a peace sign with her left hand, the dog has brown and white fur, sticking its tongue out, the background features a clear blue sky and snow-covered mountains, the scene is bright and sunny with natural lighting, the colors are vibrant with a mix of cool and warm tones, the composition is a close-up shot with the characters filling most of the frame, the focus is on the group's happy expressions and the snowy environment. <lora:anima-turbo-lora-v0.1:1>" -n "worst quality, low quality, score_1, score_2, score_3, blurry, jpeg artifacts, sepia,watermark, mosaic censoring, bar censor," --sampling-method euler --steps 8 -W 1024 -H 1024 -b 1 --cfg-scale 1 -s 1327417454 --clip-skip -1 --embd-dir /home/daniandtheweb/Workspace/sd.cpp-webui/models/embeddings/ --lora-model-dir /home/daniandtheweb/Workspace/sd.cpp-webui/models/loras/ -t 0 --rng cuda --sampler-rng cuda --lora-apply-mode auto -o /home/daniandtheweb/Workspace/sd.cpp-webui/outputs/txt2img/1765151592.png --diffusion-model /home/daniandtheweb/Workspace/sd.cpp-webui/models/unet/anima-preview3-base.safetensors --vae /home/daniandtheweb/Workspace/sd.cpp-webui/models/vae/qwen_image_vae.safetensors --llm /home/daniandtheweb/Workspace/sd.cpp-webui/models/text_encoders/qwen_3_06b_base.safetensors --scheduler simple --vae-tile-overlap 0.5 --vae-tile-size 32x32 --preview proj --preview-path /home/daniandtheweb/Workspace/sd.cpp-webui/outputs/txt2img/1765151592_preview.png --preview-interval 1 --vae-tiling --fa --vae-conv-direct --mmap --color

Here's the progression of the preview, in case it can help solving the issue:

1 step 2 steps 3 steps 4 steps 5 steps 6 steps 7 steps 8 steps
1765151593_preview 1765151593_preview 1765151593_preview 1765151593_preview 1765151593_preview 1765151593_preview 1765151593_preview 1765151593_preview

Without flash attention the resulting image comes out just fine:
1765151592

The same issue still remains on Ernie on my end.

This has been tested on Linux on a radeon rx 7800xt with both the official mesa drivers and the git ones. I also tried disabling cooperative matrix and int dot acceleration for vulkan but the result is the same, with flash attention the generation breaks down.

@leejet
Copy link
Copy Markdown
Owner Author

leejet commented Apr 23, 2026

Does the simplest txt2img pipeline—like the one below—also cause issues on your side?

.\bin\Release\sd-cli.exe --diffusion-model  ..\..\ComfyUI\models\diffusion_models\anima-preview.safetensors --vae ..\..\ComfyUI\models\vae\qwen_image_vae.safetensors  --llm ..\..\ComfyUI\models\text_encoders\qwen_3_06b_base.safetensors  -p "a lovely cat holding a sign says 'anima.cpp'" --cfg-scale 6.0 --sampling-method euler -v --offload-to-cpu

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug] Vulkan Flash Attention not working

2 participants