vulkan: change graph_compute to be async and enable get_tensor_async #17158

jeffbolznv · 2025-11-10T23:53:57Z

This allows some additional CPU/GPU overlap for large pp workloads. Also seems to help a bit for token gen, maybe getting rid of a small bubble between graph_compute and get_tensor.

Async set and copy functions seem to be very rarely used, so I didn't enable them because I didn't have a good way to test them.

The async commands need to be ordered against each other, so put them all on the compute queue. The non-async commands still use the transfer queue.

The fence for graph_compute/get_tensor_async is submitted and waited on in ggml_vk_synchronize.

See #17033 (comment).

This allows some additional CPU/GPU overlap for large pp workloads. Also seems to help a bit for token gen, maybe getting rid of a small bubble between graph_compute and get_tensor. Async set and copy functions seem to be very rarely used, so I didn't enable them because I didn't have a good way to test them. The async commands need to be ordered against each other, so put them all on the compute queue. The non-async commands still use the transfer queue. The fence for graph_compute/get_tensor_async is submitted and waited on in ggml_vk_synchronize.

jeffbolznv requested a review from 0cc4m as a code owner November 10, 2025 23:53

DajanaV mentioned this pull request Nov 11, 2025

UPSTREAM PR #17158: vulkan: change graph_compute to be async and enable get_tensor_async auroralabs-loci/llama.cpp#164

Open

github-actions bot added Vulkan Issues specific to the Vulkan backend ggml changes relating to the ggml tensor library for machine learning labels Nov 11, 2025

jeffbolznv added 3 commits November 10, 2025 20:24

fix thread safety errors

60bc85c

teardown context cleanly

924df57

jeffbolznv force-pushed the graph_compute_async branch from a66edc0 to 924df57 Compare November 11, 2025 03:19

Handle async read to non-pinned dst

3343b60

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

vulkan: change graph_compute to be async and enable get_tensor_async #17158

vulkan: change graph_compute to be async and enable get_tensor_async #17158

jeffbolznv commented Nov 10, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

vulkan: change graph_compute to be async and enable get_tensor_async #17158

Are you sure you want to change the base?

vulkan: change graph_compute to be async and enable get_tensor_async #17158

Conversation

jeffbolznv commented Nov 10, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant