vulkan: change graph_compute to be async and enable get_tensor_async #17158
+96
−45
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This allows some additional CPU/GPU overlap for large pp workloads. Also seems to help a bit for token gen, maybe getting rid of a small bubble between graph_compute and get_tensor.
Async set and copy functions seem to be very rarely used, so I didn't enable them because I didn't have a good way to test them.
The async commands need to be ordered against each other, so put them all on the compute queue. The non-async commands still use the transfer queue.
The fence for graph_compute/get_tensor_async is submitted and waited on in ggml_vk_synchronize.
See #17033 (comment).