Accelerated computations on Android Adreno 740 #17456

Elettrotecnica · 2025-11-23T22:49:41Z

Elettrotecnica
Nov 23, 2025

I am trying to run llama.cpp on a Pico 4 Ultra device, which comes with a Snapdragon XR2 Gen 2. Because I am on Android, I am using Termux as linux enironment, which packages llama-cpp and both the opencl and the vulkan backend.

I have tried using the vulkan backend, but I believe I have encountered the problem described in #16881, e.g. the output is (very fast) gibberish.

OpenCL on the other hand does not even activate, because apparently the device does not support some extension llama.cpp needs.

This is the output of llama-cli --list-devices --gpus:

`ˋˋ
ggml_opencl: selected platform: 'clvk'

ggml_opencl: device: 'Turnip Adreno (TM) 740v3 (OpenCL 3.0 CLVK on Vulkan v1.4.328 driver 104869888)'
ggml_opencl: OpenCL driver: 3.0 CLVK on Vulkan v1.4.328 driver 104869888
ggml_opencl: vector subgroup broadcast support: false
ggml_opencl: device FP16 support: true
ggml_opencl: device does not support subgroups (cl_khr_subgroups or cl_intel_subgroups) (note that subgroups is an optional feature in OpenCL 3.0)
ggml_opencl: drop unsupported device.

ggml_opencl: device: 'llvmpipe (LLVM 21.1.5, 128 bits) (OpenCL 3.0 CLVK on Vulkan v1.4.328 driver 104869888)'
Unsupported GPU: llvmpipe (LLVM 21.1.5, 128 bits)
ggml_opencl: drop unsupported device.
load_backend: loaded OpenCL backend from /data/data/com.termux/files/usr/bin/../lib/libggml-opencl.so
load_backend: loaded CPU backend from /data/data/com.termux/files/usr/bin/../lib/libggml-cpu.so
Available devices:
`ˋˋ

While this is the result of clinfo:
clinfo.txt

It seems at least some subgroup operations are supported.

I have tried disabling the check for subgroup features in the code and recompiling, but it seems the check is right and something is actually missing, as the model will fail to load as something won't be computed correctly (not really an OpenCL expert...). I have also tried to use the compilation flag GGML_OPENCL_USE_ADRENO_KERNELS=OFF to see if this would avoid certain kernel operations, but the result is the same.

My question now is: am I facing the limits of current gpu support in termux, e.g. drivers are lacking, or is the chip actually incapable of performing these operations? What else could I try?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Accelerated computations on Android Adreno 740 #17456

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

Accelerated computations on Android Adreno 740 #17456

Uh oh!

Uh oh!

Elettrotecnica Nov 23, 2025

Replies: 0 comments

Elettrotecnica
Nov 23, 2025