Accelerated computations on Android Adreno 740 #17456
Unanswered
Elettrotecnica
asked this question in
Q&A
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
I am trying to run llama.cpp on a Pico 4 Ultra device, which comes with a Snapdragon XR2 Gen 2. Because I am on Android, I am using Termux as linux enironment, which packages llama-cpp and both the opencl and the vulkan backend.
I have tried using the vulkan backend, but I believe I have encountered the problem described in #16881, e.g. the output is (very fast) gibberish.
OpenCL on the other hand does not even activate, because apparently the device does not support some extension llama.cpp needs.
This is the output of
llama-cli --list-devices --gpus:`ˋˋ
ggml_opencl: selected platform: 'clvk'
ggml_opencl: device: 'Turnip Adreno (TM) 740v3 (OpenCL 3.0 CLVK on Vulkan v1.4.328 driver 104869888)'
ggml_opencl: OpenCL driver: 3.0 CLVK on Vulkan v1.4.328 driver 104869888
ggml_opencl: vector subgroup broadcast support: false
ggml_opencl: device FP16 support: true
ggml_opencl: device does not support subgroups (cl_khr_subgroups or cl_intel_subgroups) (note that subgroups is an optional feature in OpenCL 3.0)
ggml_opencl: drop unsupported device.
ggml_opencl: device: 'llvmpipe (LLVM 21.1.5, 128 bits) (OpenCL 3.0 CLVK on Vulkan v1.4.328 driver 104869888)'
Unsupported GPU: llvmpipe (LLVM 21.1.5, 128 bits)
ggml_opencl: drop unsupported device.
load_backend: loaded OpenCL backend from /data/data/com.termux/files/usr/bin/../lib/libggml-opencl.so
load_backend: loaded CPU backend from /data/data/com.termux/files/usr/bin/../lib/libggml-cpu.so
Available devices:
`ˋˋ
While this is the result of clinfo:
clinfo.txt
It seems at least some subgroup operations are supported.
I have tried disabling the check for subgroup features in the code and recompiling, but it seems the check is right and something is actually missing, as the model will fail to load as something won't be computed correctly (not really an OpenCL expert...). I have also tried to use the compilation flag GGML_OPENCL_USE_ADRENO_KERNELS=OFF to see if this would avoid certain kernel operations, but the result is the same.
My question now is: am I facing the limits of current gpu support in termux, e.g. drivers are lacking, or is the chip actually incapable of performing these operations? What else could I try?
Beta Was this translation helpful? Give feedback.
All reactions