-
Notifications
You must be signed in to change notification settings - Fork 870
Description
Environment
SoC: Qualcomm QCS8550
Device OS: Android 13
QNN SDK: 2.42
Host OS: Ubuntu 22.04
Source: Latest executorch and executorch-examples repositories.
Problem Summary
I am currently using the executorch-examples (LlamaDemo) to evaluate LLM performance on the QCS8550 platform. While I can successfully run Llama 3.2 1B on the QNN backend, I am facing significant issues when attempting to deploy Gemma 3 4B.
Model Export Issue: When using the provided export tools to convert Gemma 3 4B for the QNN backend, the process fails with a TypeError: 'NoneType' object is not callable during the recipe application stage. It seems the QNN-specific export path for Gemma 3 is not yet fully integrated or requires specific configurations not documented in the examples.
Runtime Crash (Method Mismatch): I tried to use a pre-exported Gemma 3 4B .pte model on the Android LlamaDemo app. However, the app crashes during model loading with the following error:
ExecuTorch E No method named 'kv_forward' in program
It appears the Llama-based runner is hardcoded to look for kv_forward, while Gemma 3/VLM models might be using a different entry point (e.g., forward).
Questions
Does the current QNN delegate and the Android LlamaDemo example officially support Gemma 3 4B, or is the QNN support currently limited to the Llama/Qwen families?
For Multimodal models like Gemma 3, is there a recommended way to handle the method name mismatch in the Android runner when using the QNN backend?
Are there specific export flags or recipes required to successfully target the HTP (NPU) on QCS8550 for Gemma 3 models?
I would appreciate any insights or recommended commits that stabilize Gemma 3 support for Qualcomm QNN.
cc @cccclai @winskuo-quic @shewu-quic @haowhsu-quic @DannyYuyang-quic @cbilgin