Is there an existing issue for this?
Describe the problem
AppFunctions Studio currently only supports text-based interaction with the agent. For interactive showcases, being able to speak to the agent and hear responses would be dramatically more compelling and natural — especially when demonstrating how AppFunctions enable hands-free, voice-driven workflows on Android.
Describe the solution
Motivation
AppFunctions are fundamentally about enabling agent-driven automation without opening app UIs. The most natural interface for this is voice — a user says "create a note about tomorrow's meeting" and the agent executes it via AppFunctions. But the current testing agent forces users to type, which undermines the demo narrative.
Key use cases:
- Demos: Showing executives how AppFunctions work in a hands-free, conversational way
- Developer testing: Rapidly testing AppFunction flows via voice without typing complex queries
- Accessibility: Voice-first interaction for users who can't easily type on mobile
Technical Context
The agent currently uses the Interactions API with text input exclusively. Google already offers the Gemini Live API which supports:
- Real-time audio input/output (bidirectional streaming)
- Tool use / function calling — critical for AppFunctions execution
- Low-latency audio-to-audio via gemini-3.1-flash-live-preview
- Session management for multi-turn conversations
The Live API already supports tools, meaning the existing AppFunction → Gemini tool schema conversion could be reused. The agent would need:
- Audio capture (microphone) → streaming to Live API
- Live API processes speech, calls AppFunctions as tools (same flow as today)
- Audio response streamed back → speaker output
Suggested Implementation
- Add a "Voice mode" toggle or a push-to-talk button in the chat UI
- Show a transcript in the chat alongside audio (for clarity and debugging)
Additional context
No response
Code of Conduct
Is there an existing issue for this?
Describe the problem
AppFunctions Studio currently only supports text-based interaction with the agent. For interactive showcases, being able to speak to the agent and hear responses would be dramatically more compelling and natural — especially when demonstrating how AppFunctions enable hands-free, voice-driven workflows on Android.
Describe the solution
Motivation
AppFunctions are fundamentally about enabling agent-driven automation without opening app UIs. The most natural interface for this is voice — a user says "create a note about tomorrow's meeting" and the agent executes it via AppFunctions. But the current testing agent forces users to type, which undermines the demo narrative.
Key use cases:
Technical Context
The agent currently uses the Interactions API with text input exclusively. Google already offers the Gemini Live API which supports:
The Live API already supports tools, meaning the existing AppFunction → Gemini tool schema conversion could be reused. The agent would need:
Suggested Implementation
Additional context
No response
Code of Conduct