Skip to content

[FR]: Gemini Live API integration for voice-based interaction #5

@tw0b33rs

Description

@tw0b33rs

Is there an existing issue for this?

  • I have searched the existing issues

Describe the problem

AppFunctions Studio currently only supports text-based interaction with the agent. For interactive showcases, being able to speak to the agent and hear responses would be dramatically more compelling and natural — especially when demonstrating how AppFunctions enable hands-free, voice-driven workflows on Android.

Describe the solution

Motivation

AppFunctions are fundamentally about enabling agent-driven automation without opening app UIs. The most natural interface for this is voice — a user says "create a note about tomorrow's meeting" and the agent executes it via AppFunctions. But the current testing agent forces users to type, which undermines the demo narrative.

Key use cases:

  • Demos: Showing executives how AppFunctions work in a hands-free, conversational way
  • Developer testing: Rapidly testing AppFunction flows via voice without typing complex queries
  • Accessibility: Voice-first interaction for users who can't easily type on mobile

Technical Context

The agent currently uses the Interactions API with text input exclusively. Google already offers the Gemini Live API which supports:

  • Real-time audio input/output (bidirectional streaming)
  • Tool use / function calling — critical for AppFunctions execution
  • Low-latency audio-to-audio via gemini-3.1-flash-live-preview
  • Session management for multi-turn conversations

The Live API already supports tools, meaning the existing AppFunction → Gemini tool schema conversion could be reused. The agent would need:

  1. Audio capture (microphone) → streaming to Live API
  2. Live API processes speech, calls AppFunctions as tools (same flow as today)
  3. Audio response streamed back → speaker output

Suggested Implementation

  • Add a "Voice mode" toggle or a push-to-talk button in the chat UI
  • Show a transcript in the chat alongside audio (for clarity and debugging)

Additional context

No response

Code of Conduct

  • I agree to follow this project's Code of Conduct

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request
    No fields configured for Feature.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions