Is there an existing issue for this?
Describe the problem
The AppFunctions Testing Agent's Gemini API calls exhibit high latency variance (2–20 seconds per interaction turn). This makes the tool impractical for live demonstrations and interactive showcases. The variance comes from Standard tier scheduling, where requests can be deprioritized during peak load.
Describe the solution
Requested Change
Add a configurable service_tier field to the Gemini API request body, exposed as a user-facing setting in the app's settings screen.
Technical Context
The agent currently omits the service_tier field, defaulting to Standard tier.
Google's Priority Inference documentation specifies that adding "service_tier": "priority" to the request body routes traffic to high-priority compute queues, delivering:
- Consistent "seconds" latency (vs "seconds to minutes" for Standard)
- Non-droppable traffic (no deprioritization under load)
- Graceful degradation to Standard if priority limits are exceeded (no failures)
Suggested Implementation
- Add a setting/dropdown in the Settings screen with three options: Standard (default), Priority, Flex
- Pass the selected value as "service_tier" in the JSON request body
- Optionally: surface the x-gemini-service-tier response header somewhere in the UI or logs, so users can verify whether a request was actually served at priority tier or downgraded
Why This Matters
The AppFunctions Testing Agent is the primary tool for demonstrating AppFunctions to stakeholders. The up to 20 second latency variance on Standard tier undermines confidence in the technology during live demos. Priority Inference is specifically designed for this use case ("interactive AI applications" per Google's docs) and would make the testing agent viable for real-time showcases.
Additional context
Prerequisites for Users
- Tier 2+ billing account
- Priority tier costs 75–100% more per token than Standard
Code of Conduct
Is there an existing issue for this?
Describe the problem
The AppFunctions Testing Agent's Gemini API calls exhibit high latency variance (2–20 seconds per interaction turn). This makes the tool impractical for live demonstrations and interactive showcases. The variance comes from Standard tier scheduling, where requests can be deprioritized during peak load.
Describe the solution
Requested Change
Add a configurable service_tier field to the Gemini API request body, exposed as a user-facing setting in the app's settings screen.
Technical Context
The agent currently omits the service_tier field, defaulting to Standard tier.
Google's Priority Inference documentation specifies that adding "service_tier": "priority" to the request body routes traffic to high-priority compute queues, delivering:
Suggested Implementation
Why This Matters
The AppFunctions Testing Agent is the primary tool for demonstrating AppFunctions to stakeholders. The up to 20 second latency variance on Standard tier undermines confidence in the technology during live demos. Priority Inference is specifically designed for this use case ("interactive AI applications" per Google's docs) and would make the testing agent viable for real-time showcases.
Additional context
Prerequisites for Users
Code of Conduct