-
Notifications
You must be signed in to change notification settings - Fork 3k
Description
Description:
I am experiencing an issue where the TTS output generates two different voices and repeats words/phrases within the exact same sentence. This seems like the model is hallucinating a multi-speaker conversation or experiencing audio generation glitches.
This issue started occurring after I migrated from the googleapis/python-genai SDK (v1.65.0) to the new google/adk-python (v1.26.0).
Environment:
- Model: gemini-2.5-flash-native-audio-preview-12-2025
- Old SDK:
python-genaiv1.65.0 (Worked fine, single consistent voice) - Current SDK:
adk-pythonv1.26.0 - Language: Vietnamese
- Use Case: Voice Bot / Virtual Assistant (e.g., Customer Service)
Troubleshooting Steps I've Taken:
- Verified Audio Source: I confirmed this is NOT an audio conversion/decoding error on my end. I intercepted and saved the raw audio bytes directly returned by Google's API, and the anomalies (2 voices, repetitions) are present in the raw file.
- Prompt Engineering: I tried explicitly instructing the model to use only one voice and act as a single agent, but it didn't solve the issue.
- Configuration Check: I adjusted various configurations within the ADK, but the issue persists.
Expected Behavior:
For a single generated response, the TTS should output the text using one consistent voice without sudden voice changes or unnatural repetitions.
Actual Behavior:
The audio output shifts between two distinct voices (as if two different people are speaking) and repeats certain parts of the sentence (e.g., saying "Alo" multiple times or repeating the intro). It feels like the model is trying to simulate both sides of a phone call.
Steps to Reproduce:
(Please review the attached code snippet and audio file)
- Initialize the
adk-pythonclient (v1.26.0). - Send a prompt simulating a customer service scenario in Vietnamese (e.g., "Trung tâm hành chính công xin nghe...").
- Listen to the raw audio output returned by the model.
Attachments:
- audio: a51c5d58-3f0d-4035-af9b-c0285682eb99_ai.wav
- config:
