Please read this first
- Have you read the docs? Yes.
- Have you searched for related issues? Yes. I searched existing issues and PRs for
voice text_splitter short chunk and did not find a matching report or fix.
Describe the bug
StreamedAudioResult drops text when a custom TTS text_splitter returns a non-empty chunk shorter than 20 characters. This can silently omit short responses such as "ok" even though the splitter explicitly marked the text as ready for TTS.
The default sentence splitter already applies its own minimum sentence length before returning a chunk. Once a custom splitter returns non-empty text, StreamedAudioResult should treat that as the splitter's decision and send the chunk to the TTS model.
I also checked adjacent voice streaming behavior. The same path needs to emit turn_ended when a custom splitter consumes all buffered text before _turn_done() is called; otherwise a turn can start without a matching turn-ended lifecycle event.
Debug information
- Agents SDK version:
0.17.1, reproduced on main at 92e014a4
- Python version: Python 3.12.1
Repro steps
Run this minimal script:
import asyncio
from collections.abc import AsyncIterator
import numpy as np
from agents.voice import StreamedAudioResult, TTSModel, TTSModelSettings, VoicePipelineConfig
class RecordingTTS(TTSModel):
def __init__(self):
self.texts = []
@property
def model_name(self) -> str:
return "recording_tts"
async def run(self, text: str, settings: TTSModelSettings) -> AsyncIterator[bytes]:
self.texts.append(text)
yield np.zeros(2, dtype=np.int16).tobytes()
def split_immediately(text: str) -> tuple[str, str]:
return text, ""
async def main():
tts = RecordingTTS()
result = StreamedAudioResult(
tts,
TTSModelSettings(buffer_size=1, text_splitter=split_immediately),
VoicePipelineConfig(),
)
await result._add_text("ok")
await result._turn_done()
await result._done()
events = []
audio_chunks = 0
async for event in result.stream():
if event.type == "voice_stream_event_lifecycle":
events.append(event.event)
elif event.type == "voice_stream_event_audio":
events.append("audio")
audio_chunks += 1
print({"tts_texts": tts.texts, "events": events, "audio_chunks": audio_chunks})
asyncio.run(main())
Actual result:
{'tts_texts': [], 'events': ['turn_started', 'session_ended'], 'audio_chunks': 0}
Expected behavior
The non-empty splitter chunk should be sent to TTS even though it is shorter than 20 characters. The run should produce one audio event and a balanced lifecycle sequence, for example:
{'tts_texts': ['ok'], 'events': ['turn_started', 'audio', 'turn_ended', 'session_ended'], 'audio_chunks': 1}
Please read this first
voice text_splitter short chunkand did not find a matching report or fix.Describe the bug
StreamedAudioResultdrops text when a custom TTStext_splitterreturns a non-empty chunk shorter than 20 characters. This can silently omit short responses such as"ok"even though the splitter explicitly marked the text as ready for TTS.The default sentence splitter already applies its own minimum sentence length before returning a chunk. Once a custom splitter returns non-empty text,
StreamedAudioResultshould treat that as the splitter's decision and send the chunk to the TTS model.I also checked adjacent voice streaming behavior. The same path needs to emit
turn_endedwhen a custom splitter consumes all buffered text before_turn_done()is called; otherwise a turn can start without a matching turn-ended lifecycle event.Debug information
0.17.1, reproduced onmainat92e014a4Repro steps
Run this minimal script:
Actual result:
Expected behavior
The non-empty splitter chunk should be sent to TTS even though it is shorter than 20 characters. The run should produce one audio event and a balanced lifecycle sequence, for example: