Audio EOT#4722
Conversation
|
@chenghao-mou Excited to see this! A couple of questions:
|
Thanks for your patience! We don't have an official decision or timeline yet, but hopefully I can get it ready within a month or two. |
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
| # min_endpointing_delay path. | ||
| logger.warning( | ||
| "predict_end_of_turn called with no open user turn, short-circuiting", | ||
| ) | ||
| return 1.0 |
There was a problem hiding this comment.
π‘ WARNING-level log fires on every normal turn when audio EOT + STT are combined
When AudioTurnDetector is used with STT, predict_end_of_turn logs at WARNING level every time it's called after the audio EOT model already committed the turn. The normal flow is: (1) VAD END_OF_SPEECH triggers _run_eou_detection β bounce task β predict_end_of_turn returns cached prediction β turn committed β flush() sets _user_turn_started=False. (2) A late STT final transcript arrives β _run_eou_detection triggers again β bounce task β predict_end_of_turn sees _user_turn_started=False β logs WARNING and short-circuits. This WARNING fires once per committed turn in the most common configuration (audio EOT + Deepgram STT), flooding logs in production with a message that describes normal, expected behavior.
| # min_endpointing_delay path. | |
| logger.warning( | |
| "predict_end_of_turn called with no open user turn, short-circuiting", | |
| ) | |
| return 1.0 | |
| logger.debug( | |
| "predict_end_of_turn called with no open user turn, short-circuiting", | |
| ) | |
| return 1.0 |
Was this helpful? React with π or π to provide feedback.
Requires livekit/protocol#1485
Adds streaming audio end-of-turn detection. Single user-facing
AudioTurnDetectorinlivekit-plugins-turn-detectorselects between two backends:eot-audio-cloudeot-audio-miniOn cloud transport error or
predict_end_of_turntimeout, the session swaps to local for the rest of the stream (sticky per session, one warning per failure mode). Local failures emit the default1.0prediction and retry on the next turn. A user-setunlikely_thresholdis scaled multiplicatively against the cloud default so the operating point survives a fallback.Wired into
AudioRecognition: VADINFERENCE_DONEtriggerswarmup,END_OF_SPEECHactivates the stream, predictions flow back through_run_eou_detectionand arbitrate against the endpointing delay. A speaking guard cancels an in-flight bounce if VADSTART_OF_SPEECHfires mid-window.Structure
livekit/agents/voice/turn.pyβ abstract_AudioTurnDetector/_AudioTurnDetectorStream(FSM) live alongside the existing_TurnDetectorProtocol.livekit/plugins/turn_detector/audio.pyβ unified detector + concrete FSM stream that dispatches to the active transport.livekit/plugins/turn_detector/transports.pyβAudioTurnDetectionTransportProtocol +_CloudTransport(WS + protobuf) +_LocalTransport(ctypes). Fallback swaps the transport instance, not the stream.livekit/plugins/turn_detector/languages.pyβCLOUD_LANGUAGES(0.4) +LOCAL_LANGUAGES(0.3) per-language thresholds.Test plan
tests/test_turn_detection_fsm.pyβ 11 FSM cases incl. WARMING_UP /set_active(False)regression.tests/test_turn_detection_cloud_stream.pyβ 4 cloud-transport invariants (retry reset, FIFO send ordering).tests/test_audio_turn_detector_fallback.pyβ 15 cases: auto-select, explicit-mode errors, transport-error fallback, timeout fallback, persistence, missing-lib graceful, local-failure retry, warning dedupe, multiplicative threshold scaling.tests/test_audio_recognition_turn_detection.pyβ 10 cases: VAD/audio/sentinel forwarding into the stream, prediction-driven EOU + deactivation, speaking-guard race aborts commit.make format+make lint+make type-checkclean (only pre-existingagent_activity.pyInterruptionDetectionError errors remain).Depending on livekit/python-sdks#676