Audio EOT by chenghao-mou · Pull Request #4722 · livekit/agents

chenghao-mou · 2026-02-05T15:09:40Z

Adds streaming audio end-of-turn detection. Single user-facing AudioTurnDetector in livekit-plugins-turn-detector selects between two backends:

eot-audio-cloud
eot-audio-mini

On cloud transport error or predict_end_of_turn timeout, the session swaps to local for the rest of the stream (sticky per session, one warning per failure mode). Local failures emit the default 1.0 prediction and retry on the next turn. A user-set unlikely_threshold is scaled multiplicatively against the cloud default so the operating point survives a fallback.

Wired into AudioRecognition: VAD INFERENCE_DONE triggers warmup, END_OF_SPEECH activates the stream, predictions flow back through _run_eou_detection and arbitrate against the endpointing delay. A speaking guard cancels an in-flight bounce if VAD START_OF_SPEECH fires mid-window.

Structure

livekit/agents/voice/turn.py — abstract _AudioTurnDetector / _AudioTurnDetectorStream (FSM) live alongside the existing _TurnDetector Protocol.
livekit/plugins/turn_detector/audio.py — unified detector + concrete FSM stream that dispatches to the active transport.
livekit/plugins/turn_detector/transports.py — AudioTurnDetectionTransport Protocol + _CloudTransport (WS + protobuf) + _LocalTransport (ctypes). Fallback swaps the transport instance, not the stream.
livekit/plugins/turn_detector/languages.py — CLOUD_LANGUAGES (0.4) + LOCAL_LANGUAGES (0.3) per-language thresholds.

Test plan

tests/test_turn_detection_fsm.py — 11 FSM cases incl. WARMING_UP / set_active(False) regression.
tests/test_turn_detection_cloud_stream.py — 4 cloud-transport invariants (retry reset, FIFO send ordering).
tests/test_audio_turn_detector_fallback.py — 15 cases: auto-select, explicit-mode errors, transport-error fallback, timeout fallback, persistence, missing-lib graceful, local-failure retry, warning dedupe, multiplicative threshold scaling.
tests/test_audio_recognition_turn_detection.py — 10 cases: VAD/audio/sentinel forwarding into the stream, prediction-driven EOU + deactivation, speaking-guard race aborts commit.
make format + make lint + make type-check clean (only pre-existing agent_activity.py InterruptionDetectionError errors remain).

Depending on livekit/python-sdks#676

hsjun99 · 2026-02-25T01:00:31Z

@chenghao-mou Excited to see this! A couple of questions:

Will the multimodal EOT model be publicly accessible via model weights or agent-gateway.livekit.cloud, or in some other way?
Any rough timeline for when MultiModalTurnDetector gets fully wired up?

chenghao-mou · 2026-02-25T10:07:07Z

@chenghao-mou Excited to see this! A couple of questions:

Will the multimodal EOT model be publicly accessible via model weights or agent-gateway.livekit.cloud, or in some other way?

Any rough timeline for when MultiModalTurnDetector gets fully wired up?

Thanks for your patience! We don't have an official decision or timeline yet, but hopefully I can get it ready within a month or two.

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

devin-ai-integration

Devin Review found 1 new potential issue.

View 28 additional findings in Devin Review.

devin-ai-integration · 2026-05-25T14:28:13Z

+            # min_endpointing_delay path.
+            logger.warning(
+                "predict_end_of_turn called with no open user turn, short-circuiting",
+            )
+            return 1.0


🟡 WARNING-level log fires on every normal turn when audio EOT + STT are combined

When AudioTurnDetector is used with STT, predict_end_of_turn logs at WARNING level every time it's called after the audio EOT model already committed the turn. The normal flow is: (1) VAD END_OF_SPEECH triggers _run_eou_detection → bounce task → predict_end_of_turn returns cached prediction → turn committed → flush() sets _user_turn_started=False. (2) A late STT final transcript arrives → _run_eou_detection triggers again → bounce task → predict_end_of_turn sees _user_turn_started=False → logs WARNING and short-circuits. This WARNING fires once per committed turn in the most common configuration (audio EOT + Deepgram STT), flooding logs in production with a message that describes normal, expected behavior.

Suggested change

# min_endpointing_delay path.

logger.warning(

"predict_end_of_turn called with no open user turn, short-circuiting",

)

return 1.0

logger.debug(

"predict_end_of_turn called with no open user turn, short-circuiting",

)

return 1.0

Was this helpful? React with 👍 or 👎 to provide feedback.

add interface draft

87068d5

chenghao-mou added 25 commits March 6, 2026 10:47

Merge branch 'main' into feat/AGT-2520-multimodal-EOU

e0d5ec1

draft

8eebccc

fix type issues

f92fbc0

refactor stream to support turn detector protocol

d1086ff

minor fixes

0a02bb1

minor fixes

168d0d7

WIP: use only ws stream

277db6e

Merge branch 'main' into feat/AGT-2520-multimodal-EOU

03c0e2e

fix uv.lock bad merge

56b4796

WIP: more refactoring

be9a550

fix mypy

601229c

remove temp url

c4d92f8

disable turn detection when agent is still speaking

e963d85

minor refactoring

c529d79

fix type issues

09baed8

wip

3830638

clean up encoder

f214aa0

wip

c922f44

Merge branch 'main' into feat/AGT-2520-multimodal-EOU

f94a0dd

update protos

604bfdc

minor fixes

f9ec64a

address comments

ddbf594

add text fallback

d465564

add text fallback

6e7d6bf

fix threshold

200d634

chenghao-mou marked this pull request as ready for review April 22, 2026 07:38

chenghao-mou requested a review from a team April 22, 2026 07:38

chenghao-mou added 5 commits May 17, 2026 19:25

clean up

8b150aa

minor refactor and clean up

28af3f5

refactor

75ddae6

Merge branch 'main' into feat/AGT-2520-multimodal-EOU

76cec5d

refactor

2ccf54d

This comment was marked as resolved.

Sign in to view

chenghao-mou and others added 14 commits May 19, 2026 13:20

clean up

7fbca08

refactor

82c599a

clean up

4b6fdb5

more refactoring

7500160

fix makefile indentation

efe8d5c

update protocol

3237f9d

add direct commit for late stt transcripts

09cdb0c

Merge branch 'main' into feat/AGT-2520-multimodal-EOU

f02d24e

add local inference dependency

0a6a66d

update deps

80fbc29

use local inference pacakge and deprecate the turn detector package

21310ee

feat(vad): bundle optimized silero vad and deprecate the plugin (#5800)

5408ae1

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Merge branch 'main' into feat/AGT-2520-multimodal-EOU

442d857

fix type issue

9d9cb52

This comment was marked as resolved.

Sign in to view

chenghao-mou added 2 commits May 25, 2026 10:22

Merge branch 'main' into feat/AGT-2520-multimodal-EOU

2b4cc7e

refactoring

a42cf7a

This comment was marked as resolved.

Sign in to view

chenghao-mou added 2 commits May 25, 2026 15:15

drop duplicate calls and simplify triggers

f592a16

fix vad restore bug

82ad113

devin-ai-integration Bot reviewed May 25, 2026

View reviewed changes

chenghao-mou added 3 commits May 25, 2026 15:40

adjust thresholds for the cloud model

f93f7ca

update warning message

7ff1eb1

Merge branch 'main' into feat/AGT-2520-multimodal-EOU

e9b8a1e

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Audio EOT#4722

Audio EOT#4722
chenghao-mou wants to merge 78 commits into
mainfrom
feat/AGT-2520-multimodal-EOU

chenghao-mou commented Feb 5, 2026 •

edited

Loading

Uh oh!

hsjun99 commented Feb 25, 2026

Uh oh!

chenghao-mou commented Feb 25, 2026

Uh oh!

This comment was marked as resolved.

Uh oh!

This comment was marked as resolved.

Uh oh!

This comment was marked as resolved.

Uh oh!

devin-ai-integration Bot left a comment

Uh oh!

devin-ai-integration Bot May 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

chenghao-mou commented Feb 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Structure

Test plan

Uh oh!

hsjun99 commented Feb 25, 2026

Uh oh!

chenghao-mou commented Feb 25, 2026

Uh oh!

This comment was marked as resolved.

Uh oh!

This comment was marked as resolved.

Uh oh!

This comment was marked as resolved.

Uh oh!

devin-ai-integration Bot left a comment

Choose a reason for hiding this comment

Uh oh!

devin-ai-integration Bot May 25, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

chenghao-mou commented Feb 5, 2026 •

edited

Loading