feat(voice): add first_transcript_after_eos_delay metric by anshulkulhari7 · Pull Request #6105 · livekit/agents

anshulkulhari7 · 2026-06-15T09:57:38Z

Summary

Closes #4795.

Adds a new end-of-turn latency metric, first_transcript_after_eos_delay, computed as:

first_transcript_after_eos_delay = first_transcript_time - end_of_speech_time

where end_of_speech_time is captured on VAD or STT end-of-speech, and first_transcript_time is the first interim/final transcript event received after that EOS.

Why

The existing transcription_delay is computed from the last speaking anchor to the final transcript, so it reflects end-to-end turn latency rather than "speech end → first result". As noted in the issue, a "speech end → first transcript" metric is needed for comparing provider latency and aligning with backend SLA definitions. This new metric is distinct from and complementary to transcription_delay.

What changed

AudioRecognition now tracks two new per-turn anchors:

_end_of_speech_time — set on VAD END_OF_SPEECH and STT END_OF_SPEECH (stt turn-detection mode).
_first_transcript_after_eos_time — set to the arrival time of the first transcript event (interim / preflight / final) that occurs after an end-of-speech, and only set once per turn so later transcripts don't move it.

The computed delay is plumbed through additively:

_EndOfTurnInfo.first_transcript_after_eos_delay
EOUMetrics.first_transcript_after_eos_delay (defaults to 0.0)
the user MetricsReport
log_metrics output for EOUMetrics
the user_turn telemetry span (lk.first_transcript_after_eos_delay)

The metric is computed independently of the VAD speaking anchors used by transcription_delay / end_of_turn_delay, and falls back to 0.0 (VAD EOS) when STT EOS is unavailable, per the issue's note. When no transcript arrives strictly after EOS the value is left unset (None internally, 0.0 on the public model), which is honest rather than reporting a misleading value. The change is additive and backward-compatible — no existing field or default changes.

Testing

New unit test tests/test_first_transcript_after_eos.py drives AudioRecognition directly with crafted VAD/STT events (no audio, no network) and asserts:

a VAD END_OF_SPEECH event anchors _end_of_speech_time;
the first transcript after EOS anchors the time and later transcripts don't overwrite it;
a transcript arriving before any EOS does not set the anchor;
the value reported on _EndOfTurnInfo equals first_transcript_time - end_of_speech_time.

$ uv run pytest tests/test_first_transcript_after_eos.py --unit
4 passed

Also verified no regressions in the existing audio-recognition / session suites:

$ uv run pytest tests/test_speech_start_time_persistence.py \
    tests/test_audio_recognition_push_audio.py \
    tests/test_audio_recognition_aclose.py \
    tests/test_agent_session.py tests/test_session_host.py --unit

Lint / format / types on the changed files:

$ uv run ruff check <changed files>          # All checks passed!
$ uv run ruff format --check <changed files>  # already formatted
$ mypy --strict (pydantic plugin) -p livekit.agents  # no errors in changed modules

AI-assisted: this change was prepared with AI assistance and reviewed/verified by me.

Add a new end-of-turn latency metric, `first_transcript_after_eos_delay`, computed as `first_transcript_time - end_of_speech_time`. The existing `transcription_delay` measures the time from the last speaking anchor to the *final* transcript, which reflects end-to-end latency. The new metric measures the time from end-of-speech (VAD or STT EOS) to the *first* transcript event (interim or final) received after it, which is useful for comparing provider latency and aligning with backend SLA definitions. `AudioRecognition` now tracks two new per-turn anchors: the end-of-speech time (set on VAD/STT END_OF_SPEECH) and the time of the first transcript event after it. The delay is surfaced on `_EndOfTurnInfo`, `EOUMetrics`, the user `MetricsReport`, the `log_metrics` output and the `user_turn` telemetry span. The field defaults to 0.0 when no end of speech was detected, keeping the change additive and backward-compatible. Closes livekit#4795

devin-ai-integration

✅ Devin Review: No Issues Found

Devin Review analyzed this PR and found no bugs or issues to report.

anshulkulhari7 requested a review from a team as a code owner June 15, 2026 09:57

devin-ai-integration Bot reviewed Jun 15, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(voice): add first_transcript_after_eos_delay metric#6105

feat(voice): add first_transcript_after_eos_delay metric#6105
anshulkulhari7 wants to merge 1 commit into
livekit:mainfrom
anshulkulhari7:feat/4795-first-transcript-after-eos-metric

anshulkulhari7 commented Jun 15, 2026

Uh oh!

devin-ai-integration Bot left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

anshulkulhari7 commented Jun 15, 2026

Summary

Why

What changed

Testing

Uh oh!

devin-ai-integration Bot left a comment

Choose a reason for hiding this comment

✅ Devin Review: No Issues Found

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant