feat(voice): add first_transcript_after_eos_delay metric#6105
Open
anshulkulhari7 wants to merge 1 commit into
Open
feat(voice): add first_transcript_after_eos_delay metric#6105anshulkulhari7 wants to merge 1 commit into
anshulkulhari7 wants to merge 1 commit into
Conversation
Add a new end-of-turn latency metric, `first_transcript_after_eos_delay`, computed as `first_transcript_time - end_of_speech_time`. The existing `transcription_delay` measures the time from the last speaking anchor to the *final* transcript, which reflects end-to-end latency. The new metric measures the time from end-of-speech (VAD or STT EOS) to the *first* transcript event (interim or final) received after it, which is useful for comparing provider latency and aligning with backend SLA definitions. `AudioRecognition` now tracks two new per-turn anchors: the end-of-speech time (set on VAD/STT END_OF_SPEECH) and the time of the first transcript event after it. The delay is surfaced on `_EndOfTurnInfo`, `EOUMetrics`, the user `MetricsReport`, the `log_metrics` output and the `user_turn` telemetry span. The field defaults to 0.0 when no end of speech was detected, keeping the change additive and backward-compatible. Closes livekit#4795
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Closes #4795.
Adds a new end-of-turn latency metric,
first_transcript_after_eos_delay, computed as:where
end_of_speech_timeis captured on VAD or STT end-of-speech, andfirst_transcript_timeis the first interim/final transcript event received after that EOS.Why
The existing
transcription_delayis computed from the last speaking anchor to the final transcript, so it reflects end-to-end turn latency rather than "speech end → first result". As noted in the issue, a "speech end → first transcript" metric is needed for comparing provider latency and aligning with backend SLA definitions. This new metric is distinct from and complementary totranscription_delay.What changed
AudioRecognitionnow tracks two new per-turn anchors:_end_of_speech_time— set on VADEND_OF_SPEECHand STTEND_OF_SPEECH(sttturn-detection mode)._first_transcript_after_eos_time— set to the arrival time of the first transcript event (interim / preflight / final) that occurs after an end-of-speech, and only set once per turn so later transcripts don't move it.The computed delay is plumbed through additively:
_EndOfTurnInfo.first_transcript_after_eos_delayEOUMetrics.first_transcript_after_eos_delay(defaults to0.0)MetricsReportlog_metricsoutput forEOUMetricsuser_turntelemetry span (lk.first_transcript_after_eos_delay)The metric is computed independently of the VAD speaking anchors used by
transcription_delay/end_of_turn_delay, and falls back to0.0(VAD EOS) when STT EOS is unavailable, per the issue's note. When no transcript arrives strictly after EOS the value is left unset (Noneinternally,0.0on the public model), which is honest rather than reporting a misleading value. The change is additive and backward-compatible — no existing field or default changes.Testing
New unit test
tests/test_first_transcript_after_eos.pydrivesAudioRecognitiondirectly with crafted VAD/STT events (no audio, no network) and asserts:END_OF_SPEECHevent anchors_end_of_speech_time;_EndOfTurnInfoequalsfirst_transcript_time - end_of_speech_time.Also verified no regressions in the existing audio-recognition / session suites:
Lint / format / types on the changed files:
AI-assisted: this change was prepared with AI assistance and reviewed/verified by me.