Skip to content

feat(soniox): support stt-rt-v5 with endpoint_sensitivity option#6126

Open
mihafabcic-soniox wants to merge 3 commits into
livekit:mainfrom
mihafabcic-soniox:feat/soniox-stt-v5
Open

feat(soniox): support stt-rt-v5 with endpoint_sensitivity option#6126
mihafabcic-soniox wants to merge 3 commits into
livekit:mainfrom
mihafabcic-soniox:feat/soniox-stt-v5

Conversation

@mihafabcic-soniox

Copy link
Copy Markdown
Contributor

Updates the LiveKit Soniox plugin for the v5 model.

  • Add endpoint_sensitivity to STTOptions (float | None, range -1.0 to 1.0). Controls how quickly the model commits endpoints. Higher values finalize sooner. Only supported by v5; earlier models reject the field. Skipped on the wire when None so the server uses its default.
  • Default STT model is now stt-rt-v5.
  • Default max_endpoint_delay_ms raised from 500 (the API minimum) to 2000. The old default was too aggressive on phone-call audio: short pauses between word or digit groups would cause Soniox to finalize a segment too early, before the model had enough context. 2000 matches the Soniox API's own default.

2000ms is the Soniox API's own default and works well in practice. The
previous value of 500ms, the API minimum, is too aggressive and can
cause word recognition issues when the model finalizes tokens too early.
@mihafabcic-soniox mihafabcic-soniox requested a review from a team as a code owner June 16, 2026 15:00

@devin-ai-integration devin-ai-integration Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 1 potential issue.

Open in Devin Review

enable_language_identification: bool = True

max_endpoint_delay_ms: int = 500
max_endpoint_delay_ms: int = 2000

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚩 Breaking default change: max_endpoint_delay_ms 500 β†’ 2000

The default max_endpoint_delay_ms changed from 500 to 2000. This is a 4Γ— increase in the maximum endpoint detection delay, meaning existing users who rely on the default will experience noticeably later speech finalization. While this appears intentional for the v5 model, it is a behavioral breaking change for any caller that constructs STTOptions() without explicitly setting this field. The livekit-plugins-inworld plugin references soniox/stt-rt-v4 in comments (livekit-plugins/livekit-plugins-inworld/livekit/plugins/inworld/stt.py:55) β€” that plugin may also need updating if it depends on these defaults.

Open in Devin Review

Was this helpful? React with πŸ‘ or πŸ‘Ž to provide feedback.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant