Skip to content

feat(plugins-soniox): surface per-run language segments#1602

Open
rosetta-livekit-bot[bot] wants to merge 3 commits into
mainfrom
manuring-hurling-eloped
Open

feat(plugins-soniox): surface per-run language segments#1602
rosetta-livekit-bot[bot] wants to merge 3 commits into
mainfrom
manuring-hurling-eloped

Conversation

@rosetta-livekit-bot
Copy link
Copy Markdown
Contributor

@rosetta-livekit-bot rosetta-livekit-bot Bot commented May 25, 2026

Summary

Fixes #5685 (and the follow-up source-side symptom raised in the comment thread, which @chenghao-mou approved bundling into the same PR).

Both halves are the same plugin bug: _TokenAccumulator._lang_segments is built per-run by the existing coalescing logic but then dropped in send_endpoint_transcript (and the interim path). The fix surfaces it through new SpeechData fields on the target side, and stops dropping it on the source side in non-translation mode.

Changes

  • stt.SpeechData: add target_languages / target_texts (symmetric to existing source_languages / source_texts). Same LanguageCode coercion in __post_init__. Default None, so the addition is strictly additive for every other plugin.
  • Soniox plugin, translation mode: populate target_* from final._lang_segments on FINAL_TRANSCRIPT and INTERIM_TRANSCRIPT / PREFLIGHT_TRANSCRIPT. Consumers now see the per-run target breakdown for code-switched two-way translation, e.g. target_languages=["en", "es"] / target_texts=["Hello, how are you?", " Estoy bien, gracias."] for the translation of "Hello, ¿cómo estás? I'm doing fine, gracias.".
  • Soniox plugin, non-translation mode: populate source_* from the same accumulator (previously None). A code-switched ja + en utterance now surfaces source_languages=["ja", "en"] / source_texts=["こんにちは、私の名はサムです。", " My name is Sam."] -- matches what the SpeechData docstring already promised for "multi-language detection services".
  • Refactor: extract a _lang_segments_to_fields helper to DRY the conversion across both modes and both event paths; the four duplicated inline list comprehensions collapse to one named operation. The predicate that distinguishes source from target became data-presence-based (final_original._lang_segments) rather than config-based (is_translation_mode is not None), which is what unified both halves cleanly.

SpeechData.text and SpeechData.language are unchanged for back-compat (still the full concatenation and the first translated/detected language, respectively).

Test plan

  • 14 new unit tests in tests/test_plugin_soniox_stt.py covering:
    • SpeechData.__post_init__ target_languages coercion (strings → LanguageCode, None stays None, existing LanguageCode passthrough)
    • _TokenAccumulator._lang_segments per-run coalescing
    • _lang_segments_to_fields helper edge cases (empty → (None, None), non-empty → parallel lists with LanguageCode coercion)
    • Two-way translation, code-switched (the issue's canonical example)
    • One-way translation (single target run)
    • "none" untranslated chunk inside a translated utterance (asymmetric per-run list lengths)
    • Interim path: translation mode merging final + non-final per run on both sides
    • Interim path: non-translation mode populates source_* from final + non-final merged
    • Non-translation single-language: source_* populated, target_* None
    • Non-translation code-switched JA+EN: source_* carries the per-run breakdown
  • Live-verified end-to-end against the real Soniox WebSocket API in console mode:
    • Translation mode, code-switched "Hello, ¿cómo estás? I'm doing fine, gracias."text="Hello, how are you? Estoy bien, gracias.", target_languages=["en", "es"], target_texts=["Hello, how are you?", " Estoy bien, gracias."], "".join(target_texts) == text. Source side unchanged.
    • Non-translation mode, code-switched " こんにちは、私の名はサムです。 My name is Sam."text=" こんにちは、私の名はサムです。 My name is Sam.", source_languages=["ja", "en"], source_texts=[" こんにちは、私の名はサムです。", " My name is Sam."], target_* correctly None. Interim events also surface the multi-language source breakdown progressively as the user code-switches.
  • ruff format clean, ruff check clean, no new mypy --strict errors introduced in changed files.

Follow-ups (intentionally not in this PR)

  • The final / final_original accumulator names are honest about routing today but the new target_* fields make their two-mode roles more glaring (final is "primary user-facing accumulator", final_original is "source-side accumulator that's empty in non-translation mode"). Worth a separate behavior-preserving rename PR to final_primary / final_source.
  • The new target_* fields are wired in Soniox only; other translation-capable plugins (Gladia, Deepgram v2, AWS) can adopt them in follow-up PRs.

@changeset-bot
Copy link
Copy Markdown

changeset-bot Bot commented May 25, 2026

🦋 Changeset detected

Latest commit: 96ea001

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 34 packages
Name Type
@livekit/agents Patch
@livekit/agents-plugin-soniox Patch
@livekit/agents-plugin-anam Patch
@livekit/agents-plugin-assemblyai Patch
@livekit/agents-plugin-baseten Patch
@livekit/agents-plugin-bey Patch
@livekit/agents-plugin-cartesia Patch
@livekit/agents-plugin-cerebras Patch
@livekit/agents-plugin-deepgram Patch
@livekit/agents-plugin-elevenlabs Patch
@livekit/agents-plugin-fishaudio Patch
@livekit/agents-plugin-google Patch
@livekit/agents-plugin-hedra Patch
@livekit/agents-plugin-hume Patch
@livekit/agents-plugin-inworld Patch
@livekit/agents-plugin-lemonslice Patch
@livekit/agents-plugin-liveavatar Patch
@livekit/agents-plugin-livekit Patch
@livekit/agents-plugin-minimax Patch
@livekit/agents-plugin-mistral Patch
@livekit/agents-plugin-mistralai Patch
@livekit/agents-plugin-neuphonic Patch
@livekit/agents-plugin-openai Patch
@livekit/agents-plugin-perplexity Patch
@livekit/agents-plugin-phonic Patch
@livekit/agents-plugin-resemble Patch
@livekit/agents-plugin-rime Patch
@livekit/agents-plugin-runway Patch
@livekit/agents-plugin-sarvam Patch
@livekit/agents-plugin-silero Patch
@livekit/agents-plugin-tavus Patch
@livekit/agents-plugins-test Patch
@livekit/agents-plugin-trugen Patch
@livekit/agents-plugin-xai Patch

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

devin-ai-integration[bot]

This comment was marked as resolved.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant