Skip to content

feat(aws): add auto language detection and mid-stream language switch…#5435

Open
cldsime wants to merge 1 commit intolivekit:mainfrom
cldsime:feature/auto-language-detection
Open

feat(aws): add auto language detection and mid-stream language switch…#5435
cldsime wants to merge 1 commit intolivekit:mainfrom
cldsime:feature/auto-language-detection

Conversation

@cldsime
Copy link
Copy Markdown

@cldsime cldsime commented Apr 13, 2026

Summary

Adds support for Amazon Transcribe's automatic language identification parameters to the livekit-plugins-aws STT plugin, enabling auto language detection and mid-stream language switching without requiring users to manually specify a language_code.

Changes

6 new parameters added to STTOptions and STT.__init__():

  • identify_language — detect the dominant language for the stream
  • identify_multiple_languages — detect language switches mid-stream
  • language_options — comma-separated list of expected language codes (2–12 required)
  • preferred_language — bias detection toward a specific language
  • vocabulary_names — custom vocabularies per language
  • vocabulary_filter_names — vocabulary filters per language

All default to disabled (False / NOT_GIVEN), preserving full backward compatibility.

Conditional config building in SpeechStream._run():

language_code and identify_language/identify_multiple_languages are mutually exclusive per the AWS API. The config builder now conditionally sends one or the other:

if self._opts.identify_language:
    live_config["identify_language"] = True
    ...
elif self._opts.identify_multiple_languages:
    live_config["identify_multiple_languages"] = True
    ...
else:
    live_config["language_code"] = self._opts.language

Bug fix: filtered_config boolean handling

The original filter {k: v for k, v in live_config.items() if v and is_given(v)} silently drops False booleans. Replaced with explicit type checking that correctly preserves booleans, numbers, and NOT_GIVEN values.

Usage

# Existing behavior — unchanged
stt = STT(language="en-US")

Single language auto-detection

stt = STT(identify_language=True, language_options="en-US,es-US,fr-FR")

Multi-language mid-stream switching

stt = STT(
identify_multiple_languages=True,
language_options="en-US,es-US,fr-FR,de-DE,ja-JP,ko-KR,zh-HK,pt-BR,hi-IN,vi-VN,pl-PL,ru-RU",
)

Test Results

Tested with identify_multiple_languages=True and 12 language codes. All 12 configured languages were successfully detected across test sessions with mid-stream switching:

Language Code Detected
English en-US
Spanish es-US
French fr-FR
German de-DE
Japanese ja-JP
Korean ko-KR
Cantonese zh-HK
Portuguese pt-BR
Hindi hi-IN
Vietnamese vi-VN
Polish pl-PL
Russian ru-RU

Sample output showing mid-stream language switching in a single session:

[FINAL] [en-US] Good afternoon, everyone. Welcome to day 4.
[FINAL] [zh-HK] 你聽緊嘅係SBS電台廣東話節目
[FINAL] [vi-VN] Xin kính chào quý vị và các bạn
[FINAL] [pt-BR] Nós estamos aqui de azul hoje porque azul é a cor mais celebrando
[FINAL] [pl-PL] Dzień dobry, przy mikrofonie Joanna Borkowska Surucić

Backward Compatibility

All new parameters default to disabled. Existing code continues to work without any changes.


@CLAassistant
Copy link
Copy Markdown

CLAassistant commented Apr 13, 2026

CLA assistant check
All committers have signed the CLA.

Copy link
Copy Markdown
Contributor

@devin-ai-integration devin-ai-integration bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 1 potential issue.

View 4 additional findings in Devin Review.

Open in Devin Review

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 LanguageCode(None) crash when language identification is enabled and resp.language_code is missing

When identify_language or identify_multiple_languages is True, self._opts.language is set to None (line 142-144 in the constructor). In _streaming_recognize_response_to_speech_data, line 431 uses LanguageCode(resp.language_code or self._opts.language). If resp.language_code is also None or empty (which can happen for partial results), the or expression evaluates to None, and LanguageCode(None) is called. This crashes in _normalize_language (language.py:37) with AttributeError: 'NoneType' object has no attribute 'strip'.

(Refers to line 431)

Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants