Skip to content

fix(cartesia): expand TTS language support and add WebSocket timestamp params#5361

Open
Namit1867 wants to merge 1 commit intolivekit:mainfrom
Namit1867:fix/cartesia-expand-language-and-ws-timestamp-params
Open

fix(cartesia): expand TTS language support and add WebSocket timestamp params#5361
Namit1867 wants to merge 1 commit intolivekit:mainfrom
Namit1867:fix/cartesia-expand-language-and-ws-timestamp-params

Conversation

@Namit1867
Copy link
Copy Markdown
Contributor

@Namit1867 Namit1867 commented Apr 7, 2026

Summary

This PR syncs the livekit-plugins-cartesia TTS plugin with missing parameters from the latest Cartesia API.

Changes

Language support (models.py)

  • Expanded TTSLanguages from 7 languages (en, es, fr, de, pt, zh, ja) to the full 42 languages now supported by the Cartesia TTS API
  • The added languages include: hi, ko, it, nl, pl, ru, sv, tr, tl, bg, ro, ar, cs, el, fi, hr, ms, sk, da, ta, uk, hu, no, vi, bn, th, he, ka, id, te, gu, kn, ml, mr, or, pa
  • Ref: https://docs.cartesia.ai/api-reference/tts/bytes#body-language

WebSocket-only params (tts.py)

  • Added add_phoneme_timestamps: bool = False — requests phoneme-level timing data in streamed responses. Sent as add_phoneme_timestamps: true in the WebSocket payload when enabled.
  • Added use_normalized_timestamps: bool = False — toggles timestamp normalization in streamed responses.
  • Both params are exposed on TTS.__init__() and TTS.update_options().
  • Ref: https://docs.cartesia.ai/api-reference/tts/websocket

WebSocket response handling (tts.py)

  • Added explicit handling for the phoneme_timestamps key in _recv_task to avoid spurious "unexpected message" warnings when add_phoneme_timestamps is enabled. Phoneme timestamps are received but not currently surfaced through the output emitter (no phoneme-level transcript API exists in the base tts module).

References

…p params

- Expand TTSLanguages from 7 to 42 languages to match Cartesia TTS API
  Ref: https://docs.cartesia.ai/api-reference/tts/bytes#body-language

- Add `add_phoneme_timestamps` param (WebSocket only) to request phoneme-level
  timing data from Cartesia
  Ref: https://docs.cartesia.ai/api-reference/tts/websocket

- Add `use_normalized_timestamps` param (WebSocket only) to toggle timestamp
  normalization in streamed responses
  Ref: https://docs.cartesia.ai/api-reference/tts/websocket

- Handle `phoneme_timestamps` key in WebSocket receive loop to avoid
  spurious "unexpected message" warnings when the feature is enabled

Both new params are exposed on TTS.__init__ and update_options().
Copy link
Copy Markdown
Contributor

@devin-ai-integration devin-ai-integration bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Devin Review: No Issues Found

Devin Review analyzed this PR and found no potential bugs to report.

View in Devin Review to see 2 additional findings.

Open in Devin Review

@Namit1867
Copy link
Copy Markdown
Contributor Author

@davidzhao Could you please review my PR when you get a chance? Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant