Skip to content

feat(STT) : Fallback Adapter for STT#1083

Open
gokuljs wants to merge 16 commits intolivekit:mainfrom
gokuljs:fallback-adapter-for-sst
Open

feat(STT) : Fallback Adapter for STT#1083
gokuljs wants to merge 16 commits intolivekit:mainfrom
gokuljs:fallback-adapter-for-sst

Conversation

@gokuljs
Copy link
Contributor

@gokuljs gokuljs commented Mar 1, 2026

No description provided.

@changeset-bot
Copy link

changeset-bot bot commented Mar 1, 2026

⚠️ No Changeset found

Latest commit: 13df187

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

@gokuljs gokuljs marked this pull request as draft March 1, 2026 23:50
@gokuljs gokuljs changed the title Fallback adapter for sst feat(SST) : Fallback Adapter for SST Mar 1, 2026
devin-ai-integration[bot]

This comment was marked as resolved.

@gokuljs gokuljs changed the title feat(SST) : Fallback Adapter for SST feat(STT) : Fallback Adapter for STT Mar 1, 2026
@gokuljs
Copy link
Contributor Author

gokuljs commented Mar 1, 2026

Please do not review it yet. It is still a work in progress.

gokuljs added 3 commits March 2, 2026 10:29
…to 'stt.FallbackAdapter' for consistency in STT module.
…gnition logic, including renaming recovery tasks and making timeout options optional.
@gokuljs gokuljs marked this pull request as ready for review March 8, 2026 20:20
devin-ai-integration[bot]

This comment was marked as resolved.

…ging for input forwarding errors and closing streams on removal to enhance resource management.
devin-ai-integration[bot]

This comment was marked as resolved.

…handling for recognition and streaming tasks, ensuring proper resource management and error logging during recovery attempts.
@gokuljs gokuljs marked this pull request as draft March 9, 2026 02:15
Copy link
Contributor

@devin-ai-integration devin-ai-integration bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 2 new potential issues.

View 11 additional findings in Devin Review.

Open in Devin Review

Comment on lines +40 to +41
export class FallbackAdapter extends STT {
label = 'stt.FallbackAdapter';
Copy link
Contributor

@devin-ai-integration devin-ai-integration bot Mar 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Missing TypeDoc documentation on exported classes/interfaces violates CONTRIBUTING.md

CONTRIBUTING.md requires: "If writing new methods/interfaces/enums/classes, document them. This project uses TypeDoc for automatic API documentation generation, and every new addition has to be properly documented." The FallbackAdapter class, FallbackAdapterOptions interface, AvailabilityChangedEvent interface, and public methods (stream, close, emitAvailabilityChanged, status getter) all lack JSDoc documentation. The TTS equivalent at agents/src/tts/fallback_adapter.ts:52-80 has extensive documentation with class description, features list, and usage example.

Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

…empts, enhancing error handling and introducing delay between retries. Update FallbackSpeechStream to improve resource management by ensuring proper closure of output streams and adjusting logging levels for input forwarding errors.
@gokuljs gokuljs marked this pull request as ready for review March 10, 2026 03:43
devin-ai-integration[bot]

This comment was marked as resolved.

…improving error handling during recognition attempts.
@gokuljs gokuljs marked this pull request as draft March 10, 2026 13:39
@gokuljs gokuljs marked this pull request as ready for review March 10, 2026 13:39
devin-ai-integration[bot]

This comment was marked as resolved.

…ream errors more effectively, ensuring proper resource management and improved error reporting during STT transitions.
gokuljs added 2 commits March 10, 2026 23:08
…and retry intervals, enhancing responsiveness during STT operations.
…dation for configuration options and improved error logging during stream management, ensuring robustness in STT operations.
Copy link
Contributor

@devin-ai-integration devin-ai-integration bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 2 potential issues.

View 6 additional findings in Devin Review.

Open in Devin Review

available: boolean;
}

export class FallbackAdapter extends STT {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Missing TypeDoc documentation on new exported classes and interfaces violates CONTRIBUTING.md

CONTRIBUTING.md mandates: "If writing new methods/interfaces/enums/classes, document them. This project uses TypeDoc for automatic API documentation generation, and every new addition has to be properly documented." The exported FallbackAdapter class, AvailabilityChangedEvent interface, and all public members (sttInstances, attemptTimeoutMs, maxRetryPerSTT, retryIntervalMs, status, emitAvailabilityChanged, stream, close) lack any JSDoc documentation. The TTS equivalent at agents/src/tts/fallback_adapter.ts properly documents every exported class, interface, and public member with JSDoc comments (e.g., lines 13-32, 34-41, 52-79, 82-87, 133-138, 149-164, 232-246, 248-258, 260-268).

Prompt for agents
Add JSDoc documentation to all exported and public members in agents/src/stt/fallback_adapter.ts, following the pattern established in agents/src/tts/fallback_adapter.ts. Specifically:

1. Add a JSDoc block above the FallbackAdapter class (line 40) describing its purpose, features, and an @example block.
2. Add JSDoc to each field of the AvailabilityChangedEvent interface (lines 35-38).
3. Add JSDoc to each public property of FallbackAdapter: sttInstances (line 43), attemptTimeoutMs (line 44), maxRetryPerSTT (line 45), retryIntervalMs (line 46).
4. Add JSDoc to public methods: status getter (line 99), emitAvailabilityChanged (line 110), stream (line 258), close (line 265).
5. Add JSDoc to the FallbackAdapterOptions interface (line 27) and its fields (lines 28-32).

Refer to agents/src/tts/fallback_adapter.ts lines 13-87 for the documentation style to follow.
Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

recoveringStreamTask: Task<void> | null;
}

interface FallbackAdapterOptions {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 FallbackAdapterOptions interface is not exported, unlike the TTS equivalent

The FallbackAdapterOptions interface at line 27 is not exported, meaning users cannot import this type to construct options objects with type safety. The TTS equivalent at agents/src/tts/fallback_adapter.ts:25 exports its FallbackAdapterOptions interface, and the TTS index at agents/src/tts/index.ts re-exports it. The STT index at agents/src/stt/index.ts:16 also doesn't re-export it. This is an inconsistency with the established pattern.

Suggested change
interface FallbackAdapterOptions {
export interface FallbackAdapterOptions {
Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

@gokuljs
Copy link
Contributor Author

gokuljs commented Mar 10, 2026

@toubatbrian, could you please review this code and test it once?

I am also stuck on an issue that shows up while testing. When using FallbackAdapter with maxRetryPerSTT: 1, the failover does not behave as expected. According to the logs, Sarvam STT keeps retrying 3/32, 4/32, and continues all the way up to 32 retries before handing off to the next provider. Ideally, it should fail over after a single attempt. This behaviour is the same for other WebSocket providers like Deepgram as well

logs:

[01:07:23.915] DEBUG (9062): setting participant
    participantIdentity: "futuristic-bit"
[01:07:23.915] DEBUG (9062): setting participant audio input
    participant: "futuristic-bit"
[01:07:23.915] INFO (9062): participantValue.trackPublications
    participantValue: "futuristic-bit"
    trackPublications: [
      {
        "info": {
          "sid": "TR_AMSbfpHhzLAyeN",
          "name": "",
          "kind": "KIND_AUDIO",
          "source": "SOURCE_MICROPHONE",
          "simulcasted": false,
          "width": 0,
          "height": 0,
          "mimeType": "audio/red",
          "muted": false,
          "remote": true,
          "encryptionType": "NONE",
          "audioFeatures": [
            "TF_AUTO_GAIN_CONTROL",
            "TF_ECHO_CANCELLATION",
            "TF_NOISE_SUPPRESSION"
          ]
        },
        "ffiHandle": {},
        "subscribed": false
      }
    ]
    lengthOfTrackPublications: 1
[01:07:23.936] DEBUG (9062): onTrackSubscribed in _input
    participant: "futuristic-bit"
[01:07:25.093] DEBUG (9062): Task.runTask: task performToolExecutions done
[01:07:25.093] DEBUG (9062): Task.runTask: task performLLMInference done
[01:07:25.094] DEBUG (9062): Task.runTask: task performTextForwarding done
[01:07:28.085] DEBUG (9062): Task.runTask: task performTTSInference done
[01:07:28.790] INFO (9062): Sarvam STT connecting to: wss://api.sarvam.ai/speech-to-text/ws?model=saaras%3Av3&vad_signals=true&sample_rate=16000&input_audio_codec=pcm_s16le&language-code=en-IN&mode=transcribe
[01:07:28.957] WARN (9062): Failed to connect to Sarvam STT, retrying in 10s: Error: Unexpected server response: 403 (3/32)
[01:07:28.978] DEBUG (9062): Task.runTask: task performAudioForwarding done
[01:07:29.986] INFO (9062): playout completed without interruption
    speech_id: "speech_59e239a0-889"
    message: "Hello! How can I assist you today?"
[01:07:29.988] DEBUG (9062): Task.runTask: task AgentActivity.pipelineReply done
[01:07:30.020] DEBUG (9062): VAD task: START_OF_SPEECH
[01:07:31.709] DEBUG (9062): VAD task: END_OF_SPEECH
[01:07:31.710] DEBUG (9062): running EOU detection
    audioTranscript: ""
[01:07:31.710] DEBUG (9062): skipping EOU detection
[01:07:33.641] INFO (8702): Sarvam STT connecting to: wss://api.sarvam.ai/speech-to-text/ws?model=saaras%3Av3&vad_signals=true&sample_rate=16000&input_audio_codec=pcm_s16le&language-code=en-IN&mode=transcribe
[01:07:33.820] WARN (8702): Failed to connect to Sarvam STT, retrying in 10s: Error: Unexpected server response: 403 (5/32)
[01:07:38.321] DEBUG (8702): shutting down
    jobID: "AJ_ReguMk5nm3zw"
[01:07:38.322] DEBUG (5230): job exiting
[01:07:38.322] DEBUG (8702): Aborting all pipeline reply tasks due to interruption
    speech_id: "speech_80bd95db-602"
[01:07:38.324] DEBUG (8702): connection state changed
    state: 0
[01:07:38.850] DEBUG (9062): VAD task: START_OF_SPEECH
[01:07:38.958] INFO (9062): Sarvam STT connecting to: wss://api.sarvam.ai/speech-to-text/ws?model=saaras%3Av3&vad_signals=true&sample_rate=16000&input_audio_codec=pcm_s16le&language-code=en-IN&mode=transcribe
[01:07:39.131] WARN (9062): Failed to connect to Sarvam STT, retrying in 10s: Error: Unexpected server response: 403 (4/32)
[01:07:39.779] DEBUG (9062): VAD task: END_OF_SPEECH
[01:07:39.780] DEBUG (9062): running EOU detection
    audioTranscript: ""
[01:07:39.780] DEBUG (9062): skipping EOU detection
[01:07:43.212] INFO (9062): closing agent session due to participant disconnect (disable via `RoomInputOptions.closeOnDisconnect=False`)
    participant: "futuristic-bit"
    reason: "CLIENT_INITIATED"
[01:07:43.214] DEBUG (9062): Task.runTask: task AgentActivity_onExit started
[01:07:43.215] DEBUG (9062): Task.runTask: task AgentActivity_onExit done
[01:07:43.215] INFO (9062): mainTask: scheduling paused and no more speech tasks to wait
[01:07:43.215] INFO (9062): AgentActivity mainTask: exiting
[01:07:43.217] DEBUG (9062): User turn commit task cancelled
[01:07:43.218] DEBUG (9062): VAD task closed
[01:07:43.218] INFO (9062): AgentSession closed
    reason: "participant_disconnected"
    error: null
[01:07:43.823] INFO (8702): Sarvam STT connecting to: wss://api.sarvam.ai/speech-to-text/ws?model=saaras%3Av3&vad_signals=true&sample_rate=16000&input_audio_codec=pcm_s16le&language-code=en-IN&mode=transcribe
[01:07:44.045] WARN (8702): Failed to connect to Sarvam STT, retrying in 10s: Error: Unexpected server response: 403 (6/32)
[01:07:48.775] DEBUG (8915): shutting down
    jobID: "AJ_aiECbJwNnoZK"
[01:07:48.777] DEBUG (5230): job exiting
[01:07:48.777] DEBUG (8915): Session ended, report generated
    jobId: "AJ_aiECbJwNnoZK"
    roomName: "sbx-2p36os-24hGGH8bL2Pyk2eWFWtVdH"
    roomId: "RM_g4S7opoMJmuG"
    eventsCount: 5
[01:07:48.780] DEBUG (8915): disconnected from room
    jobID: "AJ_aiECbJwNnoZK"
[01:07:48.780] DEBUG (8915): native resources disposed
    jobID: "AJ_aiECbJwNnoZK"
[01:07:48.780] DEBUG (8915): Job process shutdown
    jobID: "AJ_aiECbJwNnoZK"
libc++abi: terminating due to uncaught exception of type std::__1::system_error: mutex lock failed: Invalid argument
[01:07:54.048] INFO (8702): Sarvam STT connecting to: wss://api.sarvam.ai/speech-to-text/ws?model=saaras%3Av3&vad_signals=true&sample_rate=16000&input_audio_codec=pcm_s16le&language-code=en-IN&mode=transcribe
[01:07:54.236] WARN (8702): Failed to connect to Sarvam STT, retrying in 10s: Error: Unexpected server response: 403 (7/32)
[01:08:04.238] INFO (8702): Sarvam STT connecting to: wss://api.sarvam.ai/speech-to-text/ws?model=saaras%3Av3&vad_signals=true&sample_rate=16000&input_audio_codec=pcm_s16le&language-code=en-IN&mode=transcribe
[01:08:04.552] WARN (8702): Failed to connect to Sarvam STT, retrying in 10s: Error: Unexpected server response: 403 (8/32)
[01:08:06.460] DEBUG (9062): shutting down
    jobID: "AJ_Ffb4pJ8SGDkt"
[01:08:06.463] DEBUG (5230): job exiting
[01:08:06.465] DEBUG (9062): Session ended, report generated
    jobId: "AJ_Ffb4pJ8SGDkt"
    roomName: "sbx-2p36os-mePUQ64qMaVC6MUBdBF279"
    roomId: "RM_ahxdmZYhxQNE"
    eventsCount: 14
[01:08:06.467] DEBUG (9062): disconnected from room
    jobID: "AJ_Ffb4pJ8SGDkt"
[01:08:06.468] DEBUG (9062): native resources disposed
    jobID: "AJ_Ffb4pJ8SGDkt"
[01:08:06.468] DEBUG (9062): Job process shutdown
    jobID: "AJ_Ffb4pJ8SGDkt"
libc++abi: terminating due to uncaught exception of type std::__1::system_error: mutex lock failed: Invalid argument
[01:08:14.555] INFO (8702): Sarvam STT connecting to: wss://api.sarvam.ai/speech-to-text/ws?model=saaras%3Av3&vad_signals=true&sample_rate=16000&input_audio_codec=pcm_s16le&language-code=en-IN&mode=transcribe
[01:08:14.725] WARN (8702): Failed to connect to Sarvam STT, retrying in 10s: Error: Unexpected server response: 403 (9/32)
[01:08:24.730] INFO (8702): Sarvam STT connecting to: wss://api.sarvam.ai/speech-to-text/ws?model=saaras%3Av3&vad_signals=true&sample_rate=16000&input_audio_codec=pcm_s16le&language-code=en-IN&mode=transcribe
[01:08:25.043] WARN (8702): Failed to connect to Sarvam STT, retrying in 10s: Error: Unexpected server response: 403 (10/32)
[01:08:35.045] INFO (8702): Sarvam STT connecting to: wss://api.sarvam.ai/speech-to-text/ws?model=saaras%3Av3&vad_signals=true&sample_rate=16000&input_audio_codec=pcm_s16le&language-code=en-IN&mode=transcribe
[01:08:35.221] WARN (8702): Failed to connect to Sarvam STT, retrying in 10s: Error: Unexpected server response: 403 (11/32)
[01:08:37.789] WARN (5230): job is unresponsive
[01:08:37.792] DEBUG (8702): SIGTERM received in job proc
    jobID: "AJ_ReguMk5nm3zw"
[01:08:45.223] INFO (8702): Sarvam STT connecting to: wss://api.sarvam.ai/speech-to-text/ws?model=saaras%3Av3&vad_signals=true&sample_rate=16000&input_audio_codec=pcm_s16le&language-code=en-IN&mode=transcribe
[01:08:45.403] WARN (8702): Failed to connect to Sarvam STT, retrying in 10s: Error: Unexpected server response: 403 (12/32)
[01:08:55.404] INFO (8702): Sarvam STT connecting to: wss://api.sarvam.ai/speech-to-text/ws?model=saaras%3Av3&vad_signals=true&sample_rate=16000&input_audio_codec=pcm_s16le&language-code=en-IN&mode=transcribe
[01:08:55.585] WARN (8702): Failed to connect to Sarvam STT, retrying in 10s: Error: Unexpected server response: 403 (13/32)
[01:09:05.587] INFO (8702): Sarvam STT connecting to: wss://api.sarvam.ai/speech-to-text/ws?model=saaras%3Av3&vad_signals=true&sample_rate=16000&input_audio_codec=pcm_s16le&language-code=en-IN&mode=transcribe
[01:09:05.761] WARN (8702): Failed to connect to Sarvam STT, retrying in 10s: Error: Unexpected server response: 403 (14/32)
[01:09:15.763] INFO (8702): Sarvam STT connecting to: wss://api.sarvam.ai/speech-to-text/ws?model=saaras%3Av3&vad_signals=true&sample_rate=16000&input_audio_codec=pcm_s16le&language-code=en-IN&mode=transcribe
[01:09:15.943] WARN (8702): Failed to connect to Sarvam STT, retrying in 10s: Error: Unexpected server response: 403 (15/32)
[01:09:25.945] INFO (8702): Sarvam STT connecting to: wss://api.sarvam.ai/speech-to-text/ws?model=saaras%3Av3&vad_signals=true&sample_rate=16000&input_audio_codec=pcm_s16le&language-code=en-IN&mode=transcribe
[01:09:26.126] WARN (8702): Failed to connect to Sarvam STT, retrying in 10s: Error: Unexpected server response: 403 (16/32)
[01:09:36.127] INFO (8702): Sarvam STT connecting to: wss://api.sarvam.ai/speech-to-text/ws?model=saaras%3Av3&vad_signals=true&sample_rate=16000&input_audio_codec=pcm_s16le&language-code=en-IN&mode=transcribe
[01:09:36.309] WARN (8702): Failed to connect to Sarvam STT, retrying in 10s: Error: Unexpected server response: 403 (17/32)
[01:09:46.313] INFO (8702): Sarvam STT connecting to: wss://api.sarvam.ai/speech-to-text/ws?model=saaras%3Av3&vad_signals=true&sample_rate=16000&input_audio_codec=pcm_s16le&language-code=en-IN&mode=transcribe
[01:09:46.585] WARN (8702): Failed to connect to Sarvam STT, retrying in 10s: Error: Unexpected server response: 403 (18/32)
[01:09:56.589] INFO (8702): Sarvam STT connecting to: wss://api.sarvam.ai/speech-to-text/ws?model=saaras%3Av3&vad_signals=true&sample_rate=16000&input_audio_codec=pcm_s16le&language-code=en-IN&mode=transcribe
[01:09:56.766] WARN (8702): Failed to connect to Sarvam STT, retrying in 10s: Error: Unexpected server response: 403 (19/32)
[01:10:06.768] INFO (8702): Sarvam STT connecting to: wss://api.sarvam.ai/speech-to-text/ws?model=saaras%3Av3&vad_signals=true&sample_rate=16000&input_audio_codec=pcm_s16le&language-code=en-IN&mode=transcribe
[01:10:06.954] WARN (8702): Failed to connect to Sarvam STT, retrying in 10s: Error: Unexpected server response: 403 (20/32)
[01:10:16.957] INFO (8702): Sarvam STT connecting to: wss://api.sarvam.ai/speech-to-text/ws?model=saaras%3Av3&vad_signals=true&sample_rate=16000&input_audio_codec=pcm_s16le&language-code=en-IN&mode=transcribe
[01:10:17.157] WARN (8702): Failed to connect to Sarvam STT, retrying in 10s: Error: Unexpected server response: 403 (21/32)
[01:10:27.163] INFO (8702): Sarvam STT connecting to: wss://api.sarvam.ai/speech-to-text/ws?model=saaras%3Av3&vad_signals=true&sample_rate=16000&input_audio_codec=pcm_s16le&language-code=en-IN&mode=transcribe
[01:10:27.353] WARN (8702): Failed to connect to Sarvam STT, retrying in 10s: Error: Unexpected server response: 403 (22/32)

From what I can tell, the root cause is that each plugin's SpeechStream.run() contains its own hardcoded maxRetry = 32 loop inside mainTask(). Because of this inner retry loop, the retry logic defined by FallbackAdapter through connOptions.maxRetry is effectively bypassed. In other words, connOptions.maxRetry is never respected by the plugin's internal retry logic.

I am unsure about the fix here. One option might be to remove the inner retry loop entirely and let mainTask() handle retries based on connOptions.maxRetry. But before making that change, I wanted to get your thoughts on the correct direction.

Would appreciate your guidance on how this should ideally be structured. I would also suggest running it once locally, as the issue becomes very clear when reproduced.

@gokuljs
Copy link
Contributor Author

gokuljs commented Mar 10, 2026

For testing this, I modified the BasicAgent example to use both Sarvam and OpenAI STT. I intentionally did not provide the API key for Sarvam to trigger the failure and test the fallback behavior. If needed, I can also share the test code.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant