Skip to content

fix(sarvam): prevent transcript loss after long agent responses#4798

Open
Nikhils-G wants to merge 4 commits intolivekit:mainfrom
Nikhils-G:fix/sarvam-stt-race-condition
Open

fix(sarvam): prevent transcript loss after long agent responses#4798
Nikhils-G wants to merge 4 commits intolivekit:mainfrom
Nikhils-G:fix/sarvam-stt-race-condition

Conversation

@Nikhils-G
Copy link

When the agent produces a long TTS response (10+ seconds), audio chunks pile up faster than they're sent. Once the audio task drains the buffer and finishes, asyncio.wait with FIRST_COMPLETED immediately cancels the message task — but Sarvam is still processing all that buffered audio and hasn't returned the transcript yet.

The result: the user speaks, Sarvam hears it, but the transcript never makes it back. The agent goes silent and can't respond.

This fix gives the message task up to 30 seconds to receive the transcript before giving up, which matches the worst-case processing time observed in production with long audio buffers.

When the agent produces a long TTS response (10+ seconds), audio
chunks pile up faster than they're sent. Once the audio task drains
the buffer and finishes, asyncio.wait with FIRST_COMPLETED immediately
cancels the message task — but Sarvam is still processing all that
buffered audio and hasn't returned the transcript yet.

The result: the user speaks, Sarvam hears it, but the transcript
never makes it back. The agent goes silent and can't respond.

This fix gives the message task up to 30 seconds to receive the
transcript before giving up, which matches the worst-case processing
time observed in production with long audio buffers.
Copilot AI review requested due to automatic review settings February 12, 2026 12:46
@CLAassistant
Copy link

CLAassistant commented Feb 12, 2026

CLA assistant check
All committers have signed the CLA.

Copy link
Contributor

@devin-ai-integration devin-ai-integration bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 1 potential issue.

View 4 additional findings in Devin Review.

Open in Devin Review

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR addresses a concurrency race in the Sarvam streaming STT connection loop where long agent TTS responses can cause buffered audio to be processed late, and the transcript message task gets cancelled prematurely (leading to dropped transcripts and the agent going silent).

Changes:

  • Adds a 30s grace period for _message_task to receive the final transcript when _audio_task completes first.
  • Refines task cancellation to only cancel tasks still not done after the grace-period logic.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

self._logger.info(
"Transcript received from Sarvam",
extra=self._build_log_context(),
)
Copy link

Copilot AI Feb 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If _message_task completes during the extra 30s wait, it won’t be in the original done set, so any exception from _process_messages can be missed. After the wait, explicitly check/propagate self._message_task.exception() (or update the done set) so failures still surface like they do for tasks in done.

Suggested change
)
)
# Ensure any exception from the message task is propagated
# by including it in the completed tasks set.
done = done | {self._message_task}

Copilot uses AI. Check for mistakes.
Comment on lines 968 to 969
except Exception:
pass
Copy link

Copilot AI Feb 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The except Exception: pass around the wait will also swallow asyncio.CancelledError on Python 3.10 (where it inherits from Exception), preventing proper cancellation/shutdown of _run_connection. Handle CancelledError explicitly (re-raise), and avoid suppressing unexpected exceptions from the wait call (at least log them).

Suggested change
except Exception:
pass
except asyncio.CancelledError:
# Propagate cancellation so the surrounding coroutine can shut down properly
raise
except Exception:
# Log unexpected errors instead of silently swallowing them
self._logger.exception(
"Error while waiting for transcript task",
extra=self._build_log_context(),
)

Copilot uses AI. Check for mistakes.
If the message task fails with an API error while we're waiting for
the transcript, that exception was getting silently swallowed since
it wasn't in the original done set. Now we check for and re-raise
any exception after the 30s wait completes.
Don't swallow asyncio.CancelledError during the transcript grace
period — re-raise it so shutdown works correctly on Python 3.10+.
Also log unexpected exceptions instead of silently dropping them.
Fixed ruff formatting to pass CI.
Copy link
Contributor

@devin-ai-integration devin-ai-integration bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 1 new potential issue.

View 6 additional findings in Devin Review.

Open in Devin Review

No point waiting for a transcript if the audio pipeline broke — the
server won't have anything to transcribe. Check the audio task for
exceptions first and only enter the grace period on clean completion.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants

Comments