Skip to content

echo guard: disable interruptions while AEC warms up#4813

Open
longcw wants to merge 2 commits intomainfrom
longc/echo-guard
Open

echo guard: disable interruptions while AEC warms up#4813
longcw wants to merge 2 commits intomainfrom
longc/echo-guard

Conversation

@longcw
Copy link
Contributor

@longcw longcw commented Feb 13, 2026

when echo_guard_duration is set, it blocks interruptions for a few seconds after the agent starts speaking to allow client to calibrate AEC.

this only blocks the audio input when agent is speaking, and disable the interruption from audio input, session.interrupt still works.

@chenghao-mou chenghao-mou requested a review from a team February 13, 2026 10:06
Copy link
Contributor

@devin-ai-integration devin-ai-integration bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 1 potential issue.

Open in Devin Review

Comment on lines +1229 to +1231
if self._session._echo_guard_remaining_duration > 0:
# disable interruption from audio activity while echo guard is active
return
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Echo guard blocks interruptions even when agent is not speaking (between turns)

The _interrupt_by_audio_activity check at agent_activity.py:1229 only checks _echo_guard_remaining_duration > 0 without verifying the agent is currently speaking. When the agent transitions from "speaking" to "thinking" (e.g., during a tool call or between LLM response and TTS), _cancel_echo_guard_timer preserves the remaining echo guard duration. This causes _interrupt_by_audio_activity to block genuine user interruptions during non-speaking phases where there is no audio output and thus no echo to guard against.

Root Cause and Impact

The push_audio method at agent_activity.py:788-791 correctly gates audio discarding on agent_state == "speaking", but _interrupt_by_audio_activity at agent_activity.py:1229 does not include this check:

# push_audio - correctly checks agent_state
should_discard = ... or (
    self._session.agent_state == "speaking"
    and self._session._echo_guard_remaining_duration > 0
)

# _interrupt_by_audio_activity - missing agent_state check
def _interrupt_by_audio_activity(self) -> None:
    if self._session._echo_guard_remaining_duration > 0:
        return  # blocks even when agent is in "thinking" or "listening" state

Scenario: With echo_guard_duration=3.0, the agent speaks for 1 second then transitions to "thinking". _cancel_echo_guard_timer (agent_session.py:1217-1227) saves 2.0s of remaining duration. During the thinking phase, the user speaks but _interrupt_by_audio_activity returns early because _echo_guard_remaining_duration is 2.0 > 0, even though there's no audio output to cause echo. This blocks the user from interrupting the agent during non-speaking phases.

Impact: Users cannot interrupt the agent during "thinking" or "listening" states while the echo guard has remaining duration, even though echo is only possible during "speaking" state.

Suggested change
if self._session._echo_guard_remaining_duration > 0:
# disable interruption from audio activity while echo guard is active
return
if self._session.agent_state == "speaking" and self._session._echo_guard_remaining_duration > 0:
# disable interruption from audio activity while echo guard is active
return
Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

but may incur extra compute if the user interrupts or revises mid-utterance.
Defaults to ``False``.
echo_guard_duration (float, optional): The duration in seconds that the agent
will ignore user's audio interruptions after the agent starts speaking.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Q: Does this apply to both cases when the session starts:

  1. Agent speaks first
  2. Agent's first response (the user might speak first)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, it only considers agent speaking, no matter who speaking first.

use_tts_aligned_transcript: NotGivenOr[bool] = NOT_GIVEN,
tts_text_transforms: NotGivenOr[Sequence[TextTransforms] | None] = NOT_GIVEN,
preemptive_generation: bool = False,
echo_guard_duration: float | None = None,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should do it by default.
Maybe we shouldn't even have an option for it, it seems like an issue everybody has

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how long does it usually need to warm up the AEC?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants

Comments