Skip to content

fix(voice): add PreemptiveGenerationOptions for fine-grained control#5428

Open
longcw wants to merge 1 commit intomainfrom
longc/preemptive-generation-options
Open

fix(voice): add PreemptiveGenerationOptions for fine-grained control#5428
longcw wants to merge 1 commit intomainfrom
longc/preemptive-generation-options

Conversation

@longcw
Copy link
Copy Markdown
Contributor

@longcw longcw commented Apr 13, 2026

Summary

  • Add PreemptiveGenerationOptions TypedDict with two configurable limits to reduce wasted LLM requests during long user utterances:
    • max_speech_duration (default 10s): skip preemptive generation when user has been speaking too long
    • max_retries (default 3): cap speculative LLM requests per user turn, resets on turn completion
  • The preemptive_generation parameter now accepts bool | PreemptiveGenerationOptions, fully backward compatible

Add configurable limits to preemptive generation to reduce wasted LLM
requests during long user utterances:

- max_speech_duration (default 10s): skip preemptive generation when
  user has been speaking longer than this threshold
- max_retries (default 3): cap speculative LLM requests per user turn,
  counter resets when the turn completes

The preemptive_generation parameter now accepts bool | PreemptiveGenerationOptions,
keeping backward compatibility (True/False still works).
@chenghao-mou chenghao-mou requested a review from a team April 13, 2026 06:16
Copy link
Copy Markdown
Contributor

@devin-ai-integration devin-ai-integration bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 1 potential issue.

View 3 additional findings in Devin Review.

Open in Devin Review

Comment on lines 1783 to +1794
self._cancel_preemptive_generation()

if (
info.started_speaking_at is not None
and time.time() - info.started_speaking_at > preemptive_opts["max_speech_duration"]
):
return

if self._preemptive_generation_count >= preemptive_opts["max_retries"]:
return

self._preemptive_generation_count += 1
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Existing preemptive generation is cancelled before max_retries check, discarding valid work

In on_preemptive_generation, _cancel_preemptive_generation() is called unconditionally on line 1783 before the max_retries check on line 1791. When _preemptive_generation_count >= max_retries, the method returns early without starting a new generation — but the previous (most recent) preemptive generation has already been cancelled and set to None. This means the last successful preemptive generation is destroyed without replacement. Later, in _user_turn_completed_task at line 1995, self._preemptive_generation is None, so the preemptive result can never be used and a fresh (non-preemptive) LLM call is always made instead. This defeats the purpose of the max_retries limit, which should keep the last generation alive when retries are exhausted.

The fix is to move _cancel_preemptive_generation() after the early-return checks (or at least after the max_retries check), so the existing generation is only cancelled when it will actually be replaced by a new one.

Suggested change
self._cancel_preemptive_generation()
if (
info.started_speaking_at is not None
and time.time() - info.started_speaking_at > preemptive_opts["max_speech_duration"]
):
return
if self._preemptive_generation_count >= preemptive_opts["max_retries"]:
return
self._preemptive_generation_count += 1
if (
info.started_speaking_at is not None
and time.time() - info.started_speaking_at > preemptive_opts["max_speech_duration"]
):
self._cancel_preemptive_generation()
return
if self._preemptive_generation_count >= preemptive_opts["max_retries"]:
return
self._cancel_preemptive_generation()
self._preemptive_generation_count += 1
Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is expected, on_preemptive_generation is called when user transcript changed, so the previous preemptive generation is invalid, we should cancel it asap.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant