feat(realtime): add input guardrails for RealtimeAgent and RealtimeRunConfig#3721
feat(realtime): add input guardrails for RealtimeAgent and RealtimeRunConfig#3721Skyline-9 wants to merge 3 commits into
Conversation
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 1b2c6fc6b1
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| for guardrail in input_guardrails: | ||
| try: | ||
| result = await guardrail.run( | ||
| # TODO (rm) Remove this cast, it's wrong | ||
| cast(Agent[Any], self._current_agent), | ||
| text, | ||
| self._context_wrapper, | ||
| ) | ||
| if result.output.tripwire_triggered: | ||
| triggered_results.append(result) |
There was a problem hiding this comment.
Run realtime input guardrails concurrently
When more than one input guardrail is configured, this loop awaits them serially and only cancels after all earlier guardrails have completed. If a slow/model-backed guardrail comes before one that would trip, the unsafe user transcript can continue generating for that guardrail's entire latency, which largely defeats the forced response cancellation. Please run the input guardrails concurrently or interrupt as soon as the first tripwire result is available.
Useful? React with 👍 / 👎.
| combined_guardrails = self._current_agent.input_guardrails + self._run_config.get( | ||
| "input_guardrails", [] | ||
| ) |
There was a problem hiding this comment.
Snapshot the agent for queued input guardrails
Because this background task re-reads self._current_agent when it eventually runs, a session that calls update_agent() or completes a handoff before the task gets CPU can check agent A's transcript using agent B's input guardrails, or no agent-level guardrails at all. That silently bypasses the guardrails configured on the agent that received the transcribed input; capture the agent/guardrail list when handling the transcription event and pass that snapshot into the task.
Useful? React with 👍 / 👎.
Snapshot the active agent and its input guardrails when the transcription event is handled so a concurrent update_agent()/handoff cannot run a different agent's guardrails, and run the input guardrails concurrently so a slow guardrail cannot delay the forced response cancel.
|
@codex review |
1b2c6fc to
bedd7a9
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: bedd7a9c37
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
|
|
||
| # Run the guardrails concurrently so a slow guardrail cannot delay the forced cancel behind | ||
| # unrelated guardrails, which would let the unsafe turn keep generating. | ||
| results = await asyncio.gather(*(_run_one(guardrail) for guardrail in input_guardrails)) |
There was a problem hiding this comment.
Cancel realtime input on the first tripped guardrail
Fresh evidence: this version now uses asyncio.gather, but gather still waits for every input guardrail to finish before inspecting triggered_results. When one guardrail trips quickly but another model-backed guardrail is slow, the unsafe audio turn keeps generating until the slowest guardrail returns, delaying the forced response.cancel that this feature relies on. Use asyncio.as_completed/task cancellation like the streamed input guardrail path so the session interrupts as soon as the first tripwire result is available.
Useful? React with 👍 / 👎.
Summary
Adds input guardrails to the realtime API, bringing it closer to parity with the non-realtime
Agent/Runner, which already supportsinput_guardrails. Realtime today only supports output guardrails (RealtimeAgent.output_guardrails/RealtimeRunConfig["output_guardrails"]); there is no first-class way to screen the user's transcribed input.What changed:
RealtimeAgent.input_guardrails(appended at the end of the dataclass,default_factory=list) andRealtimeRunConfig["input_guardrails"](NotRequiredTypedDict key).RealtimeInputGuardrailTrippedsession event (appended at the end of theRealtimeSessionEventunion), mirroringRealtimeGuardrailTrippedfield-for-field but typed toInputGuardrailResult.RealtimeSessionruns the combined agent + run-config input guardrails on the completed user transcript (input_audio_transcription_completed), de-duped byid(). It reuses the existing output-guardrail machinery (shared_guardrail_tasksset,_on_guardrail_task_done,_cleanup_guardrail_tasks), soclose()cancels in-flight tasks. On a trip it emitsinput_guardrail_tripped, forcesresponse.cancel, and sends a follow-up user message naming the guardrail.agents.realtime.__init__(__all__) with an import regression test.docs/ref/realtime/events.mdrenders the new event;docs/realtime/guide.mddocuments the feature and disambiguates it from the existing tool-level "input guardrails on function-tool calls".The design deliberately mirrors
_run_output_guardrails(argument order verified againstInputGuardrail.run(self, agent, input, context)) so the behavior and lifecycle are consistent with what maintainers already review.Known limitation (documented, not hidden)
The forced cancel reliably interrupts a response that is already in flight. If a guardrail resolves in the narrow window before any response has been created for the tripped turn, the cancel is a no-op and that response may proceed. Eliminating this window cleanly requires response<->user-item correlation at the model layer (for example a
response_idon turn-started / response-created) so the session can cancel only the tripped turn's response without also cancelling the intentional guardrail-notification response. This limitation is documented in theRealtimeInputGuardrailTrippeddocstring,RealtimeAgent.input_guardrails, and the guide rather than papered over with a heuristic that would cancel the wrong response. Scope is also documented: input guardrails run on transcribed audio only; text sent viasend_messageis not screened. Happy to pursue the model-layer correlation as a follow-up if maintainers prefer.Test plan
tests/realtime/test_session.py::TestInputGuardrailFunctionality, including edge cases:make format,make lint,make typecheck— passmake tests(full) — pass (4797 passed, 2 skipped; serial 27 passed, 5 skipped)make build-docs— pass (newRealtimeInputGuardrailTrippedreference resolves clean)Issue number
Realtime parity with the non-realtime input-guardrail support. Happy to link the relevant tracking issue.
Checks
.agents/skills/code-change-verification/scripts/run.shmake format,make lint,make typecheck,make tests, andmake build-docs)/reviewbefore submitting this PRCompatibility notes
Additive. New fields are appended at the end of
RealtimeAgent(preserving positional compatibility) and are aNotRequiredconfig key; the new event is appended at the end of theRealtimeSessionEventunion. Sessions with no input guardrails configured create no extra tasks per utterance.