Skip to content

feat(realtime): add input guardrails for RealtimeAgent and RealtimeRunConfig#3721

Open
Skyline-9 wants to merge 3 commits into
openai:mainfrom
Skyline-9:feat/realtime-input-guardrails
Open

feat(realtime): add input guardrails for RealtimeAgent and RealtimeRunConfig#3721
Skyline-9 wants to merge 3 commits into
openai:mainfrom
Skyline-9:feat/realtime-input-guardrails

Conversation

@Skyline-9

Copy link
Copy Markdown

Summary

Adds input guardrails to the realtime API, bringing it closer to parity with the non-realtime Agent/Runner, which already supports input_guardrails. Realtime today only supports output guardrails (RealtimeAgent.output_guardrails / RealtimeRunConfig["output_guardrails"]); there is no first-class way to screen the user's transcribed input.

What changed:

  • RealtimeAgent.input_guardrails (appended at the end of the dataclass, default_factory=list) and RealtimeRunConfig["input_guardrails"] (NotRequired TypedDict key).
  • New RealtimeInputGuardrailTripped session event (appended at the end of the RealtimeSessionEvent union), mirroring RealtimeGuardrailTripped field-for-field but typed to InputGuardrailResult.
  • RealtimeSession runs the combined agent + run-config input guardrails on the completed user transcript (input_audio_transcription_completed), de-duped by id(). It reuses the existing output-guardrail machinery (shared _guardrail_tasks set, _on_guardrail_task_done, _cleanup_guardrail_tasks), so close() cancels in-flight tasks. On a trip it emits input_guardrail_tripped, forces response.cancel, and sends a follow-up user message naming the guardrail.
  • Exported from agents.realtime.__init__ (__all__) with an import regression test.
  • Docs: docs/ref/realtime/events.md renders the new event; docs/realtime/guide.md documents the feature and disambiguates it from the existing tool-level "input guardrails on function-tool calls".

The design deliberately mirrors _run_output_guardrails (argument order verified against InputGuardrail.run(self, agent, input, context)) so the behavior and lifecycle are consistent with what maintainers already review.

Known limitation (documented, not hidden)

The forced cancel reliably interrupts a response that is already in flight. If a guardrail resolves in the narrow window before any response has been created for the tripped turn, the cancel is a no-op and that response may proceed. Eliminating this window cleanly requires response<->user-item correlation at the model layer (for example a response_id on turn-started / response-created) so the session can cancel only the tripped turn's response without also cancelling the intentional guardrail-notification response. This limitation is documented in the RealtimeInputGuardrailTripped docstring, RealtimeAgent.input_guardrails, and the guide rather than papered over with a heuristic that would cancel the wrong response. Scope is also documented: input guardrails run on transcribed audio only; text sent via send_message is not screened. Happy to pursue the model-layer correlation as a follow-up if maintainers prefer.

Test plan

  • Added tests/realtime/test_session.py::TestInputGuardrailFunctionality, including edge cases:
    • a raising guardrail is skipped and does not crash the shared guardrail task,
    • raising + tripping guardrails together still produce exactly one interrupt with the tripping guardrail named,
    • a second transcription for an already-tripped item is de-duplicated,
    • no guardrail task is created when none are configured.
  • Ran the standard verification stack from the repo root:
    • make format, make lint, make typecheck — pass
    • make tests (full) — pass (4797 passed, 2 skipped; serial 27 passed, 5 skipped)
    • make build-docs — pass (new RealtimeInputGuardrailTripped reference resolves clean)

Issue number

Realtime parity with the non-realtime input-guardrail support. Happy to link the relevant tracking issue.

Checks

  • I've added new tests, if relevant
  • I've run .agents/skills/code-change-verification/scripts/run.sh
  • I've confirmed all verification steps pass (ran make format, make lint, make typecheck, make tests, and make build-docs)
  • If using Codex, I've run /review before submitting this PR

Compatibility notes

Additive. New fields are appended at the end of RealtimeAgent (preserving positional compatibility) and are a NotRequired config key; the new event is appended at the end of the RealtimeSessionEvent union. Sessions with no input guardrails configured create no extra tasks per utterance.

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 1b2c6fc6b1

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread src/agents/realtime/session.py Outdated
Comment on lines +1294 to +1303
for guardrail in input_guardrails:
try:
result = await guardrail.run(
# TODO (rm) Remove this cast, it's wrong
cast(Agent[Any], self._current_agent),
text,
self._context_wrapper,
)
if result.output.tripwire_triggered:
triggered_results.append(result)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Run realtime input guardrails concurrently

When more than one input guardrail is configured, this loop awaits them serially and only cancels after all earlier guardrails have completed. If a slow/model-backed guardrail comes before one that would trip, the unsafe user transcript can continue generating for that guardrail's entire latency, which largely defeats the forced response cancellation. Please run the input guardrails concurrently or interrupt as soon as the first tripwire result is available.

Useful? React with 👍 / 👎.

Comment thread src/agents/realtime/session.py Outdated
Comment on lines +1277 to +1279
combined_guardrails = self._current_agent.input_guardrails + self._run_config.get(
"input_guardrails", []
)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Snapshot the agent for queued input guardrails

Because this background task re-reads self._current_agent when it eventually runs, a session that calls update_agent() or completes a handoff before the task gets CPU can check agent A's transcript using agent B's input guardrails, or no agent-level guardrails at all. That silently bypasses the guardrails configured on the agent that received the transcribed input; capture the agent/guardrail list when handling the transcription event and pass that snapshot into the task.

Useful? React with 👍 / 👎.

Skyline-9 added 3 commits July 1, 2026 16:31
Snapshot the active agent and its input guardrails when the transcription event is handled so a concurrent update_agent()/handoff cannot run a different agent's guardrails, and run the input guardrails concurrently so a slow guardrail cannot delay the forced response cancel.
@Skyline-9

Copy link
Copy Markdown
Author

@codex review

@Skyline-9 Skyline-9 force-pushed the feat/realtime-input-guardrails branch from 1b2c6fc to bedd7a9 Compare July 1, 2026 23:32

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: bedd7a9c37

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".


# Run the guardrails concurrently so a slow guardrail cannot delay the forced cancel behind
# unrelated guardrails, which would let the unsafe turn keep generating.
results = await asyncio.gather(*(_run_one(guardrail) for guardrail in input_guardrails))

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Cancel realtime input on the first tripped guardrail

Fresh evidence: this version now uses asyncio.gather, but gather still waits for every input guardrail to finish before inspecting triggered_results. When one guardrail trips quickly but another model-backed guardrail is slow, the unsafe audio turn keeps generating until the slowest guardrail returns, delaying the forced response.cancel that this feature relies on. Use asyncio.as_completed/task cancellation like the streamed input guardrail path so the session interrupts as soon as the first tripwire result is available.

Useful? React with 👍 / 👎.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant