Skip to content

[codex] bound Rendezvous WebSocket liveness#30643

Open
richardopenai wants to merge 3 commits into
mainfrom
codex/rendezvous-pong-timeout
Open

[codex] bound Rendezvous WebSocket liveness#30643
richardopenai wants to merge 3 commits into
mainfrom
codex/rendezvous-pong-timeout

Conversation

@richardopenai

Copy link
Copy Markdown
Contributor

Summary

  • require a Pong within 60 seconds for established Noise Rendezvous WebSockets on both the harness and executor
  • bound steady-state WebSocket writes and harness event delivery so backpressure cannot mask the deadline
  • classify executor disconnects with bounded reasons and feed them into the existing reconnect metric and structured log
  • cover silent peers, responsive peers, continuous non-Pong traffic, and local application backpressure

Why

The existing periodic Pings did not track Pongs, so a half-open or blackholed connection could remain stuck until the operating system's TCP timeout. This adds the smallest explicit liveness contract without new spans, RTT histograms, feature flags, or TCP diagnostics.

Testing

  • just test -p codex-exec-server on devbox richard-6 — 300 passed, 2 skipped
  • just fix -p codex-exec-server
  • just fmt
  • independent correctness, performance/security, and YAGNI reviews — no findings

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants