fix(chat): queue WS sends, replay buffer on initial bind#71
Merged
pufit merged 1 commit intoMay 13, 2026
Merged
Conversation
…lickHouse#67 User-reported glitch: "I ask for something and there is no response, then I ask again and it answers to the previous question." Five reliability PRs (ClickHouse#63 shorthand-schema, ClickHouse#64 synthetic done, ClickHouse#65 stale sdk_session_id, ClickHouse#66 idle timeout, ClickHouse#67 sticky session) each close one underlying cause. Two gaps remain that none of those PRs cover. Gap 1: client-side send silently drops payloads. web/src/api/websocket.ts checked readyState === OPEN and no-op'd otherwise. The 3s reconnect window leaves a hole: send() returns to the caller and chatStore.sendMessage has already optimistically appended the user message and set isStreaming=true. The user thinks the agent is thinking but the message never reached the server, so the next reply lands against a stale prompt. Track readyState explicitly. CONNECTING or reconnect-scheduled now queues the payload (bounded to 5 entries; oldest evicted) and flushes from onopen. CLOSED-without-reconnect and CLOSING return 'dropped' so the caller can revert. chatStore.sendMessage pops the optimistic user message on 'dropped' and surfaces an inline assistant error so the user can retry. Gap 2: gateway initial-bind never replayed the broadcaster buffer. The switch_session handler already shipped session_status with buffered_events on session switch, but the initial-connect handshake at server.py:286-311 didn't. Reload mid-turn (or a transient 3s WS drop) and the in-flight stream was lost from the client's view even though the events sat in broadcaster._session_buffers waiting to be replayed. Lift the duplicated send-status construction into _send_session_status and call it from both branches. Initial-bind gates on broadcaster.is_buffering so idle sessions stay silent; switch_session calls unconditionally so the client refreshes is_running/status on every selection. The frontend handleSessionStatus already restores streamingBlocks, panels, todos, and interaction state from the buffer (handled by ClickHouse#69), so this is purely additive at the gateway. Tests: - 9 new asserts in tests/test_gateway_ws.py covering the helper output, the initial-bind gate, the switch_session regression path, and a load-fidelity check for buffer ordering. - Full pytest run: 444 pass, 2 skip, 2 pre-existing failures unrelated (test_bootstrap docker-env detection and test_cli_upgrade docker mode, both noted in notes/repo-conventions/nerve.md). - web/ tsc --noEmit clean, npm run build clean. Out of scope (followups, not blocking): - Stale-listener cleanup on swallowed send_json exceptions (server.py:298-301). - Application-level message_received ack from engine after sessions.add_message. - _session_locks TTL on session archive.
pufit
approved these changes
May 13, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Two small fixes for the user-visible chat glitch where a question gets no
response and the next reply ends up answering the previous prompt. Five
reliability PRs (#63, #64, #65, #66, #67) each close one underlying
cause; the two remaining gaps live at the client send path and the
gateway initial-bind handshake.
Gap 1: client send silently drops payloads
web/src/api/websocket.tscheckedreadyState === WebSocket.OPENandno-op'd otherwise. The 3-second reconnect window leaves a hole: a
disconnected
send()returns nothing to the caller andchatStore.sendMessagehas already optimistically appended the usermessage and flipped
isStreaming: true. The UI shows "thinking" whilethe message never reached the server, the next message succeeds, and
the user reads the second response as if it answered the first prompt.
The fix:
send()returns'sent' | 'queued' | 'dropped'.CONNECTINGand"reconnect scheduled" go to a bounded
_pendingqueue (5 entries,oldest evicted).
onopendrains the queue in arrival order.chatStore.sendMessagecaptures the return value. On'dropped'the optimistic user message is popped,
isStreamingis cleared,and an inline assistant error explains the failure so the user can
retry.
Gap 2: gateway initial bind never replayed the broadcaster buffer
nerve/gateway/server.py:286-311registers the new listener but doesnot ship
session_statuswithbuffered_events. Theswitch_sessionhandler at 363-379 already does. Reload mid-turn (or the 3s WS reconnect
after a network blip) and the in-flight stream is lost from the
client's view even though
broadcaster._session_buffers[session_id]has every event.
The fix:
session_statusconstruction into_send_session_status(websocket, session_id, is_running, session_record).broadcaster.is_buffering(active_session)is true. Idle sessionsstay silent.
switch_sessioncalls it unconditionally so the client refreshesis_running/statuson every selection.The frontend
handleSessionStatusalready rebuildsstreamingBlocks,panels, todos, and interaction state from
msg.buffered_events(
web/src/stores/handlers/sessionHandlers.ts:22-114, last updated by#69), so this is purely additive at the gateway.
Test plan
python -m pytest tests/test_gateway_ws.py -v(9 new testscovering the helper output, the initial-bind gate, the
switch_sessionregression path, and a buffer-fidelity check).python -m pytest tests/(444 pass, 2 skip, 2 pre-existingfailures unrelated:
test_bootstrapdocker-env detection andtest_cli_upgradedocker mode).npx tsc --noEmitclean inweb/.npm run buildclean inweb/.mid-stream. The in-flight stream should restore (blocks rebuild via
the buffer, todos via fix(web): refresh todos panel from buffered events on reconnect #69,
isStreaming: truereflects live state).reconnect, the buffer replays.
the message flushes (
'queued'path) and the agent runs.submit. UI shows the dropped-message error (
'dropped'path) andthe optimistic state is reverted.
Out of scope (followups, not blocking)
send_jsonexceptions(
server.py:298-301).message_receivedack broadcast from the engineafter
sessions.add_message._session_locksTTL / reclaim on session archive.Notes
[no-changeset: allow](no changeset infrastructure in this repo).