Streamable HTTP: Multiple SSE streams cause infinite reconnect loop

## Summary

When multiple SSE streams exist for the same session (e.g. from POST response reconnections), `LocalSessionWorker::resume()` unconditionally replaces `self.common.tx`, killing the other stream's receiver. Both EventSource connections then reconnect every `sse_retry` seconds, leapfrogging each other in an infinite loop that floods the server with GET requests.

**Affected versions**: rmcp 0.14.0, 0.15.0
**Severity**: Critical — causes infinite reconnect loops with clients like Cursor, and breaks server-to-client notifications over Streamable HTTP

## Root Cause

The MCP Streamable HTTP transport sends POST SSE responses with a priming event containing `retry: 3000`. When the POST stream ends (after delivering the response), the browser's EventSource API automatically reconnects via GET. This creates **multiple competing EventSource connections**:

1. The initial standalone GET stream (primary notification channel)
2. Reconnecting GETs from completed POST responses (initialize, tools/list, etc.)

Each reconnecting GET calls `resume()` which unconditionally replaces `self.common.tx`:

```rust
// Before fix — local.rs resume()
None => {
    let (tx, rx) = tokio::sync::mpsc::channel(self.session_config.channel_capacity);
    self.common.tx = tx;  // ← Unconditionally replaces sender, kills other stream
    // ...
}
```

Dropping the old sender closes the old receiver, terminating the OTHER EventSource's stream. That stream then reconnects, replacing the sender again. Both leapfrog every `sse_retry` (3s) indefinitely.

### The Leapfrog Loop

```
1. Client POST initialize → SSE response with priming (retry: 3000) → stream ends
2. Client GET (standalone) → becomes primary common channel (tx1/rx1)
3. POST EventSource reconnects via GET (3s later) → replaces common.tx → kills rx1
4. GET from step 2 reconnects → replaces common.tx → kills stream from step 3
5. Repeat every 3 seconds indefinitely
```

Server logs confirm the pattern — alternating GET requests every 3 seconds with different `Last-Event-ID` values:

```
13:33:51.670  GET Last-Event-ID: 0/2   ← from completed POST response
13:33:54.668  GET Last-Event-ID: 0     ← from killed standalone stream
13:33:57.679  GET Last-Event-ID: 0/2   ← leapfrog
13:34:00.674  GET Last-Event-ID: 0     ← leapfrog
...
```

### Additional Issue: Cache Replay Loop

Even without the leapfrog, `resume()` called `sync()` on the common channel to replay cached events. Replaying server-initiated `list_changed` notifications caused clients to re-process old signals, triggering unnecessary re-fetches every reconnection cycle.

## What Happens in Practice

### Cursor (infinite loop)
- Connects via POST initialize + GET standalone
- POST SSE stream ends → EventSource reconnects via GET
- Two competing streams leapfrog every 3 seconds
- Server flooded with GET requests indefinitely
- Notifications intermittently lost as channels are swapped

### VS Code (silent notification loss)
- Reconnects SSE every ~5 minutes with same session ID
- Each reconnection replaces the channel sender
- Previous stream's receiver is orphaned
- `notify_tool_list_changed().await` returns `Ok(())` — **silent failure**

## Fix: Shadow Channels

PR: https://github.com/modelcontextprotocol/rust-sdk/pull/660

Instead of unconditionally replacing the common channel, check if the primary is still active:

- **Primary dead** (`tx.is_closed()`) → Replace it. New stream becomes primary.
- **Primary alive** → Create a **shadow stream** — an idle SSE connection kept alive by SSE keep-alive pings that does NOT receive notifications and does NOT replace the primary channel.

```rust
fn resume_or_shadow_common(&mut self) -> Result<StreamableHttpMessageReceiver, SessionError> {
    let (tx, rx) = tokio::sync::mpsc::channel(self.session_config.channel_capacity);
    if self.common.tx.is_closed() {
        // Primary is dead — replace it
        self.common.tx = tx;
    } else {
        // Primary is alive — create shadow (idle, keep-alive only)
        self.shadow_txs.push(tx);
    }
    Ok(StreamableHttpMessageReceiver { http_request_id: None, inner: rx })
}
```

### Why Not 409 Conflict?

The initial approach (matching the TypeScript SDK) was to return 409 Conflict on duplicate standalone streams. However:

1. The MCP spec states: *"The client MAY remain connected to multiple SSE streams simultaneously"* — 409 is not spec-compliant
2. 409 causes Cursor to fail entirely on reconnection (500 errors from unhandled Conflict)
3. The reconnecting EventSources are legitimate HTTP requests — they need a valid stream back

Shadow channels are the correct approach: keep all connections alive without interference.

### Why No Cache Replay on Common Channel?

Common channel notifications (`tools/list_changed`, `resources/list_changed`) are idempotent signals. Replaying cached ones causes clients to re-process old events, triggering unnecessary re-fetches or infinite notification loops. Missing one is harmless — the next real event arrives naturally. Request-wise channels still use `sync()` for proper response replay.

## Changes (5 commits)

| Commit | Description |
|--------|-------------|
| `8bd424e` | Initial 409 Conflict approach (returned error on duplicate standalone stream) |
| `0d03eb5` | Handle resume with completed request-wise channels (fall through to common) |
| `a7bb822` | Remove 409 Conflict — allow channel replacement per MCP spec |
| `7cf5406` | Skip cache replay (`sync`) when replacing active streams |
| `a7df58c` | **Shadow channels** — the final fix that prevents the leapfrog loop |

### Files Changed

- `crates/rmcp/src/transport/streamable_http_server/session/local.rs`
  - Added `shadow_txs: Vec<Sender<ServerSseMessage>>` to `LocalSessionWorker`
  - New method `resume_or_shadow_common()` with primary-alive check
  - Updated `resume()` to use shadow logic for both direct common and request-wise fallback paths
  - Removed `sync()` calls on common channel resume
  - Updated `close_sse_stream()` to clear shadow senders
  - Updated `create_local_session()` to initialize `shadow_txs`

## Test Results

- [x] Cursor connects and initializes successfully (no 409/500 errors)
- [x] Cursor does NOT enter infinite GET reconnect loop after connection
- [x] Feature changes trigger exactly one batch of list_changed notifications
- [x] Cursor receives and processes notifications correctly (re-fetches tools/resources)
- [x] No notification replay loop (no repeated ResourceListChanged every 3s)
- [x] VS Code connects and works correctly (unaffected by changes)
- [x] `cargo check --workspace` passes

## Environment

- rmcp 0.15.0 (also affects 0.14.0)
- `StreamableHttpService` with `stateful_mode: true`
- `LocalSessionManager` (default session manager)
- Clients tested: Cursor 2.4.37, VS Code MCP Extension


Commit	Description
`8bd424e`	Initial 409 Conflict approach (returned error on duplicate standalone stream)
`0d03eb5`	Handle resume with completed request-wise channels (fall through to common)
`a7bb822`	Remove 409 Conflict — allow channel replacement per MCP spec
`7cf5406`	Skip cache replay (`sync`) when replacing active streams
`a7df58c`	Shadow channels — the final fix that prevents the leapfrog loop

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Streamable HTTP: Multiple SSE streams cause infinite reconnect loop #659

Summary

Root Cause

The Leapfrog Loop

Additional Issue: Cache Replay Loop

What Happens in Practice

Cursor (infinite loop)

VS Code (silent notification loss)

Fix: Shadow Channels

Why Not 409 Conflict?

Why No Cache Replay on Common Channel?

Changes (5 commits)

Files Changed

Test Results

Environment

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Streamable HTTP: Multiple SSE streams cause infinite reconnect loop #659

Description

Summary

Root Cause

The Leapfrog Loop

Additional Issue: Cache Replay Loop

What Happens in Practice

Cursor (infinite loop)

VS Code (silent notification loss)

Fix: Shadow Channels

Why Not 409 Conflict?

Why No Cache Replay on Common Channel?

Changes (5 commits)

Files Changed

Test Results

Environment

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions