fix(mcp): harden notification system against race conditions#3168
fix(mcp): harden notification system against race conditions#3168waleedlatif1 wants to merge 1 commit intostagingfrom
Conversation
|
The latest updates on your projects. Learn more about Vercel for GitHub. |
- Guard concurrent connect() calls in connection manager with connectingServers Set - Suppress post-disconnect notification handler firing in MCP client - Clean up Redis event listeners in pub/sub dispose() - Add tests for all three hardening fixes (11 new tests)
Greptile OverviewGreptile SummaryThis PR hardens the MCP notification pipeline by (1) extending Key integration points are:
Confidence Score: 2/5
Important Files Changed
Sequence DiagramsequenceDiagram
participant MCP as External MCP Server
participant CM as McpConnectionManager
participant C as McpClient
participant PS as mcpPubSub (Redis/Local)
participant SSE as SSE endpoint (in-process)
participant UI as Browser/Frontend
CM->>C: new McpClient({config, onToolsChanged})
CM->>C: connect()
C-->>CM: connected
CM->>C: hasListChangedCapability()
alt listChanged supported
CM->>C: onClose(cb)
MCP-->>C: notifications/tools/list_changed
C->>CM: onToolsChanged(serverId)
CM->>PS: publishToolsChanged(event)
PS-->>CM: onToolsChanged(event) (all processes)
CM-->>SSE: notifyLocalListeners(event)
SSE-->>UI: push tools-changed event
else not supported
CM->>C: disconnect()
end
opt transport closes
C-->>CM: onClose callback
CM->>CM: scheduleReconnect(backoff)
CM->>C: connect() (retry)
end
opt dispose
CM->>C: disconnect() (all)
CM->>PS: unsubscribe
end
|
| constructor(options: McpClientOptions) | ||
| constructor( | ||
| configOrOptions: McpServerConfig | McpClientOptions, | ||
| securityPolicy?: McpSecurityPolicy | ||
| ) { | ||
| if ('config' in configOrOptions) { | ||
| this.config = configOrOptions.config | ||
| this.securityPolicy = configOrOptions.securityPolicy ?? { | ||
| requireConsent: true, | ||
| auditLevel: 'basic', | ||
| maxToolExecutionsPerHour: 1000, | ||
| } | ||
| this.onToolsChanged = configOrOptions.onToolsChanged | ||
| } else { | ||
| this.config = configOrOptions | ||
| this.securityPolicy = securityPolicy ?? { | ||
| requireConsent: true, |
There was a problem hiding this comment.
Constructor overload mis-detect
The overload discriminator looks inverted: if ('config' in configOrOptions) will be true for the options object, but the body treats that branch as if it were the legacy (config, securityPolicy?) case (it assigns this.config = configOrOptions.config). In the else branch it assigns this.config = configOrOptions, which will be an options object in the legacy call site and will break (e.g., this.config.url becomes undefined and the URL check throws). This needs to be flipped (treat 'config' in ... as the options path, else as the legacy config path) so existing call sites keep working.
|
|
||
| Promise.allSettled(disconnects).then(() => { | ||
| logger.info('Connection manager disposed') | ||
| }) | ||
|
|
||
| this.connections.clear() | ||
| this.states.clear() | ||
| this.listeners.clear() | ||
| this.connectingServers.clear() | ||
| } | ||
|
|
||
| /** | ||
| * Notify only process-local listeners. | ||
| * Called by the pub/sub subscription (receives events from all processes). | ||
| */ | ||
| private notifyLocalListeners(event: ToolsChangedEvent): void { | ||
| for (const listener of this.listeners) { | ||
| try { | ||
| listener(event) | ||
| } catch (error) { | ||
| logger.error('Error in tools-changed listener:', error) | ||
| } | ||
| } | ||
| } | ||
|
|
||
| /** | ||
| * Handle a tools/list_changed notification from an external MCP server. | ||
| * Publishes to pub/sub so all processes are notified. | ||
| */ | ||
| private handleToolsChanged(serverId: string): void { | ||
| const state = this.states.get(serverId) | ||
| if (!state) return | ||
|
|
||
| state.lastActivity = Date.now() | ||
|
|
||
| const event: ToolsChangedEvent = { | ||
| serverId, | ||
| serverName: state.serverName, |
There was a problem hiding this comment.
Async dispose not awaited
dispose() starts client.disconnect() calls and logs inside Promise.allSettled(...), but it returns void immediately and clears connections/states/listeners right away. If any caller expects dispose to complete teardown before proceeding (e.g., tests, process shutdown hooks), this can leave in-flight disconnects running against a manager that has already been cleared, and can also keep Node alive due to open sockets. Consider making dispose() async and await the disconnects (or at least return the Promise.allSettled(...)) so callers can reliably wait for cleanup.
Additional Comments (1)
Deploy errors are now rendered in a |
Summary
connect()calls in connection manager with aconnectingServersSet to prevent duplicate client creationif (!this.isConnected) returnguard in MCP client notification handler to suppress post-disconnect callbacksremoveAllListeners()on Redis pub/sub clients beforequit()indispose()to prevent listener accumulationconnectingServersindispose()for consistency with other collection cleanupType of Change
Testing
tsc --noEmitcleanChecklist