Skip to content

fix(mcp): harden notification system against race conditions#3168

Open
waleedlatif1 wants to merge 1 commit intostagingfrom
sim-609
Open

fix(mcp): harden notification system against race conditions#3168
waleedlatif1 wants to merge 1 commit intostagingfrom
sim-609

Conversation

@waleedlatif1
Copy link
Collaborator

Summary

  • Guard concurrent connect() calls in connection manager with a connectingServers Set to prevent duplicate client creation
  • Add if (!this.isConnected) return guard in MCP client notification handler to suppress post-disconnect callbacks
  • Call removeAllListeners() on Redis pub/sub clients before quit() in dispose() to prevent listener accumulation
  • Clear connectingServers in dispose() for consistency with other collection cleanup
  • Add 11 tests covering all three fixes + dispose behavior

Type of Change

  • Bug fix

Testing

  • 11 new tests across 3 test files, all passing
  • Full MCP test suite (165 tests) green
  • tsc --noEmit clean

Checklist

  • Code follows project style guidelines
  • Self-reviewed my changes
  • Tests added/updated and passing
  • No new warnings introduced
  • I confirm that I have read and agree to the terms outlined in the Contributor License Agreement (CLA)

@vercel
Copy link

vercel bot commented Feb 8, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

1 Skipped Deployment
Project Deployment Actions Updated (UTC)
docs Skipped Skipped Feb 8, 2026 11:14pm

Request Review

- Guard concurrent connect() calls in connection manager with connectingServers Set
- Suppress post-disconnect notification handler firing in MCP client
- Clean up Redis event listeners in pub/sub dispose()
- Add tests for all three hardening fixes (11 new tests)
@waleedlatif1 waleedlatif1 changed the base branch from main to staging February 8, 2026 23:14
@greptile-apps
Copy link
Contributor

greptile-apps bot commented Feb 8, 2026

Greptile Overview

Greptile Summary

This PR hardens the MCP notification pipeline by (1) extending McpClient to optionally register a notifications/tools/list_changed handler, (2) adding a new McpConnectionManager to maintain persistent connections for servers that declare capabilities.tools.listChanged, with reconnect/backoff + idle cleanup, and (3) adding a Redis-backed mcpPubSub adapter (with a local EventEmitter fallback) to broadcast tools-changed events across processes. It also tweaks the Deploy modal error/warning UI and adds unit tests for the new behavior.

Key integration points are:

  • McpClient.connect() registering the SDK notification handler and forwarding to the connection manager.
  • McpConnectionManager publishing ToolsChangedEvent via mcpPubSub, and also subscribing to pub/sub to fan-in events to process-local listeners (e.g., an SSE endpoint).
  • RedisMcpPubSub.dispose() removing listeners prior to quitting Redis clients to avoid leaks.

Confidence Score: 2/5

  • This PR has a high likelihood of breaking existing MCP client construction and has cleanup semantics that may cause teardown races.
  • The McpClient constructor overload discriminator appears inverted, which can break legacy call sites at runtime (URL required / wrong config shape). Additionally, McpConnectionManager.dispose() does not provide a way for callers to await disconnect completion, which can lead to teardown races and lingering connections in shutdown/test scenarios. Other changes look reasonable and are covered by new tests, but these two issues should be addressed before merging.
  • apps/sim/lib/mcp/client.ts, apps/sim/lib/mcp/connection-manager.ts

Important Files Changed

Filename Overview
apps/sim/app/workspace/[workspaceId]/w/[workflowId]/components/panel/components/deploy/components/deploy-modal/components/api/api.tsx Removes apiDeployError prop and in-component error banner for API deploy panel; errors now handled by parent DeployModal.
apps/sim/app/workspace/[workspaceId]/w/[workflowId]/components/panel/components/deploy/components/deploy-modal/deploy-modal.tsx Renames apiDeployError/apiDeployWarnings to deployError/deployWarnings and renders them as truncating Badges; passes no error prop to ApiDeploy.
apps/sim/app/workspace/[workspaceId]/w/[workflowId]/components/terminal/terminal.tsx Moves top border styling from inner container to fixed aside, keeping layout while adding terminal top border.
apps/sim/lib/mcp/client.test.ts Adds tests capturing SDK notification handler to verify onToolsChanged fires while connected and is suppressed after disconnect.
apps/sim/lib/mcp/client.ts Extends McpClient ctor to accept options with onToolsChanged; registers ToolListChangedNotificationSchema handler after connect and adds hasListChangedCapability/onClose helpers.
apps/sim/lib/mcp/connection-manager.test.ts Adds unit tests ensuring concurrent connect guard prevents duplicate clients and dispose blocks new connections.
apps/sim/lib/mcp/connection-manager.ts Introduces McpConnectionManager maintaining persistent connections, reconnect/backoff, idle cleanup, and forwards tools-changed events through Redis pub/sub.
apps/sim/lib/mcp/pubsub.test.ts Adds tests for RedisMcpPubSub initialization and dispose (removeAllListeners before quit, drop publishes after dispose).
apps/sim/lib/mcp/pubsub.ts Adds Redis-backed MCP pub/sub adapter with local EventEmitter fallback; dispose clears handlers and removes Redis listeners before quit.
apps/sim/lib/mcp/types.ts Adds types for tools-changed callback, connection manager event/state, and McpClientOptions.

Sequence Diagram

sequenceDiagram
  participant MCP as External MCP Server
  participant CM as McpConnectionManager
  participant C as McpClient
  participant PS as mcpPubSub (Redis/Local)
  participant SSE as SSE endpoint (in-process)
  participant UI as Browser/Frontend

  CM->>C: new McpClient({config, onToolsChanged})
  CM->>C: connect()
  C-->>CM: connected
  CM->>C: hasListChangedCapability()
  alt listChanged supported
    CM->>C: onClose(cb)
    MCP-->>C: notifications/tools/list_changed
    C->>CM: onToolsChanged(serverId)
    CM->>PS: publishToolsChanged(event)
    PS-->>CM: onToolsChanged(event) (all processes)
    CM-->>SSE: notifyLocalListeners(event)
    SSE-->>UI: push tools-changed event
  else not supported
    CM->>C: disconnect()
  end

  opt transport closes
    C-->>CM: onClose callback
    CM->>CM: scheduleReconnect(backoff)
    CM->>C: connect() (retry)
  end

  opt dispose
    CM->>C: disconnect() (all)
    CM->>PS: unsubscribe
  end
Loading

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

10 files reviewed, 3 comments

Edit Code Review Agent Settings | Greptile

Comment on lines +60 to +76
constructor(options: McpClientOptions)
constructor(
configOrOptions: McpServerConfig | McpClientOptions,
securityPolicy?: McpSecurityPolicy
) {
if ('config' in configOrOptions) {
this.config = configOrOptions.config
this.securityPolicy = configOrOptions.securityPolicy ?? {
requireConsent: true,
auditLevel: 'basic',
maxToolExecutionsPerHour: 1000,
}
this.onToolsChanged = configOrOptions.onToolsChanged
} else {
this.config = configOrOptions
this.securityPolicy = securityPolicy ?? {
requireConsent: true,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Constructor overload mis-detect

The overload discriminator looks inverted: if ('config' in configOrOptions) will be true for the options object, but the body treats that branch as if it were the legacy (config, securityPolicy?) case (it assigns this.config = configOrOptions.config). In the else branch it assigns this.config = configOrOptions, which will be an options object in the legacy call site and will break (e.g., this.config.url becomes undefined and the URL check throws). This needs to be flipped (treat 'config' in ... as the options path, else as the legacy config path) so existing call sites keep working.

Comment on lines +219 to +256

Promise.allSettled(disconnects).then(() => {
logger.info('Connection manager disposed')
})

this.connections.clear()
this.states.clear()
this.listeners.clear()
this.connectingServers.clear()
}

/**
* Notify only process-local listeners.
* Called by the pub/sub subscription (receives events from all processes).
*/
private notifyLocalListeners(event: ToolsChangedEvent): void {
for (const listener of this.listeners) {
try {
listener(event)
} catch (error) {
logger.error('Error in tools-changed listener:', error)
}
}
}

/**
* Handle a tools/list_changed notification from an external MCP server.
* Publishes to pub/sub so all processes are notified.
*/
private handleToolsChanged(serverId: string): void {
const state = this.states.get(serverId)
if (!state) return

state.lastActivity = Date.now()

const event: ToolsChangedEvent = {
serverId,
serverName: state.serverName,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Async dispose not awaited

dispose() starts client.disconnect() calls and logs inside Promise.allSettled(...), but it returns void immediately and clears connections/states/listeners right away. If any caller expects dispose to complete teardown before proceeding (e.g., tests, process shutdown hooks), this can leave in-flight disconnects running against a manager that has already been cleared, and can also keep Node alive due to open sockets. Consider making dispose() async and await the disconnects (or at least return the Promise.allSettled(...)) so callers can reliably wait for cleanup.

@greptile-apps
Copy link
Contributor

greptile-apps bot commented Feb 8, 2026

Additional Comments (1)

apps/sim/app/workspace/[workspaceId]/w/[workflowId]/components/panel/components/deploy/components/deploy-modal/deploy-modal.tsx
Error message gets truncated

Deploy errors are now rendered in a Badge with truncate, so longer error messages will be cut off with no way to view the full text. This will make debugging failed deployments harder compared to the previous multi-line error box. If truncation is desired for layout, consider adding title={deployError} (and similarly for warnings) or rendering the full message in a non-truncated area.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant