Skip to content

[BUG] KeepAliveScheduler does not evict sessions after ping failure, causing unbounded CPU growth #1022

@lxq19991111

Description

@lxq19991111

Description

When keep-alive-interval is configured on a Streamable HTTP MCP Server, KeepAliveScheduler periodically sends ping messages to all registered McpStreamableServerSession instances. When a session's stream is unavailable (client already disconnected), the ping fails and a WARN is logged, but the session is never removed from the session map. Over time, dead sessions accumulate indefinitely, causing increasing CPU usage as the scheduler iterates over a growing list of unreachable sessions every interval.

Environment

  • Spring AI: 1.1.4
  • MCP Java SDK: (bundled with Spring AI 1.1.4)
  • Java: JDK 25
  • Server: Tomcat (embedded via Spring Boot 3.4.1)
  • Transport: Streamable HTTP (spring.ai.mcp.server.protocol=STREAMABLE)

Configuration

spring:
  ai:
    mcp:
      server:
        type: SYNC
        protocol: STREAMABLE
        streamable-http:
          mcp-endpoint: /mcp
          keep-alive-interval: 30s

Steps to Reproduce

  1. Deploy an MCP Server with Streamable HTTP transport and keep-alive-interval: 30s
  2. Have multiple MCP clients connect, call tools, then disconnect (close TCP)
  3. Wait a few minutes and observe logs and CPU usage

Observed Behavior

Logs fill with repeated warnings every 30 seconds, one per dead session:

WARN [io.modelcontextprotocol.util.KeepAliveScheduler]
Failed to send keep-alive ping to session McpStreamableServerSession@3656e950: Stream unavailable for session d0a2a5b9-d427-49f6-8f0b-6c37293e2e35

WARN [io.modelcontextprotocol.util.KeepAliveScheduler]
Failed to send keep-alive ping to session McpStreamableServerSession@917a226: Stream unavailable for session 5e87f163-f127-4793-9fec-cd35b6a59327

WARN [io.modelcontextprotocol.util.KeepAliveScheduler]
Failed to send keep-alive ping to session McpStreamableServerSession@5795b3a2: Stream unavailable for session ...
  • Dead sessions are never removed from the internal map
  • CPU usage grows linearly with the number of accumulated dead sessions
  • The boundedElastic thread pool is consumed by failed ping attempts
  • Eventually causes CPU alerts and degrades ability to serve new requests

In our production environment, CPU went from normal to sustained 135%+ within hours of deployment, triggered by ~50 disconnected clients accumulating dead sessions.

Expected Behavior

When a ping fails (stream unavailable), KeepAliveScheduler should:

  1. Increment a failure counter for that session
  2. After N consecutive failures (e.g., 2-3), call session.close() and remove it from the session map
  3. Log the eviction at INFO level

Impact

  • Severity: High — causes progressive CPU degradation in production
  • Combined with the CLOSE-WAIT socket leak issue, can render a server completely unusable
  • No recovery without restart — dead sessions accumulate forever

Workaround

Disable keep-alive entirely:

spring:
  ai:
    mcp:
      server:
        streamable-http:
          keep-alive-interval: 0s

This stops the CPU waste but also removes the ability to detect stale sessions proactively.

Suggested Fix

In KeepAliveScheduler, after a failed ping:

// Pseudocode
void sendKeepAlive() {
    Iterator<Map.Entry<String, McpStreamableServerSession>> it = sessions.entrySet().iterator();
    while (it.hasNext()) {
        Map.Entry<String, McpStreamableServerSession> entry = it.next();
        try {
            entry.getValue().sendPing();
            failureCounters.remove(entry.getKey());
        } catch (Exception e) {
            int failures = failureCounters.merge(entry.getKey(), 1, Integer::sum);
            if (failures >= MAX_FAILURES) {
                entry.getValue().close();
                it.remove();
                failureCounters.remove(entry.getKey());
                log.info("Evicted dead session: {}", entry.getKey());
            }
        }
    }
}

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions