Description
When keep-alive-interval is configured on a Streamable HTTP MCP Server, KeepAliveScheduler periodically sends ping messages to all registered McpStreamableServerSession instances. When a session's stream is unavailable (client already disconnected), the ping fails and a WARN is logged, but the session is never removed from the session map. Over time, dead sessions accumulate indefinitely, causing increasing CPU usage as the scheduler iterates over a growing list of unreachable sessions every interval.
Environment
- Spring AI: 1.1.4
- MCP Java SDK: (bundled with Spring AI 1.1.4)
- Java: JDK 25
- Server: Tomcat (embedded via Spring Boot 3.4.1)
- Transport: Streamable HTTP (
spring.ai.mcp.server.protocol=STREAMABLE)
Configuration
spring:
ai:
mcp:
server:
type: SYNC
protocol: STREAMABLE
streamable-http:
mcp-endpoint: /mcp
keep-alive-interval: 30s
Steps to Reproduce
- Deploy an MCP Server with Streamable HTTP transport and
keep-alive-interval: 30s
- Have multiple MCP clients connect, call tools, then disconnect (close TCP)
- Wait a few minutes and observe logs and CPU usage
Observed Behavior
Logs fill with repeated warnings every 30 seconds, one per dead session:
WARN [io.modelcontextprotocol.util.KeepAliveScheduler]
Failed to send keep-alive ping to session McpStreamableServerSession@3656e950: Stream unavailable for session d0a2a5b9-d427-49f6-8f0b-6c37293e2e35
WARN [io.modelcontextprotocol.util.KeepAliveScheduler]
Failed to send keep-alive ping to session McpStreamableServerSession@917a226: Stream unavailable for session 5e87f163-f127-4793-9fec-cd35b6a59327
WARN [io.modelcontextprotocol.util.KeepAliveScheduler]
Failed to send keep-alive ping to session McpStreamableServerSession@5795b3a2: Stream unavailable for session ...
- Dead sessions are never removed from the internal map
- CPU usage grows linearly with the number of accumulated dead sessions
- The
boundedElastic thread pool is consumed by failed ping attempts
- Eventually causes CPU alerts and degrades ability to serve new requests
In our production environment, CPU went from normal to sustained 135%+ within hours of deployment, triggered by ~50 disconnected clients accumulating dead sessions.
Expected Behavior
When a ping fails (stream unavailable), KeepAliveScheduler should:
- Increment a failure counter for that session
- After N consecutive failures (e.g., 2-3), call
session.close() and remove it from the session map
- Log the eviction at INFO level
Impact
- Severity: High — causes progressive CPU degradation in production
- Combined with the CLOSE-WAIT socket leak issue, can render a server completely unusable
- No recovery without restart — dead sessions accumulate forever
Workaround
Disable keep-alive entirely:
spring:
ai:
mcp:
server:
streamable-http:
keep-alive-interval: 0s
This stops the CPU waste but also removes the ability to detect stale sessions proactively.
Suggested Fix
In KeepAliveScheduler, after a failed ping:
// Pseudocode
void sendKeepAlive() {
Iterator<Map.Entry<String, McpStreamableServerSession>> it = sessions.entrySet().iterator();
while (it.hasNext()) {
Map.Entry<String, McpStreamableServerSession> entry = it.next();
try {
entry.getValue().sendPing();
failureCounters.remove(entry.getKey());
} catch (Exception e) {
int failures = failureCounters.merge(entry.getKey(), 1, Integer::sum);
if (failures >= MAX_FAILURES) {
entry.getValue().close();
it.remove();
failureCounters.remove(entry.getKey());
log.info("Evicted dead session: {}", entry.getKey());
}
}
}
}
Description
When
keep-alive-intervalis configured on a Streamable HTTP MCP Server,KeepAliveSchedulerperiodically sends ping messages to all registeredMcpStreamableServerSessioninstances. When a session's stream is unavailable (client already disconnected), the ping fails and a WARN is logged, but the session is never removed from the session map. Over time, dead sessions accumulate indefinitely, causing increasing CPU usage as the scheduler iterates over a growing list of unreachable sessions every interval.Environment
spring.ai.mcp.server.protocol=STREAMABLE)Configuration
Steps to Reproduce
keep-alive-interval: 30sObserved Behavior
Logs fill with repeated warnings every 30 seconds, one per dead session:
boundedElasticthread pool is consumed by failed ping attemptsIn our production environment, CPU went from normal to sustained 135%+ within hours of deployment, triggered by ~50 disconnected clients accumulating dead sessions.
Expected Behavior
When a ping fails (stream unavailable),
KeepAliveSchedulershould:session.close()and remove it from the session mapImpact
Workaround
Disable keep-alive entirely:
This stops the CPU waste but also removes the ability to detect stale sessions proactively.
Suggested Fix
In
KeepAliveScheduler, after a failed ping: