[BUG] KeepAliveScheduler does not evict sessions after ping failure, causing unbounded CPU growth

## Description

When `keep-alive-interval` is configured on a Streamable HTTP MCP Server, `KeepAliveScheduler` periodically sends ping messages to all registered `McpStreamableServerSession` instances. When a session's stream is unavailable (client already disconnected), the ping fails and a WARN is logged, but **the session is never removed from the session map**. Over time, dead sessions accumulate indefinitely, causing increasing CPU usage as the scheduler iterates over a growing list of unreachable sessions every interval.

## Environment

- Spring AI: 1.1.4
- MCP Java SDK: (bundled with Spring AI 1.1.4)
- Java: JDK 25
- Server: Tomcat (embedded via Spring Boot 3.4.1)
- Transport: Streamable HTTP (`spring.ai.mcp.server.protocol=STREAMABLE`)

## Configuration

```yaml
spring:
  ai:
    mcp:
      server:
        type: SYNC
        protocol: STREAMABLE
        streamable-http:
          mcp-endpoint: /mcp
          keep-alive-interval: 30s
```

## Steps to Reproduce

1. Deploy an MCP Server with Streamable HTTP transport and `keep-alive-interval: 30s`
2. Have multiple MCP clients connect, call tools, then disconnect (close TCP)
3. Wait a few minutes and observe logs and CPU usage

## Observed Behavior

Logs fill with repeated warnings every 30 seconds, one per dead session:

```
WARN [io.modelcontextprotocol.util.KeepAliveScheduler]
Failed to send keep-alive ping to session McpStreamableServerSession@3656e950: Stream unavailable for session d0a2a5b9-d427-49f6-8f0b-6c37293e2e35

WARN [io.modelcontextprotocol.util.KeepAliveScheduler]
Failed to send keep-alive ping to session McpStreamableServerSession@917a226: Stream unavailable for session 5e87f163-f127-4793-9fec-cd35b6a59327

WARN [io.modelcontextprotocol.util.KeepAliveScheduler]
Failed to send keep-alive ping to session McpStreamableServerSession@5795b3a2: Stream unavailable for session ...
```

- Dead sessions are **never** removed from the internal map
- CPU usage grows linearly with the number of accumulated dead sessions
- The `boundedElastic` thread pool is consumed by failed ping attempts
- Eventually causes CPU alerts and degrades ability to serve new requests

In our production environment, CPU went from normal to sustained **135%+** within hours of deployment, triggered by ~50 disconnected clients accumulating dead sessions.

## Expected Behavior

When a ping fails (stream unavailable), `KeepAliveScheduler` should:

1. Increment a failure counter for that session
2. After N consecutive failures (e.g., 2-3), call `session.close()` and remove it from the session map
3. Log the eviction at INFO level

## Impact

- **Severity: High** — causes progressive CPU degradation in production
- Combined with the CLOSE-WAIT socket leak issue, can render a server completely unusable
- No recovery without restart — dead sessions accumulate forever

## Workaround

Disable keep-alive entirely:

```yaml
spring:
  ai:
    mcp:
      server:
        streamable-http:
          keep-alive-interval: 0s
```

This stops the CPU waste but also removes the ability to detect stale sessions proactively.

## Suggested Fix

In `KeepAliveScheduler`, after a failed ping:

```java
// Pseudocode
void sendKeepAlive() {
    Iterator<Map.Entry<String, McpStreamableServerSession>> it = sessions.entrySet().iterator();
    while (it.hasNext()) {
        Map.Entry<String, McpStreamableServerSession> entry = it.next();
        try {
            entry.getValue().sendPing();
            failureCounters.remove(entry.getKey());
        } catch (Exception e) {
            int failures = failureCounters.merge(entry.getKey(), 1, Integer::sum);
            if (failures >= MAX_FAILURES) {
                entry.getValue().close();
                it.remove();
                failureCounters.remove(entry.getKey());
                log.info("Evicted dead session: {}", entry.getKey());
            }
        }
    }
}
```


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] KeepAliveScheduler does not evict sessions after ping failure, causing unbounded CPU growth #1022

Description

Environment

Configuration

Steps to Reproduce

Observed Behavior

Expected Behavior

Impact

Workaround

Suggested Fix

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

[BUG] KeepAliveScheduler does not evict sessions after ping failure, causing unbounded CPU growth #1022

Description

Description

Environment

Configuration

Steps to Reproduce

Observed Behavior

Expected Behavior

Impact

Workaround

Suggested Fix

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions