fix(mcp): prevent leaked semaphore warnings during MCPServerStdio cleanup#2802
fix(mcp): prevent leaked semaphore warnings during MCPServerStdio cleanup#28024444J99 wants to merge 1 commit intoopenai:mainfrom
Conversation
…ent semaphore leaks When MCPServerStdio.cleanup() tears down via exit_stack.aclose(), the ClientSession exits first (cancelling its task group), then the transport exits. The ClientSession cancellation kills the reader/writer tasks inside the transport before the transport's finally block can close stdin and wait for the subprocess to exit gracefully. This race prevents the subprocess from flushing multiprocessing resources, causing "leaked semaphore" warnings from Python's resource_tracker. Fix: explicitly close the session's write stream before the exit_stack unwind. This delivers EOF to the subprocess's stdin, giving it a window to release its resources and exit cleanly before task groups are cancelled. Also reinitialise the exit_stack and clear server_initialize_result during cleanup so the same server instance can reconnect after cleanup. Closes openai#618
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 4468bf6f4d
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| # Reinitialize the exit stack so the same server instance can | ||
| # reconnect cleanly after cleanup. | ||
| self.exit_stack = AsyncExitStack() |
There was a problem hiding this comment.
Preserve original exit stack when cleanup is interrupted
Reinitializing self.exit_stack in finally discards pending cleanup callbacks even when await self.exit_stack.aclose() is interrupted (for example by CancelledError, which this method explicitly re-raises). In that case the old stack may still hold open transport/process contexts, but after reassignment there is no way to retry closing them, so subprocesses can be orphaned and the semaphore leak warning can persist during shutdown. Only replacing the stack after a successful aclose() avoids this resource-leak path.
Useful? React with 👍 / 👎.
|
Thanks for sharing this. I also explored a viable solution for this, but I found that the sufficient fix requires substantial changes across this SDK. So, I don't think the suggested changes here is enough for the issue. See also: #2573 (comment) |
|
Understood — happy to contribute to the broader fix in #2573 if that's useful. |
Summary
Fixes the "leaked semaphore objects to clean up at shutdown" warning that occurs when using
MCPServerStdiowith MCP servers that use Python multiprocessing internally.Root cause: During
cleanup(), theexit_stack.aclose()unwinds contexts in LIFO order —ClientSession.__aexit__runs first (cancelling its internal task group), then the transport's__aexit__runs. Because theClientSessioncancellation kills thestdout_reader/stdin_writertasks inside the transport, the transport'sfinallyblock — which closes stdin and waits for the subprocess — may never execute cleanly. The subprocess is killed without a chance to flush its multiprocessing resources (semaphores, shared memory, etc.), triggering Python'sresource_trackerwarning.Fix: Before calling
exit_stack.aclose(), explicitly close the session's write stream. This delivers EOF to the subprocess's stdin, giving it a window to release its resources and exit cleanly before the task groups are cancelled. Also reinitialiseexit_stackand clearserver_initialize_resultduring cleanup so the same server instance can reconnect.Test plan
test_cleanup_resets_state_for_reconnection— verifies all session state is cleared and exit stack is replaced after cleanuptest_reconnect_after_cleanup— verifies a server instance can connect, use tools, cleanup, then reconnect and use tools againtest_cleanup_closes_write_stream_before_exit_stack— verifies the write stream close happens before exit stack unwind (the critical ordering invariant)ruff checkandruff formatpassIssue number
Closes #618
Checks
make lintandmake format