Skip to content

fix: clear leaked running loop in MCP client background thread#2111

Open
mkmeral wants to merge 1 commit intostrands-agents:mainfrom
mkmeral:fix/otel-threading-mcp-client
Open

fix: clear leaked running loop in MCP client background thread#2111
mkmeral wants to merge 1 commit intostrands-agents:mainfrom
mkmeral:fix/otel-threading-mcp-client

Conversation

@mkmeral
Copy link
Copy Markdown
Contributor

@mkmeral mkmeral commented Apr 10, 2026

Description

When opentelemetry-instrumentation-threading is active — the documented pattern for AgentCore observability — the ThreadingInstrumentor wraps Thread.run() to propagate trace context. Combined with contextvars.copy_context() in MCPClient.start(), this leaks the parent thread's running event loop reference into the background thread. When _background_task then calls run_until_complete(), asyncio sees a "running" loop already set and raises RuntimeError: Cannot run the event loop while another loop is running.

This adds asyncio._set_running_loop(None) at the top of _background_task() to clear leaked loop state before creating the new event loop. This is the same pattern the bedrock-agentcore team shipped in aws/bedrock-agentcore-sdk-python#405.

Confirmed on SDK versions 1.28.0 and 1.35.0.

Related Issues

N/A

Documentation PR

N/A — no documentation changes needed.

Type of Change

Bug fix

Testing

How have you tested the change? Verify that the changes do not break functionality or introduce warnings in consuming repositories: agents-docs, agents-tools, agents-cli

  • I ran hatch run prepare

Added a unit test that simulates the OTEL leak by setting a running loop before _background_task executes, verifying the method clears it and completes successfully.

Checklist

  • I have read the CONTRIBUTING document
  • I have added any necessary tests that prove my fix is effective or my feature works
  • I have updated the documentation accordingly
  • I have added an appropriate example to the documentation to outline the feature, or no new docs are needed
  • My changes generate no new warnings
  • Any dependent changes have been merged and published

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

When OpenTelemetry's ThreadingInstrumentor is active, Thread.run() is
wrapped to propagate trace context, which leaks the parent thread's
running event loop reference into child threads. This causes
'RuntimeError: Cannot run the event loop while another loop is running'
when _background_task calls run_until_complete().

Add asyncio._set_running_loop(None) at the top of _background_task to
clear any leaked loop state before creating the new event loop. This is
the same pattern used by bedrock-agentcore-sdk-python (PR strands-agents#405).
@mkmeral mkmeral deployed to manual-approval April 10, 2026 23:48 — with GitHub Actions Active
@codecov
Copy link
Copy Markdown

codecov bot commented Apr 10, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

@mkmeral mkmeral enabled auto-merge (squash) April 11, 2026 00:02
This test simulates that scenario by setting a running loop before _background_task runs
and verifying it gets cleared.
"""
import asyncio
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Issue: import asyncio is inside the function body, but the repository coding standards (AGENTS.md) require imports at the top of the file.

Suggestion: Move import asyncio to the top-level imports alongside contextvars and threading.

self._log_debug_with_thread("setting up background task event loop")
# Clear any running-loop state leaked by OpenTelemetry's ThreadingInstrumentor, which wraps Thread.run()
# and can propagate the parent thread's event loop reference, causing run_until_complete() to fail.
asyncio._set_running_loop(None)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Important: asyncio._set_running_loop is a private/internal CPython API (prefixed with _). While it's been stable across Python 3.10–3.13 and is the correct low-level mechanism for this, it's worth noting the risk: this could change in a future CPython release without deprecation notice.

Suggestion: Consider adding a defensive wrapper or at minimum a brief inline comment noting the Python version range where this is known to work, so future maintainers know to verify if the minimum supported Python version changes. For example:

# asyncio._set_running_loop is a CPython internal (stable since 3.6, used by event loops themselves).
# Required to clear loop state leaked by OpenTelemetry's ThreadingInstrumentor.
asyncio._set_running_loop(None)

This is non-blocking — the existing comment is already good context about why, just suggesting adding a note about the private API nature.

@github-actions
Copy link
Copy Markdown

Assessment: Comment

Well-scoped bug fix that addresses a real issue with OpenTelemetry's ThreadingInstrumentor leaking running event loop state into MCP client background threads. The fix is minimal, the approach is validated by the bedrock-agentcore team, and the test effectively simulates the failure scenario.

Review Details
  • Private API usage: asyncio._set_running_loop(None) is a CPython internal — stable and correct for this use case, but worth a brief note for future maintainability.
  • Test import style: import asyncio should be at the module top-level per repository coding standards.

Clean fix with good context in both the PR description and code comments.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants