fix: resolve memory leak issues across multiple subsystems#14650
fix: resolve memory leak issues across multiple subsystems#14650kryptobaseddev wants to merge 2 commits intoanomalyco:devfrom
Conversation
Addresses 6 of the 10 open issues from the memory-leak work plan
(L-01, L-02, L-03, L-04, I-7046-B, I-7046-D).
L-01 (CRITICAL) — event-reducer.ts: cap accumulated delta string at 1 MB
per part field; drops oldest bytes on overflow rather than growing the
V8 heap without bound. Primary driver of observed 25 GB RSS growth.
L-02 (CRITICAL) — serverSentEvents.gen.ts (v1 + v2): guard the SSE
read loop with a 10 MB MAX_BUFFER_SIZE; throws BufferOverflowError
before the string can grow unbounded during stalled or large streams.
L-03 (HIGH) — mcp/index.ts: replace bare pendingOAuthTransports.set/
delete with helper functions that attach a 5-minute TTL timer per
entry. Stale transports from abandoned or failed OAuth flows are now
automatically evicted. Timer state is cleared on state teardown.
L-04 (LOW) — sdk/js server.ts (v1 + v2): pass { once: true } to the
AbortSignal "abort" listener added during server startup. The listener
is now auto-removed after firing, eliminating long-lived accumulation
on signals that outlive the startup promise.
I-7046-B (HIGH) — lsp/client.ts: enforce a 2 000-file cap on the
diagnostics Map and a 1 000-file cap on the open-files tracking
object. Oldest entries are evicted (FIFO) on overflow; evicted files
re-open cleanly on next access.
I-7046-D (MEDIUM) — permission/next.ts + session/index.ts: add
PermissionNext.clearSession(sessionID) which rejects and removes all
pending permission requests keyed to a session. Session.remove() now
calls clearSession() before the DB delete, so short-lived subagent
sessions no longer leave dangling permission entries.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
… leaks Addresses the 4 remaining high-priority memory leak issues from the work plan (I-9385-A, I-7046-A, PR-14635, I-7046-C partial). I-9385-A (CRITICAL, Priority anomalyco#1) — tool/task.ts: call Session.remove() after extracting subagent task output. This fires the session.deleted event, which triggers cleanupSessionCaches() in the event-reducer — freeing all in-memory messages, parts, diffs, permissions, and status for the subagent session. The task_id in the output becomes a dead reference; if the LLM tries to resume, Session.get() fails gracefully and a fresh session is created. Validated: the cleanup infrastructure already existed but was never invoked for subagent sessions. I-7046-A (CRITICAL, Priority anomalyco#3) — session/compaction.ts: clear part.state.output and part.state.attachments when pruning compacted tool parts. Previously, prune() set time.compacted but left the full output string in both the DB row and the in-memory store. toModelMessages already substituted "[Old tool result content cleared]" for compacted parts — this change aligns stored data with that behavior, freeing the large strings from memory and disk. PR-14635 (HIGH, Priority anomalyco#4) — TUI event listener cleanup: - app.tsx: save the unsubscribe functions returned by all 6 sdk.event.on() calls; call them in a single onCleanup() handler. Previously, onCleanup was not even imported. - routes/session/index.tsx: save and clean up the message.part.updated listener. This component mounts/unmounts during session navigation, so each navigation previously added a duplicate listener. - component/prompt/index.tsx: save and clean up the PromptAppend listener. Same mount/unmount pattern as the session component. I-7046-C (partial) — the TUI event listener fixes above cover the most impactful instances of the missing-dispose pattern. A full audit of all subscribe() call sites remains as follow-up work. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
The following comment was made by an LLM, it may be inaccurate: Based on my search, I found several related PRs that address memory leak issues. Here are the potential duplicates or related work: Related/Potentially Duplicate PRs:
Recommendation: Check PR #13514 and #13594 first, as they address very similar unbounded growth and long-running session memory leak issues. They may be earlier attempts or competing solutions for the same problems being fixed in PR #14650. |
Fixes subagent session deallocation, delta string cap, tool output on compact, SSE buffer guard, MCP OAuth transport TTL, pending permissions cleanup, TUI event listener cleanup, LSP diagnostics cap, and AbortSignal listener. Original work from PR anomalyco#14650 by @kryptobaseddev Co-authored-by: Keaton Hoskins <kryptobaseddev@users.noreply.github.com>
|
Hi @kryptobaseddev, I've been experiencing the memory leak issues and found your excellent fix here. Since this PR has been open for a while, I created a consolidated PR (#15435) that includes your work along with @cgwalters's bash streaming fix from #8953. I've properly attributed all your work with Co-authored-by tags and explicit credit in the PR description. Hope this helps get these important fixes merged faster. Thank you for the thorough investigation and fix! |
Issue for this PR
Closes #9385
Relates to #7046, #9151
Type of change
What does this PR do?
Fixes 10 of the 15 memory leak sources tracked across issues #9385, #7046, #9151, and PR #14635. The two dominant drivers of unbounded RSS growth were subagent sessions never being deallocated (I-9385-A) and message delta strings accumulating without any cap (event-reducer). Together these caused observed GB-scale growth in long-running sessions.
Changes by file:
Subagent deallocation (
tool/task.ts) — CallSession.remove()after extracting the subagent task output. This firessession.deleted, which triggers the existingcleanupSessionCaches()in the event-reducer, freeing all in-memory messages, parts, and status entries for completed subagent sessions. Resume viatask_iddegrades gracefully (creates a fresh session).Delta string cap (
event-reducer.ts) — Cap accumulated part field strings at 1 MB. On overflow the oldest bytes are dropped viaslice(). This was the single largest contributor to absolute heap size.Tool output on compact (
compaction.ts) — Whenprune()compacts tool parts, clearstate.outputandstate.attachmentsfrom both the DB row and in-memory store.toModelMessagesalready substituted placeholder text for compacted parts; this aligns stored data with that behavior.SSE buffer guard (
serverSentEvents.gen.ts, v1 + v2) — Add a 10 MBMAX_BUFFER_SIZEcheck before eachbuffer += value. Throws on overflow instead of growing unbounded during stalled or malformed streams.MCP OAuth transport TTL (
mcp/index.ts) — Replace bareMap.set/deletewith helpers that attach a 5-minute auto-eviction timer per pending OAuth transport entry. Stale entries from failed or abandoned auth flows are cleaned up automatically.Pending permissions cleanup (
permission/next.ts,session/index.ts) — AddPermissionNext.clearSession()which rejects and removes all pending permission requests for a given session. Called fromSession.remove()before the DB delete, so subagent sessions no longer leave dangling permission entries.TUI event listener cleanup (
app.tsx,routes/session/index.tsx,component/prompt/index.tsx) — Save the unsubscribe functions returned by all 8sdk.event.on()calls across these three components and call them inonCleanup(). PreviouslyonCleanupwas not even imported inapp.tsx. The session and prompt components mount/unmount during navigation, so each navigation was adding duplicate listeners.LSP diagnostics cap (
lsp/client.ts) — Enforce a 2,000-file cap on the diagnosticsMapand a 1,000-file cap on the open-files tracking object. FIFO eviction on overflow.AbortSignal listener (
sdk/js server.ts, v1 + v2) — Pass{ once: true }to the abort listener added during startup so it self-removes after firing.How did you verify your code works?
Built a standalone binary (
./packages/opencode/script/build.ts --single) and ran it against a real project for 2.5 hours with continuous subagent tasks, multi-step tool use, and session switching. Monitored RSS every 10 seconds (875 samples).Results:
The previous behavior (per issue reports) showed continuous unbounded growth toward 25+ GB RSS under similar workloads.
Checklist