fix(openclaw): prevent duplicate viewer instances on gateway restart#1477
fix(openclaw): prevent duplicate viewer instances on gateway restart#1477fancyboi999 wants to merge 2 commits intoMemTensor:mainfrom
Conversation
When OpenClaw calls register() again (deferred reload / gateway restart), the old viewer HTTP server was not stopped. Each re-registration leaked a new server on an incrementing port (18799 → 18800 → 18801…). Root cause: serviceStarted and viewer were closure-scoped inside register(), so a second call created a fresh closure with no knowledge of the previous one. Fix: track active instances at module level, keyed by stateDir. On re-registration with the same stateDir, startServiceCore() now awaits the previous viewer's port release before binding the new one. Also makes ViewerServer.stop() return a Promise so the caller can properly wait for the HTTP server to close. Closes MemTensor#1471
There was a problem hiding this comment.
Pull request overview
This PR fixes a resource leak where repeated register() calls (e.g., gateway restart / deferred reload) would leave the previous Memory Viewer HTTP server running, causing port increments and multiple viewer instances.
Changes:
- Add a module-level
activeInstancesmap keyed bystateDirto detect and replace existing instances on re-registration. - Make
ViewerServer.stop()async so callers can awaitserver.close()and avoid rebind races. - Add an end-to-end test to verify re-registration stops the previous viewer and reuses the same port.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 4 comments.
| File | Description |
|---|---|
| apps/memos-local-openclaw/index.ts | Tracks active instances per stateDir, stops previous instance during startup, awaits viewer shutdown on service stop. |
| apps/memos-local-openclaw/src/viewer/server.ts | Changes stop() to return a Promise and resolves when the HTTP server is fully closed. |
| apps/memos-local-openclaw/tests/shutdown-lifecycle.test.ts | Updates mock viewer to match new async stop() signature. |
| apps/memos-local-openclaw/tests/e2e-duplicate-instance.test.ts | New integration test ensuring duplicate viewer instances aren’t leaked across re-registration. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
- Add stale-instance guard in self-start setTimeout: skip if this instance was already replaced by a newer register() call. - ViewerServer.stop(): add 3s timeout fallback and Node 18.0.x compat (closeIdleConnections + unref when closeAllConnections is missing). - Use explicit error formatting in cleanup catch block. - Bind error handler before listen() in test helper.
|
Hi @fancyboi999 , thanks for the detailed write-up and the well-structured fix! We've reviewed this against our latest branch (openclaw-local-plugin-integration) and the core issue — duplicate viewer instances leaking on re-registration — is already addressed there via a module-level globalRef.__memosLocalPluginActiveService guard, and ViewerServer.stop() has also been made async. After tracing the OpenClaw plugin lifecycle (including the two-phase deferred channel load path), we've confirmed the current implementation handles all real-world restart scenarios correctly. That said, we really appreciate your contribution and the e2e test in particular (e2e-duplicate-instance.test.ts) is a nice addition to our test coverage. We'll look into cherry-picking it. Thanks again, and we'd love to see more contributions from you — feel free to pick up any open issues! |
Description
OpenClaw 在 gateway restart 或 deferred reload 时会重新调用插件的
register(),但旧的 viewer HTTP server 没有关闭。每次重注册都会泄漏一个新的 server 到下一个端口(18799 → 18800 → 18801……),#1471 里的日志就是这样。问题出在
serviceStarted和viewer都是register()闭包里的局部变量。第二次调register()会创建新闭包,旧闭包里的 viewer 就没人管了。改法:在模块层面用 Map 追踪活跃实例,以 stateDir 为 key。
register()进来先查有没有同 stateDir 的旧实例,有的话在startServiceCore()里先 await 把旧 viewer 关掉(等端口真正释放),再启动新的。ViewerServer.stop()改成返回 Promise,这样调用方能等server.close()回调完成,不会出现端口还没释放就尝试 rebind 的竞态。用 stateDir 做 key 而不是全局单例,是因为集成测试里会用不同 stateDir 同时跑多个
register(),全局单例会误杀别人的实例。Related Issue: Fixes #1471
Type of change
How Has This Been Tested?
跑了三层验证:
npm run build— TypeScript 编译通过npx vitest run— 182 个既有测试全绿(排除 main 上已有的 2 个 flaky test)e2e-duplicate-instance.test.ts— 用真实 HTTP server 验证:找一个空闲端口,对同一 stateDir 调两次register()+start(),检查:Checklist