Fix #1897: bug(memos-local-plugin): Hermes background model status checks can trigger paid #1899
Fix #1897: bug(memos-local-plugin): Hermes background model status checks can trigger paid #1899Memtensor-AI wants to merge 2 commits into
Conversation
…nal provider errors Issue #1897 reported ~12,900 paid LLM requests in 24 h on Hermes against a DeepSeek key with insufficient balance. The local `system_model_status` row count (12,900) closely tracked the provider-side `request_count` (11,344) for the same billing window. The naming is misleading: `system_model_status` is not a health probe; it is the audit row written once per LLM call (ok / fallback / error) inside `core/llm/client.ts`. With no circuit breaker, every pipeline subscriber (capture / session-relation / reward / L2 / L3 / skill / retrieval LLM filter / world-model) kept firing on every turn / closed episode / induction, generating one paid request each. Add a per-`LlmClient` circuit breaker: - Trips on terminal errors: HTTP 401/402/403 or messages containing `insufficient balance` / `invalid api key` / `unauthorized` / `account suspended` / `billing`. - Open: short-circuits subsequent calls inside the facade without contacting the provider. Throws `MemosError(LLM_UNAVAILABLE)` with `details.circuitOpen=true` so existing catch blocks still work. - Half-open after cool-down (default 5 min, configurable, min 30 s): next call probes the provider; success closes the breaker, terminal failure re-opens it for another cool-down. - Host fallback rescues a call without tripping the breaker — fallback exists precisely to keep going when the primary is down. - Coalesces `system_model_status="circuit_open"` audit rows to at most one per ~25 s while the breaker stays open, so we don't replace paid spam with audit-row spam. - Exposes `circuitOpen` / `circuitOpenUntil` / `circuitOpenedReason` via `LlmClientStats` for the Overview viewer card. - Enabled by default; legacy behaviour available via `circuitBreaker.enabled = false`. Tests: 9 new vitest cases under `tests/unit/llm/client.test.ts` covering trip on 402, trip on "insufficient balance" message, no trip on generic transient, coalescing, half-open close on success, host-fallback rescues without trip, disabled mode, stats fields, and re-open on terminal probe failure. All 59 LLM and 28 pipeline tests pass; `tsc --noEmit` clean. Out of scope (tracked separately): 429 `Retry-After` handling (issue #1620), per-tool rate limits, daily budget caps.
✅ Automated Test Results: PASSEDAll tests passed (35/35 executed, 36 skipped). memos_local_plugin/smoke: 0 passed, 1 skipped, memos_local_plugin/contract: 35 passed, 35 skipped. Duration: 5s Branch: |
|
Requesting changes. The overall direction looks right, but the circuit breaker does not cover the default host-fallback path. Please change the behavior so a terminal primary error trips the breaker even if host fallback succeeds. When the breaker is open, calls may go directly to host fallback, but should not touch the primary provider. Please also add a test for 需要修改。 整体方向是对的,但当前断路器没有覆盖默认的 host fallback 路径。 请改成:只要 primary provider 出现终态错误,即使 host fallback 成功,也应该打开断路器。断路器打开后可以直接走 host fallback,但不应再触达 primary provider。同时请补一个测试: |
Description
Fixes #1897: paid LLM request storm on Hermes when the configured provider returns terminal errors (402 insufficient balance / 401 invalid key / 403). Issue reporter saw ~12,900 paid DeepSeek requests in 24 h tracking exactly the local
system_model_statusaudit-row count.Root cause: the
system_model_statusrow name is misleading. It is not a health probe; it is the audit row written once per LLM call insideapps/memos-local-plugin/core/llm/client.ts::callWithFallback. With no circuit breaker, every pipeline subscriber (capture / session-relation / reward / L2 / L3 / skill / retrieval LLM filter / world-model) kept firing on every turn, induction, and closed episode, generating one paid request each even though the provider was permanently broken until the operator fixed billing.Fix: per-
LlmClientcircuit breaker that trips on terminal provider errors (HTTP 401/402/403 or messages containing "insufficient balance" / "invalid api key" / "unauthorized" / "account suspended" / "billing"). When open, calls short-circuit inside the facade — zero paid requests, no furthersystem_model_status="error"rows. Half-open after a 5-minute cool-down (configurable, min 30 s); the next call probes the provider and either closes the breaker on success or re-opens it on terminal failure. Host fallback rescues a call without tripping the breaker. A coalescedsystem_model_status="circuit_open"row is emitted at most once per ~25 s so the Logs viewer still surfaces the suppression event without replacing paid spam with audit-row spam. The breaker is enabled by default; legacy behaviour is available viacircuitBreaker.enabled = false.Tests: 9 new vitest cases under
apps/memos-local-plugin/tests/unit/llm/client.test.tscovering trip on 402, trip on "insufficient balance" message, no trip on generic transient, coalescing under load, half-open close on success, host-fallback rescues without trip, legacy disabled mode, stats exposure, and re-open on terminal probe failure. All 59 LLM and 28 pipeline tests pass;tsc --noEmitclean. Three full-suite failures (storage migrator / traces-count / e2e episode merge) are pre-existing on the base branch dev-20260604-v2.0.19 and unrelated to this change.Out of scope (tracked separately): 429
Retry-Afterhandling (issue #1620), per-tool rate limits, daily budget caps.Related Issue (Required): Fixes #1897
Type of change
Please delete options that are not relevant.
How Has This Been Tested?
Automated tests are pending.
Checklist
@MatthewZhuang, @CarltonXiang, @syzsunshine219 please review this PR.
Reviewer Checklist