Skip to content

Fix #1897: bug(memos-local-plugin): Hermes background model status checks can trigger paid #1899

Open
Memtensor-AI wants to merge 2 commits into
dev-20260604-v2.0.19from
bugfix/autodev-1897
Open

Fix #1897: bug(memos-local-plugin): Hermes background model status checks can trigger paid #1899
Memtensor-AI wants to merge 2 commits into
dev-20260604-v2.0.19from
bugfix/autodev-1897

Conversation

@Memtensor-AI

Copy link
Copy Markdown
Collaborator

Description

Fixes #1897: paid LLM request storm on Hermes when the configured provider returns terminal errors (402 insufficient balance / 401 invalid key / 403). Issue reporter saw ~12,900 paid DeepSeek requests in 24 h tracking exactly the local system_model_status audit-row count.

Root cause: the system_model_status row name is misleading. It is not a health probe; it is the audit row written once per LLM call inside apps/memos-local-plugin/core/llm/client.ts::callWithFallback. With no circuit breaker, every pipeline subscriber (capture / session-relation / reward / L2 / L3 / skill / retrieval LLM filter / world-model) kept firing on every turn, induction, and closed episode, generating one paid request each even though the provider was permanently broken until the operator fixed billing.

Fix: per-LlmClient circuit breaker that trips on terminal provider errors (HTTP 401/402/403 or messages containing "insufficient balance" / "invalid api key" / "unauthorized" / "account suspended" / "billing"). When open, calls short-circuit inside the facade — zero paid requests, no further system_model_status="error" rows. Half-open after a 5-minute cool-down (configurable, min 30 s); the next call probes the provider and either closes the breaker on success or re-opens it on terminal failure. Host fallback rescues a call without tripping the breaker. A coalesced system_model_status="circuit_open" row is emitted at most once per ~25 s so the Logs viewer still surfaces the suppression event without replacing paid spam with audit-row spam. The breaker is enabled by default; legacy behaviour is available via circuitBreaker.enabled = false.

Tests: 9 new vitest cases under apps/memos-local-plugin/tests/unit/llm/client.test.ts covering trip on 402, trip on "insufficient balance" message, no trip on generic transient, coalescing under load, half-open close on success, host-fallback rescues without trip, legacy disabled mode, stats exposure, and re-open on terminal probe failure. All 59 LLM and 28 pipeline tests pass; tsc --noEmit clean. Three full-suite failures (storage migrator / traces-count / e2e episode merge) are pre-existing on the base branch dev-20260604-v2.0.19 and unrelated to this change.

Out of scope (tracked separately): 429 Retry-After handling (issue #1620), per-tool rate limits, daily budget caps.

Related Issue (Required): Fixes #1897

Type of change

Please delete options that are not relevant.

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Refactor (does not change functionality, e.g. code style improvements, linting)
  • Documentation update

How Has This Been Tested?

Automated tests are pending.

  • Unit Test
  • Test Script Or Test Steps (please provide)
  • Pipeline Automated API Test (please provide)

Checklist

  • I have performed a self-review of my own code
  • I have commented my code in hard-to-understand areas
  • I have added tests that prove my fix is effective or that my feature works
  • I have created related documentation issue/PR in MemOS-Docs (if applicable)
  • I have linked the issue to this PR (if applicable)
  • I have mentioned the person who will review this PR

@MatthewZhuang, @CarltonXiang, @syzsunshine219 please review this PR.

Reviewer Checklist

…nal provider errors

Issue #1897 reported ~12,900 paid LLM requests in 24 h on Hermes
against a DeepSeek key with insufficient balance. The local
`system_model_status` row count (12,900) closely tracked the
provider-side `request_count` (11,344) for the same billing window.
The naming is misleading: `system_model_status` is not a health probe;
it is the audit row written once per LLM call (ok / fallback / error)
inside `core/llm/client.ts`. With no circuit breaker, every pipeline
subscriber (capture / session-relation / reward / L2 / L3 / skill /
retrieval LLM filter / world-model) kept firing on every turn / closed
episode / induction, generating one paid request each.

Add a per-`LlmClient` circuit breaker:

- Trips on terminal errors: HTTP 401/402/403 or messages containing
  `insufficient balance` / `invalid api key` / `unauthorized` /
  `account suspended` / `billing`.
- Open: short-circuits subsequent calls inside the facade without
  contacting the provider. Throws `MemosError(LLM_UNAVAILABLE)` with
  `details.circuitOpen=true` so existing catch blocks still work.
- Half-open after cool-down (default 5 min, configurable, min 30 s):
  next call probes the provider; success closes the breaker, terminal
  failure re-opens it for another cool-down.
- Host fallback rescues a call without tripping the breaker —
  fallback exists precisely to keep going when the primary is down.
- Coalesces `system_model_status="circuit_open"` audit rows to at
  most one per ~25 s while the breaker stays open, so we don't
  replace paid spam with audit-row spam.
- Exposes `circuitOpen` / `circuitOpenUntil` / `circuitOpenedReason`
  via `LlmClientStats` for the Overview viewer card.
- Enabled by default; legacy behaviour available via
  `circuitBreaker.enabled = false`.

Tests: 9 new vitest cases under `tests/unit/llm/client.test.ts`
covering trip on 402, trip on "insufficient balance" message, no
trip on generic transient, coalescing, half-open close on success,
host-fallback rescues without trip, disabled mode, stats fields,
and re-open on terminal probe failure. All 59 LLM and 28 pipeline
tests pass; `tsc --noEmit` clean.

Out of scope (tracked separately): 429 `Retry-After` handling
(issue #1620), per-tool rate limits, daily budget caps.
@Memtensor-AI

Copy link
Copy Markdown
Collaborator Author

✅ Automated Test Results: PASSED

All tests passed (35/35 executed, 36 skipped). memos_local_plugin/smoke: 0 passed, 1 skipped, memos_local_plugin/contract: 35 passed, 35 skipped. Duration: 5s

Branch: bugfix/autodev-1897

@hijzy

hijzy commented Jun 12, 2026

Copy link
Copy Markdown
Collaborator

Requesting changes.

The overall direction looks right, but the circuit breaker does not cover the default host-fallback path. fallbackToHost is enabled by default, and when a terminal primary error such as 401/402/403 is rescued by host fallback, the breaker currently stays closed. This means every later pipeline call can still hit the broken paid provider once before falling back, so the request storm can still happen.

Please change the behavior so a terminal primary error trips the breaker even if host fallback succeeds. When the breaker is open, calls may go directly to host fallback, but should not touch the primary provider. Please also add a test for fallbackToHost=true + terminal primary error + host success, verifying that the second call does not invoke the primary provider.


需要修改。

整体方向是对的,但当前断路器没有覆盖默认的 host fallback 路径。fallbackToHost 默认开启,当 primary provider 返回 401/402/403 这类终态错误、但 host fallback 成功时,断路器目前仍保持关闭。这样后续每次 pipeline 调用仍然会先打一次已经异常的付费 provider,再 fallback,仍可能产生请求风暴。

请改成:只要 primary provider 出现终态错误,即使 host fallback 成功,也应该打开断路器。断路器打开后可以直接走 host fallback,但不应再触达 primary provider。同时请补一个测试:fallbackToHost=true + terminal primary error + host success,验证第二次调用不会再调用 primary provider。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ai-generated bug Something isn't working | 功能异常

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants