Skip to content

bug(memos-local-plugin): Hermes background model status checks can trigger paid LLM request storm #1897

@TianchunGu

Description

@TianchunGu

Summary

I observed a high-volume paid LLM request spike that appears to correlate with MemOS Local Plugin running under Hermes. The strongest signal is that local api_logs contain a very large number of system_model_status records, close to the paid provider's request count for the same billing window.

This looks like a background model-status / health-check / retry loop may be sending real paid LLM requests instead of a non-billable local/provider availability check.

No API keys, account identifiers, full prompts, or private logs are included here.

Environment

  • Project: memos-local-plugin
  • Host: Hermes
  • Runtime data path: ~/.hermes/memos-plugin/
  • Provider: DeepSeek
  • Model: deepseek-v4-flash
  • Date observed: 2026-06-08

Evidence

Provider billing export for the dedicated MemOS key:

2026-06-08 UTC
model: deepseek-v4-flash
request_count: 11,344
cost: approx 19.89 CNY

Local MemOS database/API log summary:

api_logs total: 14,632
system_model_status: 12,900
system_error: 811
skill_generate: 80
world_model_generate: 40
policy_evolve: 85
memory_add: 231
local log coverage: roughly 2026-06-07 12:00 to 2026-06-08 11:41

The suspicious part is system_model_status = 12,900, which is very close to the provider-side request_count = 11,344 for deepseek-v4-flash.

During the abnormal window, system_error also increased rapidly, with LLM-role errors recurring every few seconds. This suggests the plugin may continue retrying after provider errors such as insufficient balance, rate limit, or transient provider failures.

Why this seems problematic

A system/model status check should not repeatedly call a paid chat/completions endpoint by default. If system_model_status is implemented as a real LLM generation request, it can silently consume paid API quota in the background.

This issue may be related to #1620, where LLM/embedding fetchers retry 429 responses without respecting Retry-After, potentially increasing failed requests and wasting paid API calls. In this case, the visible symptom is even more severe because the high-volume tool is system_model_status.

Expected behavior

  • system_model_status should not perform paid LLM generation by default.
  • Health checks should use non-generative checks where possible, such as config validation, lightweight endpoint reachability, or model-list/balance-style provider APIs if available.
  • Paid LLM calls should have per-tool rate limits.
  • Background jobs should have daily/hourly budget caps.
  • 402 / insufficient balance / unauthorized / invalid key should immediately open a circuit breaker instead of retrying.
  • 429 should respect Retry-After.
  • Repeated provider failures should exponentially back off and eventually disable the provider until user action.
  • The viewer/logs should clearly distinguish non-billable status checks from billable LLM calls.

Suggested safeguards

  1. Add a circuit breaker for provider errors:
402 / insufficient balance: disable provider immediately
401 / invalid key: disable provider immediately
429: respect Retry-After, then back off
5xx/network errors: bounded retries only
  1. Add a hard throttle for system_model_status, for example:
max 1 status check per 10-30 minutes
never run status checks in a tight loop
never retry status checks every few seconds
  1. Add per-tool billing guardrails:
system_model_status: disabled or non-billable by default
memory_add / policy_evolve / world_model_generate / skill_generate: bounded per hour/day
  1. Make paid background calls visible in the UI/logs:
tool_name, provider, model, billable=true/false, retry_count, backoff_ms, circuit_breaker_state

Current workaround

I disabled the MemOS plugin / removed the dedicated DeepSeek key to stop further spend. I can provide redacted query outputs or sampled system_model_status rows if needed, but I do not want to post complete private prompts or credentials publicly.

Metadata

Metadata

Assignees

Labels

ai-pr-readyAutoDev tests passed and PR is ready for human review/merge.bugSomething isn't working | 功能异常help wantedExtra attention is needed | 需要社区帮助pluginPlugin/adapter/bridge layer (apps/ directory) | 插件/适配层

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions