Summary
I observed a high-volume paid LLM request spike that appears to correlate with MemOS Local Plugin running under Hermes. The strongest signal is that local api_logs contain a very large number of system_model_status records, close to the paid provider's request count for the same billing window.
This looks like a background model-status / health-check / retry loop may be sending real paid LLM requests instead of a non-billable local/provider availability check.
No API keys, account identifiers, full prompts, or private logs are included here.
Environment
- Project:
memos-local-plugin
- Host: Hermes
- Runtime data path:
~/.hermes/memos-plugin/
- Provider: DeepSeek
- Model:
deepseek-v4-flash
- Date observed: 2026-06-08
Evidence
Provider billing export for the dedicated MemOS key:
2026-06-08 UTC
model: deepseek-v4-flash
request_count: 11,344
cost: approx 19.89 CNY
Local MemOS database/API log summary:
api_logs total: 14,632
system_model_status: 12,900
system_error: 811
skill_generate: 80
world_model_generate: 40
policy_evolve: 85
memory_add: 231
local log coverage: roughly 2026-06-07 12:00 to 2026-06-08 11:41
The suspicious part is system_model_status = 12,900, which is very close to the provider-side request_count = 11,344 for deepseek-v4-flash.
During the abnormal window, system_error also increased rapidly, with LLM-role errors recurring every few seconds. This suggests the plugin may continue retrying after provider errors such as insufficient balance, rate limit, or transient provider failures.
Why this seems problematic
A system/model status check should not repeatedly call a paid chat/completions endpoint by default. If system_model_status is implemented as a real LLM generation request, it can silently consume paid API quota in the background.
This issue may be related to #1620, where LLM/embedding fetchers retry 429 responses without respecting Retry-After, potentially increasing failed requests and wasting paid API calls. In this case, the visible symptom is even more severe because the high-volume tool is system_model_status.
Expected behavior
system_model_status should not perform paid LLM generation by default.
- Health checks should use non-generative checks where possible, such as config validation, lightweight endpoint reachability, or model-list/balance-style provider APIs if available.
- Paid LLM calls should have per-tool rate limits.
- Background jobs should have daily/hourly budget caps.
- 402 / insufficient balance / unauthorized / invalid key should immediately open a circuit breaker instead of retrying.
- 429 should respect
Retry-After.
- Repeated provider failures should exponentially back off and eventually disable the provider until user action.
- The viewer/logs should clearly distinguish non-billable status checks from billable LLM calls.
Suggested safeguards
- Add a circuit breaker for provider errors:
402 / insufficient balance: disable provider immediately
401 / invalid key: disable provider immediately
429: respect Retry-After, then back off
5xx/network errors: bounded retries only
- Add a hard throttle for
system_model_status, for example:
max 1 status check per 10-30 minutes
never run status checks in a tight loop
never retry status checks every few seconds
- Add per-tool billing guardrails:
system_model_status: disabled or non-billable by default
memory_add / policy_evolve / world_model_generate / skill_generate: bounded per hour/day
- Make paid background calls visible in the UI/logs:
tool_name, provider, model, billable=true/false, retry_count, backoff_ms, circuit_breaker_state
Current workaround
I disabled the MemOS plugin / removed the dedicated DeepSeek key to stop further spend. I can provide redacted query outputs or sampled system_model_status rows if needed, but I do not want to post complete private prompts or credentials publicly.
Summary
I observed a high-volume paid LLM request spike that appears to correlate with MemOS Local Plugin running under Hermes. The strongest signal is that local
api_logscontain a very large number ofsystem_model_statusrecords, close to the paid provider's request count for the same billing window.This looks like a background model-status / health-check / retry loop may be sending real paid LLM requests instead of a non-billable local/provider availability check.
No API keys, account identifiers, full prompts, or private logs are included here.
Environment
memos-local-plugin~/.hermes/memos-plugin/deepseek-v4-flashEvidence
Provider billing export for the dedicated MemOS key:
Local MemOS database/API log summary:
The suspicious part is
system_model_status = 12,900, which is very close to the provider-siderequest_count = 11,344fordeepseek-v4-flash.During the abnormal window,
system_erroralso increased rapidly, with LLM-role errors recurring every few seconds. This suggests the plugin may continue retrying after provider errors such as insufficient balance, rate limit, or transient provider failures.Why this seems problematic
A system/model status check should not repeatedly call a paid chat/completions endpoint by default. If
system_model_statusis implemented as a real LLM generation request, it can silently consume paid API quota in the background.This issue may be related to #1620, where LLM/embedding fetchers retry 429 responses without respecting
Retry-After, potentially increasing failed requests and wasting paid API calls. In this case, the visible symptom is even more severe because the high-volume tool issystem_model_status.Expected behavior
system_model_statusshould not perform paid LLM generation by default.Retry-After.Suggested safeguards
system_model_status, for example:Current workaround
I disabled the MemOS plugin / removed the dedicated DeepSeek key to stop further spend. I can provide redacted query outputs or sampled
system_model_statusrows if needed, but I do not want to post complete private prompts or credentials publicly.