bug(memos-local-plugin): Hermes background model status checks can trigger paid LLM request storm

## Summary

I observed a high-volume paid LLM request spike that appears to correlate with MemOS Local Plugin running under Hermes. The strongest signal is that local `api_logs` contain a very large number of `system_model_status` records, close to the paid provider's request count for the same billing window.

This looks like a background model-status / health-check / retry loop may be sending real paid LLM requests instead of a non-billable local/provider availability check.

No API keys, account identifiers, full prompts, or private logs are included here.

## Environment

- Project: `memos-local-plugin`
- Host: Hermes
- Runtime data path: `~/.hermes/memos-plugin/`
- Provider: DeepSeek
- Model: `deepseek-v4-flash`
- Date observed: 2026-06-08

## Evidence

Provider billing export for the dedicated MemOS key:

```text
2026-06-08 UTC
model: deepseek-v4-flash
request_count: 11,344
cost: approx 19.89 CNY
```

Local MemOS database/API log summary:

```text
api_logs total: 14,632
system_model_status: 12,900
system_error: 811
skill_generate: 80
world_model_generate: 40
policy_evolve: 85
memory_add: 231
local log coverage: roughly 2026-06-07 12:00 to 2026-06-08 11:41
```

The suspicious part is `system_model_status = 12,900`, which is very close to the provider-side `request_count = 11,344` for `deepseek-v4-flash`.

During the abnormal window, `system_error` also increased rapidly, with LLM-role errors recurring every few seconds. This suggests the plugin may continue retrying after provider errors such as insufficient balance, rate limit, or transient provider failures.

## Why this seems problematic

A system/model status check should not repeatedly call a paid chat/completions endpoint by default. If `system_model_status` is implemented as a real LLM generation request, it can silently consume paid API quota in the background.

This issue may be related to #1620, where LLM/embedding fetchers retry 429 responses without respecting `Retry-After`, potentially increasing failed requests and wasting paid API calls. In this case, the visible symptom is even more severe because the high-volume tool is `system_model_status`.

## Expected behavior

- `system_model_status` should not perform paid LLM generation by default.
- Health checks should use non-generative checks where possible, such as config validation, lightweight endpoint reachability, or model-list/balance-style provider APIs if available.
- Paid LLM calls should have per-tool rate limits.
- Background jobs should have daily/hourly budget caps.
- 402 / insufficient balance / unauthorized / invalid key should immediately open a circuit breaker instead of retrying.
- 429 should respect `Retry-After`.
- Repeated provider failures should exponentially back off and eventually disable the provider until user action.
- The viewer/logs should clearly distinguish non-billable status checks from billable LLM calls.

## Suggested safeguards

1. Add a circuit breaker for provider errors:

```text
402 / insufficient balance: disable provider immediately
401 / invalid key: disable provider immediately
429: respect Retry-After, then back off
5xx/network errors: bounded retries only
```

2. Add a hard throttle for `system_model_status`, for example:

```text
max 1 status check per 10-30 minutes
never run status checks in a tight loop
never retry status checks every few seconds
```

3. Add per-tool billing guardrails:

```text
system_model_status: disabled or non-billable by default
memory_add / policy_evolve / world_model_generate / skill_generate: bounded per hour/day
```

4. Make paid background calls visible in the UI/logs:

```text
tool_name, provider, model, billable=true/false, retry_count, backoff_ms, circuit_breaker_state
```

## Current workaround

I disabled the MemOS plugin / removed the dedicated DeepSeek key to stop further spend. I can provide redacted query outputs or sampled `system_model_status` rows if needed, but I do not want to post complete private prompts or credentials publicly.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bug(memos-local-plugin): Hermes background model status checks can trigger paid LLM request storm #1897

Summary

Environment

Evidence

Why this seems problematic

Expected behavior

Suggested safeguards

Current workaround

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

bug(memos-local-plugin): Hermes background model status checks can trigger paid LLM request storm #1897

Description

Summary

Environment

Evidence

Why this seems problematic

Expected behavior

Suggested safeguards

Current workaround

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions