feat(admin): decouple metric retention window from scrape interval#93
feat(admin): decouple metric retention window from scrape interval#93arnaugiralt wants to merge 3 commits intomasterfrom
Conversation
The per-instance ring buffer was sized at a fixed 360 samples, so lowering scraper.interval silently shrank the visible history (3s interval gave ~18 min instead of the 1h target). Add scraper.retention_window (default 1h, env CHAPERONE_ADMIN_SCRAPER_RETENTION_WINDOW) and compute ring capacity at startup so it actually spans the requested window — ceil(window/interval) + 1, since N snapshots span (N-1)*interval. Floor at 2 so degenerate inputs always leave a usable rate pair. Validation rejects retention_window < interval so the mismatch surfaces at load. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
qarlosh
left a comment
There was a problem hiding this comment.
Two findings — see inline.
| // covers at least `window`. Always returns at least 2 — the minimum needed | ||
| // for a single rate pair — so degenerate inputs never produce an unusable | ||
| // collector. | ||
| func CapacityFor(window, interval time.Duration) int { |
There was a problem hiding this comment.
Out-of-diff observation: historicalPair in admin/metrics/collector.go:265,269 still hardcodes the trend window (50*time.Minute, -1*time.Hour). Side effect of the new flag — with retention_window < 1h (now valid) trends silently never render; with > 1h they still compare against ~1h ago.
Treating the trend window as an independent product decision seems reasonable to me — leaving this note for the record in case it's worth revisiting later (couple trends to retention, or tighten validation). Non-blocking.
There was a problem hiding this comment.
good catch. tricky thing, I ended up coupling trend window to retention window to keep configuration simple
|
|
||
| parseDuration(&cfg.Scraper.Interval, "SCRAPER_INTERVAL", &errs) | ||
| parseDuration(&cfg.Scraper.Timeout, "SCRAPER_TIMEOUT", &errs) | ||
| parseDuration(&cfg.Scraper.RetentionWindow, "SCRAPER_RETENTION_WINDOW", &errs) |
There was a problem hiding this comment.
Docs in docs/guides/admin-portal.md are now out of sync:
- L52–54: YAML example missing
retention_window. - L81–87: env var table missing
CHAPERONE_ADMIN_SCRAPER_RETENTION_WINDOW. - L175: states "The portal retains 360 scrape snapshots per instance ... at 10s intervals is exactly 1 hour of history" — exactly what this PR changes; now misleading.
Add retention_window to the YAML example and env var table, and rewrite the history-retention bullet to reflect that capacity is now computed from scraper.retention_window and scraper.interval at startup. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
historicalPair previously hardcoded a 50-minute readiness threshold and a 1-hour lookback. With retention_window now configurable, this caused trends to silently never render under 1h and to ignore the configured window above 1h. Plumb the trend window through Collector and derive both the threshold and the lookback from it. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Bumps the admin metric ring buffer to actually span the configured retention window. Adds a
scraper.retention_windowconfig (defaults to 1h) and sizes the ring from window/interval at startup so changing the scrape interval no longer silently shrinks the chart history.