Skip to content

feat(observability): resource-count cap metric — alert + rule + tile + catalog (Task #55)#56

Merged
mastermanas805 merged 1 commit into
masterfrom
feat/resource-count-caps-observability
Jun 5, 2026
Merged

feat(observability): resource-count cap metric — alert + rule + tile + catalog (Task #55)#56
mastermanas805 merged 1 commit into
masterfrom
feat/resource-count-caps-observability

Conversation

@mastermanas805

Copy link
Copy Markdown
Member

What

Wires rule-25 monitoring for instant_resource_count_limit_blocked_total{service,team_tier} (api), emitted when the per-service resource-COUNT cap rejects a provision with 402 (Task #55, api PR #263 / common #47).

  • newrelic/alerts/resource-count-limit-blocked.jsonP2 (abuse/observability), WARN on > 20 blocks/h per service+tier.
  • k8s/prometheus-rules.yamlResourceCountCapBlocked rule (instant-api group).
  • newrelic/dashboards/instanode-reliability.json — stacked-bar tile by service+tier.
  • observability/METRICS-CATALOG.md — catalog row (lazy CounterVec).

All artifacts are inert until the operator enables RESOURCE_COUNT_CAPS_ENABLED on the api — the counter has zero series until the first over-cap rejection.

🤖 Generated with Claude Code

…tile + catalog (Task #55)

Wires monitoring for instant_resource_count_limit_blocked_total{service,team_tier}
(api), the metric emitted when the per-service resource-COUNT cap rejects a
provision with 402. Closes the rule-25 gap for Task #55's metric:

- newrelic/alerts/resource-count-limit-blocked.json — P2 (abuse/observability),
  WARN on > 20 blocks/h per service+tier (derivative over 1h).
- k8s/prometheus-rules.yaml — ResourceCountCapBlocked rule (instant-api group).
- newrelic/dashboards/instanode-reliability.json — stacked-bar tile by
  service+tier.
- observability/METRICS-CATALOG.md — catalog row (lazy CounterVec; INERT until
  RESOURCE_COUNT_CAPS_ENABLED).

All artifacts are inert until the operator enables the api flag — the counter
has zero series until the first over-cap rejection.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@mastermanas805 mastermanas805 merged commit 97bf11f into master Jun 5, 2026
3 checks passed
mastermanas805 added a commit that referenced this pull request Jun 10, 2026
… postgres-customers (truehomie DDL trap, task D3) (#70)

- alerts/customer-db-destructive-ddl.json: CRITICAL FROM Log (metric-based
  alerting has no live Prometheus pipeline in prod). Balances DROP
  DATABASE/ROLE/USER/OWNED lines from the postgres-customers pod
  (log_statement='ddl' trap set 2026-06-03, persists on the PVC) against the
  provisioner's sanctioned-drop ledger (event=provisioner.drop from
  server.guardedDrop + pool_reaper, provisioner PR #56), with a 4x DDL-budget
  per sanctioned shared-pg drop. The truehomie signature (drops with ZERO
  provisioner.drop events) always pages. Triage runbook in the description.
- dashboards/admin-defense.json: new "customer-db DDL trap" page (delta
  billboard, DDL-vs-sanctioned timeseries, raw trap lines, sanctioned +
  dropguard-refusal table). Purely additive.
- newrelic/CHANGES.md entry (upstream dependency: provisioner PR #56 —
  merge it first or pool reaps false-positive).

Operator apply required (no auto-apply in this repo): newrelic/apply.sh.

Co-authored-by: Claude Fable 5 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant