obs(newrelic): P0 log-based alert for unsanctioned destructive DDL on postgres-customers (truehomie DDL trap, D3)#70
Merged
Conversation
… postgres-customers (truehomie DDL trap, task D3) - alerts/customer-db-destructive-ddl.json: CRITICAL FROM Log (metric-based alerting has no live Prometheus pipeline in prod). Balances DROP DATABASE/ROLE/USER/OWNED lines from the postgres-customers pod (log_statement='ddl' trap set 2026-06-03, persists on the PVC) against the provisioner's sanctioned-drop ledger (event=provisioner.drop from server.guardedDrop + pool_reaper, provisioner PR #56), with a 4x DDL-budget per sanctioned shared-pg drop. The truehomie signature (drops with ZERO provisioner.drop events) always pages. Triage runbook in the description. - dashboards/admin-defense.json: new "customer-db DDL trap" page (delta billboard, DDL-vs-sanctioned timeseries, raw trap lines, sanctioned + dropguard-refusal table). Purely additive. - newrelic/CHANGES.md entry (upstream dependency: provisioner PR #56 — merge it first or pool reaps false-positive). Operator apply required (no auto-apply in this repo): newrelic/apply.sh. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
Task D3 deliverable 4 — wires the postgres-customers DDL-logging trap (set during the 2026-06-03 truehomie-db incident) to a P0 NR alert, plus a dashboard page.
newrelic/alerts/customer-db-destructive-ddl.json— CRITICAL, FROM Log (metric-based NR alerting is not live in prod — no Prometheus pipeline). Fires when the postgres-customers pod (k8s_namespace_name='instant-data',k8s_label_app='postgres-customers') logs aDROP DATABASE/ROLE/USER/OWNEDstatement not accounted for by a sanctioned provisioner drop (event=provisioner.drop—server.guardedDropfor RPC drops,pool.deprovisionBacking caller='pool_reaper'for hot-pool reaps) in the same 15-minute window, with a 4× DDL budget per sanctioned shared-pg drop (up to 3 retriedDROP DATABASEattempts + 1DROP USER). The truehomie signature — drops with ZEROprovisioner.dropevents — always pages. Full triage runbook in the condition description.newrelic/dashboards/admin-defense.json— new third page "customer-db DDL trap": unsanctioned-delta billboard, DDL-vs-sanctioned timeseries, raw trap lines table, sanctioned-drops + dropguard-refusals table. Purely additive (106 insertions, 0 deletions).newrelic/CHANGES.md— entry with the upstream dependency.Discovered trap log shape (live, read-only kubectl on do-nyc3-instant-prod)
ALTER SYSTEM SET log_statement='ddl'+log_connections=on(2026-06-03 incident response; persists inpostgresql.auto.confon the PVC — theconnection received/authorizedlines visible in the current pod log prove the settings survived the 2026-06-06 Recreate).2026-06-10 18:55:14.433 UTC [1704166] LOG: statement: DROP DATABASE "db_96edf9eed8ed42929036b63298ec5b2b"(extended-protocol clients logLOG: execute <name>: DROP ...) — the NRQL matches the DROP fragment, not the prefix. 26,777 pod log lines in the last 96h contain ZEROstatement:lines — i.e. no DDL ran in that window; the trap is armed and quiet.Ordering dependency
Provisioner PR #56 (InstaNode-dev/provisioner) adds the
pool_reaperledger entry + dropguard. Merge #56 before applying this alert, or hot-pool reaps of failed postgres items will false-positive.Rule-17 coverage block
Pre-existing finding (not fixed here)
newrelic/tests/apply.test.shon master contains unresolved merge-conflict markers (line 174, from PR #14 squash) and a stale expected-count baseline (33 vs 98 JSON files) — the NR test suite has been un-runnable since that merge. Recommend a follow-up PR.🤖 Generated with Claude Code