Add dora_monitor: Slack alerting tool for ethrex devnet by edg-l · Pull Request #1 · lambdaclass/ethrex-tooling

edg-l · 2026-05-20T09:07:23Z

Adds dora_monitor/, a Python 3.10+ tool that polls a Dora explorer API and posts Slack alerts when the tracked client (default: ethrex) has issues.

Summary

Detects missed slot proposals, orphaned blocks, non-canonical heads (forks), sync lag past a configurable threshold, beacon status != online, and EL version drift (deploy/rollback detection).
Posts a periodic heartbeat digest (default every 6h) with canonical head, all-client status counts, and per-client detail.
State-change-only alerting with dedup state persisted to JSON; sends a recovery alert when a condition clears.
EL version detection scrapes the /clients/execution HTML page because Dora's /v1/clients/execution JSON endpoint reflects devp2p-crawler connectivity (connected/disconnected), not the UI's Ready/Synchronizing/Offline status. This is documented in the README.
Offline/fork/sync-lag detection uses /api/v1/network/client_head_forks (CL view); an EL-only crash is detected indirectly via the paired beacon's head_slot stalling.

Test plan

Copy config.example.yaml, fill in dora_url and slack_webhook_url, run make dry-run and verify alerts print to stdout without hitting Slack.
Run make dry-run-once against a live Dora instance and check parsed client data looks correct.
Run with a real webhook and confirm a heartbeat digest posts to Slack after --force-heartbeat.
Set a low sync_lag_slots threshold, confirm a sync-lag alert fires and a recovery alert fires when the node catches up.
Run make run for a full poll cycle; verify state JSON is written and dedup prevents duplicate alerts on subsequent runs.
Run --reset-state and confirm alerts re-fire on next tick.

- guard the slot-set trim against last_known_head=0 (previously the cutoff could go negative and silently never trim) - pick canonical fork by client majority instead of highest head_slot (a minority fork can briefly be ahead during a split) - offline alert only on status=offline; synchronizing/optimistic are normal transient states and were over-paging - split Slack messages on line boundaries when they exceed 3800 chars instead of letting Slack silently truncate - distinguish Slack 429 in the error log - cap /clients/execution HTML read at 512KB to bound regex work - clearer error on unknown YAML keys (top-level and under checks:) - minor: docstring noting heartbeat snapshots aren't atomic, simpler dry-run prefix closure, cleaner status check in DoraClient._get

- post heartbeats via Block Kit (header / section / divider / context) instead of one mrkdwn blob; action alerts stay as plain text posts - new send_blocks() on SlackNotifier with text fallback for notifications - collapse online + canonical + distance=0 clients into one bucket; surface outliers (offline, synchronizing, non-canonical, lagging) above the healthy bucket with status emoji per row - status emojis: green/yellow/orange/red circles for online/sync/opt/off - dry-run patches both send and send_blocks; --debug dumps blocks JSON so it can be previewed in Slack's Block Kit Builder

Propagation timing routinely produces transient 1-2 slot leads or lags that the previous code surfaced as fork alerts (and an immediate resolved alert a tick later). Configurable via fork_confirm_ticks (default 3 = ~90s at the default 30s poll), persisted per-client in the dedup state so it survives restarts.

Add dora_monitor tool for ethrex devnet alerts

a76aa87

edg-l marked this pull request as ready for review May 20, 2026 09:08

edg-l added 5 commits May 20, 2026 11:13

Default heartbeat interval to 3h

d3d4fb6

Default heartbeat interval back to 6h

d6ee544

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add dora_monitor: Slack alerting tool for ethrex devnet#1

Add dora_monitor: Slack alerting tool for ethrex devnet#1
edg-l wants to merge 6 commits into
mainfrom
dora-monitor

edg-l commented May 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

edg-l commented May 20, 2026

Summary

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant