Conversation
Implement pattern-based and instance-based alert mute rules that suppress notifications while continuing to log alerts for auditability. Core: - MuteRule model with AND-logic matching across server, metric, database, query text, wait type, and job name fields - MuteRuleService with JSON persistence (Dashboard) and DuckDB persistence (Lite) - Optional expiration (1h/24h/7d/permanent) with automatic skip on expired rules Alert Pipeline: - Mute checks wired into all 7 alert metrics in both editions - Muted alerts logged with muted=true flag, notifications suppressed - DuckDB schema v20: muted column on config_alert_log, config_mute_rules table UI: - MuteRuleDialog for creating/editing rules with match criteria and expiration - ManageMuteRulesWindow with DataGrid listing, enable/disable/delete/purge - Settings button in both editions to access mute rule management - Right-click context menus in alert history: Mute This Alert / Mute Similar - Muted alerts shown with reduced opacity and italic text in alert history MCP: - get_mute_rules tool added to both editions' MCP servers - MuteRuleService registered in DI for MCP tool resolution Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Right-clicking an alert now shows 'View Details...' at the top of the context menu, opening a read-only popup with all alert fields (time, server, metric, values, notification type, status) and a muted indicator banner. Implemented in both Dashboard and Lite editions. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Store metric-specific detail text (query text, job names, wait types, space breakdowns, etc.) alongside each alert log entry. The Alert Detail Window now displays a Details section with rich contextual information that varies by metric type: - Long-Running Query: session ID, database, query text, CPU/reads/writes - Long-Running Job: job name, duration, average/P95, % of average - Blocking: blocked session count, longest wait duration - Deadlock: new deadlock count - Poison Wait: per-wait type breakdown (avg ms, delta, tasks) - TempDB Space: usage breakdown by category Both Dashboard and Lite editions updated with matching changes. DuckDB schema bumped to v21 for the new detail_text column. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Clarify the difference between 'Mute This Alert' (server + metric) and 'Mute Similar Alerts' (metric only, all servers). Add documentation for the context-sensitive View Details popup window. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Double-clicking a row in the Alert History DataGrid now opens the Alert Detail Window, matching the existing right-click context menu behavior. Both Dashboard and Lite editions updated. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The Enabled checkbox in the Manage Mute Rules DataGrid was non-clickable because the grid had IsReadOnly=True. Changed to IsReadOnly=False on the grid with IsReadOnly=True on individual text columns. Added CellEditEnding handler to persist the toggle via the MuteRuleService. Both editions. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
CellEditEnding fires before the binding commits, so rule.Enabled still held the old value when passed to SetRuleEnabled. Fixed by reading the current state from the service and toggling the inverse instead. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Replaced DataGridCheckBoxColumn with a DataGridTemplateColumn containing a regular CheckBox. The WPF DataGrid requires the first click to enter edit mode on DataGridCheckBoxColumn, making it feel unresponsive. A templated CheckBox with a Click handler responds immediately and persists the change directly via the service. Reverted grid to IsReadOnly=True since the checkbox is now independent of the edit pipeline. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Move BuildBlockingContextAsync and BuildDeadlockContextAsync calls before RecordAlert in Dashboard so the rich context (session IDs, databases, queries, wait times, lock modes) is stored as detail text instead of just summary counts. Falls back to simple text if the context builder returns null. Lite already had this behavior. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
MuteRuleDialog now shows only relevant pattern fields based on the selected metric type: - Blocking: Database, Query Text - Deadlocks: Database - High CPU / TempDB Space: no pattern fields - Poison Wait: Wait Type - Long-Running Query: Database, Query Text, Wait Type - Long-Running Job: Job Name - (any): all fields shown Hidden fields are cleared on save to prevent stale matches. ManageMuteRulesWindow Toggle and Delete buttons now return focus to the rules grid and preserve selection index, enabling repeated operations without re-clicking the list. Both Dashboard and Lite editions updated. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Added missing metrics to the alert thresholds table (poison waits, long-running queries, TempDB space, long-running agent jobs) and documented the three poison wait types: THREADPOOL, RESOURCE_SEMAPHORE, and RESOURCE_SEMAPHORE_QUERY_COMPILE. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Root cause: DataGrid SelectionChanged event bubbled up to the TabControl's SelectionChanged handler, which called RefreshAlerts() on every row click, replacing ItemsSource and clearing selection before the UI could render it. Fixes: - Guard TabControl handler with OriginalSource check to ignore bubbled events from child controls - Use visual tree hit-testing for double-click handlers instead of relying on SelectedItem (more robust in both editions) - Defensive Owner handling for AlertDetailWindow Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
1. Dashboard: Add OriginalSource guard to ServerTabControl_SelectionChanged to prevent child control SelectionChanged events from triggering the handler (same bubbling bug pattern fixed earlier in Lite). 2. Lite: Add re-entrancy guard to ServerTab refresh timer to prevent overlapping RefreshAllDataAsync calls if a refresh takes longer than the 60-second timer interval. Matches Dashboard's _isRefreshing pattern. 3. Dashboard: Add missing DefaultRowStyle to WaitDrillDownWindow resources so the DataGrid's RowStyle reference resolves and rows get the context menu. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
erikdarlingdata
left a comment
There was a problem hiding this comment.
Code Review: Alert Muting
Overall this is well-structured with full parity across both apps. Nice work on the context-sensitive dialog, the auditability (muted alerts still logged), and the MCP integration. A few things to address:
Bugs
1. [BLOCKER] DuckDB migration NOT NULL on ALTER TABLE
Lite/Database/DuckDbInitializer.cs:563 — The v20 migration uses BOOLEAN NOT NULL DEFAULT false, but DuckDB doesn't support NOT NULL on ALTER TABLE ADD COLUMN. There's even a comment in the v9 migration (line 424) warning about this exact issue. The catch block swallows the error so the column never gets added, and subsequent INSERTs that reference muted will fail.
Fix: Change to BOOLEAN DEFAULT false (drop the NOT NULL). The NOT NULL in Schema.CreateAlertLogTable for fresh installs is fine — this only affects the ALTER TABLE migration path.
2. "Mute This Alert" dialog title says "Edit Mute Rule"
Both Dashboard/Controls/AlertsHistoryContent.xaml.cs and Lite/Controls/AlertsHistoryTab.xaml.cs pass a pre-filled MuteRule to the MuteRuleDialog(MuteRule existingRule) constructor. Since existingRule != null, the dialog sets its title to "Edit Mute Rule" instead of "Create Mute Rule".
Fix: Either use the no-arg constructor and set fields after, add a bool isNew parameter, or use the AlertMuteContext constructor path.
3. "Mute This Alert" doesn't pre-fill context fields
Right-clicking an alert only pre-fills server name + metric name, not database/query text/wait type/job name. The AlertMuteContext constructor exists for rich pre-fill but is never called from the UI — the structured fields aren't on the display model. Users have to manually type pattern fields even though the detail info is visible in the alert.
This could be deferred to a follow-up, but worth noting the AlertMuteContext constructor is currently dead code.
Medium
4. Lite MuteRuleService skips the LockedConnection pattern
All DuckDB operations in Lite/Services/MuteRuleService.cs call _dbInitializer.CreateConnection() directly without AcquireReadLock()/AcquireWriteLock(). The established pattern in LocalDataService uses locks to coordinate with CHECKPOINT and archive operations. A concurrent CHECKPOINT during a mute rule write could cause issues.
5. Cache-first, persist-second risks silent data loss
Lite's MuteRuleService mutates the in-memory cache under lock, then does a best-effort async DuckDB write. If the write fails silently, the user's changes are lost on next app restart since LoadAsync() reloads from DuckDB.
Consider: attempt the DB write first, update the cache only on success. Or at minimum log a visible warning when persistence fails.
6. Poison wait mute only checks worst wait type
Lite/MainWindow.xaml.cs:1195 — The mute context only includes worst.WaitType. If a user mutes RESOURCE_SEMAPHORE but THREADPOOL is the worst wait, the mute check evaluates THREADPOOL and the alert fires. If the muted wait happens to be worst, other unmuted waits get suppressed too. This is a design choice but may surprise users.
7. MCP tools gap
get_alert_historyin both apps doesn't include the newmutedordetail_textcolumnsget_mute_rulesis missing from the README tools table (line 411)- README tool counts (lines 49, 278) need incrementing
Low / Cosmetic
8. Hardcoded color #33D97706 on MutedBanner in AlertDetailWindow — verify it looks right on Light/Cool Breeze themes.
9. Dashboard AlertDetailWindow missing Icon attribute (Lite sets /EDD.ico).
10. Dashboard MuteRuleService.Save() swallows all exceptions silently — should at least Logger.Error() them.
11. No automatic purge of expired rules — they accumulate until manual purge. Consider purging on startup in LoadAsync().
What's Done Well
- All 7 alert metrics covered with consistent mute check pattern
- Muted alerts still logged for full auditability
- Thread safety via
lock(_lock)with defensiveToList()copies - All DuckDB queries properly parameterized (no SQL injection)
- Schema migrations idempotent with
IF NOT EXISTS config_mute_rulescorrectly excluded from archivable tables- Context-sensitive dialog hides irrelevant fields per metric type
- Clean settings integration entry point
- Schema test updated correctly
Bug #1 is a blocker for existing Lite users upgrading. The rest can be follow-up items if needed.
Bugs: - Fix DuckDB migration v20 NOT NULL on ALTER TABLE (blocker for upgrades) - Fix Mute This Alert dialog showing 'Edit' title instead of 'Create' - Use AlertMuteContext constructor for proper field pre-fill path Medium: - Lite MuteRuleService now uses AcquireReadLock/AcquireWriteLock for CHECKPOINT coordination (matches LocalDataService pattern) - Restructure to persist-first: attempt DB write before updating cache, log warnings on failure instead of silently swallowing - Add poison wait mute limitation comment in both editions - Add muted and detail_text columns to get_alert_history MCP tool - Add get_mute_rules to README tools table, update tool counts Low/Cosmetic: - MutedBanner uses DynamicResource WarningColor instead of hardcoded color - Add missing Icon attribute to Dashboard AlertDetailWindow - Add Logger.Error() to Dashboard MuteRuleService.Save() catch block - Auto-purge expired mute rules on startup in both editions Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
PR Review Feedback — Changes Made (commit
|
| File | Changes |
|---|---|
Dashboard/AlertDetailWindow.xaml |
Icon, theme-aware MutedBanner |
Dashboard/Controls/AlertsHistoryContent.xaml.cs |
Use AlertMuteContext constructor |
Dashboard/MainWindow.xaml.cs |
Poison wait limitation comment |
Dashboard/Mcp/McpAlertTools.cs |
Add muted + detail_text columns |
Dashboard/Services/MuteRuleService.cs |
Error logging in Save(), auto-purge |
Lite/Controls/AlertsHistoryTab.xaml.cs |
Use AlertMuteContext constructor |
Lite/Database/DuckDbInitializer.cs |
Fix NOT NULL on v20 migration |
Lite/MainWindow.xaml.cs |
Poison wait limitation comment |
Lite/Mcp/McpAlertTools.cs |
Add muted + detail_text columns |
Lite/Services/MuteRuleService.cs |
Locked connections, persist-first, logging, auto-purge |
Lite/Windows/AlertDetailWindow.xaml |
Theme-aware MutedBanner |
README.md |
Tool counts, get_mute_rules in tools table |
Build: 0 errors, 0 new warnings (all warnings are pre-existing). All 6 unit tests pass.
Resolved conflict in Lite/Controls/ServerTab.xaml.cs: kept our re-entrancy guard (_isRefreshing) and adopted upstream's fullRefresh: false parameter for timer-based refreshes. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
What does this PR do?
Adds configurable muting of Alerts. For full description, plan, and concise change-log see #493
Which component(s) does this affect?
How was this tested?
Manually tested both Dashboard and Lite per screenshots shown in #493
Checklist
dotnet build -c Debug)