Mimir integration by gsanchietti · Pull Request #42 · NethServer/my

gsanchietti · 2026-02-20T10:53:30Z

📋 Description

This pull request adds Alertmanager integration based on Grafana Mimir, backend APIs for alert configuration and inspection, resolved-alert history persistence, automatic HostDown monitoring, and a system-level silence action for active alerts.

Backend API (`/api/alerts`)

GET /api/alerts/config — retrieve the current alerting configuration from Mimir as structured JSON or redacted YAML
POST /api/alerts/config — apply a new alerting configuration
DELETE /api/alerts/config — replace the tenant configuration with a blackhole-only config while keeping the built-in history webhook active
GET /api/alerts — list active alerts with optional filters (state, severity, system_key)
GET /api/alerts/totals — return active alert counters plus resolved-history totals
GET /api/alerts/trend — return resolved-alert trend data for the selected period
GET /api/systems/:id/alerts — list active alerts for a single system
POST /api/systems/:id/alerts/silences — create a silence for a single active system alert
GET /api/systems/:id/alerts/history — return paginated resolved-alert history for a single system

Alerting configuration

AlertingConfig supports global settings, per-severity overrides, and per-system overrides
SMTP settings are injected server-side
The built-in history webhook is always included in the generated Alertmanager config
Email templates are available in English and Italian
Backend access to alerting configuration and active-alert APIs is scoped through the authenticated user plus the organization_id query parameter where required by the current handlers

Collect service

POST /api/alert_history receives Alertmanager webhooks and stores resolved alerts in PostgreSQL
Bearer-token authentication is enforced through ALERTING_HISTORY_WEBHOOK_SECRET
POST /api/services/mimir/alertmanager/api/v2/alerts proxies authenticated systems to Alertmanager with X-Scope-OrgID derived server-side
When a system posts alerts through the collect proxy, labels.system_key is always overwritten with the authenticated system value
Additional system and organization context labels are injected when missing
POST /api/services/mimir/alertmanager/api/v2/silences proxies authenticated systems to Alertmanager with tenant scoping enforced by the server

Frontend

The system detail active-alerts card exposes a silence action for users with manage:systems
The silence flow uses a small confirmation modal with an optional comment and refreshes the active-alerts card after success

HostDown monitoring

The heartbeat monitor checks every 60 seconds
Systems move to inactive after exceeding HEARTBEAT_TIMEOUT_MINUTES
A HostDown alert is posted when inactivity persists beyond the timeout and one additional monitor interval
The alert is resolved automatically when the system becomes active again

Tooling and docs

services/mimir/scripts/alerting_config.py manages alerting config and alert queries through the MY API
services/mimir/scripts/alert.py fires, resolves, silences, and lists alerts through the collect proxy
OpenAPI, database schema, migrations, tests, and docs cover the new alerting surface

🧪 Validation

cd backend && make pre-commit
cd collect && make pre-commit
cd frontend && npm run pre-commit

Related issue

Implements requirements from #72 (Alarm Management - Alertmanager Integration)

github-actions · 2026-02-20T10:53:44Z

🔗 Redirect URIs Added to Logto

The following redirect URIs have been automatically added to the Logto application configuration:

Redirect URIs:

https://my-frontend-qa-pr-42.onrender.com/login-redirect
https://my-proxy-qa-pr-42.onrender.com/login-redirect

Post-logout redirect URIs:

https://my-frontend-qa-pr-42.onrender.com/login
https://my-proxy-qa-pr-42.onrender.com/login

These will be automatically removed when the PR is closed or merged.

github-actions · 2026-02-20T10:53:52Z

🤖 My API structural change detected

Preview documentation

Structural change details

Added (17)

DELETE /alerts/config
DELETE /services/mimir/alertmanager/api/v2/silences/{silence_id}
DELETE /systems/{id}/alerts/silences/{silence_id}
GET /alerts
GET /alerts/config
GET /alerts/totals
GET /alerts/trend
GET /services/mimir/alertmanager/api/v2/alerts
GET /services/mimir/alertmanager/api/v2/silences
GET /services/mimir/alertmanager/api/v2/silences/{silence_id}
GET /systems/{id}/alerts
GET /systems/{id}/alerts/history
POST /alert_history
POST /alerts/config
POST /services/mimir/alertmanager/api/v2/alerts
POST /services/mimir/alertmanager/api/v2/silences
POST /systems/{id}/alerts/silences

Powered by Bump.sh

gsanchietti · 2026-02-25T07:12:08Z

update deploy

github-actions · 2026-02-25T07:12:20Z

🚀 Build triggers updated!

All .render-build-trigger files have been automatically updated to ensure fresh deployments of all services in the PR preview environment.

…end APIs - Rename API routes from /alerting to /alerts for RESTful consistency - Add GET /api/systems/:id/alerts for per-system active alerts - Add GET /api/alerts/totals and GET /api/alerts/trend endpoints - Use RequireResourcePermission on alerts group (read:systems for GET, manage:systems for POST/DELETE) - Fix OpenAPI paths (remove duplicate /api/ prefix), tags, and security scheme names - Add composite index (system_key, created_at) and unique constraint (fingerprint, system_key) - Remove dead code (DeleteConfig), rename alertmanager_history.go to alerting_history.go - Fix collect: http client timeout, endsAt zero-time handling, timing-safe token comparison - Fix collect Redis config: only override ParseURL values when env vars are explicitly set - Add missing env vars to collect .env.example and render.yaml - Add alert history webhook endpoint to OpenAPI spec - Move scripts to services/mimir/scripts, remove hardcoded QA credentials - Add local dev setup: docker-compose.local.yml + my-local.yaml (filesystem storage) - Fix Mimir config: reference runtime_config.yaml, remove emoji from docker-compose - Update copyrights to 2026

- collect/middleware: WebhookAuthMiddleware tests (valid/invalid/missing token, unconfigured, timing-safe) - collect/methods: ReceiveAlertHistory tests (resolved, firing skipped, missing system_key, invalid body, DB error, zero-time endsAt, nullableString) - backend/methods: filterAlerts tests (all filter combinations, missing labels, empty input) - backend/entities: alert history repository tests with sqlmock (query, sort validation, totals owner/non-owner, trend up/down/stable)

…_id injection - Collect proxy injects system_id (DB UUID) label in addition to system_key - Backend BuildTemplateFiles substitutes ${APP_URL} placeholder in templates - Templates use localized annotations: summary_en/it and description_en/it with fallback - Add "service" label display in all 4 HTML/TXT templates - Add "View system" / "Visualizza sistema" CTA button linking to app_url/systems/:id - Rewrite TXT templates with welcome-style separators and footer (info@nethesis.it) - Align label columns in TXT templates (rename FIRING SINCE→SINCE, STARTED AT→STARTED, etc.) - Align headers/footers with welcome email style (MSO conditionals, backgroundTable) - Change alert_history unique constraint to (fingerprint, system_key, starts_at) - Use ON CONFLICT DO NOTHING to avoid overwriting distinct occurrences of same alert - Add tests for injectSystemLabels helper

… with main

- Merge full alerting integration guide into services/mimir/README.md - Remove separate language files (docs/en/08-alerting.md, docs/it/08-alerting.md) - Document system_id/system_key auto-injection and summary_en/summary_it/description_en/description_it conventions - Update alert catalog examples with localized annotations - Add user-facing alerting guide in docs/docs/features/alerting.md (EN + IT) - Add "Alerting System" link in Docusaurus Developer Docs dropdown and footer pointing to mimir README

The unique index (fingerprint, system_key, starts_at) was only used by the ON CONFLICT clause and never helped any SELECT query. Removing both simplifies the schema and saves index space. If Alertmanager retries a webhook after an error, a duplicate row may occasionally be inserted — acceptable trade-off for a rare edge case.

…th system context Organization lifecycle: - Auto-provision default alerting config on customer/distributor/reseller creation - Use org email from custom_data as default notification recipient - Use org language (en/it) from custom_data for email_template_lang - Retry config push to Mimir with backoff (1s/3s/5s) to tolerate transient errors - Built-in history webhook is always active so alert_history works from day one Collect Mimir proxy: - Inject organization context labels (name, vat, type) in addition to system_id/key - Inject system_name, system_fqdn, system_ipv4 from the systems table - Replace injectSystemLabels with generic injectLabels helper - Join distributors/resellers/customers in the org lookup query Email templates (HTML + TXT, EN + IT): - Two-card layout: alert card (colored) + system info card (neutral) with CTA - Dynamic organization label based on organization_type - IT: CLIENTE/RIVENDITORE/DISTRIBUTORE/ORGANIZZAZIONE - EN: CUSTOMER/RESELLER/DISTRIBUTOR/ORGANIZATION - Dynamic FQDN/IP label (shows whichever is available) - Subject format: [FIRING][AlertName] - SystemKey - Plain-text templates abbreviate long labels (RIVEND./DISTRIB./ORG.) for column alignment - CTA "View system" button linked to APP_URL/systems/<system_id>

- alerting.GetConfig returns (nil, nil) when Mimir responds 404 (no config has ever been pushed for this tenant) - GetAlertingConfig handler returns HTTP 200 with "config": null when the body is empty, so the frontend shows the "no configuration found" empty state instead of a 500 error - Previously the API returned 500 "mimir returned 404: alertmanager storage object not found" for any org without a pushed config, which broke the UI for newly created orgs where auto-provisioning failed

- Update all API calls in lib/alerting.ts to use /alerts instead of /alerting: - GET /alerts/config, POST /alerts/config, DELETE /alerts/config - GET /alerts (list active alerts) - GET /systems/:id/alerts/history - Replace getSystemActiveAlerts helper to use the dedicated GET /systems/:id/alerts endpoint instead of filtering the global alerts list by system_key client-side - SystemActiveAlertsCard: switch from (organizationId, systemKey) to (systemId) so it no longer relies on the sanitized system_key field for unregistered systems

Provides make targets to manage a local Mimir instance with filesystem storage (no S3 required), wrapping docker-compose.local.yml: - dev-setup: inject MIMIR_URL and alerting webhook env vars into backend/.env and collect/.env (idempotent) - dev-up: start Mimir container and wait for readiness - dev-down: stop container - dev-restart: restart container - dev-logs: follow container logs - dev-status: show container status and Mimir readiness - dev-ready: check readiness endpoint Update README with the local development workflow.

- Update all API paths in alerting_config.py from /alerting/... to /alerts/... to match the backend API rename: - GET/POST/DELETE /alerts/config - GET /alerts (list active alerts) - GET /systems/:id/alerts/history - Document the LOGTO_ENDPOINT, LOGTO_APP_ID and AUTH_BASE_URL environment variables in scripts/README.md, which replaced the hardcoded QA values removed in a previous commit

Short, tool-agnostic reference for AI coding agents working in this monorepo. Covers components actually on the current branch (backend, collect, sync, frontend, proxy, services/mimir) and explicitly marks services/support and services/ssh-gateway as stubs here. API reference defers to openapi.yaml as source of truth. Includes coding patterns, RBAC model, alerting invariants, and a short pitfalls list. Claude Code auto-loads CLAUDE.md; developers who use Claude Code can create a local CLAUDE.md shim that points to this file.

The script was failing with 404 errors because it was using hardcoded default Logto endpoint 'https://your-tenant.logto.app' which doesn't exist. Changes: - Add required CLI arguments: --tenant-id and --app-id - Derive Logto endpoint dynamically from tenant ID - Use the proxy URL as redirect_uri base instead of hardcoded _AUTH_BASE_URL - Update all examples in docstring to include new arguments - Pass tenant_id and app_id to all command functions This allows the script to work with any MY proxy deployment by providing the Logto tenant configuration at runtime. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

…ments - Add new required arguments to all command examples - Update full example workflow to include Logto configuration - Document the new CLI arguments in the Common arguments table Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

- Remove required constraint from --app-id argument - Use environment variable LOGTO_APP_ID as default if set - Fall back to standard app ID 'my_frontend_app' if not set - Update all documentation and examples to show --app-id is now optional - Update README table to show required/optional arguments clearly This simplifies the CLI usage for most deployments that use the standard frontend app ID. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

- Make --tenant-id optional with TENANT_ID environment variable fallback - Add validation to ensure tenant_id is provided (via CLI or env var) - Add detailed debugging in _logto_login() to identify which step fails - Improve error messages to help user troubleshoot authentication issues - Show which endpoint failed and provide guidance for common issues - Display Logto endpoint, tenant ID, and app ID in error output This helps users quickly identify if the issue is: 1. Invalid/missing tenant ID 2. Incorrect app ID 3. Unregistered redirect URI 4. Logto service unavailable Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Fire the alert only for systems that have been inactive for at least 2 check intervals (120 seconds) to avoid flapping

edospadoni had a problem deploying to mimir-integration - my-mimir-qa PR #42 February 20, 2026 10:53 — with Render Failure

gsanchietti self-assigned this Feb 20, 2026

edospadoni deployed to mimir-integration - my-collect-qa PR #42 February 20, 2026 10:53 — with Render View deployment

edospadoni deployed to mimir-integration - my-backend-qa PR #42 February 20, 2026 10:53 — with Render View deployment

edospadoni deployed to mimir-integration - my-proxy-qa PR #42 February 20, 2026 10:53 — with Render View deployment

gsanchietti mentioned this pull request Feb 20, 2026

feat: add Grafana Mimir metrics infrastructure #41

Closed

gsanchietti force-pushed the mimir-integration branch from 8b5324c to 9ec5be6 Compare February 20, 2026 11:00

edospadoni deployed to mimir-integration - my-backend-qa PR #42 February 20, 2026 11:00 — with Render View deployment

edospadoni deployed to mimir-integration - my-proxy-qa PR #42 February 20, 2026 11:00 — with Render View deployment

edospadoni deployed to mimir-integration - my-mimir-qa PR #42 February 20, 2026 11:00 — with Render Active

edospadoni deployed to mimir-integration - my-collect-qa PR #42 February 20, 2026 11:00 — with Render View deployment

edospadoni deployed to mimir-integration - my-collect-qa PR #42 February 20, 2026 13:10 — with Render View deployment

edospadoni deployed to mimir-integration - my-collect-qa PR #42 February 20, 2026 13:16 — with Render View deployment

edospadoni deployed to mimir-integration - my-mimir-qa PR #42 February 24, 2026 16:13 — with Render Active

edospadoni temporarily deployed to mimir-integration - my-collect-qa PR #42 February 24, 2026 16:13 — with Render Destroyed

gsanchietti force-pushed the mimir-integration branch from b10c682 to 08c4b4c Compare February 24, 2026 16:15

edospadoni deployed to mimir-integration - my-collect-qa PR #42 February 24, 2026 16:15 — with Render View deployment

edospadoni deployed to mimir-integration - my-mimir-qa PR #42 February 24, 2026 16:15 — with Render Active

edospadoni deployed to mimir-integration - my-collect-qa PR #42 February 24, 2026 16:25 — with Render View deployment

edospadoni requested a deployment to mimir-integration - my-mimir-qa PR #42 February 24, 2026 16:38 — with Render In progress

edospadoni deployed to mimir-integration - my-mimir-qa PR #42 February 24, 2026 16:40 — with Render Active

edospadoni had a problem deploying to mimir-integration - my-mimir-qa PR #42 February 25, 2026 06:59 — with Render Failure

edospadoni had a problem deploying to mimir-integration - my-mimir-qa PR #42 February 25, 2026 07:12 — with Render Failure

edospadoni deployed to mimir-integration - my-proxy-qa PR #42 February 25, 2026 07:12 — with Render View deployment

edospadoni deployed to mimir-integration - my-backend-qa PR #42 February 25, 2026 07:12 — with Render View deployment

edospadoni deployed to mimir-integration - my-collect-qa PR #42 February 25, 2026 07:12 — with Render View deployment

gsanchietti and others added 30 commits April 10, 2026 14:19

fix: improve HostDown logic

210e68c

fix: improve alert.py script for resolving alerts

0b97409

fix: rename alert_history migration from 018 to 019 to avoid conflict…

16e9835

… with main

fix: reduce log verbosity

f8c9b32

fix (mimir scripts): fix small regression

b437537

chore: gitignore local CLAUDE.md shim

e69d25b

fix: fix regressions after rebase

ad8cc57

feat(ui): improve alerting draft

033faa2

fix(alerting): improve HostDown alert logic

5174a54

Fire the alert only for systems that have been inactive for at least 2 check intervals (120 seconds) to avoid flapping

fix(ui): draft, cleanup alerting

c208336

fix(alerting): force system_key on server side

5cbbd82

feat(backend): add API to manage alert silence from UI

2c93360

feat(ui): draft implementation for silences

222cf79

feat(backend): handle delete silence

2f3c12a

feat(ui): disable alert silence

c0c29fe

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Mimir integration#42

Mimir integration#42
gsanchietti wants to merge 35 commits intomainfrom
mimir-integration

gsanchietti commented Feb 20, 2026 •

edited

Loading

Uh oh!

github-actions bot commented Feb 20, 2026

Uh oh!

github-actions bot commented Feb 20, 2026 •

edited

Loading

Added (17)

Uh oh!

gsanchietti commented Feb 25, 2026

Uh oh!

github-actions bot commented Feb 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

gsanchietti commented Feb 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

📋 Description

Backend API (/api/alerts)

Alerting configuration

Collect service

Frontend

HostDown monitoring

Tooling and docs

🧪 Validation

Related issue

Uh oh!

github-actions bot commented Feb 20, 2026

Uh oh!

github-actions bot commented Feb 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🤖 My API structural change detected

Added (17)

Powered by Bump.sh

Uh oh!

gsanchietti commented Feb 25, 2026

Uh oh!

github-actions bot commented Feb 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

gsanchietti commented Feb 20, 2026 •

edited

Loading

Backend API (`/api/alerts`)

github-actions bot commented Feb 20, 2026 •

edited

Loading