Server-layer observability (uvicorn/granian/hypercorn) + live admin Observability dashboard by ancongui · Pull Request #146 · fireflyframework/fireflyframework-pyfly

ancongui · 2026-06-17T15:12:51Z

Summary

Adds observability for the ASGI server layer — until now pyfly only observed the application layer (the http_server_requests_seconds filter, tracing/correlation, process metrics). This surfaces metrics about the server itself across Uvicorn, Granian, and Hypercorn, with correct multi-worker aggregation, and a live admin Observability dashboard section.

Targets release v26.06.113.

How it works (3 cooperating mechanisms)

All write to the Prometheus registry and are auto-exposed at /actuator/prometheus; everything is gated on pyfly.server.observability.enabled (on under the web/core starters) and degrades to a no-op without prometheus_client.

ServerMetricsASGIMiddleware (web/adapters/starlette/asgi_server_metrics.py) — the primary, uniform source, installed outermost so it runs in every worker for every server/worker-count. Emits server_active_connections, server_in_flight_requests, server_requests_total.
ServerMetricsBinder (observability/server_metrics.py) — bound from the in-worker ASGI lifespan (beside register_process_metrics / ManagementServer). Emits server_workers, server_uptime_seconds, server_started_total/server_stopped_total, and optional server_native_connections.
ServerStatsPort (server/ports/server_stats.py) — best-effort per-adapter native stats; uvicorn surfaces true socket counts on the serve_async path, granian/hypercorn report workers+uptime only.

Why not just read native server stats? On the production pyfly run path, uvicorn.run(workers=N) forks workers that each build their own server, so server_state is unreachable cross-process. The ASGI middleware (in-worker) is the uniform source; native stats are enrichment.

Multi-worker aggregation

pyfly run enables prometheus_client multiprocess mode (PROMETHEUS_MULTIPROC_DIR set before forking) for workers > 1, so one scrape aggregates across all workers via MultiProcessCollector. This also fixes the prior per-worker gap for http_server_requests_*.

Admin dashboard

New live Observability view (Monitoring group): stat cards (workers, uptime, active connections, in-flight, requests/sec), rolling charts, a per-worker breakdown table, lifecycle, and links to Metrics/Traces. Backed by GET /admin/api/observability + the observability SSE stream.

Config

pyfly.server.observability.{enabled,sample-interval-seconds,access-log}. Local Prometheus+Grafana stack added to docker-compose.yml (loopback-bound; ops/prometheus/prometheus.yml).

Scope

Gunicorn is intentionally not added (stack stays async-only ASGI), but the ServerStatsPort + multiprocess design is gunicorn-ready.

Quality

TDD throughout; 4890 tests pass, ruff clean, mypy --strict clean (683 files).
A real-server E2E test (tests/server/test_server_observability_e2e.py) boots uvicorn via serve_async, fires HTTP traffic, and asserts the server_* meters move and are served.
Hardened against an adversarial multi-agent review (binder exception-safety + graceful-shutdown resilience, off-thread sampling, multiprocess dir cleanup + graceful scrape fallback, per-stream requests/sec, auto server-label resolution) and a security review (loopback-bound, no hardcoded Grafana admin password, anonymous read-only).

Docs updated: observability/server/admin module docs, README, ROADMAP, and the observability book chapter (EN + ES).

🤖 Generated with Claude Code

Add server-observability backend: a ServerStatsPort outbound port (best-effort native stats on the serve_async path), per-adapter sample() implementations, a pure-ASGI server-metrics middleware (the uniform primary source for connections/in-flight/requests across all servers and worker counts), and a ServerMetricsBinder that emits worker/uptime/lifecycle meters from the in-worker ASGI lifespan. Gated by pyfly.server.observability.* (on under web/core starters). Wired into both the Starlette and FastAPI create_app lifespans.

Add pyfly.observability.multiprocess (init dir before workers fork, build an aggregating MultiProcessCollector registry). /actuator/prometheus aggregates across workers when PROMETHEUS_MULTIPROC_DIR is set; cli run enables it for workers>1. Fixes the pre-existing per-worker gap for http_server_requests + server_* meters.

Add an ObservabilityProvider (reads server_* meters, multiprocess-aware, with a per-worker breakdown), REST + SSE routes (/admin/api/observability[,/sse]), and a live observability.js view (stat cards, rolling charts, per-worker table, links to Metrics/Traces) registered in the SPA router + sidebar.

…stack End-to-end test boots a real uvicorn server via serve_async, fires HTTP traffic, and asserts the server_* meters move and are served at the exposition. Add a prometheus + grafana docker-compose stack (ops/prometheus/prometheus.yml) scraping /actuator/prometheus.

…ity view Update observability/server/admin module docs, README + ROADMAP, and the observability book chapter (EN + ES) with the server_* metric catalog, the pyfly.server.observability.* config, multi-worker aggregation, and the live admin Observability dashboard section.

…urity review - Binder: guard all gauge writes + run sample() off-thread; _run never dies silently; stop() always records server_stopped_total and cleans up even if the sampling task died (was: stop() re-raised a dead task's exception, breaking graceful shutdown). Mark workers dead in multiprocess mode on graceful stop. - Resolve the concrete server type ('auto' -> uvicorn/granian/hypercorn) so the server_* metric label is meaningful; binder falls back off the 'auto' sentinel. - Admin provider: honor pyfly.server.observability.enabled (disabled -> dashboard empty-state); fix falsy-zero native_connections; move requests/sec to per-stream state (was corrupted by sharing one provider across REST + SSE + tabs). - Multiprocess: graceful scrape fallback when the dir is missing; atexit cleanup + stale-dir sweep so mmap dirs don't accumulate across restarts. - ASGI exclusion matches /api/sse/ as a substring (custom admin paths too). - docker-compose: bind prometheus/grafana to loopback, drop the hardcoded admin password, downgrade anonymous Grafana to read-only Viewer (security review).

Bump version 26.06.112 -> 26.06.113 and add the CHANGELOG entry for the server-layer observability feature (metrics across uvicorn/granian/hypercorn, multi-worker aggregation, live admin Observability dashboard).

CI runs 'ruff format --check'; format the 4 new files to satisfy it.

ancongui added 8 commits June 17, 2026 16:42

release: v26.06.113 — server-layer observability

c43487a

Bump version 26.06.112 -> 26.06.113 and add the CHANGELOG entry for the server-layer observability feature (metrics across uvicorn/granian/hypercorn, multi-worker aggregation, live admin Observability dashboard).

style: apply ruff format to server-observability files

e7a2321

CI runs 'ruff format --check'; format the 4 new files to satisfy it.

ancongui merged commit 332776e into main Jun 17, 2026
6 checks passed

ancongui deleted the feat/server-observability branch June 17, 2026 16:05

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Server-layer observability (uvicorn/granian/hypercorn) + live admin Observability dashboard#146

Server-layer observability (uvicorn/granian/hypercorn) + live admin Observability dashboard#146
ancongui merged 8 commits into
mainfrom
feat/server-observability

ancongui commented Jun 17, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

ancongui commented Jun 17, 2026

Summary

How it works (3 cooperating mechanisms)

Multi-worker aggregation

Admin dashboard

Config

Scope

Quality

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant