A full observability stack for a Gunicorn/FastAPI application, running entirely in Docker Compose. Covers all four pillars — metrics · logs · traces · profiles — with alerting out of the box.
Grafana dashboard: FastAPI Full Observability
- Dashboard Preview
- Architecture
- How the Backend Works
- Grafana Dashboard
- Generate Load
- Quick Start
- Stack
- Alerting
- Adding Your Own Service
- Multiple Environments
- Ports
Open
http://localhost:3000after starting the stack. Use the Project dropdown to filter by Docker Compose project name.
Metrics
- Backend exposes
/metricsin Prometheus format (multiprocess-safe viaprometheus_multiproc) - Alloy pulls it every 15s — opt-in via
metrics.scrape: "true"docker label on any service - Alloy remote-writes to VictoriaMetrics
- vmalert queries VictoriaMetrics every 15s → fires to Alertmanager → Telegram / email
Logs
- Backend writes structured logfmt to stdout (includes
trace_idon every line) - Alloy reads container stdout via Docker API — no log driver config needed
- Docker Compose labels (
project,service,container) are attached as Loki stream labels - Logs are queryable in Grafana with LogQL
Traces
- Backend's OpenTelemetry SDK sends spans via OTLP gRPC → Alloy (
:4317) → Tempo - Tempo generates span metrics (RPS, latency, errors per operation) and pushes them to VictoriaMetrics
trace_idin log lines creates a live link from any log entry to its full trace
Profiles
- Pyroscope SDK pushes CPU flame graphs via HTTP → Alloy (
:4040) → Pyroscope storage - Grafana links profiles to traces via Tempo's
tracesToProfilesintegration
The backend/ directory is a minimal FastAPI + Gunicorn application wired with all four observability signals. It is intentionally kept simple — the goal is to show the instrumentation, not the business logic.
Every request passes through three middlewares in order:
RequestAccessMiddleware → MetricsMiddleware → FastAPI router
| Middleware | What it does |
|---|---|
RequestAccessMiddleware |
Generates request_id, writes a structured logfmt access log line with method, path, status, duration |
MetricsMiddleware |
Records requests_total, responses_total, request_duration_seconds, requests_in_progress, exceptions_total |
Gunicorn forks multiple worker processes. The standard Prometheus client is not process-safe by default.
The backend uses prometheus_multiproc mode — each worker writes its metrics to a shared directory (/tmp/prometheus_multiproc). The /metrics endpoint aggregates all files before responding.
Worker lifecycle hooks in gunicorn.conf.py ensure per-worker gauges are cleaned up on exit.
Traces are sent via OTLP gRPC to Alloy (:4317) using BatchSpanProcessor.
FastAPIInstrumentor automatically creates a root span for every request.
PyroscopeSpanProcessor links each root span to a Pyroscope profile — enabling the Profiles button in Tempo.
Every log line is emitted in logfmt format and includes:
level,timestamp,messagerequest_id— unique per request, also returned asX-Request-Idresponse headertrace_id,span_id— injected from the active OpenTelemetry span, enabling Logs → Traces navigation in Grafana
- Apdex — user satisfaction score based on latency thresholds
- Error Rate — percentage of 5xx responses
- Total Requests — cumulative request count
- Workers — live Gunicorn worker count (step graph, drops to 0 on crash)
- RPS total, broken down by path and by status code
- P50 latency — total and per path
P95 and P99 — total and broken down by path.
- Top 10 Slowest Endpoints (P95 bar gauge, color-coded green → red)
- Average Duration by endpoint
Requests currently being processed — total and by path. Useful for detecting request pile-ups.
- 4xx Rate by path — client errors
- 5xx Rate by path — server errors
- Exceptions by Type — unhandled Python exceptions with rate
- Service Map — visual request flow graph with avg latency and RPS per node
- Recent Traces — clickable list, opens full trace in Tempo
- Span RPS / P99 Latency / Error Rate — broken down per operation
- Log Volume by Level — histogram showing INFO / ERROR / WARNING over time
- Log Stream — live log view; each line includes
request_id,trace_id,span_id
- CPU Time Consumed — total CPU usage over time across all workers
- Flame Graph — aggregated call stack for the selected time range; shows hottest functions by self/total CPU time
Every log line contains a trace_id linking it to a distributed trace.
Logs → Traces:
- Click any line in the Log Stream to expand it
- In the Links section, click "Open in Tempo"
From a Trace span you can jump to:
- Logs for this span — correlated log lines in Loki
- Span metrics — RPS / latency / error rate for that operation
- Profile — CPU flame graph for that request
To populate the dashboards with real traffic, run the included load script:
./load.sh # ~20 req/s against http://localhost:8000
./load.sh 50 # custom rate
./load.sh 10 http://localhost:8001 # custom rate + URLThe script cycles through all endpoints — normal requests, 4xx, 5xx, slow — so every dashboard panel gets data within seconds.
Edit observability/alertmanager/config.yaml and fill in the placeholders:
global:
smtp_smarthost: '<SMTP_HOST>:<SMTP_PORT>' # e.g. smtp.gmail.com:587
smtp_from: '<SMTP_FROM>'
smtp_auth_username: '<SMTP_USERNAME>'
smtp_auth_password: '<SMTP_PASSWORD>'
receivers:
- name: 'default-receiver'
email_configs:
- to: '<ALERT_EMAIL>'
telegram_configs:
- bot_token: '<TELEGRAM_BOT_TOKEN>' # from @BotFather → /newbot
chat_id: <TELEGRAM_CHAT_ID> # from @userinfobotIf you only need Telegram — remove
email_configs. If you only need email — removetelegram_configs.
docker compose up -dhttp://localhost:3000 → admin / admin
| Component | Role | Version |
|---|---|---|
| Grafana Alloy | Collector — scrapes metrics, collects logs, receives traces & profiles | v1.12.0 |
| VictoriaMetrics | Metrics storage (Prometheus-compatible) | v1.131.0 |
| vmalert | Evaluates PromQL alert rules against VictoriaMetrics | v1.131.0 |
| Alertmanager | Alert routing, deduplication & notifications | v0.29.0 |
| Loki | Log storage & querying | v3.5.9 |
| Tempo | Distributed trace storage + span metrics generation | v2.8.0 |
| Pyroscope | Continuous profiling storage | v1.17.0 |
| Grafana | Dashboards, Explore, cross-signal navigation | v12.4.0 |
| Backend | Example FastAPI app (Gunicorn + Uvicorn workers) | — |
Rules live in observability/vmalert/rules/fastapi.yaml, evaluated every 15s.
| Alert | Fires when | Severity | Delay |
|---|---|---|---|
BackendDown |
No metrics received from backend | critical | 1m |
HighErrorRate |
5xx responses > 5% of total | warning | 2m |
HighLatencyP99 |
p99 latency > 1s | warning | 2m |
- Critical alerts suppress warnings with the same
alertnamevia inhibit rules - All alerts route to
default-receiver→ Telegram + email - To adjust thresholds — edit
exprinobservability/vmalert/rules/fastapi.yaml
To opt a service into metrics scraping, add these docker labels:
services:
my-service:
labels:
metrics.scrape: "true" # required
metrics.path: "/custom/metrics" # optional, defaults to /metricsAlloy auto-discovers the container and attaches project, service, container labels.
No Alloy config changes needed.
Logs are collected from all running containers automatically — no labels required.
Run dev and staging side by side:
docker compose -p dev up -d
docker compose -p staging up -dEach project gets its own project label on all metrics and logs.
Switch between them in Grafana using the Project dropdown at the top of the dashboard.
| Service | Port | Purpose |
|---|---|---|
| Grafana | 3000 |
Dashboards — http://localhost:3000 |
| Backend API | 8000 |
FastAPI docs — http://localhost:8000/docs |
| Alloy | 12345 |
Pipeline debug UI — http://localhost:12345 |










