feat: add OpenTelemetry metrics for the hosted MCP server#50
Conversation
Instrument the hosted HTTP transport with OpenTelemetry metrics, exported over OTLP/HTTP to the shared Appwrite observability stack (OTel Collector -> Prometheus/Mimir -> Grafana), mirroring the utopia-php/telemetry pattern used by the PHP services. - New telemetry.py owns the MeterProvider, instruments, in-process active user/client TTL sets, and exception-safe record_* helpers. No-op unless the transport is http and an OTLP endpoint is configured, so self-hosted stdio never phones home and an unconfigured server stays silent. - Wired at the operator/handler/auth boundaries: MCP request rate + latency, per-service/action Appwrite calls + errors, initializations (which agents connect), aggregated active users/clients, auth validation outcomes/reasons, write confirmations, search/docs usage, uploads, and build info. - User ids (sub) are never used as labels; distinct counts are derived in process and exposed only as aggregate gauges. - Config via standard OTEL_* env vars plus the cloud-style _APP_TELEMETRY_* aliases. Documented in AGENTS.md and compose.yaml. - Unit tests use an in-memory metric reader (no collector required).
Greptile SummaryThis PR wires the hosted Appwrite MCP server into the shared OpenTelemetry stack by adding a new
Confidence Score: 5/5Safe to merge — telemetry is fully isolated behind the The change is additive and exception-safe throughout. The only findings are label-naming inconsistencies in the metrics (duplicate auth.py — two structurally different rejection paths share the same Important Files Changed
Reviews (4): Last reviewed commit: "(docs): trim Telemetry section in AGENTS..." | Re-trigger Greptile |
Drop the speculative _APP_TELEMETRY_* aliases. When OTEL_EXPORTER_OTLP_HEADERS is unset, build it from CF_ACCESS_CLIENT_ID + CF_ACCESS_CLIENT_SECRET so the deployment reuses the shared telemetry-auth secret instead of a combined one.
The assets cluster runs Alloy in the telemetry namespace with an OTLP receiver on :4318 that authenticates upstream and upserts the deployment.* resource attributes. So the app just points at it — no CF-Access secret, no header assembly, no OTEL_RESOURCE_ATTRIBUTES needed.
- upload SSRF guard: emit reason=no_host for missing-host (was conflated with reason=scheme) - docs_search: return embedding duration from _rank instead of stashing it on the instance (removes a cross-request race on _last_embedding_duration_s) - telemetry: only touch the active-user/client sets when enabled, so they can't grow unbounded when telemetry is off (pruning only runs via the gauge collection cycle, which is disabled then)
What & why
The MCP server emitted no telemetry. The other Appwrite services use
utopia-php/telemetryto export OpenTelemetry metrics over OTLP/HTTP to the shared stack (OTel Collector → Prometheus/Mimir → Grafana attelemetry.appwrite.systems). This wires the hosted HTTP MCP server into that same stack so we can see:clientInfo(Claude, Cursor, …) + OAuth client_id.Approach
telemetry.pyowns theMeterProvider, all instruments, in-process active user/client TTL sets, and exception-saferecord_*helpers. No-op unless transport ishttpand an OTLP endpoint is configured — self-hostedstdionever phones home; an unconfigured hosted server stays silent (mirrors the PHPNone/NoTelemetryadapter).server.py,operator.py,auth.py,http_app.py,docs_search.py).sub) are never labels — distinct counts are derived in-process and exposed only via the aggregate gaugesmcp.users.active/mcp.clients.active.Metrics
Prefixed
mcp.—mcp.requests,mcp.request.duration,mcp.tool.calls,mcp.appwrite.calls/.call.duration/.errors,mcp.write.confirmations,mcp.initializations,mcp.users.active,mcp.clients.active,mcp.auth.validations/.duration,mcp.search_tools.*,mcp.search_docs.*,mcp.context.requests,mcp.results.stored,mcp.resources.reads,mcp.uploads/.upload.bytes/.upload.errors,mcp.server.info,mcp.startup.validation.Config
Standard
OTEL_EXPORTER_OTLP_ENDPOINT/OTEL_EXPORTER_OTLP_HEADERS/OTEL_RESOURCE_ATTRIBUTES, plus the cloud-style_APP_TELEMETRY_OTLP_ENDPOINT/_APP_TELEMETRY_OTLP_HEADERSaliases. Documented inAGENTS.mdandcompose.yaml. Runtime values are set in the declarative deploy repo (assets-applications<env>/mcp/default.yaml), not in CI.Testing
tests/unit/test_telemetry.py(in-memory metric reader; no collector needed): label correctness, write-block, auth reasons, session dedupe, and the no-op path.otel/opentelemetry-collector— metrics land with correct names/attributes and thedeployment.environment.nameresource attribute flows through.Dashboards
Companion Grafana dashboards (
MCP/overview.json,MCP/adoption.json) are in a separate PR against thedashboardsrepo.