Add telemetry related calls (for pipeline monitoring) by nuclearcat · Pull Request #648 · kernelci/kernelci-api

nuclearcat · 2026-02-13T08:04:36Z

Related to: kernelci/kernelci-pipeline#1419

Add TelemetryEvent model to the COLLECTIONS dict so the telemetry MongoDB collection is created at startup with TTL and compound indexes. Signed-off-by: Denys Fedoryshchenko <denys.f@collabora.com>

Add endpoint for pipeline services to submit telemetry events in bulk. Validates each event against the TelemetryEvent model and uses insert_many for efficient batch insertion. Signed-off-by: Denys Fedoryshchenko <denys.f@collabora.com>

Add paginated query endpoint for telemetry events with support for filtering by any field (kind, runtime, device_type, job_name, result, is_infra_error, etc.) plus time range via since/until parameters. Signed-off-by: Denys Fedoryshchenko <denys.f@collabora.com>

Add aggregation endpoint that groups telemetry events by configurable fields (runtime, device_type, job_name, tree, branch, arch, kind, error_type) and returns pass/fail/incomplete/skip/infra_error counts. Supports filtering by kind, runtime, device_type, job_name, tree, branch, arch, and time range (since/until) before aggregation. Also adds a generic db.aggregate() method for running MongoDB aggregation pipelines. Signed-off-by: Denys Fedoryshchenko <denys.f@collabora.com>

Add on-demand anomaly detection endpoint that identifies: 1. Runtime+device_type combinations with high infra error or failure rates exceeding a configurable threshold 2. Runtimes with recurring submission/connectivity errors Parameters: window (1h-48h), threshold (0.0-1.0), min_total (minimum event count to avoid noise from small samples). Signed-off-by: Denys Fedoryshchenko <denys.f@collabora.com>

Signed-off-by: Denys Fedoryshchenko <denys.f@collabora.com>

…efactor get_telemetry_stats() We add new method db.insert_many instead. Signed-off-by: Denys Fedoryshchenko <denys.f@collabora.com>

Copilot

Pull request overview

This PR adds telemetry-related API endpoints for pipeline monitoring, as referenced in issue #1419. The changes introduce a separate telemetry collection to track pipeline execution events with higher volume and different query patterns than the existing EventHistory collection.

Changes:

Added four new telemetry endpoints: POST /telemetry for bulk event insertion, GET /telemetry for querying events, GET /telemetry/stats for aggregated statistics, and GET /telemetry/anomalies for anomaly detection
Added database support methods (insert_many, aggregate) to handle telemetry operations
Integrated TelemetryEvent model from kernelci.api.models into the database collection mapping

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 16 comments.

File	Description
api/main.py	Adds four telemetry endpoints with authentication, filtering, aggregation, and anomaly detection capabilities
api/db.py	Adds TelemetryEvent to collection mapping and implements insert_many and aggregate methods for bulk operations

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

api/main.py

Copilot · 2026-02-16T09:49:02Z

api/main.py

+    """Get aggregated telemetry statistics.
+
+    This is rule-based anomaly detection using
+    thresholded empirical rates computed over
+    a sliding (rolling) time window.
+    This is not a full anomaly detection system
+    with baselines or machine learning, but at
+    last something to start with.
+
+    Query parameters:
+    - group_by: Comma-separated fields to group by
+      (runtime, device_type, job_name, tree, branch, arch,
+      kind, error_type)
+    - kind: Filter by event kind before aggregating
+    - runtime: Filter by runtime name
+    - since/until: Time range (ISO 8601)
+
+    Returns grouped counts with pass/fail/incomplete/infra_error
+    breakdowns for result-bearing events.
+    """


Misleading or incorrect documentation. The docstring states "This is rule-based anomaly detection using thresholded empirical rates..." but this endpoint (get_telemetry_stats) only provides aggregated statistics without any anomaly detection logic. Anomaly detection is actually performed by the get_telemetry_anomalies endpoint. This docstring should describe the stats aggregation functionality, not anomaly detection.

api/main.py

Copilot · 2026-02-16T09:49:05Z

api/main.py

+        if since:
+            ts_filter['$gte'] = datetime.fromisoformat(since)
+        if until:
+            ts_filter['$lte'] = datetime.fromisoformat(until)


Missing error handling for datetime.fromisoformat(). If 'since' or 'until' parameters contain invalid ISO 8601 format strings, this will raise a ValueError that is not caught, resulting in an unhandled 500 error instead of a user-friendly 400 Bad Request. Wrap these calls in a try-except block and raise HTTPException with status 400.

Suggested change

if since:

ts_filter['$gte'] = datetime.fromisoformat(since)

if until:

ts_filter['$lte'] = datetime.fromisoformat(until)

try:

if since:

ts_filter['$gte'] = datetime.fromisoformat(since)

if until:

ts_filter['$lte'] = datetime.fromisoformat(until)

except ValueError as exc:

raise HTTPException(

status_code=status.HTTP_400_BAD_REQUEST,

detail="Invalid 'since' or 'until' timestamp; expected ISO 8601 format.",

) from exc

api/main.py

nuclearcat added 6 commits February 13, 2026 09:54

db: Register TelemetryEvent collection and indexes

25d4800

Add TelemetryEvent model to the COLLECTIONS dict so the telemetry MongoDB collection is created at startup with TTL and compound indexes. Signed-off-by: Denys Fedoryshchenko <denys.f@collabora.com>

linter: Fix linter complaints

6015161

Signed-off-by: Denys Fedoryshchenko <denys.f@collabora.com>

nuclearcat force-pushed the add-telemetry-models-api branch from 6568cca to 7152a42 Compare February 13, 2026 08:15

fix: post_telemetry() should not call _get_collection (private) and r…

31a5c15

…efactor get_telemetry_stats() We add new method db.insert_many instead. Signed-off-by: Denys Fedoryshchenko <denys.f@collabora.com>

nuclearcat force-pushed the add-telemetry-models-api branch from 13ffe2e to 31a5c15 Compare February 13, 2026 08:23

nuclearcat requested a review from Copilot February 16, 2026 09:43

Copilot started reviewing on behalf of nuclearcat February 16, 2026 09:43 View session

Copilot AI reviewed Feb 16, 2026

View reviewed changes

nuclearcat added this pull request to the merge queue Feb 19, 2026

Merged via the queue into kernelci:main with commit ea6272f Feb 19, 2026
10 checks passed

nuclearcat deleted the add-telemetry-models-api branch February 19, 2026 13:04

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add telemetry related calls (for pipeline monitoring)#648

Add telemetry related calls (for pipeline monitoring)#648
nuclearcat merged 7 commits intokernelci:mainfrom
nuclearcat:add-telemetry-models-api

nuclearcat commented Feb 13, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI Feb 16, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI Feb 16, 2026

Uh oh!

nuclearcat Feb 16, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Comments

Conversation

nuclearcat commented Feb 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI Feb 16, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI Feb 16, 2026

Choose a reason for hiding this comment

Uh oh!

nuclearcat Feb 16, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Comments

nuclearcat commented Feb 13, 2026 •

edited

Loading