GET /analytics returns misleading bucket distribution when all scores are identical

In `app/engine/analytics.py`, `_compute_buckets()` uses `np.percentile()` to compute bucket boundaries. When all scores are identical (e.g., all `0.5`), every boundary collapses to the same value. The bucket loop then produces:

- Buckets p0–p75: `count=0`, `reduced_centroid=null`
- Final bucket p75–p100: all records pile here (due to `<=` in the last mask)

The distribution is completely misleading — `total_scored` is unaffected but the bucketing output is useless. Easy to trigger early in testing when a newly registered model produces uniform scores.

Minimal fix in `_compute_buckets()`:

```python
if len(np.unique(scores)) == 1:
    centroid = reduced.mean(axis=0).tolist() if reduced is not None else None
    return [ScoreBucket(bucket_label="p0-p100", count=len(scores), reduced_centroid=centroid)]
```

Happy to submit a PR if helpful.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GET /analytics returns misleading bucket distribution when all scores are identical #15

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

GET /analytics returns misleading bucket distribution when all scores are identical #15

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions