Skip to content

GET /analytics returns misleading bucket distribution when all scores are identical #15

@sl628

Description

@sl628

In app/engine/analytics.py, _compute_buckets() uses np.percentile() to compute bucket boundaries. When all scores are identical (e.g., all 0.5), every boundary collapses to the same value. The bucket loop then produces:

  • Buckets p0–p75: count=0, reduced_centroid=null
  • Final bucket p75–p100: all records pile here (due to <= in the last mask)

The distribution is completely misleading — total_scored is unaffected but the bucketing output is useless. Easy to trigger early in testing when a newly registered model produces uniform scores.

Minimal fix in _compute_buckets():

if len(np.unique(scores)) == 1:
    centroid = reduced.mean(axis=0).tolist() if reduced is not None else None
    return [ScoreBucket(bucket_label="p0-p100", count=len(scores), reduced_centroid=centroid)]

Happy to submit a PR if helpful.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions