Skip to content

Add cost and latency to the model leaderboard#81

Merged
MaxGhenis merged 3 commits into
mainfrom
feat/leaderboard-cost-latency
Jun 26, 2026
Merged

Add cost and latency to the model leaderboard#81
MaxGhenis merged 3 commits into
mainfrom
feat/leaderboard-cost-latency

Conversation

@MaxGhenis

Copy link
Copy Markdown
Contributor

What

Adds Cost / household and Latency columns to the model leaderboard.

Pipeline

  • analysis.model_cost_latency(predictions, price_overrides) computes per model:
    • costUsd (run total) and costPerHousehold (÷ distinct households), reusing usage_summary_by_model so totals match the existing usage CSV.
    • latencySeconds — the median of each household's summed request-time. Median rather than mean/sum because the per-call timer wraps litellm's retry/backoff, so a few rate-limited calls inflate the mean; the median reflects a typical household.
  • config.PRICE_OVERRIDES_PER_1M fills cost for provider preview models litellm's price map doesn't cover yet — currently grok-build-0.1 at $1/$2 per 1M (https://x.ai/api), which otherwise showed blank cost.
  • Wired into build_dashboard_payload, so every modelStats entry carries them.

UI

  • ModelStat gains costUsd / costPerHousehold / latencySeconds / totalTokens (all optional).
  • ModelLeaderboard adds the two columns (desktop 12-col grid + a mobile line) with tooltips. Cost is shown to 2 significant figures (range ~$0.002–$0.29); latency as 66s / 1.1m.

Verification

  • Regenerated the June run via export_country: zero change to any accuracy number (max drift 0.00000) — only the new fields are added. All 13 models populate; grok-build-0.1 ≈ $4.49.
  • 5 new unit tests (tests/test_cost_latency.py); test_analysis.py still green (79 passed).
  • eslint --max-warnings=0 clean; rendered locally — GPT-5.5 $0.12 / 1.1m, Gemini 3.1 Pro $0.085 / 45s, Opus 4.7 $0.29 / 53s.

Data

The columns render empty against the current published artifact (it predates these fields). They populate on the next export / publish-dashboard — the pipeline change makes future regenerations include them automatically.

🤖 Generated with Claude Code

analysis.model_cost_latency() joins per-model cost (USD total and
per-household) and median per-household latency into the modelStats that
build_dashboard_payload emits. Cost reuses usage_summary_by_model; latency is
the median of each household's summed request-time, which is robust to the
occasional rate-limit retry that inflates the mean. config.PRICE_OVERRIDES_PER_1M
fills cost for provider preview models litellm cannot yet price (grok-build-0.1
at $1/$2 per 1M, https://x.ai/api).

The leaderboard renders new "Cost / hh" and "Latency" columns (desktop grid +
mobile line). Verified by regenerating the June run: the new fields populate for
all 13 models (grok-build-0.1 at ~$4.49) with zero change to any accuracy number.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@vercel

vercel Bot commented Jun 25, 2026

Copy link
Copy Markdown

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
policybench-site Ready Ready Preview, Comment Jun 26, 2026 12:19am

Request Review

- Leaderboard: per-household cost now formats to a consistent 3 decimals
  (was 2 significant figures, which varied 2-4 decimals and rounded the
  cheapest models to $0.00).
- Paper: new "Cost and latency" Results section with a per-model table
  (cost/household + median latency next to exact match) and a templated
  summary. Computed from the frozen run's predictions.csv.gz via the same
  model_cost_latency helper, so no snapshot regeneration or re-freeze.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Drop the switch to minutes above 60s; every model's median latency now
reads in seconds (e.g. 135s, not 2.2 min) for a single consistent unit.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@MaxGhenis MaxGhenis merged commit fba3cfc into main Jun 26, 2026
6 checks passed
@MaxGhenis MaxGhenis deleted the feat/leaderboard-cost-latency branch June 26, 2026 00:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant