feat: add average metrics per question to model detail page by kalyangupta12 · Pull Request #27 · appwrite/arena

kalyangupta12 · 2026-03-21T09:30:27Z

Summary

Adds Totals section showing: total tokens, duration, cost, and tool calls
Adds Averages section showing: avg tokens/question, avg duration/question, avg speed, avg cost/question, avg tool calls/question
Tooltips show input/output token breakdown

Related Issue

Addresses enhancement request for more insightful benchmark metrics

Test plan

Visit any model detail page (e.g., /model/grok-4-1-fast)
Verify Totals and Averages sections display correctly
Hover over info icons to see tooltips with breakdowns

Summary by CodeRabbit

New Features
- Redesigned model statistics with separate Totals and Averages sections, vertically laid out for clearer insights.
- Added cards: Total cost, Total tool calls, Avg duration/question, Avg tokens/question (with tooltip showing input/output), Avg cost/question (shows "<$0.0001" when very small), and Avg tool calls/question.
- Tool-call cards show a wrench icon; tooltips use an accessible button trigger.
Bug Fixes
- Stats panels hide when no relevant data is available.

appwrite · 2026-03-21T09:30:32Z

Appwrite Arena

Project ID: appwrite-arena

Sites (1)

Site	Status	Logs	Preview	QR
Arena `arena`	Queued	Authorize	Preview URL

Tip

Messaging handles push notifications, emails, and SMS through one unified API

coderabbitai · 2026-03-21T09:30:40Z

Walkthrough

The ModelStats component now builds two stat groups—totals and averages—instead of a single list. Total tool calls sum questionDetails[].toolCallCount (missing values treated as 0); averages are computed per question using totalQuestions (guarding divide-by-zero). The UI renders vertical, conditional "Totals" and "Averages" sections each with a grid of cards; cards are included only when their metrics meet guard conditions. New metrics: Total cost, Total tool calls, Avg duration/question, Avg tokens/question (tooltip shows averaged input/output tokens), Avg cost/question (shows "<$0.0001" below threshold), and Avg tool calls/question. A shared renderCard helper and a button-based tooltip trigger (with aria-label) were added; Wrench icon used for tool-call cards. The component returns null only if both groups are empty.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title 'feat: add average metrics per question to model detail page' directly and clearly summarizes the main change in the changeset, which introduces average metrics computed per question alongside totals.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (1)

src/routes/model/$modelId.tsx (1)
173-179: Consider using a distinct icon for tool calls.

Both "Total tokens" (line 153) and "Total tool calls" use the Layers icon. This could make it harder for users to visually differentiate between these metrics at a glance. Consider using a different icon for tool calls (e.g., Wrench or Zap from lucide-react).
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/routes/model/`$modelId.tsx around lines 173 - 179, The metric "Total tool
calls" is using the same Layers icon as "Total tokens", making them hard to
distinguish; update the object pushed into totalItems (the block that checks
totalToolCalls > 0 and constructs {icon: Layers, label: "Total tool calls",
...}) to use a distinct icon component (for example Wrench or Zap from
lucide-react) instead of Layers and import that icon where icons are imported so
the UI shows a different glyph for totalToolCalls.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/routes/model/`$modelId.tsx:
- Around line 234-244: The tooltip is only triggered on hover
(group-hover/tip:opacity-100) so keyboard users can't open it; make the trigger
element (the wrapper span that checks item.tooltip and contains the Info icon)
keyboard-focusable by adding tabIndex={0} and add focus/focus-visible utility
classes to mirror the hover behavior (e.g., group-focus/tip:opacity-100 or
group-focus-visible/tip:opacity-100) so the absolutely positioned tooltip span
becomes visible on focus as well; ensure the trigger retains appropriate ARIA
semantics (role="button" or aria-describedby linking to the tooltip) and keep
the tooltip span pointer-events-none and accessible text content (item.tooltip)
unchanged.

---

Nitpick comments:
In `@src/routes/model/`$modelId.tsx:
- Around line 173-179: The metric "Total tool calls" is using the same Layers
icon as "Total tokens", making them hard to distinguish; update the object
pushed into totalItems (the block that checks totalToolCalls > 0 and constructs
{icon: Layers, label: "Total tool calls", ...}) to use a distinct icon component
(for example Wrench or Zap from lucide-react) instead of Layers and import that
icon where icons are imported so the UI shows a different glyph for
totalToolCalls.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: fc23432f-d836-4e73-820f-a9e5d1fe5672

📥 Commits

Reviewing files that changed from the base of the PR and between c7fb59c and 6aec035.

📒 Files selected for processing (1)

src/routes/model/$modelId.tsx

src/routes/model/$modelId.tsx

greptile-apps · 2026-03-21T09:33:58Z

Greptile Summary

This PR enhances the model detail page by splitting the existing flat stats grid into two clearly labelled sections — Totals (total tokens, duration, cost, tool calls) and Averages (avg tokens/question, avg duration/question, avg speed, avg cost/question, avg tool calls/question) — and adds a tooltip on token cards that breaks down input vs. output counts.

Key observations:

Division-by-zero is correctly guarded everywhere (model.totalQuestions > 0 checked before each division).
The optional-chaining fallback on model.questionDetails?.reduce(...) ?? 0 is defensive and safe even though the type declares questionDetails as non-optional.
A spacing inconsistency from the old code ($ ${...} → $${...}) is implicitly fixed.
Both the "Total/Avg tokens" and "Total/Avg tool calls" cards share the Layers icon, making them visually identical — a dedicated icon for tool calls would improve at-a-glance clarity.
The renderCard helper function is defined inline and recreated on every render; extracting it to module scope would be cleaner.

Confidence Score: 4/5

Safe to merge — no logic errors or runtime issues; only minor style suggestions remain.
The arithmetic is correct, all division operations are guarded against zero denominators, and the UI change is purely additive with no regressions to existing behaviour. The only open points are cosmetic: icon reuse between token and tool-call cards, and the inline renderCard helper.
No files require special attention.

Important Files Changed

Filename	Overview
src/routes/model/$modelId.tsx	Adds Totals and Averages stat sections to the model detail page; logic is sound with proper division-by-zero guards and `?? 0` fallbacks, but the same `Layers` icon is reused for both token and tool-call cards, and the `renderCard` helper is defined inline and recreated on every render.

_{Last reviewed commit: "feat: add average me..."}

greptile-apps · 2026-03-21T09:34:02Z

src/routes/model/$modelId.tsx

+	if (totalToolCalls > 0) {
+		totalItems.push({
+			icon: Layers,
+			label: "Total tool calls",
+			value: totalToolCalls.toLocaleString(),
+		});


Same icon used for tokens and tool calls

Both "Total tokens" (line 153) and "Total tool calls" use Layers as their icon. The same duplication exists in the Averages section for "Avg tokens/question" (line 186) and "Avg tool calls/question" (line 218). Since these metrics represent conceptually distinct things, using the same icon makes the two cards visually indistinguishable at a glance and may confuse users scanning the stats section.

Consider using a more semantically appropriate icon from lucide-react for tool calls — for example Wrench or Hammer — to visually differentiate them from token counts:

import { ..., Wrench } from "lucide-react"; // line 174-178 totalItems.push({ icon: Wrench, label: "Total tool calls", value: totalToolCalls.toLocaleString(), });

And similarly at line 217–221:

avgItems.push({ icon: Wrench, label: "Avg tool calls/question", value: avgToolCalls.toFixed(1), });

_{Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!}

greptile-apps · 2026-03-21T09:34:03Z

src/routes/model/$modelId.tsx

+	const renderCard = (item: StatCard) => (
+		<div key={item.label} className="arena-card flex flex-col gap-1.5 p-3">
+			<div className="flex items-center gap-1.5 text-[var(--text-secondary)]">
+				<item.icon size={12} />
+				<span className="text-xs">{item.label}</span>
+			</div>
+			<span className="inline-flex items-center gap-1.5 text-sm font-semibold text-[var(--text-primary)]">
+				{item.value}
+				{item.tooltip && (
+					<span className="group/tip relative">
+						<Info
+							size={12}
+							className="opacity-40 group-hover/tip:opacity-70 transition-opacity cursor-help"
+						/>
+						<span className="pointer-events-none absolute bottom-full left-1/2 z-50 mb-1.5 -translate-x-1/2 whitespace-pre rounded-md bg-[#1e1e22] px-2.5 py-1.5 text-xs font-normal text-[#EDEDF0] opacity-0 shadow-lg ring-1 ring-white/10 transition-opacity group-hover/tip:opacity-100">
+							{item.tooltip}
+						</span>
+					</span>
+				)}
+			</span>
+		</div>
+	);


renderCard redefined on every render

renderCard is an inline function recreated on every render of ModelStats. While harmless today (the component receives a static model prop), defining it outside the component as a standalone module-level function would be a cleaner pattern. The function does not close over any changing state or props, so it can safely be extracted without any behaviour change.

_{Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!}

- Use Wrench icon for tool calls to distinguish from tokens (Layers) - Make tooltip keyboard accessible with button element and focus states - Add aria-label for screen readers

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (1)

src/routes/model/$modelId.tsx (1)
167-172: Reuse the shared cost formatter here.

src/components/QuestionCard.tsx already formats costs via formatCost, but these new cards hardcode a separate toFixed(4)/threshold policy. Pulling the same helper into ModelStats will keep total and per-question cost strings consistent across the detail page.

Also applies to: 215-220
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/routes/model/`$modelId.tsx around lines 167 - 172, The model total and
per-question cost formatting in ModelStats currently uses hardcoded toFixed(4)
logic; import and use the shared formatCost helper (the function used by
QuestionCard) instead of manual toFixed formatting so cost strings are
consistent—replace the `$${model.totalCost.toFixed(4)}` usage in the total cost
card and the similar formatting at the per-question section (lines referenced
around 215-220) with calls to formatCost(model.totalCost) and formatCost(...)
respectively, ensuring you add the import for formatCost at the top of the file.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/routes/model/`$modelId.tsx:
- Around line 137-145: Compute averages using only entries where
QuestionDetail.toolCallCount is present rather than treating missing as zero:
replace the current sum+null-coalescing logic by (a) totalToolCalls = sum of
q.toolCallCount for q where q.toolCallCount != null, (b) toolCallCountKnown =
count of q where q.toolCallCount != null, and (c) avgToolCalls =
toolCallCountKnown > 0 ? totalToolCalls / toolCallCountKnown : undefined (or
null) so you can render known zeros distinctly; update the rendering to check
toolCallCountKnown (or avg != null) before calling toFixed to avoid hiding valid
zero values. Do the same fix for the other average computations referenced
around the blocks using model.questionDetails at avgToolCalls and the similar
calculations at the later occurrences (the blocks you noted at 174-179 and
223-227).

---

Nitpick comments:
In `@src/routes/model/`$modelId.tsx:
- Around line 167-172: The model total and per-question cost formatting in
ModelStats currently uses hardcoded toFixed(4) logic; import and use the shared
formatCost helper (the function used by QuestionCard) instead of manual toFixed
formatting so cost strings are consistent—replace the
`$${model.totalCost.toFixed(4)}` usage in the total cost card and the similar
formatting at the per-question section (lines referenced around 215-220) with
calls to formatCost(model.totalCost) and formatCost(...) respectively, ensuring
you add the import for formatCost at the top of the file.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: c0398844-a937-410e-9c5c-b28836159a38

📥 Commits

Reviewing files that changed from the base of the PR and between 6aec035 and 564eeff.

📒 Files selected for processing (1)

src/routes/model/$modelId.tsx

coderabbitai · 2026-03-26T18:35:01Z

src/routes/model/$modelId.tsx

+	// Compute average tool calls
+	const totalToolCalls =
+		model.questionDetails?.reduce(
+			(sum, q) => sum + (q.toolCallCount ?? 0),
+			0,
+		) ?? 0;
+	const avgToolCalls =
+		model.totalQuestions > 0 ? totalToolCalls / model.totalQuestions : 0;
+


⚠️ Potential issue | 🟡 Minor

Don't hide valid zero tool-call metrics.

QuestionDetail.toolCallCount is optional, and QuestionCard already treats != null as the “data exists” check. Here, ?? 0 plus the > 0 guards makes “no tool-call data” and “measured zero usage” look identical, and toFixed(1) can still show a non-zero average as 0.0. Track presence separately and render zero/small values when the metric is known.

💡 One way to preserve zero values without treating missing data as zero

+ const hasToolCallData = + model.questionDetails?.some((q) => q.toolCallCount != null) ?? false; const totalToolCalls = model.questionDetails?.reduce( (sum, q) => sum + (q.toolCallCount ?? 0), 0, ) ?? 0; const avgToolCalls = - model.totalQuestions > 0 ? totalToolCalls / model.totalQuestions : 0; + hasToolCallData && model.totalQuestions > 0 + ? totalToolCalls / model.totalQuestions + : 0; ... - if (totalToolCalls > 0) { + if (hasToolCallData) { totalItems.push({ icon: Wrench, label: "Total tool calls", value: totalToolCalls.toLocaleString(), }); } ... - if (avgToolCalls > 0) { + if (hasToolCallData && model.totalQuestions > 0) { avgItems.push({ icon: Wrench, label: "Avg tool calls/question", - value: avgToolCalls.toFixed(1), + value: + avgToolCalls === 0 + ? "0.0" + : avgToolCalls < 0.01 + ? "<0.01" + : avgToolCalls < 0.1 + ? avgToolCalls.toFixed(2) + : avgToolCalls.toFixed(1), }); }

Also applies to: 174-179, 223-227

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/routes/model/`$modelId.tsx around lines 137 - 145, Compute averages using only entries where QuestionDetail.toolCallCount is present rather than treating missing as zero: replace the current sum+null-coalescing logic by (a) totalToolCalls = sum of q.toolCallCount for q where q.toolCallCount != null, (b) toolCallCountKnown = count of q where q.toolCallCount != null, and (c) avgToolCalls = toolCallCountKnown > 0 ? totalToolCalls / toolCallCountKnown : undefined (or null) so you can render known zeros distinctly; update the rendering to check toolCallCountKnown (or avg != null) before calling toFixed to avoid hiding valid zero values. Do the same fix for the other average computations referenced around the blocks using model.questionDetails at avgToolCalls and the similar calculations at the later occurrences (the blocks you noted at 174-179 and 223-227).

feat: add average metrics per question to model detail page

6aec035

coderabbitai bot reviewed Mar 21, 2026

View reviewed changes

src/routes/model/$modelId.tsx Show resolved Hide resolved

greptile-apps bot reviewed Mar 21, 2026

View reviewed changes

fix: address PR review feedback

564eeff

- Use Wrench icon for tool calls to distinguish from tokens (Layers) - Make tooltip keyboard accessible with button element and focus states - Add aria-label for screen readers

coderabbitai bot reviewed Mar 26, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add average metrics per question to model detail page#27

feat: add average metrics per question to model detail page#27
kalyangupta12 wants to merge 2 commits intoappwrite:mainfrom
kalyangupta12:feat/add-model-avg-metrics

kalyangupta12 commented Mar 21, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

appwrite bot commented Mar 21, 2026

Uh oh!

coderabbitai bot commented Mar 21, 2026 •

edited

Loading

❌ Failed checks (1 warning)

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

greptile-apps bot commented Mar 21, 2026

Uh oh!

greptile-apps bot Mar 21, 2026

Uh oh!

greptile-apps bot Mar 21, 2026

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Mar 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

kalyangupta12 commented Mar 21, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Related Issue

Test plan

Summary by CodeRabbit

Uh oh!

appwrite bot commented Mar 21, 2026

Appwrite Arena

Uh oh!

coderabbitai bot commented Mar 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Estimated code review effort

❌ Failed checks (1 warning)

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

greptile-apps bot commented Mar 21, 2026

Greptile Summary

Confidence Score: 4/5

Important Files Changed

Uh oh!

greptile-apps bot Mar 21, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Mar 21, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 26, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

kalyangupta12 commented Mar 21, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Mar 21, 2026 •

edited

Loading