feat(webui): display rendered assertion values in Evaluation tab #6145

mldangelo · 2025-11-09T17:00:43Z

Summary

Display rendered assertion values with substituted variables in the Evaluation tab instead of raw templates, improving readability for assertions with Nunjucks variables and loops.

Fixes #5988

Changes

Backend (`src/assertions/index.ts`)

Store rendered assertion value in metadata.renderedAssertionValue when template differs from original
Only stores string values that actually changed after Nunjucks rendering
Preserves backward compatibility (old evals fall back to template)

Frontend (`src/app/src/pages/eval/components/EvaluationPanel.tsx`)

Modified getValue() to prefer rendered value from metadata
Maintains priority: context assertions → rendered values → original template
Graceful fallback for evals without rendered values

Tests (`test/assertions/runAssertion.test.ts`)

Added 5 comprehensive tests covering:
- Simple variable substitution
- Complex loops (matching the exact GitHub issue example)
- Static text handling (no unnecessary metadata)
- JavaScript assertions with variables
- Edge cases

Example

Before (raw template):

Did {{ agent_name }} select the expected tools...
{% for turn in turns %}...{% endfor %}

After (rendered value):

Did Virtual Assistant select the expected tools across the conversation?
Turn 1: Expected tool "preview_create"
Turn 2: Expected tool "preview_create"
Turn 3: Expected tool "execute_create"

Testing

✅ All 670 assertion tests passing
✅ 5 new tests for rendered values
✅ End-to-end verified with database storage
✅ Context assertion tests verified (priority order correct)
✅ No regressions

Verification

Ran npm run l && npm run f (no issues)
All tests pass
Backward compatible (no migration needed)

Display rendered assertion values with substituted variables in the Evaluation tab instead of raw templates, improving readability for assertions with Nunjucks variables and loops. - Backend: Store rendered assertion value in metadata after template substitution when value differs from original - Frontend: Prefer rendered value from metadata over raw template when displaying in EvaluationPanel - Tests: Add comprehensive tests for variable substitution, loops, and edge cases Fixes #5988

use-tusk · 2025-11-09T17:00:54Z

⏩ No test scenarios generated (1de6826) View output ↗

Symbols Not Exported

The following symbols were skipped because they are not exported and therefore cannot be tested. Export these symbols if you want them to be tested:

src/app/src/pages/eval/components/EvaluationPanel.tsx - (getValue)

View check history

Commit	Status	Output	Created (UTC)
`74856d9`	⏩ No test scenarios generated	Output	Nov 9, 2025 5:00PM
`f3e8a92`	⏩ No test scenarios generated	Output	Nov 10, 2025 1:08AM
`2bb817a`	⏩ No test scenarios generated	Output	Nov 10, 2025 1:13AM
`76f3c1f`	⏩ No test scenarios generated	Output	Nov 10, 2025 7:32AM
`9192cf2`	⏩ No test scenarios generated	Output	Nov 10, 2025 7:38AM
`b34da1b`	⏩ No test scenarios generated	Output	Nov 10, 2025 7:52AM
`1de6826`	⏩ No test scenarios generated	Output	Nov 10, 2025 8:01AM

View output in GitHub ↗

coderabbitai · 2025-11-09T17:04:39Z

Note

`.coderabbit.yaml` has unrecognized properties

CodeRabbit is using all valid settings from your configuration. Unrecognized properties (listed below) have been ignored and may indicate typos or deprecated fields that can be removed.

⚠️ Parsing warnings (1)

Validation error: Unrecognized key(s) in object: 'ignore_draft_pr', 'focus_on_security_issues'

⚙️ Configuration instructions

Please see the configuration documentation for more information.
You can also validate your configuration using the online YAML validator.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

📝 Walkthrough

Walkthrough

This pull request implements a feature to display rendered assertion values with substituted variables in the Evaluation tab of the UI. The implementation captures rendered assertion values during assertion execution and stores them in metadata, then reads and displays these values in the EvaluationPanel component. The logic includes fallback mechanisms to use original assertion values when rendered values are unavailable. Comprehensive test coverage validates variable substitution across various templating scenarios including Nunjucks loops, static templates, and JavaScript-based assertions.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

src/assertions/index.ts: Verify the conditional logic for storing renderedAssertionValue to ensure it only captures when the rendered value is a string and differs from the original assertion value; check that this metadata enrichment doesn't inadvertently affect control flow or assertion results.
src/app/src/pages/eval/components/EvaluationPanel.tsx: Validate the fallback logic in getValue that prioritizes metadata.renderedAssertionValue over raw values; ensure the context-related assertion handling correctly reads from metadata.context.
test/assertions/runAssertion.test.ts: Review the "Rendered assertion values" test suite to confirm it covers edge cases where rendering produces identical output to the original value (which should not be stored per the implementation logic).

Pre-merge checks and finishing touches

✅ Passed checks (5 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title accurately describes the main feature: displaying rendered assertion values in the Evaluation tab UI, which aligns with the primary change across backend and frontend modifications.
Description check	✅ Passed	The description comprehensively relates to the changeset, detailing backend metadata storage, frontend rendering preference, comprehensive tests, and backward compatibility—all present in the actual code changes.
Linked Issues check	✅ Passed	The PR fully addresses issue #5988: stores rendered values in backend metadata, displays them in frontend UI with proper fallback, includes tests validating substitution of variables and loops, and maintains backward compatibility.
Out of Scope Changes check	✅ Passed	All changes are in-scope: backend stores rendered values, frontend displays them, changelog documents the feature, and tests validate the implementation—all aligned with the feature request objective.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch feature/render-assertion-values-in-ui

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 3

🧹 Nitpick comments (1)

src/app/src/pages/eval/components/EvaluationPanel.tsx (1)
30-30: Minor: Comment wording could be more precise.

The comment states "should always use metadata.context," but the code only uses it when available (line 36 checks existence). Consider rephrasing to reflect the conditional nature of this preference.

Apply this diff:
-  // These assertions require special handling and should always use metadata.context
+  // These assertions prefer metadata.context when available for special handling

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between dd51cb0 and 74856d9.

📒 Files selected for processing (4)

CHANGELOG.md (1 hunks)
src/app/src/pages/eval/components/EvaluationPanel.tsx (2 hunks)
src/assertions/index.ts (1 hunks)
test/assertions/runAssertion.test.ts (1 hunks)

🧰 Additional context used

📓 Path-based instructions (14)

src/app/src/**/*.{ts,tsx}

📄 CodeRabbit inference engine (src/app/CLAUDE.md)

src/app/src/**/*.{ts,tsx}: Never use fetch() directly; always use callApi() from @app/utils/api for all HTTP requests
Access Zustand state outside React components via store.getState(); do not call hooks outside components
Use the @app/* path alias for internal imports as configured in Vite

Files:

src/app/src/pages/eval/components/EvaluationPanel.tsx

src/app/src/{components,pages}/**/*.tsx

📄 CodeRabbit inference engine (src/app/CLAUDE.md)

src/app/src/{components,pages}/**/*.tsx: Use the class-based ErrorBoundary component (@app/components/ErrorBoundary) to wrap error-prone UI
Access theme via useTheme() from @mui/material/styles instead of hardcoding theme values
Use useMemo/useCallback only when profiling indicates benefit; avoid unnecessary memoization
Implement explicit loading and error states for components performing async operations
Prefer MUI composition and the sx prop for styling over ad-hoc inline styles

Files:

src/app/src/pages/eval/components/EvaluationPanel.tsx

**/*.{tsx,jsx}

📄 CodeRabbit inference engine (.cursor/rules/react-components.mdc)

**/*.{tsx,jsx}: Use icons from @mui/icons-material
Prefer commonly used icons from @mui/icons-material for intuitive experience

Files:

src/app/src/pages/eval/components/EvaluationPanel.tsx

**/*.{ts,tsx}

📄 CodeRabbit inference engine (.cursor/rules/gh-cli-workflow.mdc)

Prefer not to introduce new TypeScript types; reuse existing interfaces where possible

**/*.{ts,tsx}: Maintain consistent import order (Biome handles sorting)
Use consistent curly braces for all control statements
Prefer const over let and avoid var
Use object shorthand syntax when possible
Use async/await for asynchronous code
Use consistent error handling with proper type checks

**/*.{ts,tsx}: Use TypeScript with strict type checking enabled
Follow consistent import order (Biome will sort imports)
Use consistent curly braces for all control statements
Prefer const over let; avoid var
Use object property shorthand when possible
Use async/await for asynchronous code instead of raw promises/callbacks
When logging, pass sensitive data via the logger context object so it is auto-sanitized; avoid interpolating secrets into message strings
Manually sanitize sensitive objects with sanitizeObject before storing or emitting outside logging contexts

Files:

src/app/src/pages/eval/components/EvaluationPanel.tsx
test/assertions/runAssertion.test.ts
src/assertions/index.ts

src/**

📄 CodeRabbit inference engine (AGENTS.md)

Place core application/library logic under src/

Files:

src/app/src/pages/eval/components/EvaluationPanel.tsx
src/assertions/index.ts

src/app/**/*.{ts,tsx,js,jsx}

📄 CodeRabbit inference engine (CLAUDE.md)

In React app code under src/app, use callApi from @app/utils/api for all API requests; do not call fetch() directly

Files:

src/app/src/pages/eval/components/EvaluationPanel.tsx

src/app/**/*.{ts,tsx}

📄 CodeRabbit inference engine (CLAUDE.md)

React hooks: use useMemo for computed values (non-callables) and useCallback for stable function references (callables)

Files:

src/app/src/pages/eval/components/EvaluationPanel.tsx

CHANGELOG.md

📄 CodeRabbit inference engine (.cursor/rules/changelog.mdc)

CHANGELOG.md: Document all user-facing changes in CHANGELOG.md
Every pull request must add or update an entry in CHANGELOG.md under the [Unreleased] section
Follow Keep a Changelog structure under [Unreleased] with sections: Added, Changed, Fixed, Dependencies, Documentation, Tests, Removed
Each changelog entry must include the PR number formatted as (#1234) or temporary placeholder (#XXXX)
Each changelog entry must use a Conventional Commit prefix: feat:, fix:, chore:, docs:, test:, or refactor:
Each changelog entry must be concise and on a single line
Each changelog entry must be user-focused, describing what changed and why it matters to users
Each changelog entry must include a scope in parentheses, e.g., feat(providers): or fix(evaluator):
Use common scopes for consistency: providers, evaluator, webui or app, cli, redteam, core, assertions, config, database
Place all dependency updates under the Dependencies category
Place all test changes under the Tests category
Use categories consistently: Added for new features, Changed for modifications/refactors/CI, Fixed for bug fixes, Removed for removed features
After a PR number is assigned, replace (#XXXX) placeholders with the actual PR number
Be specific, use active voice, include context, and avoid repeating the PR title in changelog entries
Group related changes with multiple bullets in the same category when needed; use one entry per logical change

CHANGELOG.md: All user-facing changes require a CHANGELOG.md entry before creating a PR
Add entries under [Unreleased] in appropriate category (Added, Changed, Fixed, Dependencies, Documentation, Tests)
Each changelog entry must include PR number (#1234) or placeholder (#XXXX)
Use conventional commit prefixes in changelog entries (feat:, fix:, chore:, docs:, test:, refactor:)

CHANGELOG.md: Document all user-facing changes in CHANGELOG.md
Changelog entries must include the PR number in format (#1234)
Use conventional commit prefixes in changelog entries: feat:,...

Files:

CHANGELOG.md

test/**/*.{test,spec}.ts

📄 CodeRabbit inference engine (.cursor/rules/jest.mdc)

test/**/*.{test,spec}.ts: Mock as few functions as possible to keep tests realistic
Never increase the function timeout - fix the test instead
Organize tests in descriptive describe and it blocks
Prefer assertions on entire objects rather than individual keys when writing expectations
Clean up after tests to prevent side effects (e.g., use afterEach(() => { jest.resetAllMocks(); }))
Run tests with --randomize flag to ensure your mocks setup and teardown don't affect other tests
Use Jest's mocking utilities rather than complex custom mocks
Prefer shallow mocking over deep mocking
Mock external dependencies but not the code being tested
Reset mocks between tests to prevent test pollution
For database tests, use in-memory instances or proper test fixtures
Test both success and error cases for each provider
Mock API responses to avoid external dependencies in tests
Validate that provider options are properly passed to the underlying service
Test error handling and edge cases (rate limits, timeouts, etc.)
Ensure provider caching behaves as expected
Always include both --coverage and --randomize flags when running tests
Run tests in a single pass (no watch mode for CI)
Ensure all tests are independent and can run in any order
Clean up any test data or mocks after each test

Files:

test/assertions/runAssertion.test.ts

test/**/*.test.ts

📄 CodeRabbit inference engine (test/CLAUDE.md)

test/**/*.test.ts: Never increase Jest test timeouts; fix slow tests instead (avoid jest.setTimeout or large timeouts in tests)
Do not use .only() or .skip() in committed tests
Add afterEach(() => { jest.resetAllMocks(); }) to ensure mock cleanup
Prefer asserting entire objects (toEqual on whole result) rather than individual fields
Mock minimally: only external dependencies (APIs, databases), not code under test
Use Jest (not Vitest) APIs in this suite; avoid importing vitest
Import from @jest/globals in tests

Files:

test/assertions/runAssertion.test.ts

test/**

📄 CodeRabbit inference engine (test/CLAUDE.md)

Organize tests to mirror src/ structure (e.g., test/providers → src/providers, test/redteam → src/redteam)

Place tests under test/

Files:

test/assertions/runAssertion.test.ts

**/*.{test,spec}.{js,jsx,ts,tsx}

📄 CodeRabbit inference engine (.cursor/rules/gh-cli-workflow.mdc)

Avoid disabling or skipping tests (e.g., .skip, .only) unless absolutely necessary and documented

Files:

test/assertions/runAssertion.test.ts

test/**/*.{test,spec}.{ts,tsx}

📄 CodeRabbit inference engine (AGENTS.md)

test/**/*.{test,spec}.{ts,tsx}: Follow Jest best practices using describe/it blocks
Test both success and error cases for all functionality

Files:

test/assertions/runAssertion.test.ts

test/**/*.{ts,tsx}

📄 CodeRabbit inference engine (CLAUDE.md)

Follow Jest best practices with describe/it blocks in tests

Files:

test/assertions/runAssertion.test.ts

🧠 Learnings (16)

📚 Learning: 2025-10-24T22:41:44.088Z

Learnt from: CR
Repo: promptfoo/promptfoo PR: 0
File: .cursor/rules/gh-cli-workflow.mdc:0-0
Timestamp: 2025-10-24T22:41:44.088Z
Learning: Applies to CHANGELOG.md : Add entries under [Unreleased] in appropriate category (Added, Changed, Fixed, Dependencies, Documentation, Tests)

Applied to files:

CHANGELOG.md

📚 Learning: 2025-10-24T22:42:38.674Z

Learnt from: CR
Repo: promptfoo/promptfoo PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-10-24T22:42:38.674Z
Learning: Applies to CHANGELOG.md : Add new entries under the 'Unreleased' section

Applied to files:

CHANGELOG.md

📚 Learning: 2025-10-24T22:41:09.485Z

Learnt from: CR
Repo: promptfoo/promptfoo PR: 0
File: .cursor/rules/changelog.mdc:0-0
Timestamp: 2025-10-24T22:41:09.485Z
Learning: Applies to CHANGELOG.md : Follow Keep a Changelog structure under [Unreleased] with sections: Added, Changed, Fixed, Dependencies, Documentation, Tests, Removed

Applied to files:

CHANGELOG.md

📚 Learning: 2025-10-24T22:42:38.674Z

Learnt from: CR
Repo: promptfoo/promptfoo PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-10-24T22:42:38.674Z
Learning: Applies to CHANGELOG.md : Document all user-facing changes in CHANGELOG.md

Applied to files:

CHANGELOG.md

📚 Learning: 2025-10-27T08:53:44.103Z

Learnt from: CR
Repo: promptfoo/promptfoo PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-10-27T08:53:44.103Z
Learning: Applies to CHANGELOG.md : Use standardized scopes: providers, webui, cli, assertions, api, config, deps, docs, tests, examples, redteam, site

Applied to files:

CHANGELOG.md

📚 Learning: 2025-10-24T22:42:38.674Z

Learnt from: CR
Repo: promptfoo/promptfoo PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-10-24T22:42:38.674Z
Learning: Applies to CHANGELOG.md : Changelog entries should be user-focused: describe what changed, not how

Applied to files:

CHANGELOG.md

📚 Learning: 2025-07-18T17:26:23.087Z

Learnt from: CR
Repo: promptfoo/promptfoo PR: 0
File: .cursor/rules/jest.mdc:0-0
Timestamp: 2025-07-18T17:26:23.087Z
Learning: Applies to test/**/*.{test,spec}.ts : Prefer assertions on entire objects rather than individual keys when writing expectations

Applied to files:

test/assertions/runAssertion.test.ts

📚 Learning: 2025-10-05T17:00:56.424Z

Learnt from: CR
Repo: promptfoo/promptfoo PR: 0
File: test/CLAUDE.md:0-0
Timestamp: 2025-10-05T17:00:56.424Z
Learning: Applies to test/**/*.test.ts : Prefer asserting entire objects (toEqual on whole result) rather than individual fields

Applied to files:

test/assertions/runAssertion.test.ts

📚 Learning: 2025-10-24T22:42:38.674Z

Learnt from: CR
Repo: promptfoo/promptfoo PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-10-24T22:42:38.674Z
Learning: Applies to test/**/*.{test,spec}.{ts,tsx} : Test both success and error cases for all functionality

Applied to files:

test/assertions/runAssertion.test.ts

📚 Learning: 2025-10-05T16:59:20.507Z

Learnt from: CR
Repo: promptfoo/promptfoo PR: 0
File: src/redteam/CLAUDE.md:0-0
Timestamp: 2025-10-05T16:59:20.507Z
Learning: Applies to src/redteam/test/redteam/**/*.ts : Add tests for new plugins under test/redteam/

Applied to files:

test/assertions/runAssertion.test.ts

📚 Learning: 2025-07-18T17:26:23.087Z

Learnt from: CR
Repo: promptfoo/promptfoo PR: 0
File: .cursor/rules/jest.mdc:0-0
Timestamp: 2025-07-18T17:26:23.087Z
Learning: Applies to test/**/*.{test,spec}.ts : Test error handling and edge cases (rate limits, timeouts, etc.)

Applied to files:

test/assertions/runAssertion.test.ts

📚 Learning: 2025-07-18T17:26:23.087Z

Learnt from: CR
Repo: promptfoo/promptfoo PR: 0
File: .cursor/rules/jest.mdc:0-0
Timestamp: 2025-07-18T17:26:23.087Z
Learning: Applies to test/**/*.{test,spec}.ts : Ensure provider caching behaves as expected

Applied to files:

test/assertions/runAssertion.test.ts

📚 Learning: 2025-07-18T17:26:23.087Z

Learnt from: CR
Repo: promptfoo/promptfoo PR: 0
File: .cursor/rules/jest.mdc:0-0
Timestamp: 2025-07-18T17:26:23.087Z
Learning: Applies to test/**/*.{test,spec}.ts : Ensure all tests are independent and can run in any order

Applied to files:

test/assertions/runAssertion.test.ts

📚 Learning: 2025-07-18T17:26:23.087Z

Learnt from: CR
Repo: promptfoo/promptfoo PR: 0
File: .cursor/rules/jest.mdc:0-0
Timestamp: 2025-07-18T17:26:23.087Z
Learning: Applies to test/**/*.{test,spec}.ts : Test both success and error cases for each provider

Applied to files:

test/assertions/runAssertion.test.ts

📚 Learning: 2025-10-05T16:58:47.598Z

Learnt from: CR
Repo: promptfoo/promptfoo PR: 0
File: src/providers/CLAUDE.md:0-0
Timestamp: 2025-10-05T16:58:47.598Z
Learning: Applies to src/providers/test/providers/**/*.test.{ts,tsx,js,jsx} : Provider tests must cover both success and error cases

Applied to files:

test/assertions/runAssertion.test.ts

📚 Learning: 2025-10-05T17:00:56.424Z

Learnt from: CR
Repo: promptfoo/promptfoo PR: 0
File: test/CLAUDE.md:0-0
Timestamp: 2025-10-05T17:00:56.424Z
Learning: Applies to test/providers/**/*.test.ts : Provider tests must cover: success case, error cases (4xx, 5xx, rate limits), configuration validation, and token usage tracking

Applied to files:

test/assertions/runAssertion.test.ts

🧬 Code graph analysis (1)

test/assertions/runAssertion.test.ts (2)

src/types/index.ts (3)

Assertion (560-560)

AtomicTestCase (749-749)

GradingResult (372-407)

src/assertions/index.ts (1)

runAssertion (211-413)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (18)

GitHub Check: Tusk Tester
GitHub Check: Redteam (Production API)
GitHub Check: Test on Node 24.x and windows-latest
GitHub Check: Build Docs
GitHub Check: Test on Node 24.x and ubuntu-latest
GitHub Check: Test on Node 20.x and macOS-latest
GitHub Check: Test on Node 22.x and macOS-latest
GitHub Check: Share Test
GitHub Check: Test on Node 22.x and ubuntu-latest
GitHub Check: Test on Node 22.x and windows-latest
GitHub Check: Redteam (Staging API)
GitHub Check: Test on Node 20.x and ubuntu-latest
GitHub Check: Test on Node 20.x and windows-latest
GitHub Check: webui tests
GitHub Check: Build on Node 24.x
GitHub Check: Build on Node 20.x
GitHub Check: Build on Node 22.x
GitHub Check: Analyze (javascript-typescript)

🔇 Additional comments (2)

src/app/src/pages/eval/components/EvaluationPanel.tsx (1)

41-45: LGTM! Rendered value handling is solid.

The implementation correctly prioritizes metadata.renderedAssertionValue and handles edge cases appropriately:

The !== undefined check allows other falsy values (0, false, empty string) to be displayed correctly.

Object handling via JSON.stringify provides defensive coverage even though the backend stores only strings.

String conversion for non-objects is appropriate and consistent with the fallback logic below.

src/assertions/index.ts (1)

390-399: Code snippet in review appears outdated—verify current state of metadata property naming.

I found a critical discrepancy: The review snippet shows result.metadata.renderedAssertionValue being set at line 398, but grep searches across the codebase show the property is actually stored as result.metadata.n. Tests also confirm checking result.metadata?.n.

Confirmed findings:

Array values ARE processed with Nunjucks rendering at lines 340-351; the result remains an array

The metadata storage logic correctly excludes arrays via the typeof renderedValue === 'string' check

Handlers like xml.ts and similar.ts do accept and process array renderedValues

The concern about arrays being excluded is valid design (intentional type narrowing)

However, the property name mismatch (renderedAssertionValue vs .n) suggests the review comment may be based on a different code version. The current implementation stores rendered values in metadata.n, not metadata.renderedAssertionValue.

CHANGELOG.md

coderabbitai · 2025-11-09T17:04:42Z

CHANGELOG.md

+### Added
+
+- feat(webui): display rendered assertion values with substituted variables in Evaluation tab instead of raw templates, improving readability for assertions with Nunjucks variables and loops (#5988)
+


🛠️ Refactor suggestion | 🟠 Major

Add a Tests section for the new coverage introduced in this PR

Tests must be recorded under a “Tests” category in Unreleased.

Apply this diff to add a Tests section:

+ + ### Tests + + - test(assertions): add coverage for rendered assertion value metadata (variables, loops, static) and UI fallback priority (#6145)

As per coding guidelines.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

### Tests

- test(assertions): add coverage for rendered assertion value metadata (variables, loops, static) and UI fallback priority (#6145)

🤖 Prompt for AI Agents

In CHANGELOG.md around line 12, the Unreleased section lacks a "Tests" subsection as required by our guidelines; add a "Tests" heading under Unreleased and list the new test additions or coverage changes introduced by this PR (brief bullet or single-line entries), ensuring formatting matches existing changelog style and placement conventions.

test/assertions/runAssertion.test.ts

Copilot

Pull Request Overview

This PR adds functionality to display rendered assertion values with substituted variables in the web UI's Evaluation tab, improving readability by showing the actual values instead of raw Nunjucks templates.

Key Changes

Backend stores rendered assertion values in metadata when they differ from templates
UI preferentially displays rendered values over raw template strings
Comprehensive test coverage for various template scenarios (simple variables, loops, complex templates, JavaScript assertions)

Reviewed Changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated no comments.

File	Description
`src/assertions/index.ts`	Adds logic to store `renderedAssertionValue` in metadata when templates are rendered and differ from original values
`src/app/src/pages/eval/components/EvaluationPanel.tsx`	Updates `getValue()` to prefer `renderedAssertionValue` from metadata over raw template values, with proper handling for objects and primitives
`test/assertions/runAssertion.test.ts`	Adds comprehensive test suite covering variable substitution, loops, static text, complex templates, and JavaScript assertions
`CHANGELOG.md`	Documents the new feature in the Unreleased section

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@ts-ignore

…data fields Improve type safety by adding explicit TypeScript definitions for commonly-used metadata fields instead of relying solely on index signature. - Add renderedAssertionValue: string for rendered assertion values with substituted variables - Add context: string | string[] for context-related assertions - Add contextUnits: string[] for context unit tracking - Preserve index signature [key: string]: any for extensibility This eliminates the need for @ts-ignore suppressions and provides better autocomplete and type checking when working with assertion metadata. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

Fix TypeScript compilation error by properly handling metadata.context which can be string | string[]. The getValue() function now explicitly returns string type and converts array contexts to newline-separated strings. Also addresses code review feedback: - Add Tests section to CHANGELOG.md as required by coding guidelines - Add test for failure case (rendered values captured even when assertion fails) - Add test for metadata preservation (existing metadata not overwritten) Fixes build errors: - src/app/src/pages/eval/components/EvaluationPanel.tsx(101,48): TS2345 - src/app/src/pages/eval/components/EvaluationPanel.tsx(121,76): TS2345 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

Copilot

Pull Request Overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated 2 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2025-11-10T01:13:14Z

src/assertions/index.ts

+    if (
+      renderedValue !== undefined &&
+      renderedValue !== assertion.value &&
+      typeof renderedValue === 'string'
+    ) {
+      result.metadata = result.metadata || {};
+      result.metadata.renderedAssertionValue = renderedValue;
+    }


The condition renderedValue !== assertion.value may not work correctly when assertion.value is an array. Looking at the code flow (lines 340-351), renderedValue can be assigned an array where each element is rendered using nunjucks. This means if assertion.value is an array type, the comparison will always be true (array reference comparison), even if the content is identical after rendering. Consider handling array types explicitly or using deep equality comparison for arrays.

Copilot · 2025-11-10T01:13:14Z

src/app/src/pages/eval/components/EvaluationPanel.tsx

+    const value = result.metadata.renderedAssertionValue;
+    return typeof value === 'object' ? JSON.stringify(value, null, 2) : String(value);


According to the type definition in src/types/index.ts line 409, renderedAssertionValue is typed as string, but this code checks if it's an object and stringifies it. This inconsistency suggests either the type definition is incorrect or this check is unnecessary. Based on the implementation in src/assertions/index.ts lines 392-399, only string values are stored in renderedAssertionValue, making the object check unreachable code.

Suggested change

const value = result.metadata.renderedAssertionValue;

return typeof value === 'object' ? JSON.stringify(value, null, 2) : String(value);

return result.metadata.renderedAssertionValue;

…Value handling The renderedAssertionValue is always a string type according to the type definition and backend implementation, so the typeof check for object was unreachable code. Simplified to directly return the string value. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

…ion-values-in-ui

coderabbitai bot reviewed Nov 9, 2025

View reviewed changes

mldangelo requested a review from Copilot November 10, 2025 01:01

Merge main into feature/render-assertion-values-in-ui

2657a64

Copilot AI reviewed Nov 10, 2025

View reviewed changes

mldangelo requested a review from Copilot November 10, 2025 01:10

Copilot AI reviewed Nov 10, 2025

View reviewed changes

mldangelo and others added 4 commits November 10, 2025 02:32

chore: merge main into feature/render-assertion-values-in-ui

76f3c1f

chore: merge latest main into feature/render-assertion-values-in-ui

b34da1b

Merge remote-tracking branch 'origin/main' into feature/render-assert…

1de6826

…ion-values-in-ui

mldangelo merged commit 9f3a924 into main Nov 10, 2025
40 checks passed

mldangelo deleted the feature/render-assertion-values-in-ui branch November 10, 2025 08:08

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

feat(webui): display rendered assertion values in Evaluation tab #6145

feat(webui): display rendered assertion values in Evaluation tab #6145

mldangelo commented Nov 9, 2025

Uh oh!

use-tusk bot commented Nov 9, 2025 •

edited

Loading

Uh oh!

coderabbitai bot commented Nov 9, 2025

`.coderabbit.yaml` has unrecognized properties

Walkthrough

Estimated code review effort

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

coderabbitai bot Nov 9, 2025

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Nov 10, 2025

Uh oh!

Copilot AI Nov 10, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		### Added

		- feat(webui): display rendered assertion values with substituted variables in Evaluation tab instead of raw templates, improving readability for assertions with Nunjucks variables and loops (#5988)



	### Tests

	- test(assertions): add coverage for rendered assertion value metadata (variables, loops, static) and UI fallback priority (#6145)

		const value = result.metadata.renderedAssertionValue;
		return typeof value === 'object' ? JSON.stringify(value, null, 2) : String(value);

	const value = result.metadata.renderedAssertionValue;
	return typeof value === 'object' ? JSON.stringify(value, null, 2) : String(value);
	return result.metadata.renderedAssertionValue;

Uh oh!

feat(webui): display rendered assertion values in Evaluation tab #6145

feat(webui): display rendered assertion values in Evaluation tab #6145

Conversation

mldangelo commented Nov 9, 2025

Summary

Changes

Backend (src/assertions/index.ts)

Frontend (src/app/src/pages/eval/components/EvaluationPanel.tsx)

Tests (test/assertions/runAssertion.test.ts)

Example

Testing

Verification

Uh oh!

use-tusk bot commented Nov 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

coderabbitai bot commented Nov 9, 2025

.coderabbit.yaml has unrecognized properties

Walkthrough

Estimated code review effort

Pre-merge checks and finishing touches

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot Nov 9, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Key Changes

Reviewed Changes

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Uh oh!

Copilot AI Nov 10, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 10, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Backend (`src/assertions/index.ts`)

Frontend (`src/app/src/pages/eval/components/EvaluationPanel.tsx`)

Tests (`test/assertions/runAssertion.test.ts`)

use-tusk bot commented Nov 9, 2025 •

edited

Loading

`.coderabbit.yaml` has unrecognized properties