Skip to content

feat(pydantic-ai): add model fallback checks#98

Open
constantinius wants to merge 3 commits intomainfrom
constantinius/feat/pydantic-ai/model-fallback
Open

feat(pydantic-ai): add model fallback checks#98
constantinius wants to merge 3 commits intomainfrom
constantinius/feat/pydantic-ai/model-fallback

Conversation

@constantinius
Copy link
Collaborator

Summary
Add support for object-style option values in framework config.json with per-value config overrides
Option values can now be a plain string (existing behavior) or { "value": "...", "overrides": {...} } to override modelOverrides, skip, or toolNameMapping for that specific option combination
Overrides are deep-merged into the framework config during test matrix expansion
Example

{
  "options": {
    "modelSetup": [
      "single",
      {
        "value": "fallback",
        "overrides": {
          "modelOverrides": { "request": "some-other-model" }
        }
      }
    ]
  }
}
The "single" variant uses the framework's default config. The "fallback" variant overrides modelOverrides.request for all checks in that test run.

Motivation
Some framework variants need different validation expectations (e.g., a different expected model name, different checks to skip, or different tool name mappings). Previously this required duplicating the entire framework folder. With option overrides, a single config.json can express variant-specific config inline.
@linear-code
Copy link

linear-code bot commented Mar 12, 2026

@constantinius constantinius requested a review from a team March 12, 2026 13:04
Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 2 potential issues.

Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

...framework.modelOverrides,
...optionOverrides.modelOverrides,
}
: framework.modelOverrides,
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shallow merge of skip overrides loses framework config

Medium Severity

The ...optionOverrides spread at line 668 shallow-merges skip (and toolNameMapping) into the framework config, while modelOverrides is explicitly deep-merged. If an option override provides skip: { checks: {...} }, it completely replaces the framework-level skip, losing any skip.tests entries. The same inconsistency exists in getOptionCombinations where skip from multiple option values across different keys overwrites rather than merges. The type system explicitly allows skip as an override, making this a trap for future usage.

Additional Locations (1)
Fix in Cursor Fix in Web

overrides: Partial<
Pick<FrameworkConfig, "modelOverrides" | "skip" | "toolNameMapping">
>;
};
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Duplicate OptionValue type defined in two files

Low Severity

The OptionValue type is defined identically in both src/types.ts and src/runner/framework-config.ts, each referencing their own local FrameworkConfig. Neither definition is imported elsewhere — both are only consumed locally. This duplication increases maintenance burden: if one definition is updated, the other could easily fall out of sync.

Additional Locations (1)
Fix in Cursor Fix in Web

@github-actions
Copy link

github-actions bot commented Mar 12, 2026

🔴 AI SDK Integration Test Results

Status: 1 regression detected

Summary

Metric main PR Change
Total Tests 530 536 +6
Passed 229 232 +3 ✅
Failed 301 304 +3 ⚠️

🔴 Regressions

These tests were passing on main but are now failing:

browser/openai :: Multi-Turn LLM Test (blocking)

Error: Browser test timed out (60s)

Browser test timed out (60s)

✅ Fixed

These tests were failing on main but are now passing:

  • cloudflare/anthropic :: Long Input LLM Test (blocking)
  • cloudflare/openai :: Long Input LLM Test (blocking)

🆕 New Tests

Passing (4):

  • python/pydantic-ai :: Basic Agent Test (async, single)
  • python/pydantic-ai :: Basic Agent Test (async, fallback)
  • python/pydantic-ai :: Conversation ID Agent Test (async, single)
  • python/pydantic-ai :: Conversation ID Agent Test (async, fallback)

Failing (8):

python/pydantic-ai :: Tool Call Agent Test (async, single)

Error: 3 check(s) failed:

3 check(s) failed:
Attribute validation failed:
  Span b1e24091: Attribute 'gen_ai.tool.description' must exist but is missing
  Span 846d42e1: Attribute 'gen_ai.tool.description' must exist but is missing
Tool call "add" should have argument "a"
Tool call "add" should have argument "b"
Tool call "multiply" should have argument "a"
Tool call "multiply" should have argument "b"
Tool "add" should have description "Add two numbers together" but has "undefined"
Tool "multiply" should have description "Multiply two numbers together" but has "undefined"
python/pydantic-ai :: Tool Call Agent Test (async, fallback)

Error: 3 check(s) failed:

3 check(s) failed:
Attribute validation failed:
  Span ada4875b: Attribute 'gen_ai.tool.description' must exist but is missing
  Span 9844b76d: Attribute 'gen_ai.tool.description' must exist but is missing
Tool call "add" should have argument "a"
Tool call "add" should have argument "b"
Tool call "multiply" should have argument "a"
Tool call "multiply" should have argument "b"
Tool "add" should have description "Add two numbers together" but has "undefined"
Tool "multiply" should have description "Multiply two numbers together" but has "undefined"
python/pydantic-ai :: Tool Error Agent Test (async, single)

Error: 2 check(s) failed:

2 check(s) failed:
Attribute validation failed:
  Span a6eed90f: Attribute 'gen_ai.tool.description' must exist but is missing
Tool call "read_file" should have argument "path"
python/pydantic-ai :: Tool Error Agent Test (async, fallback)

Error: 2 check(s) failed:

2 check(s) failed:
Attribute validation failed:
  Span 9eb65752: Attribute 'gen_ai.tool.description' must exist but is missing
Tool call "read_file" should have argument "path"
python/pydantic-ai :: Vision Agent Test (async, single)

Error: 1 check(s) failed:

1 check(s) failed:
Messages should not contain raw base64 data (should be redacted)
Messages should contain '[Blob substitute]' marker indicating binary content was redacted
python/pydantic-ai :: Vision Agent Test (async, fallback)

Error: 1 check(s) failed:

1 check(s) failed:
Messages should not contain raw base64 data (should be redacted)
Messages should contain '[Blob substitute]' marker indicating binary content was redacted
python/pydantic-ai :: Long Input Agent Test (async, single)

Error: 1 check(s) failed:

1 check(s) failed:
Message should be trimmed (length 25667 > 20000)
Message should be trimmed (length 25667 > 20000)
python/pydantic-ai :: Long Input Agent Test (async, fallback)

Error: 1 check(s) failed:

1 check(s) failed:
Message should be trimmed (length 25667 > 20000)
Message should be trimmed (length 25667 > 20000)

🗑️ Removed Tests

These tests existed on main but are not in the PR:

  • python/pydantic-ai :: Basic Agent Test (async)
  • python/pydantic-ai :: Tool Call Agent Test (async)
  • python/pydantic-ai :: Tool Error Agent Test (async)
  • python/pydantic-ai :: Vision Agent Test (async)
  • python/pydantic-ai :: Long Input Agent Test (async)
  • python/pydantic-ai :: Conversation ID Agent Test (async)

Test Matrix

Agent Tests

SDK Basic Agent Test Conversation ID Agent Test Long Input Agent Test Tool Call Agent Test Tool Error Agent Test Vision Agent Test
browser/langgraph blk, combinedblk, compiledblk, custom-stateblk, graphblk, langchainstr, combinedstr, compiledstr, custom-statestr, graphstr, langchain blk, combinedblk, compiledblk, custom-stateblk, graphblk, langchainstr, combinedstr, compiledstr, custom-statestr, graphstr, langchain blk, combinedblk, compiledblk, custom-stateblk, graphblk, langchainstr, combinedstr, compiledstr, custom-statestr, graphstr, langchain blk, combinedblk, compiledblk, custom-stateblk, graphblk, langchainstr, combinedstr, compiledstr, custom-statestr, graphstr, langchain blk, combinedblk, compiledblk, custom-stateblk, graphblk, langchainstr, combinedstr, compiledstr, custom-statestr, graphstr, langchain blk, combinedblk, compiledblk, custom-stateblk, graphblk, langchainstr, combinedstr, compiledstr, custom-statestr, graphstr, langchain
cloudflare/langgraph
cloudflare/vercel
nextjs/mastra
nextjs/vercel blkstr blkstr blkstr blkstr blkstr blkstr
node/langgraph
node/manual
node/mastra
node/vercel
php/laravel blkstr blkstr blkstr blkstr blkstr blkstr
python/langgraph as as as as as as
python/manual as as as as as as
python/openai-agents
python/pydantic-ai 🗑️a ✅🆕a, fallback ✅🆕a, single 🗑️a ✅🆕a, fallback ✅🆕a, single 🗑️a ❌🆕a, fallback ❌🆕a, single 🗑️a ❌🆕a, fallback ❌🆕a, single 🗑️a ❌🆕a, fallback ❌🆕a, single 🗑️a ❌🆕a, fallback ❌🆕a, single

Embedding Tests

SDK Basic Embeddings Test
browser/google-genai
browser/langchain
browser/openai
cloudflare/google-genai
cloudflare/langchain
cloudflare/openai
cloudflare/vercel
nextjs/google-genai
nextjs/langchain
nextjs/openai
nextjs/vercel
node/google-genai
node/langchain
node/openai
node/vercel
php/laravel
python/google-genai a, blks, blk
python/langchain a, blks, blk
python/litellm a, blks, blk
python/manual a, blks, blk
python/openai a, blks, blk

LLM Tests

SDK Basic Error LLM Test Basic LLM Test Conversation ID LLM Test Long Input LLM Test Multi-Turn LLM Test Vision LLM Test
browser/anthropic blkstr blkstr blkstr blkstr blkstr blkstr
browser/google-genai blkstr blkstr blkstr blkstr blkstr blkstr
browser/langchain blkstr blkstr blkstr blkstr blkstr blkstr
browser/openai blkstr blkstr blkstr blkstr ❌📉blkstr blkstr
cloudflare/anthropic blkstr blkstr blkstr ✅🔧blkstr blkstr blkstr
cloudflare/google-genai blkstr blkstr blkstr blkstr blkstr blkstr
cloudflare/langchain blkstr blkstr blkstr blkstr blkstr blkstr
cloudflare/openai blkstr blkstr blkstr ✅🔧blkstr blkstr blkstr
nextjs/anthropic blkstr blkstr blkstr blkstr blkstr blkstr
nextjs/google-genai blkstr blkstr blkstr blkstr blkstr blkstr
nextjs/langchain blkstr blkstr blkstr blkstr blkstr blkstr
nextjs/openai blkstr blkstr blkstr blkstr blkstr blkstr
node/anthropic blkstr blkstr blkstr blkstr blkstr blkstr
node/google-genai blkstr blkstr blkstr blkstr blkstr blkstr
node/langchain blkstr blkstr blkstr blkstr blkstr blkstr
node/manual
node/openai blkstr blkstr blkstr blkstr blkstr blkstr
python/anthropic a, blka, strs, blks, str a, blka, strs, blks, str a, blka, strs, blks, str a, blka, strs, blks, str a, blka, strs, blks, str a, blka, strs, blks, str
python/cohere s, blk, v1s, blk, v2s, str, v1s, str, v2 s, blk, v1s, blk, v2s, str, v1s, str, v2 s, blk, v1s, blk, v2s, str, v1s, str, v2 s, blk, v1s, blk, v2s, str, v1s, str, v2
python/google-genai a, blka, strs, blks, str a, blka, strs, blks, str a, blka, strs, blks, str a, blka, strs, blks, str a, blka, strs, blks, str a, blka, strs, blks, str
python/langchain a, blka, strs, blks, str a, blka, strs, blks, str a, blka, strs, blks, str a, blka, strs, blks, str a, blka, strs, blks, str a, blka, strs, blks, str
python/litellm a, blka, strs, blks, str a, blka, strs, blks, str a, blka, strs, blks, str a, blka, strs, blks, str a, blka, strs, blks, str a, blka, strs, blks, str
python/manual a, blks, blk a, blks, blk a, blks, blk a, blks, blk a, blks, blk
python/openai a, blka, strs, blks, str a, blka, strs, blks, str a, blka, strs, blks, str a, blka, strs, blks, str a, blka, strs, blks, str a, blka, strs, blks, str

Legend: ✅ Pass | ❌ Fail | ✅🔧 Fixed | ❌📉 Regressed | ✅🆕 New (pass) | ❌🆕 New (fail) | 🗑️ Removed | str=streaming blk=blocking a=async s=sync hi=highlevel lo=lowlevel


Generated by AI SDK Integration Tests

Comment on lines 669 to 679
executionMode: execMode,
streamingMode: streamMode,
resolvedOptions,
// Deep-merge modelOverrides from framework config and option overrides
modelOverrides: optionOverrides.modelOverrides
? {
...framework.modelOverrides,
...optionOverrides.modelOverrides,
}
: framework.modelOverrides,
},
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: The check to skip tests occurs before option-level skip overrides are processed. Consequently, tests that should be skipped for specific option combinations will still be executed.
Severity: MEDIUM

Suggested Fix

Move the skip check inside the optionCombinations loop. After creating the combined framework object by merging framework and optionOverrides, perform the skip check on this new object before pushing the test run to the matrix. This will ensure that option-specific skip rules are correctly applied.

Prompt for AI Agent
Review the code at the location below. A potential bug has been identified by an AI
agent.
Verify if this is a real issue. If it is, propose a fix; if not, explain why it's not
valid.

Location: src/orchestrator.ts#L665-L679

Potential issue: The logic to determine if a test should be skipped is located at
`orchestrator.ts:631`, which checks the framework-level `skip.tests` array. This check
happens before the code iterates through `optionCombinations` starting at line 656.
Inside this loop, `optionOverrides` (which can contain a `skip` configuration) are
merged into the framework object for the matrix entry. However, because the decision to
include the test has already been made, any `skip` rules defined in `optionOverrides`
are effectively ignored. This means tests that are supposed to be skipped for specific
option combinations will still be added to the execution matrix and run.

Did we get this right? 👍 / 👎 to inform future reviews.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant