feat(pydantic-ai): add model fallback checks#98
Conversation
Summary
Add support for object-style option values in framework config.json with per-value config overrides
Option values can now be a plain string (existing behavior) or { "value": "...", "overrides": {...} } to override modelOverrides, skip, or toolNameMapping for that specific option combination
Overrides are deep-merged into the framework config during test matrix expansion
Example
{
"options": {
"modelSetup": [
"single",
{
"value": "fallback",
"overrides": {
"modelOverrides": { "request": "some-other-model" }
}
}
]
}
}
The "single" variant uses the framework's default config. The "fallback" variant overrides modelOverrides.request for all checks in that test run.
Motivation
Some framework variants need different validation expectations (e.g., a different expected model name, different checks to skip, or different tool name mappings). Previously this required duplicating the entire framework folder. With option overrides, a single config.json can express variant-specific config inline.
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 2 potential issues.
Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
| ...framework.modelOverrides, | ||
| ...optionOverrides.modelOverrides, | ||
| } | ||
| : framework.modelOverrides, |
There was a problem hiding this comment.
Shallow merge of skip overrides loses framework config
Medium Severity
The ...optionOverrides spread at line 668 shallow-merges skip (and toolNameMapping) into the framework config, while modelOverrides is explicitly deep-merged. If an option override provides skip: { checks: {...} }, it completely replaces the framework-level skip, losing any skip.tests entries. The same inconsistency exists in getOptionCombinations where skip from multiple option values across different keys overwrites rather than merges. The type system explicitly allows skip as an override, making this a trap for future usage.
Additional Locations (1)
| overrides: Partial< | ||
| Pick<FrameworkConfig, "modelOverrides" | "skip" | "toolNameMapping"> | ||
| >; | ||
| }; |
There was a problem hiding this comment.
Duplicate OptionValue type defined in two files
Low Severity
The OptionValue type is defined identically in both src/types.ts and src/runner/framework-config.ts, each referencing their own local FrameworkConfig. Neither definition is imported elsewhere — both are only consumed locally. This duplication increases maintenance burden: if one definition is updated, the other could easily fall out of sync.
Additional Locations (1)
🔴 AI SDK Integration Test ResultsStatus: 1 regression detected Summary
🔴 RegressionsThese tests were passing on main but are now failing: browser/openai :: Multi-Turn LLM Test (blocking)Error: Browser test timed out (60s) ✅ FixedThese tests were failing on main but are now passing:
🆕 New TestsPassing (4):
Failing (8): ❌ python/pydantic-ai :: Tool Call Agent Test (async, single)Error: 3 check(s) failed: ❌ python/pydantic-ai :: Tool Call Agent Test (async, fallback)Error: 3 check(s) failed: ❌ python/pydantic-ai :: Tool Error Agent Test (async, single)Error: 2 check(s) failed: ❌ python/pydantic-ai :: Tool Error Agent Test (async, fallback)Error: 2 check(s) failed: ❌ python/pydantic-ai :: Vision Agent Test (async, single)Error: 1 check(s) failed: ❌ python/pydantic-ai :: Vision Agent Test (async, fallback)Error: 1 check(s) failed: ❌ python/pydantic-ai :: Long Input Agent Test (async, single)Error: 1 check(s) failed: ❌ python/pydantic-ai :: Long Input Agent Test (async, fallback)Error: 1 check(s) failed: 🗑️ Removed TestsThese tests existed on main but are not in the PR:
Test MatrixAgent Tests
Embedding Tests
LLM Tests
Legend: ✅ Pass | ❌ Fail | ✅🔧 Fixed | ❌📉 Regressed | ✅🆕 New (pass) | ❌🆕 New (fail) | 🗑️ Removed | str=streaming blk=blocking a=async s=sync hi=highlevel lo=lowlevel Generated by AI SDK Integration Tests |
| executionMode: execMode, | ||
| streamingMode: streamMode, | ||
| resolvedOptions, | ||
| // Deep-merge modelOverrides from framework config and option overrides | ||
| modelOverrides: optionOverrides.modelOverrides | ||
| ? { | ||
| ...framework.modelOverrides, | ||
| ...optionOverrides.modelOverrides, | ||
| } | ||
| : framework.modelOverrides, | ||
| }, |
There was a problem hiding this comment.
Bug: The check to skip tests occurs before option-level skip overrides are processed. Consequently, tests that should be skipped for specific option combinations will still be executed.
Severity: MEDIUM
Suggested Fix
Move the skip check inside the optionCombinations loop. After creating the combined framework object by merging framework and optionOverrides, perform the skip check on this new object before pushing the test run to the matrix. This will ensure that option-specific skip rules are correctly applied.
Prompt for AI Agent
Review the code at the location below. A potential bug has been identified by an AI
agent.
Verify if this is a real issue. If it is, propose a fix; if not, explain why it's not
valid.
Location: src/orchestrator.ts#L665-L679
Potential issue: The logic to determine if a test should be skipped is located at
`orchestrator.ts:631`, which checks the framework-level `skip.tests` array. This check
happens before the code iterates through `optionCombinations` starting at line 656.
Inside this loop, `optionOverrides` (which can contain a `skip` configuration) are
merged into the framework object for the matrix entry. However, because the decision to
include the test has already been made, any `skip` rules defined in `optionOverrides`
are effectively ignored. This means tests that are supposed to be skipped for specific
option combinations will still be added to the execution matrix and run.
Did we get this right? 👍 / 👎 to inform future reviews.


Closes https://linear.app/getsentry/issue/TET-2038/come-up-with-a-way-to-test-anthropic-alloy