feat(pydantic-ai): add model fallback checks by constantinius · Pull Request #98 · getsentry/testing-ai-sdk-integrations

constantinius · 2026-03-12T13:04:39Z

Closes https://linear.app/getsentry/issue/TET-2038/come-up-with-a-way-to-test-anthropic-alloy

Summary Add support for object-style option values in framework config.json with per-value config overrides Option values can now be a plain string (existing behavior) or { "value": "...", "overrides": {...} } to override modelOverrides, skip, or toolNameMapping for that specific option combination Overrides are deep-merged into the framework config during test matrix expansion Example { "options": { "modelSetup": [ "single", { "value": "fallback", "overrides": { "modelOverrides": { "request": "some-other-model" } } } ] } } The "single" variant uses the framework's default config. The "fallback" variant overrides modelOverrides.request for all checks in that test run. Motivation Some framework variants need different validation expectations (e.g., a different expected model name, different checks to skip, or different tool name mappings). Previously this required duplicating the entire framework folder. With option overrides, a single config.json can express variant-specific config inline.

linear-code · 2026-03-12T13:04:43Z

TET-2038 Come up with a way to test anthropic alloy

cursor

Cursor Bugbot has reviewed your changes and found 2 potential issues.

^{Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.}

cursor · 2026-03-12T13:09:06Z

src/orchestrator.ts

+                        ...framework.modelOverrides,
+                        ...optionOverrides.modelOverrides,
+                      }
+                    : framework.modelOverrides,


Shallow merge of skip overrides loses framework config

Medium Severity

The ...optionOverrides spread at line 668 shallow-merges skip (and toolNameMapping) into the framework config, while modelOverrides is explicitly deep-merged. If an option override provides skip: { checks: {...} }, it completely replaces the framework-level skip, losing any skip.tests entries. The same inconsistency exists in getOptionCombinations where skip from multiple option values across different keys overwrites rather than merges. The type system explicitly allows skip as an override, making this a trap for future usage.

Additional Locations (1)

src/orchestrator.ts#L889-L901

cursor · 2026-03-12T13:09:06Z

src/types.ts

+      overrides: Partial<
+        Pick<FrameworkConfig, "modelOverrides" | "skip" | "toolNameMapping">
+      >;
+    };


Duplicate OptionValue type defined in two files

Low Severity

The OptionValue type is defined identically in both src/types.ts and src/runner/framework-config.ts, each referencing their own local FrameworkConfig. Neither definition is imported elsewhere — both are only consumed locally. This duplication increases maintenance burden: if one definition is updated, the other could easily fall out of sync.

Additional Locations (1)

src/runner/framework-config.ts#L4-L12

github-actions · 2026-03-12T13:22:30Z

🔴 AI SDK Integration Test Results

Status: 1 regression detected

Summary

Metric	main	PR	Change
Total Tests	530	536	+6
Passed	229	232	+3 ✅
Failed	301	304	+3 ⚠️

🔴 Regressions

These tests were passing on main but are now failing:

browser/openai :: Multi-Turn LLM Test (blocking)

Error: Browser test timed out (60s)

Browser test timed out (60s)

✅ Fixed

These tests were failing on main but are now passing:

cloudflare/anthropic :: Long Input LLM Test (blocking)
cloudflare/openai :: Long Input LLM Test (blocking)

🆕 New Tests

Passing (4):

✅ python/pydantic-ai :: Basic Agent Test (async, single)
✅ python/pydantic-ai :: Basic Agent Test (async, fallback)
✅ python/pydantic-ai :: Conversation ID Agent Test (async, single)
✅ python/pydantic-ai :: Conversation ID Agent Test (async, fallback)

Failing (8):

❌ python/pydantic-ai :: Tool Call Agent Test (async, single)

Error: 3 check(s) failed:

3 check(s) failed:
Attribute validation failed:
  Span b1e24091: Attribute 'gen_ai.tool.description' must exist but is missing
  Span 846d42e1: Attribute 'gen_ai.tool.description' must exist but is missing
Tool call "add" should have argument "a"
Tool call "add" should have argument "b"
Tool call "multiply" should have argument "a"
Tool call "multiply" should have argument "b"
Tool "add" should have description "Add two numbers together" but has "undefined"
Tool "multiply" should have description "Multiply two numbers together" but has "undefined"

❌ python/pydantic-ai :: Tool Call Agent Test (async, fallback)

Error: 3 check(s) failed:

3 check(s) failed:
Attribute validation failed:
  Span ada4875b: Attribute 'gen_ai.tool.description' must exist but is missing
  Span 9844b76d: Attribute 'gen_ai.tool.description' must exist but is missing
Tool call "add" should have argument "a"
Tool call "add" should have argument "b"
Tool call "multiply" should have argument "a"
Tool call "multiply" should have argument "b"
Tool "add" should have description "Add two numbers together" but has "undefined"
Tool "multiply" should have description "Multiply two numbers together" but has "undefined"

❌ python/pydantic-ai :: Tool Error Agent Test (async, single)

Error: 2 check(s) failed:

2 check(s) failed:
Attribute validation failed:
  Span a6eed90f: Attribute 'gen_ai.tool.description' must exist but is missing
Tool call "read_file" should have argument "path"

❌ python/pydantic-ai :: Tool Error Agent Test (async, fallback)

Error: 2 check(s) failed:

2 check(s) failed:
Attribute validation failed:
  Span 9eb65752: Attribute 'gen_ai.tool.description' must exist but is missing
Tool call "read_file" should have argument "path"

❌ python/pydantic-ai :: Vision Agent Test (async, single)

Error: 1 check(s) failed:

1 check(s) failed:
Messages should not contain raw base64 data (should be redacted)
Messages should contain '[Blob substitute]' marker indicating binary content was redacted

❌ python/pydantic-ai :: Vision Agent Test (async, fallback)

Error: 1 check(s) failed:

1 check(s) failed:
Messages should not contain raw base64 data (should be redacted)
Messages should contain '[Blob substitute]' marker indicating binary content was redacted

❌ python/pydantic-ai :: Long Input Agent Test (async, single)

Error: 1 check(s) failed:

1 check(s) failed:
Message should be trimmed (length 25667 > 20000)
Message should be trimmed (length 25667 > 20000)

❌ python/pydantic-ai :: Long Input Agent Test (async, fallback)

Error: 1 check(s) failed:

1 check(s) failed:
Message should be trimmed (length 25667 > 20000)
Message should be trimmed (length 25667 > 20000)

🗑️ Removed Tests

These tests existed on main but are not in the PR:

python/pydantic-ai :: Basic Agent Test (async)
python/pydantic-ai :: Tool Call Agent Test (async)
python/pydantic-ai :: Tool Error Agent Test (async)
python/pydantic-ai :: Vision Agent Test (async)
python/pydantic-ai :: Long Input Agent Test (async)
python/pydantic-ai :: Conversation ID Agent Test (async)

Test Matrix

Agent Tests

SDK	Basic Agent Test	Conversation ID Agent Test	Long Input Agent Test	Tool Call Agent Test	Tool Error Agent Test	Vision Agent Test
browser/langgraph	❌_{blk, combined} ❌_{blk, compiled} ❌_{blk, custom-state} ❌_{blk, graph} ❌_{blk, langchain} ❌_{str, combined} ❌_{str, compiled} ❌_{str, custom-state} ❌_{str, graph} ❌_{str, langchain}	❌_{blk, combined} ❌_{blk, compiled} ❌_{blk, custom-state} ❌_{blk, graph} ❌_{blk, langchain} ❌_{str, combined} ❌_{str, compiled} ❌_{str, custom-state} ❌_{str, graph} ❌_{str, langchain}	❌_{blk, combined} ❌_{blk, compiled} ❌_{blk, custom-state} ❌_{blk, graph} ❌_{blk, langchain} ❌_{str, combined} ❌_{str, compiled} ❌_{str, custom-state} ❌_{str, graph} ❌_{str, langchain}	❌_{blk, combined} ❌_{blk, compiled} ❌_{blk, custom-state} ❌_{blk, graph} ❌_{blk, langchain} ❌_{str, combined} ❌_{str, compiled} ❌_{str, custom-state} ❌_{str, graph} ❌_{str, langchain}	❌_{blk, combined} ❌_{blk, compiled} ❌_{blk, custom-state} ❌_{blk, graph} ❌_{blk, langchain} ❌_{str, combined} ❌_{str, compiled} ❌_{str, custom-state} ❌_{str, graph} ❌_{str, langchain}	❌_{blk, combined} ❌_{blk, compiled} ❌_{blk, custom-state} ❌_{blk, graph} ❌_{blk, langchain} ❌_{str, combined} ❌_{str, compiled} ❌_{str, custom-state} ❌_{str, graph} ❌_{str, langchain}
cloudflare/langgraph	❌	❌	❌	❌	❌	❌
cloudflare/vercel	❌	❌	❌	❌	❌	❌
nextjs/mastra	✅	❌	—	✅	✅	✅
nextjs/vercel	✅_blk ✅_str	❌_blk ❌_str	❌_blk ❌_str	❌_blk ❌_str	❌_blk ❌_str	❌_blk ❌_str
node/langgraph	❌	❌	❌	❌	❌	❌
node/manual	✅	❌	✅	✅	✅	✅
node/mastra	❌	❌	❌	❌	❌	❌
node/vercel	✅	❌	❌	❌	❌	❌
php/laravel	❌_blk ❌_str	❌_blk ❌_str	❌_blk ❌_str	❌_blk ❌_str	❌_blk ❌_str	❌_blk ❌_str
python/langgraph	❌_a ❌_s	❌_a ❌_s	❌_a ❌_s	❌_a ❌_s	❌_a ❌_s	❌_a ❌_s
python/manual	✅_a ✅_s	✅_a ✅_s	✅_a ✅_s	✅_a ✅_s	✅_a ✅_s	✅_a ✅_s
python/openai-agents	✅	✅	❌	✅	✅	❌
python/pydantic-ai	🗑️_a ✅🆕_{a, fallback} ✅🆕_{a, single}	🗑️_a ✅🆕_{a, fallback} ✅🆕_{a, single}	🗑️_a ❌🆕_{a, fallback} ❌🆕_{a, single}	🗑️_a ❌🆕_{a, fallback} ❌🆕_{a, single}	🗑️_a ❌🆕_{a, fallback} ❌🆕_{a, single}	🗑️_a ❌🆕_{a, fallback} ❌🆕_{a, single}

Embedding Tests

SDK	Basic Embeddings Test
browser/google-genai	❌
browser/langchain	❌
browser/openai	✅
cloudflare/google-genai	❌
cloudflare/langchain	❌
cloudflare/openai	✅
cloudflare/vercel	❌
nextjs/google-genai	❌
nextjs/langchain	❌
nextjs/openai	✅
nextjs/vercel	❌
node/google-genai	❌
node/langchain	❌
node/openai	✅
node/vercel	❌
php/laravel	❌
python/google-genai	❌_{a, blk} ❌_{s, blk}
python/langchain	❌_{a, blk} ❌_{s, blk}
python/litellm	❌_{a, blk} ✅_{s, blk}
python/manual	✅_{a, blk} ✅_{s, blk}
python/openai	✅_{a, blk} ✅_{s, blk}

LLM Tests

SDK	Basic Error LLM Test	Basic LLM Test	Conversation ID LLM Test	Long Input LLM Test	Multi-Turn LLM Test	Vision LLM Test
browser/anthropic	✅_blk ❌_str	✅_blk ✅_str	❌_blk ❌_str	✅_blk ✅_str	✅_blk ✅_str	✅_blk ✅_str
browser/google-genai	✅_blk ✅_str	✅_blk ✅_str	❌_blk ❌_str	❌_blk ❌_str	❌_blk ❌_str	✅_blk ✅_str
browser/langchain	❌_blk ❌_str	❌_blk ❌_str	❌_blk ❌_str	❌_blk ❌_str	❌_blk ❌_str	❌_blk ❌_str
browser/openai	✅_blk ✅_str	✅_blk ❌_str	❌_blk ❌_str	✅_blk ❌_str	❌📉_blk ❌_str	✅_blk ❌_str
cloudflare/anthropic	✅_blk ❌_str	✅_blk ✅_str	❌_blk ❌_str	✅🔧_blk ✅_str	✅_blk ✅_str	✅_blk ✅_str
cloudflare/google-genai	✅_blk ✅_str	✅_blk ✅_str	❌_blk ❌_str	✅_blk ✅_str	✅_blk ✅_str	✅_blk ✅_str
cloudflare/langchain	❌_blk ❌_str	❌_blk ❌_str	❌_blk ❌_str	❌_blk ❌_str	❌_blk ❌_str	❌_blk ❌_str
cloudflare/openai	✅_blk ✅_str	✅_blk ✅_str	❌_blk ❌_str	✅🔧_blk ✅_str	✅_blk ✅_str	✅_blk ✅_str
nextjs/anthropic	✅_blk ✅_str	❌_blk ❌_str	❌_blk ❌_str	❌_blk ❌_str	❌_blk ❌_str	❌_blk ❌_str
nextjs/google-genai	✅_blk ✅_str	❌_blk ❌_str	❌_blk ❌_str	❌_blk ❌_str	❌_blk ❌_str	❌_blk ❌_str
nextjs/langchain	✅_blk ✅_str	❌_blk ❌_str	❌_blk ❌_str	❌_blk ❌_str	❌_blk ❌_str	❌_blk ❌_str
nextjs/openai	✅_blk ✅_str	✅_blk ❌_str	✅_blk ❌_str	✅_blk ❌_str	✅_blk ❌_str	✅_blk ❌_str
node/anthropic	✅_blk ✅_str	✅_blk ✅_str	✅_blk ✅_str	✅_blk ✅_str	✅_blk ✅_str	✅_blk ✅_str
node/google-genai	✅_blk ✅_str	✅_blk ✅_str	✅_blk ✅_str	✅_blk ✅_str	✅_blk ✅_str	✅_blk ✅_str
node/langchain	✅_blk ✅_str	✅_blk ✅_str	✅_blk ✅_str	✅_blk ✅_str	✅_blk ✅_str	✅_blk ✅_str
node/manual	—	✅	✅	✅	✅	✅
node/openai	✅_blk ✅_str	✅_blk ✅_str	✅_blk ✅_str	✅_blk ✅_str	✅_blk ✅_str	✅_blk ✅_str
python/anthropic	✅_{a, blk} ❌_{a, str} ✅_{s, blk} ❌_{s, str}	✅_{a, blk} ❌_{a, str} ✅_{s, blk} ❌_{s, str}	✅_{a, blk} ❌_{a, str} ✅_{s, blk} ❌_{s, str}	✅_{a, blk} ❌_{a, str} ✅_{s, blk} ❌_{s, str}	✅_{a, blk} ❌_{a, str} ✅_{s, blk} ❌_{s, str}	✅_{a, blk} ❌_{a, str} ✅_{s, blk} ❌_{s, str}
python/cohere	❌_{s, blk, v1} ❌_{s, blk, v2} ❌_{s, str, v1} ❌_{s, str, v2}	❌_{s, blk, v1} ❌_{s, blk, v2} ❌_{s, str, v1} ❌_{s, str, v2}	❌_{s, blk, v1} ❌_{s, blk, v2} ❌_{s, str, v1} ❌_{s, str, v2}	—	❌_{s, blk, v1} ❌_{s, blk, v2} ❌_{s, str, v1} ❌_{s, str, v2}	—
python/google-genai	✅_{a, blk} ✅_{a, str} ✅_{s, blk} ✅_{s, str}	✅_{a, blk} ✅_{a, str} ✅_{s, blk} ✅_{s, str}	✅_{a, blk} ✅_{a, str} ✅_{s, blk} ✅_{s, str}	✅_{a, blk} ✅_{a, str} ✅_{s, blk} ✅_{s, str}	❌_{a, blk} ❌_{a, str} ❌_{s, blk} ❌_{s, str}	❌_{a, blk} ❌_{a, str} ❌_{s, blk} ❌_{s, str}
python/langchain	❌_{a, blk} ❌_{a, str} ❌_{s, blk} ❌_{s, str}	✅_{a, blk} ✅_{a, str} ✅_{s, blk} ✅_{s, str}	✅_{a, blk} ✅_{a, str} ✅_{s, blk} ✅_{s, str}	✅_{a, blk} ✅_{a, str} ✅_{s, blk} ✅_{s, str}	✅_{a, blk} ✅_{a, str} ✅_{s, blk} ✅_{s, str}	✅_{a, blk} ✅_{a, str} ✅_{s, blk} ✅_{s, str}
python/litellm	❌_{a, blk} ❌_{a, str} ❌_{s, blk} ❌_{s, str}	❌_{a, blk} ❌_{a, str} ✅_{s, blk} ✅_{s, str}	❌_{a, blk} ❌_{a, str} ✅_{s, blk} ✅_{s, str}	❌_{a, blk} ❌_{a, str} ✅_{s, blk} ✅_{s, str}	❌_{a, blk} ❌_{a, str} ✅_{s, blk} ✅_{s, str}	❌_{a, blk} ❌_{a, str} ✅_{s, blk} ✅_{s, str}
python/manual	—	✅_{a, blk} ✅_{s, blk}	✅_{a, blk} ✅_{s, blk}	✅_{a, blk} ✅_{s, blk}	✅_{a, blk} ✅_{s, blk}	✅_{a, blk} ✅_{s, blk}
python/openai	❌_{a, blk} ❌_{a, str} ❌_{s, blk} ❌_{s, str}	✅_{a, blk} ❌_{a, str} ✅_{s, blk} ❌_{s, str}	✅_{a, blk} ❌_{a, str} ✅_{s, blk} ❌_{s, str}	✅_{a, blk} ❌_{a, str} ✅_{s, blk} ❌_{s, str}	✅_{a, blk} ❌_{a, str} ✅_{s, blk} ❌_{s, str}	❌_{a, blk} ❌_{a, str} ❌_{s, blk} ❌_{s, str}

Generated by AI SDK Integration Tests

sentry · 2026-03-12T13:51:31Z

src/orchestrator.ts

                  executionMode: execMode,
                  streamingMode: streamMode,
                  resolvedOptions,
+                  // Deep-merge modelOverrides from framework config and option overrides
+                  modelOverrides: optionOverrides.modelOverrides
+                    ? {
+                        ...framework.modelOverrides,
+                        ...optionOverrides.modelOverrides,
+                      }
+                    : framework.modelOverrides,
                },


Bug: The check to skip tests occurs before option-level skip overrides are processed. Consequently, tests that should be skipped for specific option combinations will still be executed.
_{Severity: MEDIUM}

Suggested Fix

Move the skip check inside the optionCombinations loop. After creating the combined framework object by merging framework and optionOverrides, perform the skip check on this new object before pushing the test run to the matrix. This will ensure that option-specific skip rules are correctly applied.

Prompt for AI Agent

Review the code at the location below. A potential bug has been identified by an AI agent. Verify if this is a real issue. If it is, propose a fix; if not, explain why it's not valid. Location: src/orchestrator.ts#L665-L679 Potential issue: The logic to determine if a test should be skipped is located at `orchestrator.ts:631`, which checks the framework-level `skip.tests` array. This check happens before the code iterates through `optionCombinations` starting at line 656. Inside this loop, `optionOverrides` (which can contain a `skip` configuration) are merged into the framework object for the matrix entry. However, because the decision to include the test has already been made, any `skip` rules defined in `optionOverrides` are effectively ignored. This means tests that are supposed to be skipped for specific option combinations will still be added to the execution matrix and run.

_{Did we get this right? 👍 / 👎 to inform future reviews.}

constantinius added 2 commits March 12, 2026 14:00

feat(pydantic-ai): add model fallback checks

eceb7c3

constantinius requested a review from a team March 12, 2026 13:04

cursor bot reviewed Mar 12, 2026

View reviewed changes

fix: template and config

08649cc

sentry bot reviewed Mar 12, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(pydantic-ai): add model fallback checks#98

feat(pydantic-ai): add model fallback checks#98
constantinius wants to merge 3 commits intomainfrom
constantinius/feat/pydantic-ai/model-fallback

constantinius commented Mar 12, 2026

Uh oh!

linear-code bot commented Mar 12, 2026

Uh oh!

cursor bot left a comment

Uh oh!

cursor bot Mar 12, 2026

Uh oh!

cursor bot Mar 12, 2026

Uh oh!

github-actions bot commented Mar 12, 2026 •

edited

Loading

Uh oh!

sentry bot Mar 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

constantinius commented Mar 12, 2026

Uh oh!

linear-code bot commented Mar 12, 2026

Uh oh!

cursor bot left a comment

Choose a reason for hiding this comment

Uh oh!

cursor bot Mar 12, 2026

Choose a reason for hiding this comment

Shallow merge of skip overrides loses framework config

Uh oh!

cursor bot Mar 12, 2026

Choose a reason for hiding this comment

Duplicate OptionValue type defined in two files

Uh oh!

github-actions bot commented Mar 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔴 AI SDK Integration Test Results

Summary

🔴 Regressions

✅ Fixed

🆕 New Tests

🗑️ Removed Tests

Test Matrix

Agent Tests

Embedding Tests

LLM Tests

Uh oh!

sentry bot Mar 12, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Shallow merge of `skip` overrides loses framework config

Duplicate `OptionValue` type defined in two files

github-actions bot commented Mar 12, 2026 •

edited

Loading