Skip to content

DO NOT MERGE - test(pydantic-ai): test iterate-pr flow #5640

Closed
dingsdax wants to merge 2 commits intomasterfrom
test-pydantic-warden
Closed

DO NOT MERGE - test(pydantic-ai): test iterate-pr flow #5640
dingsdax wants to merge 2 commits intomasterfrom
test-pydantic-warden

Conversation

@dingsdax
Copy link
Contributor

Extract shared content serialization logic from ai_client.py and invoke_agent.py into dedicated helpers in spans/utils.py, and add support for pydantic-ai's ImageUrl message type.

Previously, both ai_client.py and invoke_agent.py duplicated the same inline logic for serializing BinaryContent items into span data — including the get_modality_from_mime_type call and BLOB_DATA_SUBSTITUTE redaction. This refactor centralizes that into _serialize_binary_content_item.

The new _serialize_image_url_item helper handles ImageUrl content items (introduced in recent pydantic-ai versions). It detects base64-encoded data URLs via a regex in consts.py and redacts their content while preserving regular HTTP URLs verbatim.

The tool_definition parameter was also removed from execute_tool_span — it was only used to set GEN_AI_TOOL_DESCRIPTION, and the pydantic-ai ToolDefinition type is an internal API that added an unnecessary coupling.

…Url support

Move shared binary/image serialization logic from ai_client.py and invoke_agent.py
into a shared _serialize_binary_content_item and _serialize_image_url_item helpers
in spans/utils.py. Add support for pydantic-ai's ImageUrl type, with base64 data
URLs redacted and regular HTTP URLs preserved. Remove unused tool_definition
parameter from execute_tool_span.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@github-actions
Copy link
Contributor

github-actions bot commented Mar 11, 2026

Semver Impact of This PR

None (no version bump detected)

📋 Changelog Preview

This is how your changes will appear in the changelog.
Entries from this PR are highlighted with a left border (blockquote style).


New Features ✨

  • (crons) Add owner field to MonitorConfig by julwhitney13 in #5610
  • (otlp) Add collector_url option to OTLPIntegration by sl0thentr0py in #5603
  • (pydantic-ai) Add tool description to execute_tool spans by ericapisani in #5596

Bug Fixes 🐛

  • (celery) Propagate user-set headers by sentrivana in #5581
  • (utils) Avoid double serialization of strings in safe_serialize by ericapisani in #5587

Documentation 📚

  • (openai-agents) Remove inapplicable comment by alexander-alderman-webb in #5495
  • Add AGENTS.md by sentrivana in #5579
  • Add set_attribute example to changelog by sentrivana in #5578

Internal Changes 🔧

Openai Agents

  • Do not fail on new tool fields by alexander-alderman-webb in #5625
  • Stop expecting a specific function name by alexander-alderman-webb in #5623
  • Set streaming header when library uses with_streaming_response() by alexander-alderman-webb in #5583
  • Replace mocks with httpx for streamed responses by alexander-alderman-webb in #5580
  • Replace mocks with httpx in non-MCP tool tests by alexander-alderman-webb in #5602
  • Replace mocks with httpx in MCP tool tests by alexander-alderman-webb in #5605
  • Replace mocks with httpx in handoff tests by alexander-alderman-webb in #5604
  • Replace mocks with httpx in API error test by alexander-alderman-webb in #5601
  • Replace mocks with httpx in non-error single-response tests by alexander-alderman-webb in #5600
  • Remove test for unreachable state by alexander-alderman-webb in #5584
  • Expect namespace tool field for new openai versions by alexander-alderman-webb in #5599

Other

  • (httpx) Resolve type checking failures by alexander-alderman-webb in #5626
  • (pyramid) Support alpha suffixes in version parsing by alexander-alderman-webb in #5618
  • Remove CodeQL action by sentrivana in #5616
  • Normalize dots in package names in populate_tox.py by alexander-alderman-webb in #5574
  • Do not run actions on potel-base by sentrivana in #5614

Other

  • DO NOT MERGE - test(pydantic-ai): test iterate-pr flow by dingsdax in #5640

🤖 This preview updates automatically when you update the PR.

@github-actions
Copy link
Contributor

github-actions bot commented Mar 11, 2026

Codecov Results 📊

13 passed | Total: 13 | Pass Rate: 100% | Execution Time: 4.53s

All tests are passing successfully.

❌ Patch coverage is 0.00%. Project has 14039 uncovered lines.

Files with missing lines (6)
File Patch % Lines
ai_client.py 0.00% ⚠️ 147 Missing
tools.py 0.00% ⚠️ 92 Missing
invoke_agent.py 0.00% ⚠️ 81 Missing
utils.py 0.00% ⚠️ 32 Missing
execute_tool.py 0.00% ⚠️ 22 Missing
consts.py 0.00% ⚠️ 3 Missing

Generated by Codecov Action

Comment on lines 14 to 16
def execute_tool_span(
tool_name: str,
tool_args: "Any",
agent: "Any",
tool_type: str = "function",
tool_definition: "Optional[ToolDefinition]" = None,
tool_name: str, tool_args: "Any", agent: "Any", tool_type: str = "function"
) -> "sentry_sdk.tracing.Span":
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removal of GEN_AI_TOOL_DESCRIPTION will break existing test

The code removes the logic that sets GEN_AI_TOOL_DESCRIPTION on execute_tool spans (lines 35-40 being deleted). However, the test test_tool_description_in_execute_tool_span in tests/integrations/pydantic_ai/test_pydantic_ai.py (lines 2835-2836) explicitly asserts that this field should be present and contain the tool's docstring. Since the test file is not listed as being modified in this PR, this change will cause test failures.

Verification

Verified by reading the test file at lines 2799-2836 which shows test_tool_description_in_execute_tool_span asserting SPANDATA.GEN_AI_TOOL_DESCRIPTION in tool_span['data']. Confirmed the test file is not in the PR's list of modified files. Checked grep results showing the deleted code was the only place setting this field for pydantic_ai execute_tool spans.

Identified by Warden code-review · JDR-KXH

Comment on lines +29 to +41
data_url_matches = DATA_URL_BASE64_REGEX.match(item.url)

if data_url_matches:
mime_type = data_url_matches[1] or "image"
return {
"type": "image",
"mime_type": mime_type,
"content": BLOB_DATA_SUBSTITUTE,
}

return {
"type": "image",
"content": str(item.url),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regex fails to match valid base64 data URLs with complex MIME types

The DATA_URL_BASE64_REGEX pattern ^data:([a-zA-Z]+/[a-zA-Z]+);base64,... only matches simple MIME types with alphanumeric characters. Valid MIME types like image/svg+xml or application/vnd.api+json (containing + or .) will not match, causing _serialize_image_url_item to fall through to the non-data-URL branch and expose the full base64-encoded image content in span data instead of redacting it.

Verification

Verified by reading consts.py (line 7-9) which defines the regex as ^data:([a-zA-Z]+/[a-zA-Z]+);base64,.... The character class [a-zA-Z]+ in the MIME subtype does not include +, ., or - characters. Compared with parse_data_uri in sentry_sdk/ai/utils.py (lines 43-72) which uses string splitting and handles edge cases gracefully.

Also found at 2 additional locations
  • sentry_sdk/integrations/pydantic_ai/consts.py:7-9
  • sentry_sdk/integrations/pydantic_ai/spans/ai_client.py:165-166

Identified by Warden code-review · NKQ-KBQ

@dingsdax dingsdax marked this pull request as ready for review March 11, 2026 17:25
@dingsdax dingsdax requested a review from a team as a code owner March 11, 2026 17:25
@dingsdax dingsdax changed the title ref(pydantic-ai): Extract content serialization helpers and add ImageUrl support test(pydantic-ai): test iterate-pr flow Mar 11, 2026
@dingsdax dingsdax removed the request for review from a team March 11, 2026 17:25
@dingsdax dingsdax changed the title test(pydantic-ai): test iterate-pr flow DO NOT MERGE - test(pydantic-ai): test iterate-pr flow Mar 11, 2026
The previous commit removed tool_definition from execute_tool_span, which
inadvertently dropped gen_ai.tool.description from spans. Restore description
support by extracting the description string from tool.tool_def in patches/tools.py
and passing it as Optional[str] to execute_tool_span, avoiding the ToolDefinition
type coupling while preserving the behavior.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
For data URLs containing base64-encoded images, the content is redacted.
For regular HTTP URLs, the URL string is preserved.
"""
data_url_matches = DATA_URL_BASE64_REGEX.match(item.url)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: The function _serialize_image_url_item will raise a TypeError because it attempts a regex match on a Pydantic Url object without first converting it to a string.
Severity: HIGH

Suggested Fix

Convert item.url to a string before performing the regex match, similar to how it is handled elsewhere in the function. Change DATA_URL_BASE64_REGEX.match(item.url) to DATA_URL_BASE64_REGEX.match(str(item.url)).

Prompt for AI Agent
Review the code at the location below. A potential bug has been identified by an AI
agent.
Verify if this is a real issue. If it is, propose a fix; if not, explain why it's not
valid.

Location: sentry_sdk/integrations/pydantic_ai/spans/utils.py#L29

Potential issue: The function `_serialize_image_url_item` calls
`DATA_URL_BASE64_REGEX.match(item.url)` without first converting `item.url` to a string.
According to Pydantic V2's documentation, URL fields are stored as
`pydantic_core._pydantic_core.Url` objects, not strings. The `re.match` function expects
a string or bytes-like object, so this operation will raise a `TypeError`. While this
error is caught in the `ai_client.py` integration, it is not handled in
`invoke_agent.py`, which will cause the instrumentation to crash when an `ImageUrl` is
processed.

Did we get this right? 👍 / 👎 to inform future reviews.

Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 2 potential issues.

Bugbot Autofix is kicking off a free cloud agent to fix these issues. This run is complimentary, but you can enable autofix for all future PRs in the Cursor dashboard.

# Group 1: MIME type (e.g. "image/png"), Group 2: base64 data
DATA_URL_BASE64_REGEX = re.compile(
r"^data:([a-zA-Z]+/[a-zA-Z]+);base64,([A-Za-z0-9+/\-_]+={0,2})$"
)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regex fails to match common MIME types, leaking data

High Severity

DATA_URL_BASE64_REGEX uses [a-zA-Z]+/[a-zA-Z]+ for the MIME type, which only matches pure alphabetic characters. Valid MIME types like video/mp4, audio/mp3, image/svg+xml, image/x-icon, and font/woff2 contain digits, hyphens, or plus signs and won't match. When a valid base64 data URL fails to match, _serialize_image_url_item falls through to the non-redacted path and exposes the full base64-encoded content in span data. An existing parse_data_uri utility in sentry_sdk/ai/utils.py already handles data URI parsing robustly per RFC 2397 and is used elsewhere in the codebase.

Additional Locations (1)
Fix in Cursor Fix in Web

from pydantic_ai.messages import BinaryContent, ImageUrl # type: ignore
except ImportError:
BinaryContent = None
ImageUrl = None
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unused runtime imports of BinaryContent and ImageUrl

Low Severity

BinaryContent and ImageUrl are imported at runtime in spans/utils.py but never referenced anywhere in the file. The helper functions _serialize_image_url_item and _serialize_binary_content_item both accept item: "Any" and access attributes directly — the isinstance checks happen in the calling files (ai_client.py and invoke_agent.py), which have their own imports. These are dead runtime imports that add unnecessary coupling to pydantic_ai.messages.

Fix in Cursor Fix in Web

@dingsdax dingsdax closed this Mar 11, 2026
@dingsdax dingsdax deleted the test-pydantic-warden branch March 11, 2026 17:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant