Python: Fix GeminiChatClient dropping image/file content#6751
Conversation
GeminiChatClient._convert_message_contents only handled text and function_call content, so data/uri (image, PDF, audio) parts were silently dropped and never reached Gemini. Convert data URIs to inline_data Parts and external URIs to file_data Parts, warning on genuinely unconvertible content. Adds tests for the multimodal conversion paths. Fixes microsoft#6688 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
Fixes GeminiChatClient multimodal input handling so Content.from_data(...) and Content.from_uri(...) (data URIs and external URIs) are converted into Gemini Part objects instead of being silently dropped, with new unit tests validating the behavior.
Changes:
- Added
data/urihandling inGeminiChatClient._convert_message_contentsvia a new_convert_data_or_uri_contenthelper. - Improved observability by warning (instead of debug-only) when multimodal content can’t be converted.
- Added tests covering inline data (
from_data, data-URIfrom_uri), external URIs, mixed text+image, and the non-base64 data-URI skip path.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
| python/packages/gemini/agent_framework_gemini/_chat_client.py | Converts data/uri content into Gemini inline_data / file_data parts and warns on unconvertible inputs. |
| python/packages/gemini/tests/test_gemini_client.py | Adds regression/unit tests for multimodal conversion paths and warning behavior. |
There was a problem hiding this comment.
Automated Code Review
Reviewers: 5 | Confidence: 85%
✓ Correctness
This PR correctly fixes the Gemini multimodal content handling. The implementation properly handles data URIs (decoded to inline_data) and external URIs (file_data references). Edge cases (missing URI, non-base64 data URIs, decode failures) are handled with warnings. The only minor concern is that external URIs with no media_type pass None to types.Part.from_uri, which the author explicitly acknowledges as a design decision in the PR description.
✓ Security Reliability
The implementation correctly handles multimodal content conversion and has good defensive coding for the data-URI path (validating presence, base64 encoding, mime type, and catching decode failures). However, the external-URI path on line 724 can raise an unhandled ValueError from the Gemini SDK when content.media_type is None and the URL lacks a recognizable file extension, since Part.from_uri internally calls mimetypes.guess_type() and raises ValueError on failure. The data-URI path properly catches all exceptions and returns None with a warning, but the external-URI path lacks equivalent protection.
✓ Test Coverage
The new tests cover the primary happy paths well (from_data, data-URI from_uri, external from_uri, multimodal combo, and the non-base64 skip path). Assertions are meaningful — they verify specific attributes rather than just checking for no-exception. However, there are a few untested branches in the new _convert_data_or_uri_content helper: (1) external URI with media_type=None (a realistic scenario since the Content model allows it, passing None to the Gemini SDK), (2) the missing-URI warning path (line 704-706), and (3) the invalid-base64-decode warning path (line 718-720). The first gap is the most significant because the PR author explicitly asks reviewers to consider whether external URIs should require a media type.
✓ Failure Modes
The PR correctly fixes silent dropping of multimodal content. One operational failure mode exists: when an external URI has no media_type and the SDK cannot guess one from the URL,
types.Part.from_uriraises an unhandledValueError(line 724), crashing the request instead of gracefully skipping like all sibling error paths in the same method.
✓ Design Approach
The multimodal conversion approach is mostly aligned with the framework contract, but there is one non-blocking design gap: the new Gemini data-URI decoder uses permissive base64 decoding, some malformed
data:payloads will still be accepted as empty/corupted bytes instead of taking the new warning-and-skip path.
Suggestions
- Line 724: When
content.media_typeis None and the URI lacks a guessable file extension (e.g. presigned URLs, API endpoints),types.Part.from_uriraisesValueError. Consider wrapping in try/except or checking media_type upfront to match the defensive pattern used throughout the rest of the method.
Automated review by giles17's agents
Strip parameters (e.g. charset) from a data URI media type before passing it to Gemini, and wrap types.Part.from_uri so a URI with no media_type and no guessable extension is passed through as file_data without a mime type instead of raising ValueError. Adds tests for both paths. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Motivation & Context
When a multimodal message is sent to an agent backed by
GeminiChatClient—a
MessagecontainingContent.from_data(...)(image/PDF/audio) orContent.from_uri(...)— the file content was silently discarded before therequest reached Gemini. The model only ever saw the text parts, with no error or
warning, so it looked like Gemini was "ignoring" the attachment. This is the
documented, provider-agnostic way to send images, so multimodal scenarios were
effectively broken for the Gemini provider.
The root cause is
GeminiChatClient._convert_message_contents, which onlymatched
textandfunction_call; every other content type (includingdataand
uri) fell through to the catch-all and was dropped with alogger.debugline.
Description & Review Guide
What are the major changes?
case "data" | "uri"branch to_convert_message_contentsand a new_convert_data_or_uri_contenthelper inpackages/gemini/agent_framework_gemini/_chat_client.py. Data URIs(
type="data") are converted to Geminiinline_dataParts viatypes.Part.from_bytes(...)(decoding the base64 payload and deriving themime type from
media_typeor the URI header); external URIs (type="uri")become
file_dataParts viatypes.Part.from_uri(...).undecodable bytes) now warns instead of being silently dropped.
from_data, data-URIfrom_uri, external-URIfrom_uri, combined text+image, and the non-base64 skip path.What is the impact of these changes?
dropped. Behavior for text and function-call content is unchanged. No public
API changes.
What do you want reviewers to focus on?
should require a media type rather than letting the SDK infer it.
Related Issue
Fixes #6688
Contribution Checklist
breaking changelabel (or add "[BREAKING]" to the title prefix, before or after any language prefix) — a workflow keeps the label and title prefix in sync automatically.