Skip to content

Python: Fix GeminiChatClient dropping image/file content#6751

Open
giles17 wants to merge 2 commits into
microsoft:mainfrom
giles17:fix/gemini-multimodal-content
Open

Python: Fix GeminiChatClient dropping image/file content#6751
giles17 wants to merge 2 commits into
microsoft:mainfrom
giles17:fix/gemini-multimodal-content

Conversation

@giles17

@giles17 giles17 commented Jun 25, 2026

Copy link
Copy Markdown
Contributor

Motivation & Context

When a multimodal message is sent to an agent backed by GeminiChatClient
a Message containing Content.from_data(...) (image/PDF/audio) or
Content.from_uri(...) — the file content was silently discarded before the
request reached Gemini. The model only ever saw the text parts, with no error or
warning, so it looked like Gemini was "ignoring" the attachment. This is the
documented, provider-agnostic way to send images, so multimodal scenarios were
effectively broken for the Gemini provider.

The root cause is GeminiChatClient._convert_message_contents, which only
matched text and function_call; every other content type (including data
and uri) fell through to the catch-all and was dropped with a logger.debug
line.

Description & Review Guide

  • What are the major changes?

    • Added a case "data" | "uri" branch to _convert_message_contents and a new
      _convert_data_or_uri_content helper in
      packages/gemini/agent_framework_gemini/_chat_client.py. Data URIs
      (type="data") are converted to Gemini inline_data Parts via
      types.Part.from_bytes(...) (decoding the base64 payload and deriving the
      mime type from media_type or the URI header); external URIs (type="uri")
      become file_data Parts via types.Part.from_uri(...).
    • Genuinely unconvertible content (missing URI, non-base64 data URI, or
      undecodable bytes) now warns instead of being silently dropped.
    • Added tests covering from_data, data-URI from_uri, external-URI
      from_uri, combined text+image, and the non-base64 skip path.
  • What is the impact of these changes?

    • Multimodal input (images, PDFs, audio) now reaches Gemini instead of being
      dropped. Behavior for text and function-call content is unchanged. No public
      API changes.
  • What do you want reviewers to focus on?

    • The mime-type resolution for data URIs and whether external-URI handling
      should require a media type rather than letting the SDK infer it.

Related Issue

Fixes #6688

Contribution Checklist

  • The code builds clean without any errors or warnings
  • All unit tests pass, and I have added new tests where possible
  • The PR follows the Contribution Guidelines
  • This PR is linked to an issue and there is no other open PR for this issue (see Related Issue above).
  • This is not a breaking change. If it is a breaking change, add the breaking change label (or add "[BREAKING]" to the title prefix, before or after any language prefix) — a workflow keeps the label and title prefix in sync automatically.

GeminiChatClient._convert_message_contents only handled text and function_call content, so data/uri (image, PDF, audio) parts were silently dropped and never reached Gemini. Convert data URIs to inline_data Parts and external URIs to file_data Parts, warning on genuinely unconvertible content. Adds tests for the multimodal conversion paths.

Fixes microsoft#6688

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings June 25, 2026 21:07
@moonbox3 moonbox3 added the python Usage: [Issues, PRs], Target: Python label Jun 25, 2026
@github-actions

github-actions Bot commented Jun 25, 2026

Copy link
Copy Markdown
Contributor

Python Test Coverage

Python Test Coverage Report •
FileStmtsMissCoverMissing
packages/gemini/agent_framework_gemini
   _chat_client.py4101097%399, 705–706, 715–716, 719–721, 869, 880
TOTAL42586509388% 

Python Unit Test Overview

Tests Skipped Failures Errors Time
8332 37 💤 0 ❌ 0 🔥 2m 7s ⏱️

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes GeminiChatClient multimodal input handling so Content.from_data(...) and Content.from_uri(...) (data URIs and external URIs) are converted into Gemini Part objects instead of being silently dropped, with new unit tests validating the behavior.

Changes:

  • Added data/uri handling in GeminiChatClient._convert_message_contents via a new _convert_data_or_uri_content helper.
  • Improved observability by warning (instead of debug-only) when multimodal content can’t be converted.
  • Added tests covering inline data (from_data, data-URI from_uri), external URIs, mixed text+image, and the non-base64 data-URI skip path.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File Description
python/packages/gemini/agent_framework_gemini/_chat_client.py Converts data/uri content into Gemini inline_data / file_data parts and warns on unconvertible inputs.
python/packages/gemini/tests/test_gemini_client.py Adds regression/unit tests for multimodal conversion paths and warning behavior.

Comment thread python/packages/gemini/agent_framework_gemini/_chat_client.py

@github-actions github-actions Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Automated Code Review

Reviewers: 5 | Confidence: 85%

✓ Correctness

This PR correctly fixes the Gemini multimodal content handling. The implementation properly handles data URIs (decoded to inline_data) and external URIs (file_data references). Edge cases (missing URI, non-base64 data URIs, decode failures) are handled with warnings. The only minor concern is that external URIs with no media_type pass None to types.Part.from_uri, which the author explicitly acknowledges as a design decision in the PR description.

✓ Security Reliability

The implementation correctly handles multimodal content conversion and has good defensive coding for the data-URI path (validating presence, base64 encoding, mime type, and catching decode failures). However, the external-URI path on line 724 can raise an unhandled ValueError from the Gemini SDK when content.media_type is None and the URL lacks a recognizable file extension, since Part.from_uri internally calls mimetypes.guess_type() and raises ValueError on failure. The data-URI path properly catches all exceptions and returns None with a warning, but the external-URI path lacks equivalent protection.

✓ Test Coverage

The new tests cover the primary happy paths well (from_data, data-URI from_uri, external from_uri, multimodal combo, and the non-base64 skip path). Assertions are meaningful — they verify specific attributes rather than just checking for no-exception. However, there are a few untested branches in the new _convert_data_or_uri_content helper: (1) external URI with media_type=None (a realistic scenario since the Content model allows it, passing None to the Gemini SDK), (2) the missing-URI warning path (line 704-706), and (3) the invalid-base64-decode warning path (line 718-720). The first gap is the most significant because the PR author explicitly asks reviewers to consider whether external URIs should require a media type.

✓ Failure Modes

The PR correctly fixes silent dropping of multimodal content. One operational failure mode exists: when an external URI has no media_type and the SDK cannot guess one from the URL, types.Part.from_uri raises an unhandled ValueError (line 724), crashing the request instead of gracefully skipping like all sibling error paths in the same method.

✓ Design Approach

The multimodal conversion approach is mostly aligned with the framework contract, but there is one non-blocking design gap: the new Gemini data-URI decoder uses permissive base64 decoding, some malformed data: payloads will still be accepted as empty/corupted bytes instead of taking the new warning-and-skip path.

Suggestions

  • Line 724: When content.media_type is None and the URI lacks a guessable file extension (e.g. presigned URLs, API endpoints), types.Part.from_uri raises ValueError. Consider wrapping in try/except or checking media_type upfront to match the defensive pattern used throughout the rest of the method.

Automated review by giles17's agents

Strip parameters (e.g. charset) from a data URI media type before passing it to Gemini, and wrap types.Part.from_uri so a URI with no media_type and no guessable extension is passed through as file_data without a mime type instead of raising ValueError. Adds tests for both paths.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@giles17 giles17 marked this pull request as ready for review June 25, 2026 21:22

@github-actions github-actions Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Automated Code Review

Reviewers: 5 | Confidence: 85% | Result: All clear

Reviewed: Correctness, Security Reliability, Test Coverage, Failure Modes, Design Approach


Automated review by giles17's agents

@giles17 giles17 enabled auto-merge June 25, 2026 21:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

python Usage: [Issues, PRs], Target: Python

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Python: [Bug]: GeminiChatClient silently drops image/file (data/uri) content — multimodal input never reaches the model

3 participants