Skip to content

Conversation

@Cozmopolit
Copy link
Contributor

Motivation and Context

Fixes #13430

Currently, when a Semantic Kernel function returns ImageContent, it gets serialized to JSON - losing the binary image data and preventing multimodal-capable models from processing the image.

This PR enables ImageContent preservation in tool/function results, allowing connectors with multimodal capabilities (Gemini 3+, Anthropic) to pass images natively to the model. This is essential for agentic workflows where tools generate or process images that the model needs to analyze.

Description

FunctionCallsProcessor (shared infrastructure)

  • Changed ProcessFunctionResult() return type from string to object
  • Added early return for ImageContent to preserve it for multimodal-capable connectors
  • Added ImageContentNotSupportedErrorMessage constant for consistent error messaging

Google Gemini Connector (native support)

  • Extended FunctionResponsePart with Parts property for nested multimodal content
  • Added FunctionResponsePartContent class with InlineData support
  • Implemented CreateImageFunctionResponsePart() to convert ImageContent to Gemini's native inlineData format

OpenAI Connector (error handling)

  • Added ImageContent check with clear error message (API does not support images in tool results)

OpenAI Agents (error handling)

  • Added GetFunctionResultAsString() helper with ImageContent error handling

Amazon Bedrock Agents (error handling)

  • Added GetFunctionResultAsString() helper with ImageContent error handling

New Unit Tests

Test File
ItShouldPreserveImageContentWithoutSerialization FunctionCallsProcessorTests.cs
FromChatHistoryImageContentInToolResultCreatesInlineDataPart GeminiRequestTests.cs
FromChatHistoryImageContentWithoutDataThrowsInvalidOperationException GeminiRequestTests.cs
FromChatHistoryImageContentWithoutMimeTypeThrowsInvalidOperationException GeminiRequestTests.cs
VerifyAssistantMessageAdapterGetMessageWithImageContentInFunctionResult AssistantMessageFactoryTests.cs

Notes

  • No breaking changes: All changes are to internal APIs. The public FunctionResultContent.Result property is already object?.
  • Gemini version detection: The connector does not check the model version. If an older Gemini model does not support functionResponse.parts, the API will return an appropriate error.
  • URI-based ImageContent: Only ImageContent with binary data is supported. URI-based ImageContent will throw InvalidOperationException.
  • MistralAI: Out of scope - has its own ProcessFunctionResult implementation.
  • Related: Prepares infrastructure for PR .Net: feat: Add Anthropic Connector for Claude models #13419 (Anthropic Connector multimodal support)

Contribution Checklist

sk-pr-MultimodalToolResults.md

Enable ImageContent preservation in function results for multimodal-capable
connectors (Gemini 3+). Non-supporting connectors return clear error message.

Changes:
- FunctionCallsProcessor: Return object instead of string, preserve ImageContent
- Gemini: Native support via FunctionResponse.Parts with inlineData
- OpenAI/Bedrock Agents: Error handling with ImageContentNotSupportedErrorMessage

Includes 5 new unit tests for ImageContent handling.

Fixes microsoft#13430
@Cozmopolit Cozmopolit requested a review from a team as a code owner December 22, 2025 21:08
@moonbox3 moonbox3 added .NET Issue or Pull requests regarding .NET code kernel Issues or pull requests impacting the core kernel labels Dec 22, 2025
@github-actions github-actions bot changed the title feat(connectors): Support ImageContent in tool/function results .Net: feat(connectors): Support ImageContent in tool/function results Dec 22, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

kernel Issues or pull requests impacting the core kernel .NET Issue or Pull requests regarding .NET code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

.Net: Bug: ImageContent in tool/function results is serialized to JSON instead of native format

2 participants