Skip to content

Conversation

Copy link
Contributor

Copilot AI commented Nov 7, 2025

Address remaining feedback from PR #7025:

  • Add reference to Microsoft.Extensions.AI.Abstractions package
  • Replace manual CreateDataUri method with DataContent class
  • Use MemoryStream.GetBuffer() with Length instead of ToArray() to avoid array allocations
  • Update using statements
  • Verify all tests still pass
  • Code review completed
  • All tests verified
  • Address PR feedback: remove redundant reference and optimize memory usage
  • Optimize MemoryStream allocation with pre-sizing
  • Use Math.Min for safe int casting

Latest Changes (addressing @stephentoub feedback):

  • Use Math.Min(int.MaxValue, length) for safe casting to int when pre-sizing MemoryStream
  • Prevents potential overflow exceptions when dealing with very large files/streams (>2GB)
  • Applied to both FileStream.Length and Stream.Length casts
  • All 4 unit tests pass across all target frameworks (net8.0, net9.0, net10.0)

Benefits:

  • Code reuse: Leveraging existing, well-tested DataContent infrastructure
  • Better performance:
    • Eliminated all unnecessary array allocations
    • Pre-sized MemoryStream buffers to avoid reallocation during stream copying
  • Safer code: Handles edge cases with very large files gracefully
  • Cleaner code: Removed manual base64 encoding logic, pragma warnings, and redundant references
  • More concise: Consistent inline construction pattern for both overloads
Original prompt

Please submit a pr that addresses the remaining feedback on #7025 (comment)


💬 We'd love your input! Share your thoughts on Copilot coding agent in our 2 minute survey.

Microsoft Reviewers: Open in CodeFlow

- Added reference to Microsoft.Extensions.AI.Abstractions
- Replaced manual CreateDataUri method with DataContent class
- Use MemoryStream.GetBuffer() with Length instead of ToArray() to avoid array allocations
- All existing tests pass

Co-authored-by: stephentoub <2642209+stephentoub@users.noreply.github.com>
Copilot AI changed the title [WIP] Address feedback on pull request 7025 Use DataContent from Microsoft.Extensions.AI for data URI generation Nov 7, 2025
Copilot AI requested a review from stephentoub November 7, 2025 21:19
Copilot finished work on behalf of stephentoub November 7, 2025 21:19
- Remove explicit Microsoft.Extensions.AI.Abstractions reference (available transitively)
- Change fileBytes from byte[] to ReadOnlyMemory<byte> to avoid ToArray()
- Simplify Stream reading code with inline DataContent construction

Co-authored-by: stephentoub <2642209+stephentoub@users.noreply.github.com>
Copilot AI requested a review from stephentoub November 7, 2025 22:08
Copilot finished work on behalf of stephentoub November 7, 2025 22:08
- Seed MemoryStream with FileStream.Length for FileInfo overload
- Seed MemoryStream with source.Length when source.CanSeek for Stream overload
- Consolidate DataContent construction into single statement for FileInfo overload

Co-authored-by: stephentoub <2642209+stephentoub@users.noreply.github.com>
Copilot AI requested a review from stephentoub November 7, 2025 22:21
Copilot finished work on behalf of stephentoub November 7, 2025 22:21
using (FileStream fs = new(source.FullName, FileMode.Open, FileAccess.Read, FileShare.Read, 1, FileOptions.Asynchronous))
{
using MemoryStream ms = new();
using MemoryStream ms = new((int)fs.Length);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copilot, for these casts from Length to (int), let's instead do (int)Math.Min(int.MaxValue, fs.Length).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated both casts to use Math.Min(int.MaxValue, length) in c76000f.

@stephentoub stephentoub marked this pull request as ready for review November 7, 2025 22:25
Copilot AI review requested due to automatic review settings November 7, 2025 22:25
@stephentoub stephentoub enabled auto-merge (squash) November 7, 2025 22:25
- Replace (int)fs.Length with (int)Math.Min(int.MaxValue, fs.Length)
- Replace (int)source.Length with (int)Math.Min(int.MaxValue, source.Length)
- Prevents overflow when dealing with files/streams larger than int.MaxValue

Co-authored-by: stephentoub <2642209+stephentoub@users.noreply.github.com>
auto-merge was automatically disabled November 7, 2025 22:33

Head branch was pushed to by a user without write access

Copilot AI requested a review from stephentoub November 7, 2025 22:34
Copilot finished work on behalf of stephentoub November 7, 2025 22:34
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR refactors the MarkItDownMcpReader class to use the DataContent class from Microsoft.Extensions.AI for creating data URIs instead of manually constructing them. This eliminates code duplication and leverages existing infrastructure.

Key changes:

  • Replaced manual base64 data URI construction with DataContent class usage
  • Changed from byte[] to ReadOnlyMemory<byte> for better memory efficiency
  • Removed the custom CreateDataUri method and associated pragma warnings

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants