-
Notifications
You must be signed in to change notification settings - Fork 841
Use DataContent from Microsoft.Extensions.AI for data URI generation #7027
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
- Added reference to Microsoft.Extensions.AI.Abstractions - Replaced manual CreateDataUri method with DataContent class - Use MemoryStream.GetBuffer() with Length instead of ToArray() to avoid array allocations - All existing tests pass Co-authored-by: stephentoub <2642209+stephentoub@users.noreply.github.com>
...oft.Extensions.DataIngestion.MarkItDown/Microsoft.Extensions.DataIngestion.MarkItDown.csproj
Outdated
Show resolved
Hide resolved
src/Libraries/Microsoft.Extensions.DataIngestion.MarkItDown/MarkItDownMcpReader.cs
Outdated
Show resolved
Hide resolved
src/Libraries/Microsoft.Extensions.DataIngestion.MarkItDown/MarkItDownMcpReader.cs
Outdated
Show resolved
Hide resolved
- Remove explicit Microsoft.Extensions.AI.Abstractions reference (available transitively) - Change fileBytes from byte[] to ReadOnlyMemory<byte> to avoid ToArray() - Simplify Stream reading code with inline DataContent construction Co-authored-by: stephentoub <2642209+stephentoub@users.noreply.github.com>
src/Libraries/Microsoft.Extensions.DataIngestion.MarkItDown/MarkItDownMcpReader.cs
Outdated
Show resolved
Hide resolved
src/Libraries/Microsoft.Extensions.DataIngestion.MarkItDown/MarkItDownMcpReader.cs
Outdated
Show resolved
Hide resolved
src/Libraries/Microsoft.Extensions.DataIngestion.MarkItDown/MarkItDownMcpReader.cs
Outdated
Show resolved
Hide resolved
- Seed MemoryStream with FileStream.Length for FileInfo overload - Seed MemoryStream with source.Length when source.CanSeek for Stream overload - Consolidate DataContent construction into single statement for FileInfo overload Co-authored-by: stephentoub <2642209+stephentoub@users.noreply.github.com>
| using (FileStream fs = new(source.FullName, FileMode.Open, FileAccess.Read, FileShare.Read, 1, FileOptions.Asynchronous)) | ||
| { | ||
| using MemoryStream ms = new(); | ||
| using MemoryStream ms = new((int)fs.Length); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@copilot, for these casts from Length to (int), let's instead do (int)Math.Min(int.MaxValue, fs.Length).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated both casts to use Math.Min(int.MaxValue, length) in c76000f.
- Replace (int)fs.Length with (int)Math.Min(int.MaxValue, fs.Length) - Replace (int)source.Length with (int)Math.Min(int.MaxValue, source.Length) - Prevents overflow when dealing with files/streams larger than int.MaxValue Co-authored-by: stephentoub <2642209+stephentoub@users.noreply.github.com>
Head branch was pushed to by a user without write access
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR refactors the MarkItDownMcpReader class to use the DataContent class from Microsoft.Extensions.AI for creating data URIs instead of manually constructing them. This eliminates code duplication and leverages existing infrastructure.
Key changes:
- Replaced manual base64 data URI construction with
DataContentclass usage - Changed from
byte[]toReadOnlyMemory<byte>for better memory efficiency - Removed the custom
CreateDataUrimethod and associated pragma warnings
Address remaining feedback from PR #7025:
Latest Changes (addressing @stephentoub feedback):
Math.Min(int.MaxValue, length)for safe casting to int when pre-sizing MemoryStreamBenefits:
DataContentinfrastructureOriginal prompt
💬 We'd love your input! Share your thoughts on Copilot coding agent in our 2 minute survey.
Microsoft Reviewers: Open in CodeFlow