Conversation
…mentations and tests Agent-Logs-Url: https://github.com/dotnet/runtime/sessions/bb3ce72c-ce34-45fe-b196-16d776008a31 Co-authored-by: adamsitnik <6011991+adamsitnik@users.noreply.github.com>
…ary await in test helper Agent-Logs-Url: https://github.com/dotnet/runtime/sessions/bb3ce72c-ce34-45fe-b196-16d776008a31 Co-authored-by: adamsitnik <6011991+adamsitnik@users.noreply.github.com>
|
Tagging subscribers to this area: @dotnet/area-system-diagnostics-process |
There was a problem hiding this comment.
Pull request overview
Adds a new synchronous Process.ReadAllLines(TimeSpan? timeout = default) API that multiplexes stdout/stderr and yields interleaved ProcessOutputLine instances without deadlocks, plus updates tests and the ref assembly.
Changes:
- Introduces
Process.ReadAllLines(...)and shared buffer-to-line parsing helpers. - Implements platform-specific synchronous multiplexing iterators for Windows (overlapped I/O +
WaitAny) and Unix (poll(2)+ non-blocking reads). - Extends existing streaming tests to validate both sync and async line-reading paths.
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| src/libraries/System.Diagnostics.Process/tests/ProcessStreamingTests.cs | Expands ReadAllLinesAsync test coverage to also exercise the new sync API. |
| src/libraries/System.Diagnostics.Process/src/System/Diagnostics/Process.Multiplexing.cs | Adds ReadAllLines(...) entrypoint and shared helpers for parsing/compacting line buffers. |
| src/libraries/System.Diagnostics.Process/src/System/Diagnostics/Process.Multiplexing.Windows.cs | Adds Windows-specific synchronous line multiplexing iterator using overlapped I/O. |
| src/libraries/System.Diagnostics.Process/src/System/Diagnostics/Process.Multiplexing.Unix.cs | Adds Unix-specific synchronous line multiplexing iterator using poll and non-blocking reads. |
| src/libraries/System.Diagnostics.Process/ref/System.Diagnostics.Process.cs | Adds the ReadAllLines(...) signature to the reference assembly. |
…UTF-16/UTF-32 support; fix Windows use-after-free by canceling pending IO in finally Agent-Logs-Url: https://github.com/dotnet/runtime/sessions/a9eadcb8-dd77-4933-9877-920b21513e25 Co-authored-by: adamsitnik <6011991+adamsitnik@users.noreply.github.com>
…plexing.Unix.cs Agent-Logs-Url: https://github.com/dotnet/runtime/sessions/a9eadcb8-dd77-4933-9877-920b21513e25 Co-authored-by: adamsitnik <6011991+adamsitnik@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 5 out of 5 changed files in this pull request and generated 2 comments.
Comments suppressed due to low confidence (1)
src/libraries/System.Diagnostics.Process/tests/ProcessStreamingTests.cs:213
- The new sync path adds incremental decoding/line-splitting logic that’s meant to support multi-byte encodings (UTF-16/UTF-32, partial sequences across reads, BOM). The updated tests don’t currently exercise
StandardOutputEncoding/StandardErrorEncodingvariations; adding a theory that runs both sync and async with non-default encodings would help prevent regressions in the decoder/line parsing behavior.
public async Task ReadAllLines_ReadsBothOutputAndError(string standardOutput, string standardError, bool useAsync)
{
using Process process = StartLinePrintingProcess(
string.IsNullOrEmpty(standardOutput) ? DontPrintAnything : standardOutput,
string.IsNullOrEmpty(standardError) ? DontPrintAnything : standardError);
…UTF-16/UTF-32 Agent-Logs-Url: https://github.com/dotnet/runtime/sessions/4a5f20d7-dcd7-415c-b5c8-e15806a7e5c0 Co-authored-by: adamsitnik <6011991+adamsitnik@users.noreply.github.com>
Agent-Logs-Url: https://github.com/dotnet/runtime/sessions/4a5f20d7-dcd7-415c-b5c8-e15806a7e5c0 Co-authored-by: adamsitnik <6011991+adamsitnik@users.noreply.github.com>
…reparePollFds Agent-Logs-Url: https://github.com/dotnet/runtime/sessions/bfdc186c-a30a-4294-99ec-6b264c27af2f Co-authored-by: adamsitnik <6011991+adamsitnik@users.noreply.github.com>
adamsitnik
left a comment
There was a problem hiding this comment.
LGTM
@tmds could you please review the Unix part?
@JeremyKuhne could you please review the Windows part?
The multiplexing is the same as in #126807, the most tricky part is the decoding.
@tarekgh could you please review the decoding part?
🤖 Copilot Code Review — PR #127106Note This review was generated by Copilot (Claude Opus 4.6) with additional analysis from Claude Sonnet 4.5. GPT-5.3-Codex was launched but did not complete within the timeout window. Holistic AssessmentMotivation: Justified. Adds the synchronous Approach: Sound. The implementation follows the established patterns in Summary: Detailed Findings✅ API Approval — Matches approved shape exactlyThe ✅ Resource Management — Correct buffer lifecycle
✅ Line Parsing Correctness — Handles \r, \n, \r\n correctly
✅ Encoding Handling — Decoder-based approach is correctUsing ✅ Unsafe Code in Iterators — NativeOverlapped as nint is safeOn Windows, ✅ Refactored Unix Helpers — Good code reuse
|
JeremyKuhne
left a comment
There was a problem hiding this comment.
Windows changes look good to me
- CancelPendingIOIfNeeded is called anyway from the finally block - move common logic for handling EOF to a helper method - move common logic for handling bytes read to a helper method - test improvements
| SafePipeHandle outputHandle = GetSafeHandleFromStreamReader(_standardOutput!); | ||
| SafePipeHandle errorHandle = GetSafeHandleFromStreamReader(_standardError!); | ||
|
|
||
| byte[] outputByteBuffer = ArrayPool<byte>.Shared.Rent(InitialReadAllBufferSize); |
There was a problem hiding this comment.
I think we don't need separate byte buffers for output and error?
And if 4096 is considered ok for stackallocing, we don't need to rent it either.
There was a problem hiding this comment.
I think we don't need separate byte buffers for output and error?
@copilot use a single byte buffer since we don't perform more than one read at once
There was a problem hiding this comment.
Done in 7c196cc. ReadPipesToLines now uses a single byteBuffer shared between both pipes since we only perform one read at a time.
| byte[] outputByteBuffer = ArrayPool<byte>.Shared.Rent(InitialReadAllBufferSize); | ||
| byte[] errorByteBuffer = ArrayPool<byte>.Shared.Rent(InitialReadAllBufferSize); | ||
| char[] outputCharBuffer = ArrayPool<char>.Shared.Rent(InitialReadAllBufferSize); | ||
| char[] errorCharBuffer = ArrayPool<char>.Shared.Rent(InitialReadAllBufferSize); |
There was a problem hiding this comment.
These might start out as Array.Empty, not sure if it's a meaningful improvement. It would avoid renting arrays for empty output/error.
There was a problem hiding this comment.
I expect an empty std out/err to be rare compared to having some output. And ArrayPool was optimized so many times it's really fast to get those arrays (especially when it compared to process startup time and the read sys-calls)
| } | ||
|
|
||
| // Cannot use stackalloc in an iterator method; use a regular array. | ||
| Interop.PollEvent[] pollFds = new Interop.PollEvent[2]; |
There was a problem hiding this comment.
nit: PollForPipeActivity could stackalloc this.
There was a problem hiding this comment.
nit: PollForPipeActivity could stackalloc this.
I am not sure if this would be possible, as we need to read from this buffer after PollForPipeActivity exits
| } | ||
| else if (bytesRead == 0) | ||
| { | ||
| done = FlushDecoderAndEmitRemainingChars(decoder, ref charBuffer, ref charStart, ref charEnd, standardError, lines); |
There was a problem hiding this comment.
nit: the method that is called always returns true, rather than have to look at the implementation to know that, it might be a void method and this can assign done = true after calling it. Same at the other call sites.
| /// </param> | ||
| private static void SkipBomIfPresent(char[] charBuffer, int endIndex, ref int startIndex) | ||
| { | ||
| if (startIndex < endIndex && charBuffer[startIndex] == '\uFEFF') |
There was a problem hiding this comment.
suggestion: I assume instead of startIndex < endIndex, this might be: Debug.Assert(startIndex == 0 && endIndex > startIndex);.
|
|
||
| while (charEndIndex + charCount > charBuffer.Length) | ||
| { | ||
| RentLargerBuffer(ref charBuffer, charEndIndex); |
There was a problem hiding this comment.
nit: consider passing to RentLargerBuffer the minimum required size rather than looping.
| /// After line parsing, compacts remaining data to the front of the char buffer if it has reached | ||
| /// the end, or rents a larger buffer if the entire buffer is filled with a single incomplete line. | ||
| /// </summary> | ||
| private static void CompactOrGrowCharBuffer(ref char[] buffer, ref int startIndex, ref int endIndex) |
There was a problem hiding this comment.
Doesn't DecodeAndAppendChars do this compacting and growing already?
There was a problem hiding this comment.
Doesn't
DecodeAndAppendCharsdo this compacting and growing already?
@copilot answer this question
There was a problem hiding this comment.
Yes, DecodeAndAppendChars handles compaction and growth for the decode step (ensuring there's room for newly decoded chars). CompactOrGrowCharBuffer handles compaction and growth for the line parsing step — after ParseLinesFromCharBuffer consumes complete lines, the remaining incomplete line data may need to be compacted or the buffer grown if a single line fills the entire buffer. They serve different phases: decode vs. post-parse.
| /// <summary> | ||
| /// If the first character at <paramref name="startIndex"/> is the Unicode BOM (U+FEFF), | ||
| /// advances <paramref name="startIndex"/> past it. Called once per stream after the first | ||
| /// decode that produces characters, to match <see cref="StreamReader"/> BOM-stripping behavior. |
There was a problem hiding this comment.
I think this is reproducing the StreamReader removing bytes while trying to detect encoding in:
That code is handling additional byte sequences. Should those also be handled?
There was a problem hiding this comment.
Before that, it first checks the Preamble:
runtime/src/libraries/System.Private.CoreLib/src/System/IO/StreamReader.cs
Lines 629 to 632 in 333fe88
There was a problem hiding this comment.
I think this is reproducing the StreamReader removing bytes while trying to detect encoding in:
runtime/src/libraries/System.Private.CoreLib/src/System/IO/StreamReader.cs
Line 471 in 333fe88
private void DetectEncoding()
That code is handling additional byte sequences. Should those also be handled?
@copilot please study the referenced code and check if other Process APIs that involve encoding do detect the encoding. For now just provide an answer.
There was a problem hiding this comment.
StreamReader.DetectEncoding() handles byte-level BOM detection to auto-detect the encoding from raw bytes (UTF-8 BOM EF BB BF, UTF-16 LE FF FE, UTF-16 BE FE FF, UTF-32 LE FF FE 00 00, UTF-32 BE 00 00 FE FF). It changes the encoding and creates a new decoder when a BOM is found.
In the ReadAllLines/ReadAllLinesAsync path, the encoding is already known — it's explicitly set via StandardOutputEncoding/StandardErrorEncoding (or defaults to the system encoding). The Decoder is created from that known encoding. We only need to strip the character-level BOM (U+FEFF) that the decoder produces from byte-level BOMs, which is what SkipBomIfPresent does.
The other Process APIs (ReadAllText, ReadAllBytes, ReadAllTextAsync) also use explicitly-set encodings and don't perform byte-level encoding detection. So the current approach is consistent — no additional byte-level BOM handling is needed.
| int charCount = decoder.GetCharCount(byteBuffer, byteIndex, byteCount, flush); | ||
| if (charCount == 0) | ||
| { | ||
| return; |
There was a problem hiding this comment.
GetChars is never called, partial bytes are loss. either use
if (charCount == 0 && byteCount == 0) returnor simply remove this early return block since the loop (while (charEndIndex + 0 > ...)) is trivially skipped when charCount == 0.
There was a problem hiding this comment.
GetChars is never called, partial bytes are loss. either use
if (charCount == 0 && byteCount == 0) return
or simply remove this early return block since the loop(while (charEndIndex + 0 > ...))is trivially skipped whencharCount == 0.
@copilot address this feedback
There was a problem hiding this comment.
Fixed in 7c196cc. The early return now checks charCount == 0 && byteCount == 0 so that GetChars is still called when there are input bytes (preserving partial byte sequences in the decoder's internal state).
| [InlineData("utf-16", false)] | ||
| [InlineData("utf-32", true)] | ||
| [InlineData("utf-32", false)] | ||
| public async Task ReadAllLines_WorksWithNonDefaultEncodings(string encodingName, bool useAsync) |
There was a problem hiding this comment.
It would be good to test double bytes encodings too. It will be interesting with the decoder usage with the encoding.
There was a problem hiding this comment.
It would be good to test double bytes encodings too. It will be interesting with the decoder usage with the encoding.
@copilot address this feedback and extend the test coverage with such encodings. Make sure the tests are passing locally before you push any changes
There was a problem hiding this comment.
Added ReadAllLines_WorksWithMultiByteCharacters theory in 7c196cc. It uses CJK characters (U+4E16 U+754C = "世界") which require 3 bytes in UTF-8, 2 bytes in UTF-16, and 4 bytes in UTF-32, exercising multi-byte decoder state across reads for all three encodings with both sync and async paths.
adamsitnik
left a comment
There was a problem hiding this comment.
@copilot please address my feedback where I've tagged you directly
| [InlineData("utf-16", false)] | ||
| [InlineData("utf-32", true)] | ||
| [InlineData("utf-32", false)] | ||
| public async Task ReadAllLines_WorksWithNonDefaultEncodings(string encodingName, bool useAsync) |
There was a problem hiding this comment.
It would be good to test double bytes encodings too. It will be interesting with the decoder usage with the encoding.
@copilot address this feedback and extend the test coverage with such encodings. Make sure the tests are passing locally before you push any changes
| SafePipeHandle outputHandle = GetSafeHandleFromStreamReader(_standardOutput!); | ||
| SafePipeHandle errorHandle = GetSafeHandleFromStreamReader(_standardError!); | ||
|
|
||
| byte[] outputByteBuffer = ArrayPool<byte>.Shared.Rent(InitialReadAllBufferSize); |
There was a problem hiding this comment.
I think we don't need separate byte buffers for output and error?
@copilot use a single byte buffer since we don't perform more than one read at once
| byte[] outputByteBuffer = ArrayPool<byte>.Shared.Rent(InitialReadAllBufferSize); | ||
| byte[] errorByteBuffer = ArrayPool<byte>.Shared.Rent(InitialReadAllBufferSize); | ||
| char[] outputCharBuffer = ArrayPool<char>.Shared.Rent(InitialReadAllBufferSize); | ||
| char[] errorCharBuffer = ArrayPool<char>.Shared.Rent(InitialReadAllBufferSize); |
There was a problem hiding this comment.
I expect an empty std out/err to be rare compared to having some output. And ArrayPool was optimized so many times it's really fast to get those arrays (especially when it compared to process startup time and the read sys-calls)
| } | ||
|
|
||
| // Cannot use stackalloc in an iterator method; use a regular array. | ||
| Interop.PollEvent[] pollFds = new Interop.PollEvent[2]; |
There was a problem hiding this comment.
nit: PollForPipeActivity could stackalloc this.
I am not sure if this would be possible, as we need to read from this buffer after PollForPipeActivity exits
| int charCount = decoder.GetCharCount(byteBuffer, byteIndex, byteCount, flush); | ||
| if (charCount == 0) | ||
| { | ||
| return; |
There was a problem hiding this comment.
GetChars is never called, partial bytes are loss. either use
if (charCount == 0 && byteCount == 0) return
or simply remove this early return block since the loop(while (charEndIndex + 0 > ...))is trivially skipped whencharCount == 0.
@copilot address this feedback
| /// After line parsing, compacts remaining data to the front of the char buffer if it has reached | ||
| /// the end, or rents a larger buffer if the entire buffer is filled with a single incomplete line. | ||
| /// </summary> | ||
| private static void CompactOrGrowCharBuffer(ref char[] buffer, ref int startIndex, ref int endIndex) |
There was a problem hiding this comment.
Doesn't
DecodeAndAppendCharsdo this compacting and growing already?
@copilot answer this question
| /// <summary> | ||
| /// If the first character at <paramref name="startIndex"/> is the Unicode BOM (U+FEFF), | ||
| /// advances <paramref name="startIndex"/> past it. Called once per stream after the first | ||
| /// decode that produces characters, to match <see cref="StreamReader"/> BOM-stripping behavior. |
There was a problem hiding this comment.
I think this is reproducing the StreamReader removing bytes while trying to detect encoding in:
runtime/src/libraries/System.Private.CoreLib/src/System/IO/StreamReader.cs
Line 471 in 333fe88
private void DetectEncoding()
That code is handling additional byte sequences. Should those also be handled?
@copilot please study the referenced code and check if other Process APIs that involve encoding do detect the encoding. For now just provide an answer.
…, add multi-byte encoding tests Agent-Logs-Url: https://github.com/dotnet/runtime/sessions/c08a9de7-b817-42f5-9272-0d54c0a69ad3 Co-authored-by: adamsitnik <6011991+adamsitnik@users.noreply.github.com>
Description
Implements
Process.ReadAllLines(TimeSpan? timeout = default)— a synchronous, single-threaded multiplexing API that reads interleaved lines from both stdout and stderr without deadlocks.API approved in #125838 (comment).
Cross-platform core (
Process.Multiplexing.cs)ReadAllLines()validates state eagerly viaValidateReadAllState, createsDecoderinstances per stream, and delegates to platform-specificReadPipesToLinesiteratorDecodeAndAppendChars(incremental byte→char decoding viaDecoder),ParseLinesFromCharBuffer(scans for'\r'or'\n'viaIndexOfAny, handles'\r\n'as a single terminator),EmitRemainingCharsAsLine(EOF partial line),CompactOrGrowCharBuffer(compact or grow via genericRentLargerBuffer<T>)DecodeAndAppendCharsacceptsref int charStartIndexand always compacts the char buffer (moving unconsumed data to the front) before considering growth whencharStartIndex > 0, ensuringRentLargerBufferonly copies the unconsumed tail rather than the entire buffer including the consumed prefixDecodeAndAppendCharsearly return guard checks bothcharCount == 0andbyteCount == 0to avoid losing partial bytes that theDecoderhas buffered internally but hasn't yet produced output chars for (e.g., first byte of a multi-byte UTF-16/UTF-32 sequence)SkipBomIfPresentstrips the Unicode BOM (U+FEFF) from the start of each stream's char buffer after the first decode, matchingStreamReaderBOM-stripping behavior used by the async pathchar[]buffers usingEncoding.GetDecoder()per stream, correctly handling multi-byte encodings (UTF-16, UTF-32) and partial sequences across reads — byte-level0x0Ascanning is not usedRentLargerBuffer<T>is a single generic helper used for bothbyte[]andchar[]buffer growth (no duplication)Windows (
Process.Multiplexing.Windows.cs)ReadPipesToLinesiterator using overlapped I/O withWaitAnymultiplexingNativeOverlapped*asnintto avoid pointer fields in iterator state machine; casts back inside scopedunsafeblocks (C# 13 allows unsafe in iterators, butyield returnmust be in safe context)DangerousAddRef/DangerousReleaseneeded — all Windows syscalls acceptSafeHandledirectlyUnix (
Process.Multiplexing.Unix.cs)ReadPipesToLinesiterator usingpoll(2)with non-blocking readsPollForPipeActivityhelper acceptsSpan<Interop.PollEvent>and encapsulatesPreparePollFds+ timeout check + inlinepollcall (singletriggeredvariable) + error/EINTR handling, shared by bothReadPipesToLinesandReadPipesto eliminate code duplicationReadPipes(non-iterator) usesstackallocfor the poll event buffer to avoid heap allocation, while the iterator-basedReadPipesToLinespasses a heap-allocated array (implicitly converted toSpan)HandlePipeLineReadhelper encapsulates the read → decode → BOM check → parse lines → compact flowTests
ReadAllLines_WorksWithNonDefaultEncodingstheory tests UTF-8, UTF-16, and UTF-32 encodings for both sync and async paths, asserting identical outputReadAllLines_WorksWithMultiByteCharacterstheory tests multi-byte character handling with UTF-8 (2-byte, 3-byte, 4-byte sequences), UTF-16 (surrogate pairs), and UTF-32, verifying correct decoding when multi-byte sequences may be split across read boundariesReadAllLines_ReadsVeryLongLinestheory tests buffer growth by producing lines over 8192 characters on both stdout and stderrEnumerateLinestest helper populatesList<string> capturedOutput, List<string> capturedErrordirectlyBuild fix
ToTimeoutMillisecondscall toProcessUtils.ToTimeoutMillisecondsto match upstream refactor onmainthat moved this method fromProcesstoProcessUtils