Skip to content

Add Process.ReadAllLines synchronous API#127106

Open
Copilot wants to merge 19 commits intomainfrom
copilot/add-readalllines-to-process
Open

Add Process.ReadAllLines synchronous API#127106
Copilot wants to merge 19 commits intomainfrom
copilot/add-readalllines-to-process

Conversation

Copy link
Copy Markdown
Contributor

Copilot AI commented Apr 18, 2026

Description

Implements Process.ReadAllLines(TimeSpan? timeout = default) — a synchronous, single-threaded multiplexing API that reads interleaved lines from both stdout and stderr without deadlocks.

API approved in #125838 (comment).

using Process process = Process.Start(startInfo);
foreach (ProcessOutputLine line in process.ReadAllLines())
{
    if (line.StandardError)
        Console.ForegroundColor = ConsoleColor.Red;
    Console.WriteLine(line.Content);
    Console.ResetColor();
}

Cross-platform core (Process.Multiplexing.cs)

  • ReadAllLines() validates state eagerly via ValidateReadAllState, creates Decoder instances per stream, and delegates to platform-specific ReadPipesToLines iterator
  • Shared char-level helpers: DecodeAndAppendChars (incremental byte→char decoding via Decoder), ParseLinesFromCharBuffer (scans for '\r' or '\n' via IndexOfAny, handles '\r\n' as a single terminator), EmitRemainingCharsAsLine (EOF partial line), CompactOrGrowCharBuffer (compact or grow via generic RentLargerBuffer<T>)
  • DecodeAndAppendChars accepts ref int charStartIndex and always compacts the char buffer (moving unconsumed data to the front) before considering growth when charStartIndex > 0, ensuring RentLargerBuffer only copies the unconsumed tail rather than the entire buffer including the consumed prefix
  • DecodeAndAppendChars early return guard checks both charCount == 0 and byteCount == 0 to avoid losing partial bytes that the Decoder has buffered internally but hasn't yet produced output chars for (e.g., first byte of a multi-byte UTF-16/UTF-32 sequence)
  • SkipBomIfPresent strips the Unicode BOM (U+FEFF) from the start of each stream's char buffer after the first decode, matching StreamReader BOM-stripping behavior used by the async path
  • Line parsing operates on char[] buffers using Encoding.GetDecoder() per stream, correctly handling multi-byte encodings (UTF-16, UTF-32) and partial sequences across reads — byte-level 0x0A scanning is not used
  • RentLargerBuffer<T> is a single generic helper used for both byte[] and char[] buffer growth (no duplication)

Windows (Process.Multiplexing.Windows.cs)

  • ReadPipesToLines iterator using overlapped I/O with WaitAny multiplexing
  • Stores NativeOverlapped* as nint to avoid pointer fields in iterator state machine; casts back inside scoped unsafe blocks (C# 13 allows unsafe in iterators, but yield return must be in safe context)
  • Error event at index 0 in wait handle array ensures stderr is processed first when both streams have data
  • No DangerousAddRef/DangerousRelease needed — all Windows syscalls accept SafeHandle directly

Unix (Process.Multiplexing.Unix.cs)

  • ReadPipesToLines iterator using poll(2) with non-blocking reads
  • Uses a single shared byte buffer for both stdout and stderr reads (only one read is performed at a time), while maintaining separate char buffers per stream for incremental decoding state
  • PollForPipeActivity helper accepts Span<Interop.PollEvent> and encapsulates PreparePollFds + timeout check + inline poll call (single triggered variable) + error/EINTR handling, shared by both ReadPipesToLines and ReadPipes to eliminate code duplication
  • ReadPipes (non-iterator) uses stackalloc for the poll event buffer to avoid heap allocation, while the iterator-based ReadPipesToLines passes a heap-allocated array (implicitly converted to Span)
  • HandlePipeLineRead helper encapsulates the read → decode → BOM check → parse lines → compact flow

Tests

  • Extended existing streaming tests to validate both sync and async line-reading paths
  • ReadAllLines_WorksWithNonDefaultEncodings theory tests UTF-8, UTF-16, and UTF-32 encodings for both sync and async paths, asserting identical output
  • ReadAllLines_WorksWithMultiByteCharacters theory tests multi-byte character handling with UTF-8 (2-byte, 3-byte, 4-byte sequences), UTF-16 (surrogate pairs), and UTF-32, verifying correct decoding when multi-byte sequences may be split across read boundaries
  • ReadAllLines_ReadsVeryLongLines theory tests buffer growth by producing lines over 8192 characters on both stdout and stderr
  • EnumerateLines test helper populates List<string> capturedOutput, List<string> capturedError directly

Build fix

  • Updated ToTimeoutMilliseconds call to ProcessUtils.ToTimeoutMilliseconds to match upstream refactor on main that moved this method from Process to ProcessUtils

Copilot AI and others added 2 commits April 17, 2026 08:00
…mentations and tests

Agent-Logs-Url: https://github.com/dotnet/runtime/sessions/bb3ce72c-ce34-45fe-b196-16d776008a31

Co-authored-by: adamsitnik <6011991+adamsitnik@users.noreply.github.com>
…ary await in test helper

Agent-Logs-Url: https://github.com/dotnet/runtime/sessions/bb3ce72c-ce34-45fe-b196-16d776008a31

Co-authored-by: adamsitnik <6011991+adamsitnik@users.noreply.github.com>
Copilot AI review requested due to automatic review settings April 18, 2026 14:12
Copilot AI review requested due to automatic review settings April 18, 2026 14:12
@dotnet-policy-service
Copy link
Copy Markdown
Contributor

Tagging subscribers to this area: @dotnet/area-system-diagnostics-process
See info in area-owners.md if you want to be subscribed.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a new synchronous Process.ReadAllLines(TimeSpan? timeout = default) API that multiplexes stdout/stderr and yields interleaved ProcessOutputLine instances without deadlocks, plus updates tests and the ref assembly.

Changes:

  • Introduces Process.ReadAllLines(...) and shared buffer-to-line parsing helpers.
  • Implements platform-specific synchronous multiplexing iterators for Windows (overlapped I/O + WaitAny) and Unix (poll(2) + non-blocking reads).
  • Extends existing streaming tests to validate both sync and async line-reading paths.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
src/libraries/System.Diagnostics.Process/tests/ProcessStreamingTests.cs Expands ReadAllLinesAsync test coverage to also exercise the new sync API.
src/libraries/System.Diagnostics.Process/src/System/Diagnostics/Process.Multiplexing.cs Adds ReadAllLines(...) entrypoint and shared helpers for parsing/compacting line buffers.
src/libraries/System.Diagnostics.Process/src/System/Diagnostics/Process.Multiplexing.Windows.cs Adds Windows-specific synchronous line multiplexing iterator using overlapped I/O.
src/libraries/System.Diagnostics.Process/src/System/Diagnostics/Process.Multiplexing.Unix.cs Adds Unix-specific synchronous line multiplexing iterator using poll and non-blocking reads.
src/libraries/System.Diagnostics.Process/ref/System.Diagnostics.Process.cs Adds the ReadAllLines(...) signature to the reference assembly.

Comment thread src/libraries/System.Diagnostics.Process/tests/ProcessStreamingTests.cs Outdated
Copilot AI and others added 2 commits April 19, 2026 12:15
…UTF-16/UTF-32 support; fix Windows use-after-free by canceling pending IO in finally

Agent-Logs-Url: https://github.com/dotnet/runtime/sessions/a9eadcb8-dd77-4933-9877-920b21513e25

Co-authored-by: adamsitnik <6011991+adamsitnik@users.noreply.github.com>
…plexing.Unix.cs

Agent-Logs-Url: https://github.com/dotnet/runtime/sessions/a9eadcb8-dd77-4933-9877-920b21513e25

Co-authored-by: adamsitnik <6011991+adamsitnik@users.noreply.github.com>
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated 2 comments.

Comments suppressed due to low confidence (1)

src/libraries/System.Diagnostics.Process/tests/ProcessStreamingTests.cs:213

  • The new sync path adds incremental decoding/line-splitting logic that’s meant to support multi-byte encodings (UTF-16/UTF-32, partial sequences across reads, BOM). The updated tests don’t currently exercise StandardOutputEncoding/StandardErrorEncoding variations; adding a theory that runs both sync and async with non-default encodings would help prevent regressions in the decoder/line parsing behavior.
        public async Task ReadAllLines_ReadsBothOutputAndError(string standardOutput, string standardError, bool useAsync)
        {
            using Process process = StartLinePrintingProcess(
                string.IsNullOrEmpty(standardOutput) ? DontPrintAnything : standardOutput,
                string.IsNullOrEmpty(standardError) ? DontPrintAnything : standardError);

Copilot AI and others added 2 commits April 19, 2026 12:47
…UTF-16/UTF-32

Agent-Logs-Url: https://github.com/dotnet/runtime/sessions/4a5f20d7-dcd7-415c-b5c8-e15806a7e5c0

Co-authored-by: adamsitnik <6011991+adamsitnik@users.noreply.github.com>
Agent-Logs-Url: https://github.com/dotnet/runtime/sessions/4a5f20d7-dcd7-415c-b5c8-e15806a7e5c0

Co-authored-by: adamsitnik <6011991+adamsitnik@users.noreply.github.com>
Copilot AI requested review from Copilot and removed request for Copilot April 19, 2026 12:48
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 6 out of 6 changed files in this pull request and generated 1 comment.

Copilot AI review requested due to automatic review settings April 24, 2026 21:04
Copy link
Copy Markdown
Member

@adamsitnik adamsitnik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@tmds could you please review the Unix part?
@JeremyKuhne could you please review the Windows part?

The multiplexing is the same as in #126807, the most tricky part is the decoding.

@tarekgh could you please review the decoding part?

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 6 out of 6 changed files in this pull request and generated 1 comment.

@github-actions
Copy link
Copy Markdown
Contributor

🤖 Copilot Code Review — PR #127106

Note

This review was generated by Copilot (Claude Opus 4.6) with additional analysis from Claude Sonnet 4.5. GPT-5.3-Codex was launched but did not complete within the timeout window.

Holistic Assessment

Motivation: Justified. Adds the synchronous ReadAllLines API that was approved in the API review of #125838. The API fills the gap between the existing async ReadAllLinesAsync and the synchronous ReadAllText/ReadAllBytes methods, enabling simple foreach-based consumption of interleaved stdout/stderr.

Approach: Sound. The implementation follows the established patterns in Process.Multiplexing.*.cs — poll-based multiplexing on Unix, overlapped IO on Windows — and reuses existing infrastructure (buffer management, timeout handling, handle safety). The char-level Decoder-based line parsing correctly handles multi-byte encodings and was introduced after reviewer feedback. Shared helpers (DecodeAndAppendChars, ParseLinesFromCharBuffer, CompactOrGrowCharBuffer, etc.) avoid duplication and are well-documented.

Summary: ⚠️ Needs Human Review. The code is correct and the API matches the approved shape. I found no bugs, but there are two design points worth human judgment: (1) the IEnumerable re-enumeration semantics, and (2) notable code duplication in the Windows path. Both are discussed below.


Detailed Findings

✅ API Approval — Matches approved shape exactly

The ref/ assembly change adds public IEnumerable<ProcessOutputLine> ReadAllLines(TimeSpan? timeout = default), which exactly matches the approved API shape from the bartonjs review comment. The linked issue has the api-approved label. No extra unapproved public surface was detected.

✅ Resource Management — Correct buffer lifecycle

RentLargerBuffer<T> (line 640 of Process.Multiplexing.cs) returns the old buffer to the pool before storing the new one in the ref parameter. Since iterator locals become state machine fields, the finally blocks in both ReadPipesToLines implementations correctly return the current (possibly grown) buffer, not the original. No double-return or leak. Verified for both Unix (lines 114–117) and Windows (lines 210–213).

✅ Line Parsing Correctness — Handles \r, \n, \r\n correctly

ParseLinesFromCharBuffer uses IndexOfAny('\r', '\n') and correctly:

  • Treats \r\n as a single terminator
  • Waits for more data when \r is at the buffer boundary (can't distinguish standalone \r from \r\n)
  • EmitRemainingCharsAsLine strips trailing \r at EOF, matching StreamReader.ReadLine behavior

✅ Encoding Handling — Decoder-based approach is correct

Using Encoding.GetDecoder() per stream ensures multi-byte encodings (UTF-16, UTF-32) are handled correctly across read boundaries. SkipBomIfPresent strips the BOM once per stream, matching StreamReader behavior.

✅ Unsafe Code in Iterators — NativeOverlapped as nint is safe

On Windows, NativeOverlapped* is stored as nint because iterator state machines can't have pointer fields. This is safe since the memory is unmanaged (NativeMemory.AllocZeroed) and the casts back to NativeOverlapped* occur inside scoped unsafe blocks. The finally block properly frees these via NativeMemory.Free.

✅ Refactored Unix Helpers — Good code reuse

PollForPipeActivity and PreparePollFds are shared between the new ReadPipesToLines iterator and the existing ReadPipes method. The order was changed (error FD first), which is a behavioral change for ReadPipes but has no observable effect since ReadPipes accumulates all bytes regardless of order.

⚠️ IEnumerable Re-enumeration Semantics — Needs human judgment

ReadAllLines() (line 144 of Process.Multiplexing.cs) calls ValidateReadAllState() eagerly (which sets both stream modes to SyncMode), then returns the ReadPipesToLines() iterator. This means:

  • A second call to ReadAllLines() correctly throws InvalidOperationException (mode is no longer Undefined)
  • But the returned IEnumerable<ProcessOutputLine> could theoretically be enumerated multiple times via .GetEnumerator(), with the second enumeration attempting to read from already-consumed pipes

This was previously noted in review comments. I believe this is acceptable since IEnumerable does not guarantee re-enumerability, but a human reviewer should confirm this is the intended contract.

⚠️ Windows ReadPipesToLines — Notable code duplication

The Windows ReadPipesToLines (lines 103–175 of Process.Multiplexing.Windows.cs) duplicates the decode → BOM check → parse → compact → re-queue logic between the isError and !isError branches. The Unix path avoids this with HandlePipeLineRead. The duplication exists because the Windows path interleaves overlapped IO operations with decoding, making extraction harder. This is a maintenance concern but not a correctness issue. A human reviewer should decide if the duplication is acceptable or if it should be refactored further.

💡 Test Coverage — Good, with minor gaps

Tests cover: basic output/error, interleaved output, large output, very long lines, BOM/encoding (UTF-8/UTF-16/UTF-32), cancellation/timeout, early break, and error paths. The useAsync parameter elegantly tests both sync and async paths in the same test methods.

Minor gaps that could be addressed in follow-up:

  • Empty lines ("line1\n\nline3") to verify empty strings are emitted
  • Standalone \r line endings (without \n)
  • Lines requiring multiple buffer growths (> 16KB per line)

💡 Generic RentLargerBuffer — Good improvement

Making RentLargerBuffer<T> generic and using Array.Copy instead of Buffer.BlockCopy is correct — BlockCopy only works with primitive types, while Array.Copy works with any type including char[]. This eliminates the need for a separate RentLargerCharBuffer method.

Generated by Code Review for issue #127106 ·

Copy link
Copy Markdown
Member

@JeremyKuhne JeremyKuhne left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Windows changes look good to me

- CancelPendingIOIfNeeded is called anyway from the finally block
- move common logic for handling EOF to a helper method
- move common logic for handling bytes read to a helper method
- test improvements
SafePipeHandle outputHandle = GetSafeHandleFromStreamReader(_standardOutput!);
SafePipeHandle errorHandle = GetSafeHandleFromStreamReader(_standardError!);

byte[] outputByteBuffer = ArrayPool<byte>.Shared.Rent(InitialReadAllBufferSize);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we don't need separate byte buffers for output and error?

And if 4096 is considered ok for stackallocing, we don't need to rent it either.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we don't need separate byte buffers for output and error?

@copilot use a single byte buffer since we don't perform more than one read at once

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in 7c196cc. ReadPipesToLines now uses a single byteBuffer shared between both pipes since we only perform one read at a time.

byte[] outputByteBuffer = ArrayPool<byte>.Shared.Rent(InitialReadAllBufferSize);
byte[] errorByteBuffer = ArrayPool<byte>.Shared.Rent(InitialReadAllBufferSize);
char[] outputCharBuffer = ArrayPool<char>.Shared.Rent(InitialReadAllBufferSize);
char[] errorCharBuffer = ArrayPool<char>.Shared.Rent(InitialReadAllBufferSize);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These might start out as Array.Empty, not sure if it's a meaningful improvement. It would avoid renting arrays for empty output/error.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I expect an empty std out/err to be rare compared to having some output. And ArrayPool was optimized so many times it's really fast to get those arrays (especially when it compared to process startup time and the read sys-calls)

}

// Cannot use stackalloc in an iterator method; use a regular array.
Interop.PollEvent[] pollFds = new Interop.PollEvent[2];
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: PollForPipeActivity could stackalloc this.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: PollForPipeActivity could stackalloc this.

I am not sure if this would be possible, as we need to read from this buffer after PollForPipeActivity exits

}
else if (bytesRead == 0)
{
done = FlushDecoderAndEmitRemainingChars(decoder, ref charBuffer, ref charStart, ref charEnd, standardError, lines);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: the method that is called always returns true, rather than have to look at the implementation to know that, it might be a void method and this can assign done = true after calling it. Same at the other call sites.

/// </param>
private static void SkipBomIfPresent(char[] charBuffer, int endIndex, ref int startIndex)
{
if (startIndex < endIndex && charBuffer[startIndex] == '\uFEFF')
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion: I assume instead of startIndex < endIndex, this might be: Debug.Assert(startIndex == 0 && endIndex > startIndex);.


while (charEndIndex + charCount > charBuffer.Length)
{
RentLargerBuffer(ref charBuffer, charEndIndex);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: consider passing to RentLargerBuffer the minimum required size rather than looping.

/// After line parsing, compacts remaining data to the front of the char buffer if it has reached
/// the end, or rents a larger buffer if the entire buffer is filled with a single incomplete line.
/// </summary>
private static void CompactOrGrowCharBuffer(ref char[] buffer, ref int startIndex, ref int endIndex)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't DecodeAndAppendChars do this compacting and growing already?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't DecodeAndAppendChars do this compacting and growing already?

@copilot answer this question

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, DecodeAndAppendChars handles compaction and growth for the decode step (ensuring there's room for newly decoded chars). CompactOrGrowCharBuffer handles compaction and growth for the line parsing step — after ParseLinesFromCharBuffer consumes complete lines, the remaining incomplete line data may need to be compacted or the buffer grown if a single line fills the entire buffer. They serve different phases: decode vs. post-parse.

/// <summary>
/// If the first character at <paramref name="startIndex"/> is the Unicode BOM (U+FEFF),
/// advances <paramref name="startIndex"/> past it. Called once per stream after the first
/// decode that produces characters, to match <see cref="StreamReader"/> BOM-stripping behavior.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is reproducing the StreamReader removing bytes while trying to detect encoding in:

That code is handling additional byte sequences. Should those also be handled?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before that, it first checks the Preamble:

// Check for preamble before detect encoding. This is not to override the
// user supplied Encoding for the one we implicitly detect. The user could
// customize the encoding which we will loose, such as ThrowOnError on UTF8
if (IsPreamble())

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is reproducing the StreamReader removing bytes while trying to detect encoding in:

runtime/src/libraries/System.Private.CoreLib/src/System/IO/StreamReader.cs

Line 471 in 333fe88

private void DetectEncoding()
That code is handling additional byte sequences. Should those also be handled?

@copilot please study the referenced code and check if other Process APIs that involve encoding do detect the encoding. For now just provide an answer.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

StreamReader.DetectEncoding() handles byte-level BOM detection to auto-detect the encoding from raw bytes (UTF-8 BOM EF BB BF, UTF-16 LE FF FE, UTF-16 BE FE FF, UTF-32 LE FF FE 00 00, UTF-32 BE 00 00 FE FF). It changes the encoding and creates a new decoder when a BOM is found.

In the ReadAllLines/ReadAllLinesAsync path, the encoding is already known — it's explicitly set via StandardOutputEncoding/StandardErrorEncoding (or defaults to the system encoding). The Decoder is created from that known encoding. We only need to strip the character-level BOM (U+FEFF) that the decoder produces from byte-level BOMs, which is what SkipBomIfPresent does.

The other Process APIs (ReadAllText, ReadAllBytes, ReadAllTextAsync) also use explicitly-set encodings and don't perform byte-level encoding detection. So the current approach is consistent — no additional byte-level BOM handling is needed.

int charCount = decoder.GetCharCount(byteBuffer, byteIndex, byteCount, flush);
if (charCount == 0)
{
return;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GetChars is never called, partial bytes are loss. either use

if (charCount == 0 && byteCount == 0) return

or simply remove this early return block since the loop (while (charEndIndex + 0 > ...)) is trivially skipped when charCount == 0.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GetChars is never called, partial bytes are loss. either use

if (charCount == 0 && byteCount == 0) return
or simply remove this early return block since the loop (while (charEndIndex + 0 > ...)) is trivially skipped when charCount == 0.

@copilot address this feedback

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 7c196cc. The early return now checks charCount == 0 && byteCount == 0 so that GetChars is still called when there are input bytes (preserving partial byte sequences in the decoder's internal state).

[InlineData("utf-16", false)]
[InlineData("utf-32", true)]
[InlineData("utf-32", false)]
public async Task ReadAllLines_WorksWithNonDefaultEncodings(string encodingName, bool useAsync)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be good to test double bytes encodings too. It will be interesting with the decoder usage with the encoding.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be good to test double bytes encodings too. It will be interesting with the decoder usage with the encoding.

@copilot address this feedback and extend the test coverage with such encodings. Make sure the tests are passing locally before you push any changes

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added ReadAllLines_WorksWithMultiByteCharacters theory in 7c196cc. It uses CJK characters (U+4E16 U+754C = "世界") which require 3 bytes in UTF-8, 2 bytes in UTF-16, and 4 bytes in UTF-32, exercising multi-byte decoder state across reads for all three encodings with both sync and async paths.

Copy link
Copy Markdown
Member

@adamsitnik adamsitnik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copilot please address my feedback where I've tagged you directly

[InlineData("utf-16", false)]
[InlineData("utf-32", true)]
[InlineData("utf-32", false)]
public async Task ReadAllLines_WorksWithNonDefaultEncodings(string encodingName, bool useAsync)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be good to test double bytes encodings too. It will be interesting with the decoder usage with the encoding.

@copilot address this feedback and extend the test coverage with such encodings. Make sure the tests are passing locally before you push any changes

SafePipeHandle outputHandle = GetSafeHandleFromStreamReader(_standardOutput!);
SafePipeHandle errorHandle = GetSafeHandleFromStreamReader(_standardError!);

byte[] outputByteBuffer = ArrayPool<byte>.Shared.Rent(InitialReadAllBufferSize);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we don't need separate byte buffers for output and error?

@copilot use a single byte buffer since we don't perform more than one read at once

byte[] outputByteBuffer = ArrayPool<byte>.Shared.Rent(InitialReadAllBufferSize);
byte[] errorByteBuffer = ArrayPool<byte>.Shared.Rent(InitialReadAllBufferSize);
char[] outputCharBuffer = ArrayPool<char>.Shared.Rent(InitialReadAllBufferSize);
char[] errorCharBuffer = ArrayPool<char>.Shared.Rent(InitialReadAllBufferSize);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I expect an empty std out/err to be rare compared to having some output. And ArrayPool was optimized so many times it's really fast to get those arrays (especially when it compared to process startup time and the read sys-calls)

}

// Cannot use stackalloc in an iterator method; use a regular array.
Interop.PollEvent[] pollFds = new Interop.PollEvent[2];
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: PollForPipeActivity could stackalloc this.

I am not sure if this would be possible, as we need to read from this buffer after PollForPipeActivity exits

int charCount = decoder.GetCharCount(byteBuffer, byteIndex, byteCount, flush);
if (charCount == 0)
{
return;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GetChars is never called, partial bytes are loss. either use

if (charCount == 0 && byteCount == 0) return
or simply remove this early return block since the loop (while (charEndIndex + 0 > ...)) is trivially skipped when charCount == 0.

@copilot address this feedback

/// After line parsing, compacts remaining data to the front of the char buffer if it has reached
/// the end, or rents a larger buffer if the entire buffer is filled with a single incomplete line.
/// </summary>
private static void CompactOrGrowCharBuffer(ref char[] buffer, ref int startIndex, ref int endIndex)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't DecodeAndAppendChars do this compacting and growing already?

@copilot answer this question

/// <summary>
/// If the first character at <paramref name="startIndex"/> is the Unicode BOM (U+FEFF),
/// advances <paramref name="startIndex"/> past it. Called once per stream after the first
/// decode that produces characters, to match <see cref="StreamReader"/> BOM-stripping behavior.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is reproducing the StreamReader removing bytes while trying to detect encoding in:

runtime/src/libraries/System.Private.CoreLib/src/System/IO/StreamReader.cs

Line 471 in 333fe88

private void DetectEncoding()
That code is handling additional byte sequences. Should those also be handled?

@copilot please study the referenced code and check if other Process APIs that involve encoding do detect the encoding. For now just provide an answer.

…, add multi-byte encoding tests

Agent-Logs-Url: https://github.com/dotnet/runtime/sessions/c08a9de7-b817-42f5-9272-0d54c0a69ad3

Co-authored-by: adamsitnik <6011991+adamsitnik@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants