Skip to content

Optimize string.Split(char, ...)#125379

Open
hamarb123 wants to merge 12 commits intodotnet:mainfrom
hamarb123:main33
Open

Optimize string.Split(char, ...)#125379
hamarb123 wants to merge 12 commits intodotnet:mainfrom
hamarb123:main33

Conversation

@hamarb123
Copy link
Contributor

@hamarb123 hamarb123 commented Mar 10, 2026

In some cases string.Split(char, ...) was slower (significantly) than the single string seperator overload - this PR optimizes the char overload/s to have much closer performance (still a small gap in 1 case) - the single-char overloads are the most common case, so I have added the additional optimisation just for that for now - benchmarks below.

Main benchmark that was the focus - single char splitter with rare/no seperators (17% improvement on arm, 47% improvement on x64) - char almost matches or outperforms string now (could probably get it to match by adjusting the branches carefully - I can implement if wanted, but it will probably make it less readable):

X64 (AMD):
image

Arm:
image

Other cases have not regressed meaningfully (within margin of error) - some have even possibly improved slightly on x64: EgorBot/Benchmarks#28

Copilot AI review requested due to automatic review settings March 10, 2026 12:26
@hamarb123 hamarb123 marked this pull request as draft March 10, 2026 12:26
@dotnet-policy-service dotnet-policy-service bot added the community-contribution Indicates that the PR has been added by a community member label Mar 10, 2026
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR optimizes string.Split(char, ...) by adding a dedicated fast path for the most common scenario: splitting on a single separator character, including a new vectorized implementation for scanning separator positions.

Changes:

  • Added a single-separator (separators.Length == 1) specialized path in MakeSeparatorListAny.
  • Introduced a new MakeSeparatorListVectorized(..., char c) overload to vectorize scanning for a single separator char.
  • Adjusted the existing 2–3 separator path to assume separators.Length >= 2 (since the 1-separator case is now handled earlier).

@hamarb123

This comment was marked as resolved.

@hamarb123
Copy link
Contributor Author

hamarb123 commented Mar 10, 2026

Seems to improve perf by ~25%, but there's still a substantial gap on X86 (would need another ~40% improvement to match string).

AMD:
image

Arm:
image

Intel:
image

…or non-existent, which is similar to what IndexOf does
@hamarb123

This comment was marked as outdated.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 1 out of 1 changed files in this pull request and generated 6 comments.

Comment on lines +2049 to +2057
else if (separators.Length <= 1)
{
char sep0 = separators[0];
if (Vector128.IsHardwareAccelerated && source.Length >= Vector128<ushort>.Count * 2)
{
MakeSeparatorListVectorized(source, ref sepListBuilder, sep0);
return;
}

Copy link

Copilot AI Mar 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This introduces a new single-separator fast path and a new vectorized implementation. Please add/extend tests that exercise string.Split(char, ...) for a single separator with input lengths around SIMD boundaries (e.g., below/at/above Vector128<ushort>.Count * 2, and similarly for 256/512 when available) to ensure correctness for long strings and to guard against out-of-range reads in vectorized paths.

Copilot uses AI. Check for mistakes.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copilot open a new pull request to apply changes based on this feedback and ensure the tests are also applicable considering the changes that have followed since.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

worth a try lol

@hamarb123

This comment was marked as outdated.

Copilot AI review requested due to automatic review settings March 10, 2026 14:12
@hamarb123

This comment was marked as outdated.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 9 comments.

@hamarb123 hamarb123 requested a review from Copilot March 10, 2026 14:26
@hamarb123
Copy link
Contributor Author

@MihuBot

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

@hamarb123
Copy link
Contributor Author

@EgorBot -amd -arm

using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Running;
using System.IO;
using System.Threading;
using System.Threading.Tasks;

BenchmarkSwitcher.FromAssembly(typeof(Bench).Assembly).Run(args);

// Test code from the C# Discord (not mine)

public class Bench
{
    [Benchmark]
    [Arguments("testttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttt")]
    [Arguments("ttt,ts,afd,sadasdsad,sa,f,safasf,sd,fsa,f,sa,fd,s,a,dsa,d,sa,d,safs,a,,,das,dsafda,geadwadasdwad,a,sd,wa,s,d,wa,sd,wa,s,dw,asdw,a,sd,wa,d,wasdsat,ts,afd,sadasdsad,sa,f,safasf,sd,fsa,f,sa,fd,s,a,dsa,d,sa,d,safs,a,,,das,dsafda,geadwadasdwad,a,sd,wa,s,d,wa,sd,wa,s,dw,asdw,a,sd,wa,d,wasdst,ts,afd,sadasdsad,sa,f,safasf,sd,fsa,f,sa,fd,s,a,dsa,d,sa,d,safs,a,,,das,dsafda,geadwadasdwad,a,sd,wa,s,d,wa,sd,wa,s,dw,asdw,a,sd,wa,d,wasdst,ts,afd,sadasdsad,sa,f,safasf,sd,fsa,f,sa,fd,s,a,dsa,d,sa,d,safs,a,,,das,dsafda,geadwadasdwad,a,sd,wa,s,d,wa,sd,wa,s,dw,asdw,a,sd,wa,d,wasdst,ts,afd,sadasdsad,sa,f,safasf,sd,fsa,f,sa,fd,s,a,dsa,d,sa,d,safs,a,,,das,dsafda,geadwadasdwad,a,sd,wa,s,d,wa,sd,wa,s,dw,asdw,a,sd,wa,d,wasdst,ts,afd,sadasdsad,sa,f,safasf,sd,fsa,f,sa,fd,s,a,dsa,d,sa,d,safs,a,,,das,dsafda,geadwadasdwad,a,sd,wa,s,d,wa,sd,wa,s,dw,asdw,a,sd,wa,d,wasdst,ts,afd,sadasdsad,sa,f,safasf,sd,fsa,f,sa,fd,s,a,dsa,d,sa,d,safs,a,,,das,dsafda,geadwadasdwad,a,sd,wa,s,d,wa,sd,wa,s,dw,asdw,a,sd,wa,d,wasdst,ts,afd,sadasdsad,sa,f,safasf,sd,fsa,f,sa,fd,s,a,dsa,d,sa,d,safs,a,,,das,dsafda,geadwadasdwad,a,sd,wa,s,d,wa,sd,wa,s,dw,asdw,a,sd,wa,d,wasdst,ts,afd,sadasdsad,sa,f,safasf,sd,fsa,f,sa,fd,s,a,dsa,d,sa,d,safs,a,,,das,dsafda,geadwadasdwad,a,sd,wa,s,d,wa,sd,wa,s,dw,asdw,a,sd,wa,d,wasdst,ts,afd,sadasdsad,sa,f,safasf,sd,fsa,f,sa,fd,s,a,dsa,d,sa,d,safs,a,,,das,dsafda,geadwadasdwad,a,sd,wa,s,d,wa,sd,wa,s,dw,asdw,a,sd,wa,d,wasdst,ts,afd,sadasdsad,sa,f,safasf,sd,fsa,f,sa,fd,s,a,dsa,d,sa,d,safs,a,,,das,dsafda,geadwadasdwad,a,sd,wa,s,d,wa,sd,wa,s,dw,asdw,a,sd,wa,d,wasdst,ts,afd,sadasdsad,sa,f,safasf,sd,fsa,f,sa,fd,s,a,dsa,d,sa,d,safs,a,,,das,dsafda,geadwadasdwad,a,sd,wa,s,d,wa,sd,wa,s,dw,asdw,a,sd,wa,d,wasdst,ts,afd,sadasdsad,sa,f,safasf,sd,fsa,f,sa,fd,s,a,dsa,d,sa,d,safs,a,,,das,dsafda,geadwadasdwad,a,sd,wa,s,d,wa,sd,wa,s,dw,asdw,a,sd,wa,d,wasdst,ts,afd,sadasdsad,sa,f,safasf,sd,fsa,f,sa,fd,s,a,dsa,d,sa,d,safs,a,,,das,dsafda,geadwadasdwad,a,sd,wa,s,d,wa,sd,wa,s,dw,asdw,a,sd,wa,d,wasdst,ts,afd,sadasdsad,sa,f,safasf,sd,fsa,f,sa,fd,s,a,dsa,d,sa,d,safs,a,,,das,dsafda,geadwadasdwad,a,sd,wa,s,d,wa,sd,wa,s,dw,asdw,a,sd,wa,d,wasds")]
    [Arguments("abcdef")] // mine
    [Arguments("abc,def")] // mine
    public string[] StringSplitByString(string str)
    {
        return str.Split(",");
    }

    [Benchmark]
    [Arguments("testttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttt")]
    [Arguments("ttt,ts,afd,sadasdsad,sa,f,safasf,sd,fsa,f,sa,fd,s,a,dsa,d,sa,d,safs,a,,,das,dsafda,geadwadasdwad,a,sd,wa,s,d,wa,sd,wa,s,dw,asdw,a,sd,wa,d,wasdsat,ts,afd,sadasdsad,sa,f,safasf,sd,fsa,f,sa,fd,s,a,dsa,d,sa,d,safs,a,,,das,dsafda,geadwadasdwad,a,sd,wa,s,d,wa,sd,wa,s,dw,asdw,a,sd,wa,d,wasdst,ts,afd,sadasdsad,sa,f,safasf,sd,fsa,f,sa,fd,s,a,dsa,d,sa,d,safs,a,,,das,dsafda,geadwadasdwad,a,sd,wa,s,d,wa,sd,wa,s,dw,asdw,a,sd,wa,d,wasdst,ts,afd,sadasdsad,sa,f,safasf,sd,fsa,f,sa,fd,s,a,dsa,d,sa,d,safs,a,,,das,dsafda,geadwadasdwad,a,sd,wa,s,d,wa,sd,wa,s,dw,asdw,a,sd,wa,d,wasdst,ts,afd,sadasdsad,sa,f,safasf,sd,fsa,f,sa,fd,s,a,dsa,d,sa,d,safs,a,,,das,dsafda,geadwadasdwad,a,sd,wa,s,d,wa,sd,wa,s,dw,asdw,a,sd,wa,d,wasdst,ts,afd,sadasdsad,sa,f,safasf,sd,fsa,f,sa,fd,s,a,dsa,d,sa,d,safs,a,,,das,dsafda,geadwadasdwad,a,sd,wa,s,d,wa,sd,wa,s,dw,asdw,a,sd,wa,d,wasdst,ts,afd,sadasdsad,sa,f,safasf,sd,fsa,f,sa,fd,s,a,dsa,d,sa,d,safs,a,,,das,dsafda,geadwadasdwad,a,sd,wa,s,d,wa,sd,wa,s,dw,asdw,a,sd,wa,d,wasdst,ts,afd,sadasdsad,sa,f,safasf,sd,fsa,f,sa,fd,s,a,dsa,d,sa,d,safs,a,,,das,dsafda,geadwadasdwad,a,sd,wa,s,d,wa,sd,wa,s,dw,asdw,a,sd,wa,d,wasdst,ts,afd,sadasdsad,sa,f,safasf,sd,fsa,f,sa,fd,s,a,dsa,d,sa,d,safs,a,,,das,dsafda,geadwadasdwad,a,sd,wa,s,d,wa,sd,wa,s,dw,asdw,a,sd,wa,d,wasdst,ts,afd,sadasdsad,sa,f,safasf,sd,fsa,f,sa,fd,s,a,dsa,d,sa,d,safs,a,,,das,dsafda,geadwadasdwad,a,sd,wa,s,d,wa,sd,wa,s,dw,asdw,a,sd,wa,d,wasdst,ts,afd,sadasdsad,sa,f,safasf,sd,fsa,f,sa,fd,s,a,dsa,d,sa,d,safs,a,,,das,dsafda,geadwadasdwad,a,sd,wa,s,d,wa,sd,wa,s,dw,asdw,a,sd,wa,d,wasdst,ts,afd,sadasdsad,sa,f,safasf,sd,fsa,f,sa,fd,s,a,dsa,d,sa,d,safs,a,,,das,dsafda,geadwadasdwad,a,sd,wa,s,d,wa,sd,wa,s,dw,asdw,a,sd,wa,d,wasdst,ts,afd,sadasdsad,sa,f,safasf,sd,fsa,f,sa,fd,s,a,dsa,d,sa,d,safs,a,,,das,dsafda,geadwadasdwad,a,sd,wa,s,d,wa,sd,wa,s,dw,asdw,a,sd,wa,d,wasdst,ts,afd,sadasdsad,sa,f,safasf,sd,fsa,f,sa,fd,s,a,dsa,d,sa,d,safs,a,,,das,dsafda,geadwadasdwad,a,sd,wa,s,d,wa,sd,wa,s,dw,asdw,a,sd,wa,d,wasdst,ts,afd,sadasdsad,sa,f,safasf,sd,fsa,f,sa,fd,s,a,dsa,d,sa,d,safs,a,,,das,dsafda,geadwadasdwad,a,sd,wa,s,d,wa,sd,wa,s,dw,asdw,a,sd,wa,d,wasds")]
    [Arguments("abcdef")] // mine
    [Arguments("abc,def")] // mine
    public string[] StringSplitByChar(string str)
    {
        return str.Split(',');
    }
}

Copilot AI review requested due to automatic review settings March 11, 2026 01:24
@hamarb123 hamarb123 marked this pull request as ready for review March 11, 2026 01:24
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.

Comments suppressed due to low confidence (1)

src/libraries/System.Private.CoreLib/src/System/SpanHelpers.Packed.cs:1147

  • These PackSources helpers were made internal to support new callers outside PackedSpanHelpers, but the file-level comment indicates this type exists to hide "private helpers". Consider updating the nearby commentary / documenting the intended usage constraints (e.g., only call when the relevant ISA is supported and CanUsePackedIndexOf has been validated) to avoid future misuse within the assembly.
        [MethodImpl(MethodImplOptions.AggressiveInlining)]
        [CompExactlyDependsOn(typeof(Avx512BW))]
        internal static Vector512<byte> PackSources(Vector512<short> source0, Vector512<short> source1)
        {
            Debug.Assert(Avx512BW.IsSupported);
            // Pack two vectors of characters into bytes. While the type is Vector256<short>, these are really UInt16 characters.
            // X86: Downcast every character using saturation.

@AndyAyersMS AndyAyersMS added the tenet-performance Performance related issue label Mar 11, 2026
Copilot AI review requested due to automatic review settings March 11, 2026 04:50
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 5 comments.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area-System.Runtime community-contribution Indicates that the PR has been added by a community member tenet-performance Performance related issue

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants