Skip to content

Optimize HashSet intersecting#125256

Open
pentp wants to merge 1 commit intodotnet:mainfrom
pentp:hashset-intersect
Open

Optimize HashSet intersecting#125256
pentp wants to merge 1 commit intodotnet:mainfrom
pentp:hashset-intersect

Conversation

@pentp
Copy link
Contributor

@pentp pentp commented Mar 6, 2026

Optimize HashSet<T> intersection/subset/overlap calculations, especially for worst case scenarios.

HashSet<int> Overlap Mean (main) Error Mean (PR) Error
IntersectWithEnumerable 0 19,929.2 ns 296.9 ns 6,321.7 ns 62.7 ns
IntersectWithEnumerable 50 13,488.4 ns 125.7 ns 5,558.4 ns 67.5 ns
IntersectWithEnumerable 100 5,198.2 ns 36.2 ns 4,543.1 ns 81.2 ns
IntersectWithHashSetWithSameComparer 0 3,799.0 ns 29.6 ns 3,507.3 ns 69.4 ns
IntersectWithHashSetWithSameComparer 50 2,993.8 ns 31.3 ns 2,885.2 ns 30.2 ns
IntersectWithHashSetWithSameComparer 100 2,317.4 ns 21.9 ns 2,346.4 ns 43.8 ns
IsSupersetOfHashSetWithSameComparer 0 89.5 ns 1.0 ns 36.7 ns 0.4 ns
IsSupersetOfHashSetWithSameComparer 50 144.0 ns 1.8 ns 42.6 ns 0.6 ns
IsSupersetOfHashSetWithSameComparer 100 16,266.0 ns 188.2 ns 1,328.3 ns 14.2 ns
OverlapsHashSetWithSameComparer 0 1,500.0 ns 10.6 ns 1,125.5 ns 11.7 ns
OverlapsHashSetWithSameComparer 50 6.6 ns 0.1 ns 42.0 ns 0.4 ns
OverlapsHashSetWithSameComparer 100 6.6 ns 0.2 ns 21.3 ns 0.3 ns
SymmetricExceptWithEnumerable 0 7,691.9 ns 90.7 ns 7,318.7 ns 131.3 ns
SymmetricExceptWithEnumerable 50 15,198.2 ns 129.9 ns 8,324.3 ns 160.2 ns
SymmetricExceptWithEnumerable 100 20,029.7 ns 331.0 ns 6,445.5 ns 34.6 ns
HashSet<string> Overlap Mean (main) Error Mean (PR) Error
IntersectWithEnumerable 0 37,776.2 ns 330.0 ns 10,412.0 ns 113.2 ns
IntersectWithEnumerable 50 26,304.8 ns 319.6 ns 10,375.7 ns 84.3 ns
IntersectWithEnumerable 100 10,489.1 ns 147.4 ns 9,782.3 ns 80.9 ns
IntersectWithHashSetWithSameComparer 0 10,848.6 ns 186.7 ns 3,975.3 ns 78.6 ns
IntersectWithHashSetWithSameComparer 50 9,199.0 ns 159.9 ns 3,731.3 ns 40.3 ns
IntersectWithHashSetWithSameComparer 100 7,353.5 ns 112.7 ns 3,575.3 ns 49.1 ns
IsSupersetOfHashSetWithSameComparer 0 127.8 ns 2.4 ns 27.7 ns 0.4 ns
IsSupersetOfHashSetWithSameComparer 50 222.6 ns 1.8 ns 34.3 ns 0.4 ns
IsSupersetOfHashSetWithSameComparer 100 28,581.9 ns 402.8 ns 2,150.9 ns 26.5 ns
OverlapsHashSetWithSameComparer 0 5,600.5 ns 92.6 ns 1,295.9 ns 14.7 ns
OverlapsHashSetWithSameComparer 50 24.7 ns 0.5 ns 41.9 ns 0.6 ns
OverlapsHashSetWithSameComparer 100 27.2 ns 0.3 ns 17.6 ns 0.2 ns
SymmetricExceptWithEnumerable 0 13,148.2 ns 246.2 ns 11,817.7 ns 140.1 ns
SymmetricExceptWithEnumerable 50 27,093.3 ns 168.6 ns 13,416.3 ns 163.8 ns
SymmetricExceptWithEnumerable 100 40,047.7 ns 508.4 ns 11,062.4 ns 94.7 ns
Benchmark code

[GenericTypeArguments(typeof(int))] // value type
[GenericTypeArguments(typeof(string))] // reference type
public class Bench<T>
{
    public const int Size = 512;

    [Params(0, 50, 100)]
    public int Overlap;

    private HashSet<T> _mainSet;
    private T[][] _otherKeys;
    private HashSet<T>[] _otherSets;

    [GlobalSetup]
    public void Setup()
    {
        var all = ValuesGenerator.ArrayOfUniqueValues<T>(Size * 2);
        var keys = all.AsSpan(0, Size).ToArray();
        _mainSet = new(keys);

        var keys0 = all.AsSpan(Size, Size).ToArray();
        var rnd = new Random(42);
        var keys100 = keys.AsSpan().ToArray();
        rnd.Shuffle(keys100);

        rnd.Shuffle(all);
        var keys50 = all.AsSpan(0, Size).ToArray();

        _otherKeys = [keys0, keys50, keys100];
        _otherSets = Array.ConvertAll(_otherKeys, x => new HashSet<T>(x));
    }

    [Benchmark]
    public HashSet<T> IntersectWithHashSetWithSameComparer()
    {
        var hashSet = new HashSet<T>(_mainSet);
        hashSet.IntersectWith(_otherSets[Overlap / 50]);
        return hashSet;
    }

    [Benchmark]
    public HashSet<T> IntersectWithEnumerable()
    {
        var hashSet = new HashSet<T>(_mainSet);
        hashSet.IntersectWith(_otherKeys[Overlap / 50]);
        return hashSet;
    }

    [Benchmark]
    public HashSet<T> SymmetricExceptWithEnumerable()
    {
        var hashSet = new HashSet<T>(_mainSet);
        hashSet.SymmetricExceptWith(_otherKeys[Overlap / 50]);
        return hashSet;
    }

    [Benchmark]
    public bool IsSupersetOfHashSetWithSameComparer() => _mainSet.IsSupersetOf(_otherSets[Overlap / 50]);

    [Benchmark]
    public bool OverlapsHashSetWithSameComparer() => _mainSet.Overlaps(_otherSets[Overlap / 50]);
}

Copilot AI review requested due to automatic review settings March 6, 2026 06:00
@dotnet-policy-service dotnet-policy-service bot added the community-contribution Indicates that the PR has been added by a community member label Mar 6, 2026
@dotnet-policy-service
Copy link
Contributor

Tagging subscribers to this area: @dotnet/area-system-collections
See info in area-owners.md if you want to be subscribed.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR optimizes HashSet<T> set operations (intersection, subset, overlap, superset, symmetric except) by reducing unnecessary work in multiple ways: reusing stored hashcodes via a new Contains(ref Entry) overload, replacing Remove(value) with a more efficient RemoveAt(entries, index), using BitHelper.FindFirstUnmarked() to skip already-processed entries, and adding fast paths for Overlaps and IsSupersetOf when the other set is a HashSet<T> with compatible comparers. It also simplifies HashSetEqualityComparer by delegating to SetEquals.

Changes:

  • Added Contains(ref Entry), RemoveAt, OverlapsHashSetWithSameComparer, and enhanced IsSubsetOfHashSetWithSameComparer/IntersectWithHashSetWithSameComparer to reuse pre-computed hashcodes when effective comparers match, falling back to standard Contains(value) otherwise.
  • Added BitHelper.TryMarkBit, IsUnmarked, FindFirstUnmarked methods and optimized ToIntArrayLength to support more efficient bit-marking operations in set algorithms.
  • Simplified HashSetEqualityComparer.Equals to delegate to SetEquals, changed EqualityComparersAreEqual/EffectiveEqualityComparersAreEqual to include fast reference-equality checks, and removed the EffectiveComparer property.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.

File Description
src/libraries/Common/src/System/Collections/Generic/BitHelper.cs Added TryMarkBit, IsUnmarked, FindFirstUnmarked methods; optimized ToIntArrayLength to use uint arithmetic
src/libraries/System.Private.CoreLib/src/System/Collections/Generic/HashSet.cs Core optimizations: new Contains(ref Entry) and RemoveAt overloads, optimized IsSupersetOf/Overlaps fast paths, enhanced internal methods to reuse hashcodes, simplified comparer equality checks
src/libraries/System.Private.CoreLib/src/System/Collections/Generic/HashSetEqualityComparer.cs Simplified Equals to delegate to x.SetEquals(y) instead of manual O(N²) comparison

}

return true;
return x.SetEquals(y);
Copy link

Copilot AI Mar 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changing from the old per-element comparison with EqualityComparer<T>.Default to x.SetEquals(y) introduces a behavioral change for HashSetEqualityComparer when the two sets use different comparers.

Previously, elements were compared using EqualityComparer<T>.Default regardless of either set's comparer. Now, x.SetEquals(y) uses x's comparer. This can produce different results and also breaks the IEqualityComparer<T> contract: Equals(x, y) may not equal Equals(y, x) when x and y have different comparers, and GetHashCode (which still uses T.GetHashCode()) may be inconsistent with Equals when a custom comparer considers elements equal that the default comparer doesn't.

That said, the old code also had a bug: the O(N²) loop only checked that every element of y was in x, without checking the reverse, so Equals could return true for sets of different sizes. The new code is more correct in that regard. If this behavioral change is intentional, it would be worth documenting in the PR description.

Copilot uses AI. Check for mistakes.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's probably fundamentally impossible to make this class work with mismatching comparers. Like Copilot says, the old behavior was pretty broken and SetEquals doesn't fully solve it. We could consider changing it to only consider sets with matching comparers (and change GetHashCode also then - for example currently ignore-case sets will have mismatching hashcodes even if the contents differ only in case).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@stephentoub what do you think would be most appropriate here?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area-System.Collections community-contribution Indicates that the PR has been added by a community member

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants