Conversation
|
Tagging subscribers to this area: @dotnet/area-system-collections |
There was a problem hiding this comment.
Pull request overview
This PR optimizes HashSet<T> set operations (intersection, subset, overlap, superset, symmetric except) by reducing unnecessary work in multiple ways: reusing stored hashcodes via a new Contains(ref Entry) overload, replacing Remove(value) with a more efficient RemoveAt(entries, index), using BitHelper.FindFirstUnmarked() to skip already-processed entries, and adding fast paths for Overlaps and IsSupersetOf when the other set is a HashSet<T> with compatible comparers. It also simplifies HashSetEqualityComparer by delegating to SetEquals.
Changes:
- Added
Contains(ref Entry),RemoveAt,OverlapsHashSetWithSameComparer, and enhancedIsSubsetOfHashSetWithSameComparer/IntersectWithHashSetWithSameComparerto reuse pre-computed hashcodes when effective comparers match, falling back to standardContains(value)otherwise. - Added
BitHelper.TryMarkBit,IsUnmarked,FindFirstUnmarkedmethods and optimizedToIntArrayLengthto support more efficient bit-marking operations in set algorithms. - Simplified
HashSetEqualityComparer.Equalsto delegate toSetEquals, changedEqualityComparersAreEqual/EffectiveEqualityComparersAreEqualto include fast reference-equality checks, and removed theEffectiveComparerproperty.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
src/libraries/Common/src/System/Collections/Generic/BitHelper.cs |
Added TryMarkBit, IsUnmarked, FindFirstUnmarked methods; optimized ToIntArrayLength to use uint arithmetic |
src/libraries/System.Private.CoreLib/src/System/Collections/Generic/HashSet.cs |
Core optimizations: new Contains(ref Entry) and RemoveAt overloads, optimized IsSupersetOf/Overlaps fast paths, enhanced internal methods to reuse hashcodes, simplified comparer equality checks |
src/libraries/System.Private.CoreLib/src/System/Collections/Generic/HashSetEqualityComparer.cs |
Simplified Equals to delegate to x.SetEquals(y) instead of manual O(N²) comparison |
| } | ||
|
|
||
| return true; | ||
| return x.SetEquals(y); |
There was a problem hiding this comment.
Changing from the old per-element comparison with EqualityComparer<T>.Default to x.SetEquals(y) introduces a behavioral change for HashSetEqualityComparer when the two sets use different comparers.
Previously, elements were compared using EqualityComparer<T>.Default regardless of either set's comparer. Now, x.SetEquals(y) uses x's comparer. This can produce different results and also breaks the IEqualityComparer<T> contract: Equals(x, y) may not equal Equals(y, x) when x and y have different comparers, and GetHashCode (which still uses T.GetHashCode()) may be inconsistent with Equals when a custom comparer considers elements equal that the default comparer doesn't.
That said, the old code also had a bug: the O(N²) loop only checked that every element of y was in x, without checking the reverse, so Equals could return true for sets of different sizes. The new code is more correct in that regard. If this behavioral change is intentional, it would be worth documenting in the PR description.
There was a problem hiding this comment.
It's probably fundamentally impossible to make this class work with mismatching comparers. Like Copilot says, the old behavior was pretty broken and SetEquals doesn't fully solve it. We could consider changing it to only consider sets with matching comparers (and change GetHashCode also then - for example currently ignore-case sets will have mismatching hashcodes even if the contents differ only in case).
There was a problem hiding this comment.
@stephentoub what do you think would be most appropriate here?
Optimize
HashSet<T>intersection/subset/overlap calculations, especially for worst case scenarios.HashSet<int>HashSet<string>Benchmark code