Allow O2K_CONST_VEC to handle some TYP_SIMD32/64 by tannergooding · Pull Request #127390 · dotnet/runtime

tannergooding · 2026-04-24T14:55:20Z

Follow up to #127124. It handles certain TYP_SIMD32/64 values by checking if they repeat the same TYP_SIMD16 value across all "lanes". This allows handling many of the most common cases such as Zero, AllBitsSet, Create(cns), etc.

There are some remaining cases like Indices, CreateSequence(...) and others that aren't handled yet and might be good to support in the future, but that can be expanded as needed by having a set of flags for the vector constant case of the union.

dotnet-policy-service · 2026-04-24T14:56:36Z

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

Copilot

Pull request overview

This PR is a follow-up to the introduction of O2K_CONST_VEC, extending assertion propagation so it can handle common TYP_SIMD32/64 (Vector256/Vector512 on xarch) constants when they are broadcasts of the low TYP_SIMD16 lane, avoiding the need for heap allocation while still covering core constants like Zero, AllBitsSet, and Create(cns).

Changes:

Teach optCreateAssertion to accept TYP_SIMD32/64 GT_CNS_VEC constants when they are lane-broadcasts, storing only the low 16 bytes in the assertion payload.
Extend optAssertionGenJtrue handling to include Vector256/Vector512 equality/inequality intrinsics on xarch.
Teach optConstantAssertionProp to reconstruct TYP_SIMD32/64 constants by broadcasting the stored simd16_t into the upper lanes.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.

File	Description
src/coreclr/jit/compiler.h	Updates/clarifies the `O2K_CONST_VEC` storage comment to document the “store v128, support wider via broadcast” approach.
src/coreclr/jit/assertionprop.cpp	Implements broadcast-detection for `TYP_SIMD32/64` constant assertions (xarch), expands recognized SIMD equality intrinsics, and reconstructs widened constants during propagation.

EgorBo · 2026-04-24T15:12:42Z

+                // sizes in the future by adding m_encodedVconFlags and tracking
+                // whether a different heap allocated value was used or other
+                // special cases like Indices or Sequences
+                simd16_t m_simdVal;


Do you think it's worth the complixity to detect SIMD32/SIMD64 that we can fit into simd16_t or we just add simd_t* to this union and allocate it on heap? Less changes + will support all kinds of constants

I would assume it is and may even be worthwhile to do for VEC_CNS as well. -- This is based on past throughput and allocation considerations we saw for VN and other phases.

Particularly on xarch, we are most typically hitting the TYP_SIMD32 path for hardware and so this would mean a lot more constants are heap allocating when they don't need to be.

We'd have to have a if (size > 16) { speciallyHandleTheHeapAlloc } anyways with such a scenario, so this approach shouldn't be adding extra overhead. Rather, it just optimizes and avoids the allocation for the most common cases. -- and I'd actually expect assertion prop doesn't care about most constants, particularly "arbitrary" ones, so it may be unimportant to ever add the heap allocation support.

We may even want to consider doing the same for GT_CNS_VEC so that we aren't forced to have a large node for all constants and to avoid needing to do repeat checks like "is this a broadcast".

@tannergooding I am not sure I follow. Your PR tries to fit CNS_VEC for SIMD32 and SIMD64 into simd16_t in AssertionDsc2 for things where 16 bytes components are the same. I just don't see why if we can support all kinds of constants naturally, just that SIMD32 and SIMD64 will require a small arena allocation because 99% of assertions are used for non-SIMD stuff and we don't want to increase the struct size for this nieche case (pay to play). Basically, I suggest this change: https://github.com/EgorBo/runtime-1/pull/new/jit/wide-simd-assertions (just vibe-coded, probably can be simplified). I doubt we will see any noticieable regression anywhere (both TP and alloc size).

Example, my commit can fold this:

Vector256<long> Test(ref Vector256<long> v) { var a = v; if (a == Vector256.Create(1, 2, 3, 4)) return a + a; // folded into [2,4,6,8] return default; }

while your cannot.

I just don't see why if we can support all kinds of constants naturally, just that SIMD32 and SIMD64 will require a small arena allocation

That would work too. The downside is that it's forcing an allocation for the most common cases when we can trivially avoid it for the majority scenario (most SIMD constants are broadcastable or repeating the V128 value).

Based on what we had seen in VN, there were enough simd32/64 constants that this added up and pessimized throughput. We fixed that by splitting it out into separate maps. We had similarly seen even in GenTreeVecCon and other areas that the cost of the size check and doing different things for each vector size was significantly less than always touching a full cache line.

I would expect that the "optimization" this PR is doing is still beneficial even with your full fix and that we may even want to do something similar for GenTreeVecCon itself to reduce the amount of data that has to be checked for the most common patterns/optimizations.

I'm fine with waiting for the "full fix" to go in and then comparing SPMI metrics or factoring it into this PR so we can compare and choose. Thoughts?

Assertions are created only for x VectorX_op_Equality cns today - do we really have so many of them to have an impact on tp?

It's small "now" but may matter more as this expands. It can be useful for optimizing out a lot of edge cases for floating-point, conversions, and other scenarios when we know a valid is within a given constraint.

Up to you, https://github.com/EgorBo/runtime-1/pull/new/jit/wide-simd-assertions is less changes, generates -8,949 bytes on Windows-x64 locally

Do you have numbers on the throughput (I'm guessing near 0%) and allocation impact?

feel free to submit it as a PR to grab real numbers, i'm 100% sure TP is 0, alloc rate might be visible (>0) but not big enough to care IMO

fyi: this pr produced -3,260 diff on win-x64 and simd_t* is -8,949

Extract it to #127401 to see the metrics and get full differences. We can take either or both, depending on what the numbers report.

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>

Copilot

Pull request overview

Extends JIT assertion propagation for O2K_CONST_VEC so it can cover common TYP_SIMD32/64 constants by treating them as a repeated/broadcast TYP_SIMD16 payload when all 128-bit lanes match. This builds on the existing vector-constant assertion kind to enable more constant propagation opportunities without introducing heap allocation for large vector constants.

Changes:

Teach assertion creation (optCreateAssertion) to accept TYP_SIMD32/64 GT_CNS_VEC when the value is a repeated TYP_SIMD16 across all lanes (xarch).
Teach constant assertion propagation (optConstantAssertionProp) to materialize TYP_SIMD32/64 constants by broadcasting the stored simd16_t payload (xarch).
Enable assertion generation for Vector256/Vector512 equality/inequality intrinsics in optAssertionGenJtrue (xarch).

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.

File	Description
src/coreclr/jit/compiler.h	Updates `O2K_CONST_VEC` storage comment to document the “store v128, support v256/v512 via broadcast” approach.
src/coreclr/jit/assertionprop.cpp	Adds lane-equality checks for `TYP_SIMD32/64` constants and broadcasts the stored `simd16_t` when propagating back to `TYP_SIMD32/64` uses; expands supported vector equality/inequality intrinsics.

tannergooding · 2026-04-25T00:03:58Z

Closing in favor of #127402

The heap allocations are indeed not meaningful at scale and its better to handle all constants.

This provides O2K_CONST_VEC support for all SIMD constants by heap allocating for values larger than TYP_SIMD16. It is an alternative to #127390. --------- Co-authored-by: EgorBo <egorbo@gmail.com> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Allow O2K_CONST_VEC to handle some TYP_SIMD32/64

ea91bb4

Copilot AI review requested due to automatic review settings April 24, 2026 14:55

github-actions Bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Apr 24, 2026

dotnet-policy-service Bot assigned tannergooding Apr 24, 2026

Copilot started reviewing on behalf of tannergooding April 24, 2026 14:56 View session

Copilot AI reviewed Apr 24, 2026

View reviewed changes

Comment thread src/coreclr/jit/assertionprop.cpp Outdated

Comment thread src/coreclr/jit/assertionprop.cpp Outdated

Comment thread src/coreclr/jit/assertionprop.cpp Outdated

EgorBo reviewed Apr 24, 2026

View reviewed changes

Apply suggestions from code review

1b8fbf3

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>

Copilot AI review requested due to automatic review settings April 24, 2026 15:22

Copilot started reviewing on behalf of tannergooding April 24, 2026 15:23 View session

Copilot AI reviewed Apr 24, 2026

View reviewed changes

Fix formatting

30d3620

This was referenced Apr 24, 2026

The Operation will be canceled. The next steps may not contain expected logs. dotnet/dnceng#3008

Open

[android-arm64] The Operation will be canceled. The next steps may not contain expected logs. dotnet/dnceng#6408

Open

This was referenced Apr 24, 2026

Allow O2K_CONST_VEC to handle all TYP_SIMD32/64 #127401

Closed

Allow O2K_CONST_VEC to handle all TYP_SIMD32/64 #127402

Merged

tannergooding closed this Apr 25, 2026

tannergooding deleted the vec-aprop-const2 branch April 25, 2026 00:04

Conversation

tannergooding commented Apr 24, 2026

Uh oh!

dotnet-policy-service Bot commented Apr 24, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

EgorBo Apr 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

tannergooding Apr 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

EgorBo Apr 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

EgorBo Apr 24, 2026

Choose a reason for hiding this comment

Uh oh!

tannergooding Apr 24, 2026

Choose a reason for hiding this comment

Uh oh!

tannergooding Apr 24, 2026

Choose a reason for hiding this comment

Uh oh!

tannergooding Apr 24, 2026

Choose a reason for hiding this comment

Uh oh!

EgorBo Apr 24, 2026

Choose a reason for hiding this comment

Uh oh!

EgorBo Apr 24, 2026

Choose a reason for hiding this comment

Uh oh!

tannergooding Apr 24, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

tannergooding commented Apr 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

EgorBo Apr 24, 2026 •

edited

Loading

tannergooding Apr 24, 2026 •

edited

Loading

EgorBo Apr 24, 2026 •

edited

Loading