JIT: Enable CHECK_IR earlier#127425
Conversation
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Move CHECK_IR enablement up one phase past PHASE_MORPH_IMPBYREF. - Default GTF_EXCEPT on in gtNewCallNode; clear for no-throw helpers in both gtNewHelperCallNode overloads. - Remove GT_QMARK !op1->CanCSE() assertion from fgDebugCheckFlags (per user direction). Not yet clean on aspnet2: 312 missing, 457 extra, 1 USEASG. Total assertions went from 945 -> 769 (else-if redistribution).
- fginline: after RET_EXPR substitution, run gtUpdateStmtSideEffects on the owning statement to refresh stale ancestor side-effect flags. - gentree: GT_RET_EXPR now sets GTF_CALL in OperRequiresCallFlag so that gtUpdateStmtSideEffects does not strip GTF_CALL from nested RET_EXPR nodes (the fginline walker uses GTF_CALL to decide whether to recurse). - lclmorph: when marking a call with GTF_CALL_M_RETBUFFARG_LCLOPT, also set GTF_ASG on the call, matching OperRequiresAsgFlag. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…rofile this-arg - indirectcalltransformer.cpp ClonabilityVisitor: when the scout substitutes a RET_EXPR with its gtSubstExpr in a subsequent statement, remember it and call gtUpdateStmtSideEffects on the affected statement so ancestor STORE/ COMMA nodes no longer carry stale GTF_CALL. - fgprofile.cpp class profile instrumentation: after replacing the this-arg with a COMMA(stores, this), propagate GTF_ALL_EFFECT from the COMMA to the owning call so GTF_ASG/GTF_CALL flags stay consistent. - fginline.cpp LateDevirtualization: mark m_substitutedRetExpr = true after a successful gtSplitTree so the outer walker refreshes side effects on m_curStmt (safety belt for devirt-triggered substitutions). aspnet2 replay failures: 150 -> 106. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
fgRetypeImplicitByRefArgs retypes implicit byref args from TYP_STRUCT to TYP_BYREF but does not rewrite existing LCL_FLD/STORE_LCL_FLD references into those args. Those references are rewritten later in fgMorphExpandImplicitByRefArg during PHASE_MORPH_GLOBAL. During the intervening window the GTF_VAR_USEASG flag that physical promotion set on partial stores based on the old (struct) size no longer matches IsPartialLclFld (computed from the new byref size), which trips the post-phase CHECK_IR assert. Advance CHECK_IR enablement one more phase (to before PHASE_IMPBYREF_COPY_OMISSION) so the check first fires on the IR entering that phase, and temporarily suppress CHECK_IR across PHASE_MORPH_IMPBYREF where the IR is transiently inconsistent by design. CHECK_IR is restored for PHASE_MORPH_GLOBAL and all subsequent phases. Full SuperPMI replay (all 11 MCHs) is now clean. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
LocalAddressVisitor can rewrite IND(FIELD_ADDR(LCL_VAR)) into a LCL_FLD, which removes GTF_EXCEPT (and other effect bits) from the rewritten node. Ancestor nodes retain the stale effect flags, which tripped CHECK_IR once it was enabled earlier in the phase pipeline. Call gtUpdateStmtSideEffects once per modified statement to recompute ancestor effect flags. This matches the pattern already used by other phases that mutate IR shape (e.g. class-profile instrumentation). On a full SuperPMI replay this reduces 'Extra flags' asserts ~88% and 'Missing flags' asserts ~80% across all collections. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…truction With CHECK_IR enabled earlier in the phase pipeline (before PHASE_MORPH_IMPBYREF) the diagnostic check fires in several places where the importer constructs calls or intrinsics without accurately carrying over child side-effect flags. Fix the underlying IR construction sites: * importercalls.cpp: six sites that propagate arg/this/fptr flags to a call via `& GTF_GLOB_EFFECT` drop `GTF_ORDER_SIDEEFF`. Use `GTF_ALL_EFFECT` so volatile indirections and other ordering-sensitive side effects are carried onto the call node. * importercalls.cpp: `RuntimeHelpers.IsKnownConstant` is modelled as a `GT_INTRINSIC` that `OperRequiresCallFlag` treats as call-implemented. Explicitly set `GTF_CALL` on the constructed node (matching the pattern already used for other user-call intrinsics) so the check-IR invariant holds until morph folds the node to 0/1. * gentree.cpp (`gtNewSimdBinOpNode` small-integer shift path): after `fgMakeMultiUse` rewrites the existing `CreateScalar` operand into a COMMA containing a STORE_LCL_VAR, propagate the new child's side-effect flags onto the parent HWINTRINSIC. Measured on SuperPMI full replay: Missing-flags diagnostics: 1828 -> 10 (total across 11 MCHs) No asm diffs expected; the flags were already semantically required. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…ndow PHASE_MORPH_IMPBYREF retypes implicit byref args from TYP_STRUCT to TYP_BYREF but does not rewrite existing LCL_FLD/STORE_LCL_FLD references into them; that rewrite happens later in fgMorphExpandImplicitByRefArg during PHASE_MORPH_GLOBAL. In the window between those two phases, the GTF_VAR_USEASG flag on such nodes was set for the original TYP_STRUCT size and may not match IsPartialLclFld computed against the new TYP_BYREF size, tripping the fgDebugCheckFlags invariant when CHECK_IR runs before PHASE_MORPH_GLOBAL. Replace the previous broad `activePhaseChecks &= ~CHECK_IR` suppression with a narrow predicate: a new debug-only `fgImplicitByRefLclFldsStale` flag gates the USEASG/IsPartialLclFld check so it is skipped only for STORE_LCL_FLD/LCL_FLD nodes whose local is an implicit-byref now typed TYP_BYREF. All other checks continue to run during the window. The flag must be true when the post-phase check fires at the end of PHASE_MORPH_IMPBYREF, so set it before DoPhase and clear it after PHASE_MORPH_GLOBAL. SuperPMI full replay: USEASG diagnostic count 309 max/MCH -> 0 across all 11 MCHs. No other assertion counts regress. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
PHASE_MORPH_GLOBAL rewrites the stale LCL_FLD/STORE_LCL_FLD references to the now-TYP_BYREF implicit byref args (via fgMorphExpandImplicitByRefArg), so by the time that phase's post-phase CHECK_IR runs the IR is consistent again. Clear fgImplicitByRefLclFldsStale before DoPhase(PHASE_MORPH_GLOBAL) instead of after it, so the full USEASG/IsPartialLclFld invariant check runs at the end of morph rather than being relaxed. SuperPMI full replay: USEASG count remains 0 across all 11 MCHs; no new assertions surfaced by tightening the scope. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
fgCompactBlock merges a target block into a predecessor, preserving the set of flags listed in BBF_COMPACT_UPD from the target. The list did not include BBF_RECURSIVE_TAILCALL, so when an earlier inline pass split a block holding a recursive tail call and later passes compacted the subsequent empty tail back into a new predecessor, the merged block lost BBF_RECURSIVE_TAILCALL even though the recursive tail call statement itself ended up in that block. With CHECK_IR now running before PHASE_MORPH_GLOBAL, this trips the `block->HasFlag(BBF_RECURSIVE_TAILCALL)` invariant for recursive tailcall nodes in fgDebugCheckFlags. Add BBF_RECURSIVE_TAILCALL to BBF_COMPACT_UPD so the merged block retains the annotation. SuperPMI full replay: BBF_RECURSIVE_TAILCALL assertion count 7 max/MCH (8 occurrences) -> 0 across all 11 MCHs. No other categories regress. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…MD MinMax fixup Three related fixes where IR construction/rewrite sites introduced GTF_ASG, GTF_CALL, or GTF_EXCEPT on a descendant without updating ancestor flags, triggering CHECK_IR 'Missing flags on tree' when that check runs before PHASE_MORPH_IMPBYREF: 1. HandleHistogramProbeInstrumentor::Instrument (class-profile instrumentation): wrap the existing per-statement walk so that when a probe is actually inserted, we call gtUpdateStmtSideEffects on the statement. The inserter rewrites 'this' args with COMMA(STORE, COMMA(helper-call, LCL_VAR)), introducing new ASG/CALL/EXCEPT flags that must be propagated up the statement. Remove the earlier narrower fix at the inserter that only updated the immediate CALL's flags. 2. ValueInstrumentor::Instrument (value-profile instrumentation): when the length arg is rewritten to COMMA(helper-call, LCL_VAR), OR the new arg's effect flags onto the parent CALL so GTF_ASG surfaces there. 3. gtNewSimdMinMaxNode FixupScalar path: after fgMakeMultiUse rewrites op2 (a CreateScalarUnsafe) into a COMMA(STORE, LCL_VAR) and that COMMA is re-attached to the MaxScalar/MinScalar HWINTRINSIC, propagate the new COMMA's effect flags onto the HWINTRINSIC so GTF_ASG is reflected. Reduces max 'Missing flags' count per MCH from 4 to 3 across 11 MCHs in SuperPMI replay, and reduces affected MCHs from 4 to 3. No regressions in Extra/USEASG/tailcall categories. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
… instrumentation The initial fix only updated the direct enclosing call, but the new GTF_ASG introduced by the wrapping COMMA(profile-helper, load) also needs to propagate up to ancestors (e.g., RETURN). Use the same gtUpdateStmtSideEffects pattern applied for HandleHistogramProbeInstrumentor: track per-statement whether any probe was inserted, and refresh the whole statement's side-effect flags when so. Eliminates the final Missing-flags CHECK_IR assertion (libraries_tests.run context #389863, StartsWith[long]) and brings the Missing category to 0 across all SuperPMI replay MCHs. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…r swap In gtNewSimdBinOpNode's Shift expansion, fgMakeMultiUse on op2->Op(1) wraps the shift count in COMMA(STORE, LCL_VAR). GTF_ASG was OR'd onto op2 (CreateScalar) immediately. For GT_RSH the subsequent std::swap pulls the COMMA out of op2 into shiftCountDup (used under maskAmountOp), leaving op2->Op(1) as a plain LCL_VAR with no side effects. The OR on op2 is now stale, yielding an Extra GTF_ASG on CreateScalar. Move the OR to after the swap so we pick up flags of whatever ends up as op2->Op(1) — the COMMA for non-RSH, the plain LCL_VAR (no extra flags) for RSH. Eliminates the Extra GTF_ASG on CreateScalar across Vector*.Shift* contexts and drops replay max Extra count from 109 to 101, and from 10 to 9 affected MCHs. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Two CHECK_IR Extra-flag patterns surface when CHECK_IR is enabled before PHASE_MORPH_IMPBYREF: 1. fgInsertInlineeArgument bashes single-use temp LCL_VARs in-place to the original arg expression. Parents of the bashed node (NULLCHECK, FIELD_ADDR, ARR_LENGTH, ...) keep stale GTF_EXCEPT when the new child cannot fault (e.g. IND with GTF_IND_NONNULL/NONFAULTING). Fix: after argument insertion, walk all inlinee statements and call gtUpdateStmtSideEffects so ancestor flags are recomputed. Also: at nullcheck creation time, consult the original arg node alongside the temp; if the original is non-null, skip the redundant nullcheck entirely. 2. Forward Substitution only refreshed side effects when the substituted node had GTF_ALL_EFFECT. But replacing a possibly-null local use with a known non-null expression (e.g. IND of static-readonly init field) removes a potential fault from ancestors like ARR_LENGTH and FIELD_ADDR. Fix: always refresh side effects on the receiving statement after substitution. Full SuperPMI replay is now clean across all 11 MCH collections. Asm diffs show 3 size improvements (-10 bytes, -0.63% PerfScore) and no regressions, reflecting elimination of redundant null/fault checks. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
|
Draft for now since I want to see CI's read on TP. There is a full inlinee tree walk that may need to be more surgical. |
There was a problem hiding this comment.
Pull request overview
Moves CHECK_IR earlier in the CoreCLR JIT phase pipeline so IR flag invariants (notably side-effect flags) are validated before morph, and adds targeted flag/side-effect refreshes to keep IR consistent during earlier phases.
Changes:
- Enable
PhaseChecks::CHECK_IRearlier (pre implicit-byref copy-omission), and scope a debug-only relaxation for a transient implicit-byrefUSEASGinvariant violation. - Refresh statement/node side-effect flags after various IR rewrites (local morphing, forward substitution, inlining RET_EXPR substitution/splitting, PGO probe insertion, GDV cloning prep).
- Adjust/propagate effect flags more precisely (e.g., use
GTF_ALL_EFFECTpropagation, ensure certain nodes carry requiredGTF_CALL/GTF_ASG/GTF_EXCEPTflags).
Reviewed changes
Copilot reviewed 12 out of 12 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| src/coreclr/jit/lclmorph.cpp | Refresh stmt side effects after local-address rewrites; mark retbuf-local-optimized calls with GTF_ASG. |
| src/coreclr/jit/indirectcalltransformer.cpp | Update stmt side effects after substituting clonable GT_RET_EXPR nodes. |
| src/coreclr/jit/importercalls.cpp | Propagate GTF_ALL_EFFECT into call flags; ensure specific intrinsic placeholder carries GTF_CALL. |
| src/coreclr/jit/gentree.cpp | Treat GT_RET_EXPR as requiring GTF_CALL; default calls to GTF_EXCEPT; propagate effects in some SIMD builders. |
| src/coreclr/jit/forwardsub.cpp | Always recompute receiving statement side-effect flags after forward substitution. |
| src/coreclr/jit/fgprofile.cpp | Recompute stmt side effects after histogram/value probe insertion mutates call argument trees. |
| src/coreclr/jit/fginline.cpp | Recompute stmt side effects after RET_EXPR substitution/splitting; adjust nullcheck decision; refresh inlinee stmt side effects after arg insertion. |
| src/coreclr/jit/fgdiagnostic.cpp | Relax USEASG invariant check during implicit-byref retype window; remove GT_QMARK CSE assert. |
| src/coreclr/jit/compiler.hpp | Clear GTF_EXCEPT for helper calls proven NoThrow now that calls default to GTF_EXCEPT. |
| src/coreclr/jit/compiler.h | Add debug flag fgImplicitByRefLclFldsStale to scope check relaxation. |
| src/coreclr/jit/compiler.cpp | Enable CHECK_IR earlier; bracket implicit-byref stale window with fgImplicitByRefLclFldsStale. |
| src/coreclr/jit/block.h | Include BBF_RECURSIVE_TAILCALL in BBF_COMPACT_UPD set. |
Comments suppressed due to low confidence (1)
src/coreclr/jit/gentree.cpp:21459
- In this path fgMakeMultiUse can rewrite op2->Op(1) to a COMMA(tempStore, lcl), and for GT_RSH the subsequent std::swap can replace that operand with a plain LCL_VAR. Using
op2->gtFlags |= ...only ever adds flags and does not clear any stale effects op2 may still be carrying from its pre-rewrite operand, which can trip CHECK_IR as an "extra flags" failure. After mutating/swapping the operand, please recompute op2's effect flags (e.g., via gtUpdateNodeSideEffects(op2) or equivalent) rather than OR-ing bits in.
|
Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch |
|
There is a decent TP hit in minopts, possibly from just the more "surgical" flag repairs. Will have to dig in and see what's up. |
…g shrinks LocalAddressVisitor::VisitStmt was calling gtUpdateStmtSideEffects whenever `m_stmtModified` was set, which triggered a full pre+post tree walk for every modified statement. Most lclmorph rewrites (e.g. local-address canonicalization at MorphLocalAddress) do not actually remove any side-effect bits, so the refresh is unnecessary in the common case. Track a separate `m_stmtSideEffectsShrunk` flag and only refresh ancestor flags when a rewrite is known to drop side effects (BashToConst, nullcheck NOP, IndirTransform::Nop, MorphLocalIndir's IND->LCL, address-comparison constant fold, and the promoted-struct-field replacements). For the common MorphLocalAddress site, only mark shrunk when the address node actually had side-effect bits before being reset to GTF_EMPTY. This restores the lclmorph throughput cost on Tier0 from +9.3% to under +1% on the aspnet2 minopts subset, and brings the overall Tier0 throughput delta on that subset from +1.65% down into noise (+0.05%). SPMI replay is clean and asmdiffs are unchanged. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
This PR moves CHECK_IR (detailed IR invariant checking, especially flag correctness) earlier in the JIT phase sequence and updates multiple transforms to keep side-effect flags consistent as IR is rewritten before/around morphing.
Changes:
- Enable
PhaseChecks::CHECK_IRearlier (before implicit-byref copy omission analysis) and relax one debug-only invariant during the implicit-byref retyping window. - Refresh statement side-effect flags (
gtUpdateStmtSideEffects) after various rewrites/substitutions that can add/removeGTF_*effects. - Tighten/adjust effect flag propagation in call/import and selected SIMD/HWIntrinsic node construction.
Reviewed changes
Copilot reviewed 12 out of 12 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| src/coreclr/jit/lclmorph.cpp | Tracks when local-morph rewrites may stale ancestor effect flags and refreshes stmt side effects accordingly; marks retbuf-as-local calls as GTF_ASG. |
| src/coreclr/jit/indirectcalltransformer.cpp | Updates stmt side effects when substituting GT_RET_EXPR during clonability scanning. |
| src/coreclr/jit/importercalls.cpp | Propagates GTF_ALL_EFFECT (not just GTF_GLOB_EFFECT) into call flags; marks IsKnownConstant intrinsic as call-like pre-morph. |
| src/coreclr/jit/gentree.cpp | Adjusts operators requiring call flags; makes new calls default to GTF_EXCEPT; fixes up effect propagation for certain SIMD construction paths. |
| src/coreclr/jit/forwardsub.cpp | Always refreshes side-effect flags on the target statement after forward substitution. |
| src/coreclr/jit/fgprofile.cpp | Refreshes stmt side effects after instrumentation inserts COMMA-wrapped helpers/temps. |
| src/coreclr/jit/fginline.cpp | Refreshes side-effect flags after RET_EXPR substitution/splitting; adds null-check decision refinement; refreshes side effects across inlinee statements after arg insertion. |
| src/coreclr/jit/fgdiagnostic.cpp | Relaxes USEASG vs IsPartialLclFld checking during implicit-byref retyping window; updates required call-flag set. |
| src/coreclr/jit/compiler.hpp | Clears GTF_EXCEPT for no-throw helpers now that call nodes default to GTF_EXCEPT. |
| src/coreclr/jit/compiler.h | Adds a debug-only flag to scope the implicit-byref “stale LCL_FLD” invariant relaxation. |
| src/coreclr/jit/compiler.cpp | Moves enabling of CHECK_IR earlier; scopes the implicit-byref stale window around relevant phases. |
| src/coreclr/jit/block.h | Ensures BBF_RECURSIVE_TAILCALL is preserved when compacting blocks. |
When LocalAddressVisitor rewrites an address tree to a known non-null LCL_ADDR, ancestor indirections that previously held GTF_EXCEPT due to a potentially-null address can have stale flags. The narrowing introduced in 98c29c7 only refreshed when the address itself had side effects, missing this case and causing CHECK_IR 'Extra flags on tree' assertions in the libraries.pmi MCH (e.g., context 126507) once the check runs after Allocate Objects + Local morph. Always set m_stmtUpdateSideEffects = true here so the per-statement gtUpdateStmtSideEffects pass at the end of the visitor recomputes ancestor flags. The other narrowing sites in lclmorph.cpp remain narrow (they handle cases where the rewritten subtree's faulting potential does not change). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…LocalAddress The previous commit refreshed unconditionally inside MorphLocalAddress, but only the ProcessIndirection wide-indir caller has a parent indir whose GTF_EXCEPT can be made stale by collapsing the address to a known non-null LCL_ADDR. The EscapeAddress caller has no such parent, so the original narrow condition (refresh only when the address tree itself had effects) is sufficient there. Restores the MinOpts portion of the lclmorph throughput improvement: aspnet2 MinOpts goes from +0.45% back to +0.26% (matching the pre-correctness-fix measurement) while keeping replay clean. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
Moves detailed IR invariant checking (notably effect/flag consistency) earlier in the JIT phase order, and updates/repairs effect-flag propagation in several transformations so CHECK_IR can run sooner without false positives.
Changes:
- Enable
PhaseChecks::CHECK_IRearlier (pre-morph) and scope a debug-only relaxation for the implicit-byrefUSEASG/IsPartialLclFldinvariant during the transient retyping window. - Add/expand
gtUpdateStmtSideEffectsrefreshes (and some new flag propagation) after IR rewrites that can add/removeGTF_ASG/GTF_CALL/GTF_EXCEPT. - Adjust call/intrinsic flag initialization and effect propagation (e.g., default
GTF_EXCEPTon calls; propagateGTF_ALL_EFFECTin more places).
Reviewed changes
Copilot reviewed 12 out of 12 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| src/coreclr/jit/lclmorph.cpp | Tracks when local-morph rewrites can invalidate ancestor effect flags and refreshes statement side effects accordingly; ensures retbuf-as-local calls carry GTF_ASG. |
| src/coreclr/jit/indirectcalltransformer.cpp | Refreshes statement side effects when substituting GT_RET_EXPR during GDV chaining analysis. |
| src/coreclr/jit/importercalls.cpp | Uses GTF_ALL_EFFECT propagation for indirect-call/control expressions and args; marks certain intrinsics with GTF_CALL. |
| src/coreclr/jit/gentree.cpp | Makes call nodes default to GTF_EXCEPT; extends OperRequiresCallFlag to GT_RET_EXPR; fixes missing effect propagation in some SIMD/HWI construction paths. |
| src/coreclr/jit/forwardsub.cpp | Always refreshes receiving statement side effects after forward substitution to avoid stale effect flags. |
| src/coreclr/jit/fgprofile.cpp | Refreshes statement side effects after PGO probe insertion when instrumentation modifies call argument trees. |
| src/coreclr/jit/fginline.cpp | Refreshes statement side effects after RET_EXPR substitution/splitting and after inline argument insertion (currently via full inlinee walk). |
| src/coreclr/jit/fgdiagnostic.cpp | Relaxes USEASG/IsPartialLclFld debug invariant during the implicit-byref transient window; removes one GT_QMARK CSE assert. |
| src/coreclr/jit/compiler.hpp | Clears GTF_EXCEPT for NoThrow helpers now that calls default to GTF_EXCEPT. |
| src/coreclr/jit/compiler.h | Adds debug-only fgImplicitByRefLclFldsStale flag to scope invariant relaxation. |
| src/coreclr/jit/compiler.cpp | Enables CHECK_IR earlier and toggles fgImplicitByRefLclFldsStale around implicit-byref retyping/morph window; removes later enabling. |
| src/coreclr/jit/block.h | Includes BBF_RECURSIVE_TAILCALL in BBF_COMPACT_UPD so compaction preserves this block property. |
…ng occurred After commit 6c6fbb3, fgInlinePrependStatements unconditionally walks every statement of every inlinee block and calls gtUpdateStmtSideEffects. This walk is only required when fgInsertInlineeArgument bashed a single-use LCL_VAR temp in place (the path that calls argSingleUseNode->ReplaceWith). Other paths (gtNewTempStore, gtUnusedValNode, gtTryRemoveBoxUpstreamEffects) do not mutate pre-existing inlinee body trees and therefore cannot leave stale side-effect flags on inlinee parents. Track whether any in-place bash occurred via a bool out-parameter from fgInsertInlineeArgument and skip the refresh loop entirely when none did. Throughput on aspnet2: FullOpts (full): -0.64% Tier1 subset: -0.77% Replay clean and no asm diffs across all 11 SPMI collections. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
|
TP impact is now down to about 0.5% worst case: diffs Not sure how much lower it can go. We may be able to claw some of this back when optimizing, eg simplifying some of the extra checks in forward sub. |
- fgdiagnostic.cpp: tighten the implicit-byref-stale guard to OperIs(GT_STORE_LCL_FLD). GT_LCL_FLD cannot reach the GT_STORE_LCL_VAR/GT_STORE_LCL_FLD switch arm, and GT_STORE_LCL_VAR is whole-local so its USEASG/IsPartialLclFld relationship is not affected by the implicit-byref retyping window. - gentree.cpp gtNewSimdMinMaxNode: in the MIN !isNumber/!isScalar branch the inner block had a stray unconditional overwrite of needsFixup. The first assignment (IsVectorZero) was the wrong predicate for this case; the comment block above states fixup is needed when cns is -0. Collapse the two blocks into a single else that sets needsFixup from IsVectorNegativeZero(simdBaseType), matching the intent. - fginline.cpp SubstitutePlaceholdersAndDevirtualizeWalker: rename m_substitutedRetExpr to m_needsSideEffectRefresh. The flag is set both when a RET_EXPR is substituted and when LateDevirtualization performs gtSplitTree; the new name reflects the actual role. Replay clean across all 11 SPMI collections. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
This PR moves PhaseChecks::CHECK_IR earlier in the JIT phase list (starting before implicit-byref copy omission analysis) and then adjusts several IR-mutating transforms to keep side-effect/flag invariants valid at the earlier check point. It also introduces a debug-only “stale implicit byref LCL_FLD” window to scope a temporary relaxation of the USEASG invariant.
Changes:
- Enable
CHECK_IRearlier and relax one debug invariant during the implicit-byref retype-to-rewrite window. - Add targeted
gtUpdateStmtSideEffectsrefreshes after rewrites that can add/removeGTF_*effect flags. - Adjust call/intrinsic flag propagation and defaults so
GTF_*summaries are consistent earlier.
Reviewed changes
Copilot reviewed 12 out of 12 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| src/coreclr/jit/lclmorph.cpp | Tracks when rewrites can stale ancestor effect flags and refreshes statement side effects. |
| src/coreclr/jit/indirectcalltransformer.cpp | Refreshes statement side effects if GT_RET_EXPR is substituted during clonability scouting. |
| src/coreclr/jit/importercalls.cpp | Switches effect propagation to GTF_ALL_EFFECT and annotates IsKnownConstant intrinsic with GTF_CALL. |
| src/coreclr/jit/gentree.cpp | Tightens/aligns required flags (e.g., GT_RET_EXPR call-flag requirement), adjusts call defaults, and fixes effect propagation in some SIMD helpers. |
| src/coreclr/jit/forwardsub.cpp | Always refreshes side-effect flags on the receiving statement after substitution. |
| src/coreclr/jit/fgprofile.cpp | Refreshes stmt side effects when probe insertion changes call subtree effects. |
| src/coreclr/jit/fginline.cpp | Adds conditional side-effect refresh after placeholder substitution/splits and after in-place arg temp bashing. |
| src/coreclr/jit/fgdiagnostic.cpp | Scopes a debug-only relaxation for USEASG checks during implicit-byref rewrite window; removes a GT_QMARK CSE assertion. |
| src/coreclr/jit/compiler.hpp | Updates helper-call flagging to clear GTF_EXCEPT for no-throw helpers (given new call defaults). |
| src/coreclr/jit/compiler.h | Adds debug-only fgImplicitByRefLclFldsStale state and updates inline argument helper signature. |
| src/coreclr/jit/compiler.cpp | Moves CHECK_IR enablement earlier and toggles the implicit-byref stale window around relevant phases. |
| src/coreclr/jit/block.h | Ensures BBF_RECURSIVE_TAILCALL is preserved during block compaction flag updates. |
| m_compiler->lvaSetHiddenBufferStructArg(lclNum); | ||
| escapeAddr = false; | ||
| callUser->gtCallMoreFlags |= GTF_CALL_M_RETBUFFARG_LCLOPT; | ||
| callUser->gtFlags |= GTF_ASG; |
| // op1 is not a known constant, we'll do the expansion in morph | ||
| retNode = new (this, GT_INTRINSIC) GenTreeIntrinsic(TYP_INT, op1, ni, method R2RARG(*entryPoint)); | ||
| retNode->gtFlags |= GTF_CALL; |
We only do a detailed check of some IR invariants (notably flags) after morph. Start moving the enabling earlier in the phase list (here, by two phases).
Allow a brief window between morphing implicit byrefs and morph where USEASG is not correct. We possibly should revisit the division of labor here.