GVN: invalidate moved-from droppable locals on Operand::Move#155296
GVN: invalidate moved-from droppable locals on Operand::Move#155296blackms wants to merge 1 commit intorust-lang:mainfrom
Conversation
`simplify_operand` was treating `Operand::Move(p)` and `Operand::Copy(p)`
identically and leaving the recorded `VnIndex` of `p.local` in place even after
the value had been consumed. A later aggregate rvalue with the same `VnIndex`
was then rewritten to `Operand::Copy(p.local)` by `visit_assign`, and
`StorageRemover` downgraded the original `Move` to a `Copy` as well, turning a
single move into a double-use of memory the callee had already dropped.
The reproducer is a pair of `FnOnce`-consumed `(Vec::new(),)` tuples, both of
which receive the same aggregate `VnIndex` because `Vec::new()` lowers to a
deterministic aggregate; GVN then made the second `call_once(move _7)` into
`call_once(copy _5)` after the first call had already taken ownership of `_5`.
This patch records `simplify_operand`'s `Move` branch separately and, when the
moved place's type `needs_drop`, drops everything GVN knows about the backing
local via a new `invalidate_local` helper. Calls reach this path through
`super_terminator` -> `visit_operand`, so terminator-arg moves are covered with
no extra plumbing. `!needs_drop` types are left untouched: for them `Move` and
`Copy` are observationally equivalent in post-borrowck MIR, so the existing
unification (e.g. moves of `&mut T`, `*mut T`, plain integer aggregates) is
preserved.
Three pre-existing mir-opt tests lose the unsound aggregate-to-copy rewrite as
a consequence:
- `tests/mir-opt/pre-codegen/try_identity.rs` (`old` desugaring): the
`Ok(x?)` identity is no longer collapsed into `_0 = copy _1` for the
generic `Result<T, E>`. The `new` desugaring is unaffected.
- `tests/mir-opt/gvn_copy_aggregate.rs::remove_storage_dead`: the rebuilt
`AlwaysSome::<T>::Some(_)` is no longer rewritten to `_0 = copy _3`.
- `tests/mir-opt/early_otherwise_branch_unwind.rs`: `EarlyOtherwiseBranch`
now emits a fresh `discriminant` read on the cleanup edge instead of
reusing the cached one.
The new tests pin the fix down:
- `tests/ui/mir/gvn-fnonce-miscompile.rs` runs the issue's MCVE under
`-Copt-level=3 -Zmir-enable-passes=+GVN` and asserts the second closure
sees a fresh, empty `Vec`.
- `tests/mir-opt/gvn_move_invalidation.rs` exercises the post-fix MIR shape
for both the droppable (no rewrite) and plain (constant-fold) cases.
|
Some changes occurred to MIR optimizations cc @rust-lang/wg-mir-opt |
|
I don't think this fix is correct; see #155241 (comment). |
|
Thanks for taking a look. I think there may be a mix-up: issue #155241 ended up containing two independent miscompiles, and the comment you linked is about the second one (@theemathas's This PR addresses the issue's first MCVE — the I verified each fix closes the MCVE it targets in isolation. If you have a specific case that reproduces on a tree with both PRs applied, please share — I haven't stress-tested combinations that might hit a third shape, and would want to know before this one goes further. |
|
What test case is fixed by this PR that is not fixed by #155291? |
|
So far there seems to be no indication that #155241 is a bug in GVN. Changing GVN likely merely works around the symptoms but not the cause of the bug. An LLM would of course not notice this, this requires a human to actually understand the problem. This is exactly why LLM-driven contributions are dangerous and generally unsuited for new contributors that are inexperienced in the codebase. |
|
Hi, we have seen that you've generated a lot of PRs recently. I'm going to ask that you slow down on opening PRs for the time being. Several of these are in very tricky areas of the compiler which can easily be wrong. And it's easy to overwhelm reviewers. Furthermore, recently we have seen waves of automated contributions that are not adequately self-reviewed, which is why we are being more cautious. Separately, there are concerns about #155296 and the potential for unsoundness (as mir-opts are often a source of unsoundness). Rather than try to engage in the PR thread about that PR, it would be better to discuss it on Zulip to talk through the change and why it may or may not be correct. Additionally, several reviewers have expressed concerns with these assisted contributions:
I am thus going to close the existing PRs from you that exhibit these characteristics that cause frustration with reviewers. If you would like to continue contributing:
Caution In my capacity as a venue moderator, I am hereby placing a temporary embargo on creation of additional new PRs from you before reviewers develop more confidence that you are self-reviewing and understand your contributions. The temporary embargo will be lifted if the reviewers have sufficient confidence in your contributions. This is a moderation warning. Before the temporary embargo is lifted, any further PRs from your account will be closed. If new PRs continue to be raised from your account before so, further moderation action may be applied. You may contact the moderation team to discuss your warning and temporary embargo. See our policy and contribution etiquette. |
Fixes #155241 (the MIR-GVN half of the issue; the
extern "rust-call"const-arg half is handled in #155291).Problem
simplify_operandwas treatingOperand::Move(p)andOperand::Copy(p)identically and leaving the recordedVnIndexofp.localin place even after the value had been consumed. A later aggregate rvalue with the sameVnIndexwas then rewritten toOperand::Copy(p.local)byvisit_assign, andStorageRemoverdowngraded the originalMoveto aCopyas well, turning a single move into a double-use of memory the callee had already dropped.The reproducer in the issue is a pair of
FnOnce-consumed(Vec::new(),)tuples, both of which receive the same aggregateVnIndexbecauseVec::new()lowers to a deterministicAggregatervalue. GVN then made the secondcall_once(move _7)intocall_once(copy _5)after the first call had already taken ownership of_5, leading to either anunreachable!()panic or heap corruption (SIGABRT on stable 1.94.1) at-Copt-level=2+.Fix
This patch records
simplify_operand'sMovebranch separately and, when the moved place's typeneeds_drop, drops everything GVN knows about the backing local via a newinvalidate_localhelper. Calls reach this path throughsuper_terminator -> visit_operand, so terminator-arg moves are covered with no extra plumbing.!needs_droptypes are intentionally left alone: for themMoveandCopyare observationally equivalent in the post-borrowck MIR GVN runs on, so the existing unification (e.g. moves of&mut T,*mut T, plain integer aggregates) is preserved.Soundness sketch
After
_5 = move _xGVN no longer believes_5holds a re-derivable value for any locally-trackedVnIndex, so it cannot rewrite a future_y = some_aggregate_with_same_vninto_y = copy _5. The invalidation is gated onneeds_dropbecause that is precisely when the post-move source is left in an "unspecified state": the callee may drop / move-out of the slot, and the original drop glue for_5may have been elided by drop elaboration (which ran before GVN).The narrower alternative — invalidate only at
TerminatorKind::Callarguments — was considered and rejected: the same unsoundness arises within a single basic block when a partial move (_x = move _5.field) is followed by another aggregate with the sameVnIndexas_5's, andStorageRemovercan no longer rescue that pattern.Test plan
run-passtesttests/ui/mir/gvn-fnonce-miscompile.rsruns the MCVE under-Copt-level=3 -Zmir-enable-passes=+GVNand asserts the second closure sees a fresh, emptyVec.tests/mir-opt/gvn_move_invalidation.rsexercises the post-fix MIR shape for both the droppable (no rewrite) and plain (constant-fold) cases.tests/mir-opt/pre-codegen/try_identity.rs(olddesugaring): theOk(x?)identity is no longer collapsed into_0 = copy _1for the genericResult<T, E>. Thenewdesugaring is unaffected.tests/mir-opt/gvn_copy_aggregate.rs::remove_storage_dead: the rebuiltAlwaysSome::<T>::Some(_)is no longer rewritten to_0 = copy _3.tests/mir-opt/early_otherwise_branch_unwind.rs:EarlyOtherwiseBranchnow emits a freshdiscriminantread on the cleanup edge instead of reusing the cached one.x test tests/mir-opt,tests/ui/mir,tests/ui/borrowck,tests/ui/closures,tests/ui/drop,tests/codegen-llvm,tests/assembly-llvmall pass onaarch64-apple-darwin --stage 1.x test tests/ui --stage 1made it past 13200/21004 cases with no failures before being interrupted for time.End-to-end verification
rustc 1.94.1:-Copt-level=2exit 134 (SIGABRT).-Copt-level=3 -Zmir-enable-passes=+GVNexit 133.Open questions
The following are genuine soundness / policy questions that survive this fix; flagging them for the mir-opt reviewers rather than trying to address them in this PR:
StorageRemover::visit_operandalso gate itsMove -> Copydowngrade on!needs_drop? This PR stops GVN from amplifying that downgrade into unsoundness, but the downgrade itself remains a footgun for any future pass that adds itself toreused_locals. Probably warrants a separate hardening PR.PassMode::Indirect { mode: ByMutRef, .. }) can be mutated by the callee. Today GVN ignores this entirely. This PR doesn't make that worse, but it's a real soundness gap that should be tracked separately.simplify_aggregate_to_copystill synthesisesOperand::Copy(place)from a place computed bytry_as_placewithout a proper liveness check. Theinvalidate_localmechanism here keeps the rewrite honest from the GVN side, but a principled liveness analysis would let us drop theneeds_dropheuristic and unify even droppable aggregates when it's safe — natural transition path to the "Option B" explored in the investigation.r? mir-opt
LLM disclosure: this PR was bootstrapped by a Claude-based coding agent team that read the GVN pass end-to-end, formed the root-cause hypothesis, and drafted the initial patch and tests. I then validated the work manually: reproduced the bug on system rustc 1.94.1 and on stage0, ran
x testacross mir-opt, ui/mir, ui/borrowck, ui/closures, ui/drop, codegen-llvm, and assembly-llvm onaarch64-apple-darwin --stage 1to confirm there are no regressions beyond the three pre-existing tests documented above, and reviewed the diff line-by-line before pushing. Happy to answer any question about the reasoning or rewrite anything that reads as machine-y.