Use unchecked indexing in integer _fmt_inner to keep bounds checks elided#155289
Use unchecked indexing in integer _fmt_inner to keep bounds checks elided#155289blackms wants to merge 1 commit intorust-lang:mainfrom
_fmt_inner to keep bounds checks elided#155289Conversation
|
Some changes occurred in integer formatting cc @tgross35 |
| //@ compile-flags: -Copt-level=3 | ||
| // Regression test for https://github.com/rust-lang/rust/issues/152061. | ||
| // | ||
| // `impl fmt::Display` for integers lowered through `_fmt_inner` in | ||
| // `library/core/src/fmt/num.rs` used to leave a `panic_bounds_check` | ||
| // path in optimized LLVM IR when LLVM failed to propagate the | ||
| // `assume`-based range information across LTO / `opt-level=z` | ||
| // boundaries. The implementation was rewritten to use | ||
| // `get_unchecked{_mut}` for the buffer writes, so the `panic_bounds_check` | ||
| // path must not appear regardless of the optimizer's propagation. |
There was a problem hiding this comment.
This is testing opt-level=3 but the problematic case seems to be opt-level=z. You probably want to use revisions to handle both. (//@ revisions: opt-3 opt-z then //@ compile-flags[opt-3]: ... and the same for opt-z)
library/core/src/fmt/num.rs
Outdated
| // SAFETY: `offset + 4 <= buf.len()` by the asserts above, and | ||
| // `pair1, pair2 < 100` so every `DECIMAL_PAIRS` index is `< 200` | ||
| // which is the exact length of `DECIMAL_PAIRS`. Using unchecked | ||
| // indexing here avoids relying on LLVM to elide the bounds | ||
| // checks, which can regress under `opt-level=z` + fat LTO | ||
| // (see https://github.com/rust-lang/rust/issues/152061). |
There was a problem hiding this comment.
These safety comments are wordy, please try to distill them. The reasoning can be split out of the safety comment since it's not part of the invariant.
|
Could you post a quick before+after codegen demo? I'm on the fence about whether we should bother with this at all, is there an LLVM bug? The entire standard library is going to be a lot larger and slower if trivial bounds checks can't be elided. |
…elided The `_fmt_inner` helper in `core::fmt::num` wrote into its `MaybeUninit<u8>` buffer through regular `[]` indexing and then relied on surrounding `core::hint::assert_unchecked` hints to convince LLVM to elide the resulting bounds checks. Under `-Copt-level=z` with fat LTO (and other configurations where the range information fails to propagate) the hints were dropped and the optimizer left a `panic_bounds_check` call on the hot integer-formatting path. Switch the per-pair buffer writes to `get_unchecked_mut` / `get_unchecked` on both `buf` and `DECIMAL_PAIRS`. Each `unsafe` block carries a short rationale comment (why unchecked indexing) and a tight `SAFETY` comment stating the invariants (`offset + N <= buf.len()`, and the numeric bound on the `DECIMAL_PAIRS` index). The existing `assert_unchecked`s are retained because they still document the `offset` / `buf.len()` invariants and help downstream codegen for the rest of the routine. Also add a codegen-llvm regression test with `opt-3` and `opt-z` revisions that checks the `Display` path for `usize` and `u64` does not contain any `panic_bounds_check`, with a paired sanity check to make sure the symbol itself is still emitted by the compiler for a real out-of-bounds index so the `CHECK-NOT`s cannot pass vacuously.
18e150a to
d3b2e82
Compare
|
Thanks for the review @tgross35! Addressed all three points in the amended commit and collected before/after data on the "should we bother" question. 1. Test revisions (
|
| BEFORE | AFTER | |
|---|---|---|
panic_bounds_check call sites in LTO IR |
29 | 26 |
Calls with the MCVE -4, 20 / -2, 10 signature (from _fmt_inner) |
2 | 0 |
Standalone _fmt_inner (u32) function with personality ptr @rust_eh_personality |
1 | 0 |
BEFORE uses Homebrew's stable rustc 1.94.1 (LLVM 21); AFTER uses stage1 built from this branch. Same source, same cargo flags. The two panic_bounds_check(i64 -2, i64 10, …) and (i64 -4, i64 20, …) calls that appear in the BEFORE IR are exactly the add nsw i64 %24, -4 pattern from the issue's MCVE. They disappear AFTER. Unblocking inlining of _fmt_inner also removes the separate u32 _fmt_inner function with its eh personality.
Binary __TEXT is page-aligned so I don't have a "N bytes smaller" number to show — the change is too small to cross a page. The IR-level elimination is the real measurable effect.
Given that LLVM 22 already fixes this upstream, I'm fine with either direction: land this as a defensive source-level guarantee on a hot formatting path, or close it and let 1.95 + LLVM 22 resolve the user impact. If you'd prefer to wait for LLVM 22, no objection from me — the motivation is genuinely weaker than it looked when I first reproduced the bug on stable. Your call.
Fixes #152061.
The
_fmt_innerhelper incore::fmt::numwrites into itsMaybeUninit<u8>buffer through regular[]indexing and relies on surroundingcore::hint::assert_uncheckedhints to convince LLVM to elide the resulting bounds checks. Under-Copt-level=zwith fat LTO the hints were dropped by LLVM 21 and the optimizer left apanic_bounds_checkcall on the hot integer-formatting path, regressing 1.92 → 1.93. LLVM 22 (shipping with 1.95) fixes the propagation upstream, so this change is primarily defensive: it removes the reliance on LLVM's propagation ofassert_unchecked-derived ranges across the_fmt_innerinliner/LTO boundary, and makesfmt::Displayfor integers usable today for 1.93 / 1.94 users on fat-LTO no-panic builds without waiting for LLVM 22.This PR switches the per-pair buffer writes to
get_unchecked_mut/get_uncheckedon bothbufandDECIMAL_PAIRS, in three places:Each
unsafeblock has a short rationale comment and a tightSAFETY:comment stating just the invariants (offset + N <= buf.len(), and the numeric bound on theDECIMAL_PAIRSindex). Theassert_uncheckeds themselves are retained — they still document theoffset/buf.len()invariant of the function and help downstream codegen for the parts of the routine outside theseunsafeblocks.A codegen-llvm regression test is added under
tests/codegen-llvm/issues/withopt-3andopt-zrevisions, checking that theDisplaypath forusizeandu64does not contain anypanic_bounds_check, paired with a sanity check that confirmspanic_bounds_checkis still the symbol emitted for a real non-elidable index — so theCHECK-NOTs cannot pass vacuously. Note: a single-crate codegen test can only check the caller's IR; the LTO-conditional aspect of the original bug is out of reach for this harness. A before/after IR demo against a real no_std-style bin with fat LTO is included in the review reply.r? libs
LLM disclosure: this PR was roughly a 50/50 collaboration between me and a Claude-based coding agent team. Agent-side: read
library/core/src/fmt/num.rs, identified the three_fmt_innerbounds-check call sites from the issue's MCVE, drafted theget_uncheckedconversion with the surroundingSAFETYcomments, and drafted the codegen-llvm regression test. Human-side: reproduced the original bug locally (stable + fat LTO + opt-z), reviewed the diff, validated that the patched stage1 eliminates the twopanic_bounds_check(-4, 20)/(-2, 10)calls from the LTO IR, and opened the PR. Happy to answer any question about the reasoning or the reproduction steps.