fix(dist): propagate v2 manifest checksum failure#4830
fix(dist): propagate v2 manifest checksum failure#4830cachebag wants to merge 2 commits intorust-lang:mainfrom
Conversation
…g "unchanged" When the v2 channel manifest's `.toml` did not match its published `.sha256`, `try_update_from_dist_` mapped `RustupError::ChecksumFailed` to `Ok(None)`. Callers interpret `Ok(None)` as "no update available", so `rustup update` printed only a `warn:` line, reported the toolchain as `unchanged`, and exited 0, even though the manifest could not be integrity-checked.
|
Reposting for grammar/spelling errors: Worth noting that this also tightens behaviour against static.rust-lang.org itself, where the v2 swallow was historically there to absorb brief mismatches during releases (cf. #3390). If you prefer, I think it would make sense to gate this on |
|
@cachebag Thanks for your investigation! As you have investigated already, I think the original behavior is somewhat intentional in that if the manifest update failed with checksum mismatch, it usually indicates an inconsistent state within the release server itself and in most of the cases it should be considered that the new Rust release is still in progress in the official release server.|
... and no, I don't want to make official release server that different in this sense; if the release server were to cause rustup to fail, the type of release server shouldn't have mattered. Swallowing the error in that routine completely is absolutely not ideal, but this does looks like the right thing to do so maybe we can try it first and see how it goes... In the worst of the cases we can downgrade that to a warning 🤔 |
|
Could we talk to infra about if such inconsistent states still happen and if they could be mitigated on their side somehow? |
@ChrisDenton I think we can try (cc @marcoieni) but since the error has been swallowed since a long time ago I doubt we can get any real feedback since. |
|
Could/should we retry it a few times and really fail if it still fails? |
|
@djc I think we could add retries but if this is mostly not a client-side problem then it should not help very much... Thinking about it, if each time we get the same "wrong" checksum then we do get more sure that this is a server-side problem, so retrying does have some benefits after all. |
|
@rami3l Also, I just want to mention that retries wouldn't have helped my case. The mirror was consistently serving stale content, so all retries would fail identically. |
|
@cachebag I suggest that we wait for a bit and see how the infra people relies. IMHO as you said this is most likely a server-side problem so it is hard to say whether issuing multiple retries will actually turns a failure into a success, what I added was just that the user could be more sure of the server-side problem if they happen to get the same shasum every time. |
Our nightly CI in a project I am working on started failing intermittently after we bumped MSRV. a bunch of agents were still on the old toolchain even though rustup update stable was reporting unchanged and exiting 0.
On one agent, rustc --version was a full release behind what our Artifactory mirror was serving. See below:
This PR drops the arm in
try_update_from_dist_so the error propagates.