[CI] Explain Flaky Failures #641

boomanaiden154 · 2025-11-01T00:01:59Z

This patch adds functionality to the premerge advisor to explain away
flaky failures. For this, we just look to see that the failure observed
is failing for a sufficient amount of time, in our case 200 commits.

Created using spr 1.3.7 [skip ci]

Created using spr 1.3.7

This patch adds functionality to the premerge advisor to explain away flaky failures. For this, we just look to see that the failure observed is failing for a sufficient amount of time, in our case 200 commits. Pull Request: llvm#641

Created using spr 1.3.7 [skip ci]

Created using spr 1.3.7

Created using spr 1.3.7 [skip ci]

Created using spr 1.3.7

This patch adds functionality to the premerge advisor to explain away flaky failures. For this, we just look to see that the failure observed is failing for a sufficient amount of time, in our case 200 commits. Pull Request: llvm#641

dwblaikie · 2025-11-10T22:02:16Z

premerge/advisor/advisor_lib.py

    return None


+def _try_explain_flaky_failure(


Maybe a comment describing the particular definition of "flaky" this function is using would be handy?

It looks to me like it's testing if this failure has been seen before over at least a 200 commit range between oldest/newest sighting? What's to stop that being "it's been failing at HEAD for at least 200 revisions"?

Added a function level docstring.

I've never seen a non-flaky test failure that has been failing for more than ~20 commits. I think a OOM more in range should be safe. I thought about also adding a heuristic around the longest running subsequence to catch cases like this, but I think it adds complexity for little benefit:

As mentioned, 200 is about an OOM bigger than the ranges we've seen in practice.

We're exactly matching the failure message, so the test would need to fail in the exact same way to trigger this.

I'd probably still go for a test that verifies the failures are discontiguous from ToT rather than a heuristic that guesses that something seems unlikely to be a long-standing failure at HEAD - seems cheap/simple enough to do. Because bad advice can be pretty confusing.

But I guess it's a start.

Yeah, I'll add in that functionality soon. It would be good to have it for the failing at head case too.

Created using spr 1.3.7

This patch adds functionality to the premerge advisor to explain away flaky failures. For this, we just look to see that the failure observed is failing for a sufficient amount of time, in our case 200 commits. Pull Request: llvm#641

Created using spr 1.3.7 [skip ci]

Created using spr 1.3.7

boomanaiden154 added 2 commits October 31, 2025 17:01

[𝘀𝗽𝗿] changes to main this commit is based on

9b0c33d

Created using spr 1.3.7 [skip ci]

[𝘀𝗽𝗿] initial version

9e0442c

Created using spr 1.3.7

boomanaiden154 requested review from Keenuts, cmtice, dschuff, gburgessiv and lnihlen November 1, 2025 00:02

boomanaiden154 added 4 commits November 4, 2025 09:51

[𝘀𝗽𝗿] changes introduced through rebase

af62a5e

Created using spr 1.3.7 [skip ci]

rebase

6a15d74

Created using spr 1.3.7

[𝘀𝗽𝗿] changes introduced through rebase

a8bd906

Created using spr 1.3.7 [skip ci]

fix

95b770d

Created using spr 1.3.7

boomanaiden154 mentioned this pull request Nov 6, 2025

[CI] Make premerge_advisor_explain write comments llvm/llvm-project#166605

Merged

dwblaikie reviewed Nov 10, 2025

View reviewed changes

Add comment

eb6b6d8

Created using spr 1.3.7

boomanaiden154 requested a review from dwblaikie November 11, 2025 00:53

dwblaikie approved these changes Nov 12, 2025

View reviewed changes

ldionne and others added 2 commits November 14, 2025 09:42

[𝘀𝗽𝗿] changes introduced through rebase

c5a5786

Created using spr 1.3.7 [skip ci]

rebase

05f672c

Created using spr 1.3.7

boomanaiden154 changed the base branch from users/boomanaiden154/main.ci-explain-flaky-failures to main November 14, 2025 17:43

boomanaiden154 merged commit 3ae62e0 into main Nov 14, 2025
9 checks passed

boomanaiden154 deleted the users/boomanaiden154/ci-explain-flaky-failures branch November 14, 2025 17:43

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[CI] Explain Flaky Failures #641

[CI] Explain Flaky Failures #641

boomanaiden154 commented Nov 1, 2025

Uh oh!

dwblaikie Nov 10, 2025

Uh oh!

boomanaiden154 Nov 11, 2025

Uh oh!

dwblaikie Nov 12, 2025

Uh oh!

boomanaiden154 Nov 14, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

[CI] Explain Flaky Failures #641

[CI] Explain Flaky Failures #641

Conversation

boomanaiden154 commented Nov 1, 2025

Uh oh!

dwblaikie Nov 10, 2025

Choose a reason for hiding this comment

Uh oh!

boomanaiden154 Nov 11, 2025

Choose a reason for hiding this comment

Uh oh!

dwblaikie Nov 12, 2025

Choose a reason for hiding this comment

Uh oh!

boomanaiden154 Nov 14, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants