-
Notifications
You must be signed in to change notification settings - Fork 121
[CI] Explain Flaky Failures #641
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[CI] Explain Flaky Failures #641
Conversation
Created using spr 1.3.7 [skip ci]
Created using spr 1.3.7
This patch adds functionality to the premerge advisor to explain away flaky failures. For this, we just look to see that the failure observed is failing for a sufficient amount of time, in our case 200 commits. Pull Request: llvm#641
Created using spr 1.3.7 [skip ci]
Created using spr 1.3.7 [skip ci]
This patch adds functionality to the premerge advisor to explain away flaky failures. For this, we just look to see that the failure observed is failing for a sufficient amount of time, in our case 200 commits. Pull Request: llvm#641
| return None | ||
|
|
||
|
|
||
| def _try_explain_flaky_failure( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe a comment describing the particular definition of "flaky" this function is using would be handy?
It looks to me like it's testing if this failure has been seen before over at least a 200 commit range between oldest/newest sighting? What's to stop that being "it's been failing at HEAD for at least 200 revisions"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a function level docstring.
I've never seen a non-flaky test failure that has been failing for more than ~20 commits. I think a OOM more in range should be safe. I thought about also adding a heuristic around the longest running subsequence to catch cases like this, but I think it adds complexity for little benefit:
- As mentioned, 200 is about an OOM bigger than the ranges we've seen in practice.
- We're exactly matching the failure message, so the test would need to fail in the exact same way to trigger this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd probably still go for a test that verifies the failures are discontiguous from ToT rather than a heuristic that guesses that something seems unlikely to be a long-standing failure at HEAD - seems cheap/simple enough to do. Because bad advice can be pretty confusing.
But I guess it's a start.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I'll add in that functionality soon. It would be good to have it for the failing at head case too.
Created using spr 1.3.7
This patch adds functionality to the premerge advisor to explain away flaky failures. For this, we just look to see that the failure observed is failing for a sufficient amount of time, in our case 200 commits. Pull Request: llvm#641
Created using spr 1.3.7 [skip ci]
This patch adds functionality to the premerge advisor to explain away
flaky failures. For this, we just look to see that the failure observed
is failing for a sufficient amount of time, in our case 200 commits.