-
Notifications
You must be signed in to change notification settings - Fork 3
fix exponential backoff overflow #1512
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
There is a test program that demonstrates the problem: playground: https://go.dev/play/p/wBxP0SpRRkt |
|
Hey @hwh33, @myleshorton, @reflog, could one of you please take a look at this PR? |
|
@magras I'm on it, thank you for reporting |
reflog
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There may be a subtle bug: if the first request fails before any success, nextWait starts at zero (Go's default for time.Duration):
s.nextWait = retryWaitSeconds // only set on success
On first failure → wait = 0 → immediate retry with no backoff.
Recommendation: Initialize nextWait to retryWaitSeconds in the struct or handle the zero case in backoff():
if s.nextWait == 0 {
s.nextWait = retryWaitSeconds
}
The `backoff` function's wait time calculation completely overflows `time.Duration` on the 55th retry (approximately after 6 hours). This results in zero wait times, leading to the uncontrolled spawn of hundreds of goroutines, which can cause memory exhaustion and OOM kill on linux.
I don't know go, but I found no native way to achieve that. Hence I went the second way, but slightly deviated from it: now retry timeout is clamped between min and max. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR fixes a critical exponential backoff overflow bug that could lead to memory exhaustion and OOM kill. The previous implementation used math.Pow(2, float64(failCount)) which overflows time.Duration after approximately 55 retries (6 hours), resulting in zero wait times and uncontrolled goroutine spawn.
Key Changes:
- Replaced counter-based exponential calculation with direct duration tracking to prevent overflow
- Renamed
retryWaitSecondstominRetryWaitfor clarity - Changed struct field from
failCount inttonextWait time.Duration
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
The
backofffunction's wait time calculation completely overflowstime.Durationon the 55th retry (approximately after 6 hours). This results in zero wait times, leading to the uncontrolled spawn of hundreds of goroutines, which can cause memory exhaustion and OOM kill on linux.