fix(planner): correct retry wall-clock formula + lock table contract#212
Merged
fix(planner): correct retry wall-clock formula + lock table contract#212
Conversation
Follow-up to #209. Coral platform-engineer review caught two items post-merge: 1. The wall-clock formula in taskMaxRetries' godoc was wrong. The executor increments t.RetryCount BEFORE calling retryBackoff (executor.go:189-196), so the smallest argument ever passed is 1, not 0. retryBackoff(1)=10s, retryBackoff(2)=20s, retryBackoff(N>=3) caps at 30s. Real wall-clock: 10s for N=1; 10s + 20s + 30s*(N-2) for N>=2. Operators sizing budgets from the prior comment would compute the wrong wall-clock — e.g. discoverPeersMaxRetries=20 was documented as ~9 min but the real value is ~9.5 min. 2. The existing test in archive_test.go catches the regression (0 != 20) but doesn't pin the lookup table itself. A direct TestTaskMaxRetries makes a table swap fail loud — e.g. swapping discoverPeersMaxRetries for genesisConfigureMaxRetries would compile fine but the table test would catch it. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Follow-up to #209. Coral platform-engineer review caught two items post-merge.
1. Wall-clock formula in
taskMaxRetriesgodoc was wrongThe executor at
internal/planner/executor.go:189-196incrementst.RetryCountbefore callingretryBackoff, so the smallest argument ever passed is 1, not 0:retryBackoff(1) = 10s,retryBackoff(2) = 20s,retryBackoff(N≥3)caps at 30s. Real wall-clock for N retries:The prior comment claimed
5s + 10s + 20s + 30s*(N-3) for N>=3— under-counts by one term and starts at the wrong value. Operators sizing budgets from the godoc would compute the wrong wall-clock — e.g.discoverPeersMaxRetries=20was documented as ~9 min but the real value is ~9.5 min.Bigger drift on the larger budgets:
genesisConfigureMaxRetries=180is documented inbootstrap.go:151-153as ~30 min but actual wall-clock is ~89.5 min. Fixingbootstrap.go's comment is out of scope here (separate pre-existing bug, file as follow-up if it bites).2. Lock the lookup table directly
The existing
TestArchivePlanner_WithPeers(archive_test.go:79-83) catches theMaxRetries != discoverPeersMaxRetriesregression but doesn't pin the lookup itself — a refactor that swapsdiscoverPeersMaxRetriesforgenesisConfigureMaxRetriesin the switch would compile fine and the assertion would still find a non-zero value. Adding a direct table test:Locks the contract. Adding a new retry-aware task type adds one switch entry + one map entry — no per-call-site test changes needed.
Test plan
go test ./...greenNote
Low Risk
Low risk: no runtime behavior changes, only a comment correction and a small unit test asserting the retry-budget mapping for task types.
Overview
Corrects the
taskMaxRetriesgodoc to reflect the executor’s actual retry/backoff sequence (sinceRetryCountis incremented beforeretryBackoffis called).Adds
TestTaskMaxRetriesto directly assert the task-type→max-retries mapping (including unknown/empty task types returning0), helping prevent accidental regressions during refactors.Reviewed by Cursor Bugbot for commit 252dc11. Bugbot is set up for automated code reviews on this repo. Configure here.