fix(proton): classify server errors as permanent in batch retry loops#70
Merged
fix(proton): classify server errors as permanent in batch retry loops#70
Conversation
Add retry classification so Proton server errors (deterministic rejects like duplicate columns, syntax errors) fail fast via backoff.Permanent instead of retrying for 5 minutes. Transient errors (network timeout, EOF, connection reset) continue to retry with exponential backoff. - isProtonPermanentError: string-pattern match on proto.Exception format - PermanentIfServerError: wraps server errors with backoff.Permanent - Applied at all retry sites: processBatch, BulkImportStreamColumnar, ExecContext, and outer import retry boundary in task_run.go - Add unit tests for permanent/transient classification and retry behavior Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…fication Replace string-only matching with three-level detection: typed *proton.Exception first, g.ErrType.OrigErr recursion second, tightened string fallback (requiring digits after "code: ") last. This eliminates dependence on rendered error strings as the primary classification contract. Also distinguishes permanent server rejection from retry-budget-exhausted in the proton-to-proton import error message, and aligns the outer retryWithBackoff InitialInterval with the database helper. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
backoff.RetryNotify unwraps *backoff.PermanentError before returning, so errors.As could never find it. Use a closure-captured flag set when PermanentIfServerError wraps the error instead. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
TestServerRejectionFlag verifies the pattern used in runProtonToProton: - permanent server error sets the closure flag (backoff unwraps PermanentError before returning, so errors.As cannot detect it) - transient error exhausting retry budget does NOT set the flag Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
backoff.Permanentinstead of retrying for up to 5 minutesprocessBatch,BulkImportStreamColumnar,ExecContext, and outer import retry boundary intask_run.goContext
Previously, all errors in batch retry loops were retried equally. A server rejection like "Column id specified more than once" would retry for the full 5-minute backoff budget, causing apparent deadlocks in CI. With this change, server errors are detected by matching the
proto.Exceptionstring format ("code: NN, message: ...") and wrapped withbackoff.Permanentto stop immediately.Test plan
TestPermanentServerError— 8 subtests verifying classification of server vs transient errorsTestTransientErrorRetries— proves permanent error = 1 attempt, transient error = retries until success🤖 Generated with Claude Code