fix: treat 429 retry budget exhaustion as terminal in async job orchestrator#1048
Draft
devin-ai-integration[bot] wants to merge 3 commits into
Draft
fix: treat 429 retry budget exhaustion as terminal in async job orchestrator#1048devin-ai-integration[bot] wants to merge 3 commits into
devin-ai-integration[bot] wants to merge 3 commits into
Conversation
…strator Introduce RateLimitBudgetExhaustedException subclass of AirbyteTracedException. HttpClient raises this specific type when 429 retry budget is exhausted. AsyncJobOrchestrator treats it as a breaking exception, preventing cascading retries at orchestrator and platform levels. Co-Authored-By: bot_apk <apk@cognition.ai>
Contributor
Author
🤖 Devin AI EngineerI'll be helping with this pull request! Here's what you should know: ✅ I will automatically:
Note: I can only respond to comments from users who have write access to this repository. ⚙️ Control Options:
|
👋 Greetings, Airbyte Team Member!Here are some helpful tips and reminders for your convenience. 💡 Show Tips and TricksTesting This CDK VersionYou can test this version of the CDK using the following: # Run the CLI from this branch:
uvx 'git+https://github.com/airbytehq/airbyte-python-cdk.git@devin/1781128595-fix-rate-limit-budget-exhaustion#egg=airbyte-python-cdk[dev]' --help
# Update a connector to use the CDK from this branch ref:
cd airbyte-integrations/connectors/source-example
poe use-cdk-branch devin/1781128595-fix-rate-limit-budget-exhaustionPR Slash CommandsAirbyte Maintainers can execute the following slash commands on your PR:
|
Co-Authored-By: bot_apk <apk@cognition.ai>
…teLimitBudgetExhaustedException Co-Authored-By: bot_apk <apk@cognition.ai>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
When the
HttpClientexhausts its configured 429 retry budget (e.g.max_retriesin a manifest error handler), it raisesAirbyteTracedExceptionwithfailure_type=transient_error. TheAsyncJobOrchestrator._is_breaking_exceptiononly breaks onconfig_error, so the orchestrator retries job creation (default 3×), each triggering the full HTTP retry budget again. After orchestrator retries exhaust,_process_partitions_with_errorswraps the failure assystem_error, which the platform retries. Net result: a configured cap of 5 retries produces hundreds ofcreateReportcalls.Fix: Introduce
RateLimitBudgetExhaustedException(AirbyteTracedException)and use it in the rate-limit-exhaustion path ofHttpClient. The orchestrator's_is_breaking_exceptionnow also breaks on this type, so rate limit budget exhaustion propagates immediately instead of cascading through orchestrator and platform retry layers.Resolves https://github.com/airbytehq/oncall/issues/12850:
Breaking Change Evaluation
Not a breaking change.
RateLimitBudgetExhaustedExceptionis a subclass ofAirbyteTracedException, so all existingexcept AirbyteTracedExceptionhandlers continue to catch it. No spec, schema, state, or stream changes.Declarative-First Evaluation
N/A — this fix is in core CDK infrastructure (
HttpClient,AsyncJobOrchestrator,traced_exception), not in a declarative connector component.Test Coverage
test_given_rate_limit_budget_exhausted_when_start_job_then_break_immediately— verifies orchestrator breaks immediately (1 attempt, no retry)test_given_rate_limit_budget_exhausted_with_running_jobs_then_abort_and_break— verifies running jobs are aborted when rate limit exhaustion hitstest_given_429_budget_exhausted_then_raises_rate_limit_budget_exhausted_exception— verifiesHttpClientraises the specific exception type withtransient_errorfailure typetest_raise_on_http_errors_off_429— verifies the exception type and message matchLink to Devin session: https://app.devin.ai/sessions/a8474287696b42888abeacf371306470