Skip to content

fix: treat 429 retry budget exhaustion as terminal in async job orchestrator#1048

Draft
devin-ai-integration[bot] wants to merge 3 commits into
mainfrom
devin/1781128595-fix-rate-limit-budget-exhaustion
Draft

fix: treat 429 retry budget exhaustion as terminal in async job orchestrator#1048
devin-ai-integration[bot] wants to merge 3 commits into
mainfrom
devin/1781128595-fix-rate-limit-budget-exhaustion

Conversation

@devin-ai-integration

Copy link
Copy Markdown
Contributor

Summary

When the HttpClient exhausts its configured 429 retry budget (e.g. max_retries in a manifest error handler), it raises AirbyteTracedException with failure_type=transient_error. The AsyncJobOrchestrator._is_breaking_exception only breaks on config_error, so the orchestrator retries job creation (default 3×), each triggering the full HTTP retry budget again. After orchestrator retries exhaust, _process_partitions_with_errors wraps the failure as system_error, which the platform retries. Net result: a configured cap of 5 retries produces hundreds of createReport calls.

Fix: Introduce RateLimitBudgetExhaustedException(AirbyteTracedException) and use it in the rate-limit-exhaustion path of HttpClient. The orchestrator's _is_breaking_exception now also breaks on this type, so rate limit budget exhaustion propagates immediately instead of cascading through orchestrator and platform retry layers.

# traced_exception.py — new subclass
class RateLimitBudgetExhaustedException(AirbyteTracedException): ...

# http_client.py line ~309 — raise specific type instead of base
-   raise AirbyteTracedException(failure_type=transient_error, ...)
+   raise RateLimitBudgetExhaustedException(failure_type=transient_error, ...)

# job_orchestrator.py _is_breaking_exception — also break on the new type
    or isinstance(exception, RateLimitBudgetExhaustedException)

Resolves https://github.com/airbytehq/oncall/issues/12850:

Breaking Change Evaluation

Not a breaking change. RateLimitBudgetExhaustedException is a subclass of AirbyteTracedException, so all existing except AirbyteTracedException handlers continue to catch it. No spec, schema, state, or stream changes.

Declarative-First Evaluation

N/A — this fix is in core CDK infrastructure (HttpClient, AsyncJobOrchestrator, traced_exception), not in a declarative connector component.

Test Coverage

  • test_given_rate_limit_budget_exhausted_when_start_job_then_break_immediately — verifies orchestrator breaks immediately (1 attempt, no retry)
  • test_given_rate_limit_budget_exhausted_with_running_jobs_then_abort_and_break — verifies running jobs are aborted when rate limit exhaustion hits
  • test_given_429_budget_exhausted_then_raises_rate_limit_budget_exhausted_exception — verifies HttpClient raises the specific exception type with transient_error failure type
  • Updated test_raise_on_http_errors_off_429 — verifies the exception type and message match

Link to Devin session: https://app.devin.ai/sessions/a8474287696b42888abeacf371306470

…strator

Introduce RateLimitBudgetExhaustedException subclass of AirbyteTracedException.
HttpClient raises this specific type when 429 retry budget is exhausted.
AsyncJobOrchestrator treats it as a breaking exception, preventing cascading
retries at orchestrator and platform levels.

Co-Authored-By: bot_apk <apk@cognition.ai>
@devin-ai-integration

Copy link
Copy Markdown
Contributor Author

🤖 Devin AI Engineer

I'll be helping with this pull request! Here's what you should know:

✅ I will automatically:

  • Address comments on this PR. Add '(aside)' to your comment to have me ignore it.
  • Look at CI failures and help fix them

Note: I can only respond to comments from users who have write access to this repository.

⚙️ Control Options:

  • Disable automatic comment, CI, and merge conflict monitoring

@github-actions

Copy link
Copy Markdown

👋 Greetings, Airbyte Team Member!

Here are some helpful tips and reminders for your convenience.

💡 Show Tips and Tricks

Testing This CDK Version

You can test this version of the CDK using the following:

# Run the CLI from this branch:
uvx 'git+https://github.com/airbytehq/airbyte-python-cdk.git@devin/1781128595-fix-rate-limit-budget-exhaustion#egg=airbyte-python-cdk[dev]' --help

# Update a connector to use the CDK from this branch ref:
cd airbyte-integrations/connectors/source-example
poe use-cdk-branch devin/1781128595-fix-rate-limit-budget-exhaustion

PR Slash Commands

Airbyte Maintainers can execute the following slash commands on your PR:

  • /autofix - Fixes most formatting and linting issues
  • /poetry-lock - Updates poetry.lock file
  • /test - Runs connector tests with the updated CDK
  • /prerelease - Triggers a prerelease publish with default arguments
  • /poe build - Regenerate git-committed build artifacts, such as the pydantic models which are generated from the manifest JSON schema in YAML.
  • /poe <command> - Runs any poe command in the CDK environment
📚 Show Repo Guidance

Helpful Resources

📝 Edit this welcome message.

Co-Authored-By: bot_apk <apk@cognition.ai>
@github-actions

github-actions Bot commented Jun 10, 2026

Copy link
Copy Markdown

PyTest Results (Fast)

4 112 tests  +3   4 100 ✅ +3   7m 14s ⏱️ -5s
    1 suites ±0      12 💤 ±0 
    1 files   ±0       0 ❌ ±0 

Results for commit bec34b9. ± Comparison against base commit fd95ecf.

♻️ This comment has been updated with latest results.

…teLimitBudgetExhaustedException

Co-Authored-By: bot_apk <apk@cognition.ai>
@github-actions

Copy link
Copy Markdown

PyTest Results (Full)

4 115 tests  +3   4 103 ✅ +4   11m 27s ⏱️ + 2m 2s
    1 suites ±0      12 💤 ±0 
    1 files   ±0       0 ❌  - 1 

Results for commit bec34b9. ± Comparison against base commit fd95ecf.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

0 participants