feat: add circuit breaker for upstream provider overload protection #75
+847
−7
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Summary
Implement per-provider circuit breakers that detect upstream rate limiting (429/503/529 status codes) and temporarily stop sending requests when providers are overloaded.
This completes the overload protection story by adding the aibridge-specific component that couldn't be implemented as generic HTTP middleware in coderd (since it requires understanding upstream provider responses).
Key Features
Circuit Breaker States
Status Codes That Trigger Circuit Breaker
Other error codes (400, 401, 500, 502, etc.) do not trigger the circuit breaker since they indicate different issues that circuit breaking wouldn't help with.
Default Configuration
EnabledfalseFailureThreshold5Window10sCooldown30sHalfOpenMaxRequests3New Prometheus Metrics
aibridge_circuit_breaker_state{provider}- Current state (0=closed, 1=open, 2=half-open)aibridge_circuit_breaker_trips_total{provider}- Total times circuit openedaibridge_circuit_breaker_rejects_total{provider}- Requests rejected due to open circuitFiles Changed
circuit_breaker.go- Core circuit breaker implementationcircuit_breaker_test.go- Comprehensive test suite (13 tests)bridge.go- Integration into RequestBridgeinterception.go- Apply circuit breaker to intercepted requestsmetrics.go- Add Prometheus metricsTesting
All tests pass:
Related
aibridgedinternal#1153