Add broker drain API#18709
Conversation
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## master #18709 +/- ##
============================================
- Coverage 64.75% 64.74% -0.01%
Complexity 1319 1319
============================================
Files 3391 3393 +2
Lines 210891 211085 +194
Branches 33105 33129 +24
============================================
+ Hits 136552 136677 +125
- Misses 63320 63385 +65
- Partials 11019 11023 +4
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Harness. 🚀 New features to boost your workflow:
|
xiangfu0
left a comment
There was a problem hiding this comment.
Found one high-signal issue; see inline comment.
| } | ||
|
|
||
| private void startDrain() { | ||
| if (!_draining.compareAndSet(false, true)) { |
There was a problem hiding this comment.
startDrain() marks the broker draining before getConnectedHelixManager() and the Helix writes succeed. Because BaseBrokerStarter exposes the admin API before _participantHelixManager.connect(), a startup-window POST /drain (or any Helix write failure here) returns 500 but still leaves _draining set, so later queries are rejected until restart. Please make this transition rollback-safe or only flip the local drain state after the Helix preconditions/writes succeed.
125baa7 to
9414905
Compare
Summary
POST /drainandGET /drain/statusadmin APIs.BrokerShuttingDownwhile waiting for already accepted queries to finish.shutdownInProgress=true, remove it frombrokerResource, fail health checks while draining, and restore stale drain state on startup.User manual
To gracefully drain the broker and stop it after accepted queries finish:
curl -X POST "http://<broker-host>:<admin-port>/drain?timeoutMs=-1&shutdown=true"timeoutMs=-1usespinot.broker.delayShutdownTimeMs. Useshutdown=falsewhen an external orchestrator should terminate the process after drain completes:While draining, the broker health check returns 503 and new valid queries sent to this broker fail with
BrokerShuttingDown(212), so clients should retry another broker.Sample query
If this query is sent to a draining broker, the broker returns
BrokerShuttingDowninstead of accepting it for execution.Tests
./mvnw -pl pinot-broker -am -Dtest=BrokerDrainManagerTest,BrokerRequestHandlerDelegateTest -Dsurefire.failIfNoSpecifiedTests=false test./mvnw spotless:apply -pl pinot-broker,pinot-spi,pinot-controller./mvnw checkstyle:check -pl pinot-broker,pinot-spi,pinot-controller./mvnw license:format -pl pinot-broker,pinot-spi,pinot-controller./mvnw license:check -pl pinot-broker,pinot-spi,pinot-controllergit diff --check upstream/master...HEADReview notes
shutdown=truenow delays the asynchronous stop callback briefly so the drain API response is not raced by admin server shutdown.