Skip to content

Make adaptive routing variables runtime adjustable#18807

Open
timothy-e wants to merge 1 commit into
apache:masterfrom
timothy-e:timothy-e-enable-adaptive-routing-runtime-tuning
Open

Make adaptive routing variables runtime adjustable#18807
timothy-e wants to merge 1 commit into
apache:masterfrom
timothy-e:timothy-e-enable-adaptive-routing-runtime-tuning

Conversation

@timothy-e

Copy link
Copy Markdown
Contributor

Summary

Makes three adaptive routing parameters adjustable at runtime via cluster config changes:

  • hybrid.score.exponent — the exponent used in the hybrid score formula (O+A+B)^N * C
  • ewma.alpha — the smoothing factor for exponential moving averages (latency and in-flight requests)
  • autodecay.window.ms — the time threshold after which EMA values auto-decay toward zero

When any of these configs change, the new value is propagated to all existing ServerRoutingStatsEntry instances in both the SSE and MSE stats maps. New entries created after the change also use the updated values.

Motivation

Make it easier to tune adaptive routing!

Testing

We've deployed this internally and used it to tune all three variables.

The testing we did before merging internally:

Deployed to a QA cluster.

curl -X POST localhost:9000/cluster/configs -H "Content-Type: application/json" -d '{"pinot.broker.adaptive.server.selector.ewma.alpha": "0"}'
{"status":"Updated cluster config."}
image
curl -X POST localhost:9000/cluster/configs -H "Content-Type: application/json" -d '{"pinot.broker.adaptive.server.selector.hybrid.score.exponent": "0"}'
{"status":"Updated cluster config."}

Harder to prove it worked, but we do see Updated EWMA alpha to 0.0 and propagated to all entries. in the logs.

curl -X POST localhost:9000/cluster/configs -H "Content-Type: application/json" -d '{"pinot.broker.adaptive.server.selector.autodecay.window.ms": "100000"}'
{"status":"Updated cluster config."}
image

We can see the stats stay high for a lot longer before they drop down.

cc stripe-private-oss-forks/pinot-reviewers

Makes three adaptive routing parameters adjustable at runtime via cluster config changes:
- `hybrid.score.exponent` — the exponent used in the hybrid score formula `(O+A+B)^N * C`
- `ewma.alpha` — the smoothing factor for exponential moving averages (latency and in-flight requests)
- `autodecay.window.ms` — the time threshold after which EMA values auto-decay toward zero

When any of these configs change, the new value is propagated to all existing `ServerRoutingStatsEntry` instances in both the SSE and MSE stats maps. New entries created after the change also use the updated values.

Make it easier to tune adaptive routing!

[Deployed to rad-rose QA](https://amp.qa.corp.stripe.com/deploy/qa-deploy1.pdx.deploy.stripe.net%2Fdeploy_GVXCm9r7RJSy6QxZOK2XmA).

```
curl -X POST localhost:9000/cluster/configs -H "Content-Type: application/json" -d '{"pinot.broker.adaptive.server.selector.ewma.alpha": "0"}'
{"status":"Updated cluster config."}
```
<img width="741" alt="Screenshot 2026-06-03 at 4 37 32 pm" src="https://git.corp.stripe.com/user-attachments/assets/643a0a16-b182-42e5-a04d-48fb2609c27d" />

```
curl -X POST localhost:9000/cluster/configs -H "Content-Type: application/json" -d '{"pinot.broker.adaptive.server.selector.hybrid.score.exponent": "0"}'
{"status":"Updated cluster config."}
```
Harder to prove it worked, but we do see `Updated EWMA alpha to 0.0 and propagated to all entries.` in logscale

```
curl -X POST localhost:9000/cluster/configs -H "Content-Type: application/json" -d '{"pinot.broker.adaptive.server.selector.autodecay.window.ms": "100000"}'
{"status":"Updated cluster config."}
```
<img width="738" alt="Screenshot 2026-06-03 at 4 48 02 pm" src="https://git.corp.stripe.com/user-attachments/assets/b2e06eef-b0c0-45cf-af61-38bb3639b009" />
We can see the stats stay high for a lot longer before they drop down.

Stripe-Original-Repo: stripe-private-oss-forks/pinot
Stripe-Monotonic-Timestamp: v2/2026-06-11T17:05:04Z/0
Stripe-Original-PR: https://git.corp.stripe.com/stripe-private-oss-forks/pinot/pull/669
@codecov-commenter

codecov-commenter commented Jun 18, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 84.25197% with 20 lines in your changes missing coverage. Please review.
✅ Project coverage is 64.81%. Comparing base (d469cb1) to head (c8d5cee).
⚠️ Report is 8 commits behind head on master.

Files with missing lines Patch % Lines
...erver/routing/stats/ServerRoutingStatsManager.java 81.92% 6 Missing and 9 partials ⚠️
...e/pinot/common/utils/ExponentialMovingAverage.java 83.33% 1 Missing and 4 partials ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##             master   #18807      +/-   ##
============================================
+ Coverage     64.78%   64.81%   +0.03%     
  Complexity     1309     1309              
============================================
  Files          3381     3386       +5     
  Lines        209967   210247     +280     
  Branches      32891    32933      +42     
============================================
+ Hits         136020   136269     +249     
- Misses        62979    63002      +23     
- Partials      10968    10976       +8     
Flag Coverage Δ
custom-integration1 100.00% <ø> (ø)
integration 100.00% <ø> (ø)
integration1 100.00% <ø> (ø)
integration2 0.00% <ø> (ø)
java-21 64.81% <84.25%> (+0.03%) ⬆️
temurin 64.81% <84.25%> (+0.03%) ⬆️
unittests 64.81% <84.25%> (+0.03%) ⬆️
unittests1 56.99% <83.46%> (+0.03%) ⬆️
unittests2 37.25% <9.44%> (-0.01%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@timothy-e timothy-e marked this pull request as ready for review June 19, 2026 14:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants