admission: compute cpu time token refill rates #157029

joshimhoff · 2025-11-06T20:53:53Z

admission: compute cpu time token refill rates

This commit introduces a linear model that computes cpu
time token refill rates. cpuTimeTokenFiller uses this
model to determine how many tokens to add per second to
the buckets in cpuTimeTokenGranter.

Fixes: #154471

Release note: None.

cockroach-teamcity · 2025-11-06T20:54:08Z

This change is

github-actions · 2025-11-06T20:59:48Z

Potential Bug(s) Detected

The three-stage Claude Code analysis has identified potential bug(s) in this PR that may warrant investigation.

Next Steps:
Please review the detailed findings in the workflow run.

Note: When viewing the workflow output, scroll to the bottom to find the Final Analysis Summary.

After you review the findings, please tag the issue as follows:

If the detected issue is real or was helpful in any way, please tag the issue with O-AI-Review-Real-Issue-Found
If the detected issue was not helpful in any way, please tag the issue with O-AI-Review-Not-Helpful

github-actions · 2025-11-06T21:13:16Z

Potential Bug(s) Detected

The three-stage Claude Code analysis has identified potential bug(s) in this PR that may warrant investigation.

Next Steps:
Please review the detailed findings in the workflow run.

Note: When viewing the workflow output, scroll to the bottom to find the Final Analysis Summary.

After you review the findings, please tag the issue as follows:

If the detected issue is real or was helpful in any way, please tag the issue with O-AI-Review-Real-Issue-Found
If the detected issue was not helpful in any way, please tag the issue with O-AI-Review-Not-Helpful

This commit introduces a linear model that computes cpu time token refill rates. cpuTimeTokenFiller uses this model to determine how many tokens to add per second to the buckets in cpuTimeTokenGranter. Fixes: cockroachdb#154471 Release note: None.

tbg

Looks good! @sumeerbhola should also comment, but I have confidence that at a technical level, this is sound. I had some implementation suggestions as well in the old PRs, but this is overall coming out nicely.

@tbg reviewed 7 of 7 files at r1, all commit messages.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @sumeerbhola)

pkg/util/admission/testdata/cpu_time_token_allocator line 74 at r1 (raw file):

tier1  3        2

# Test that burst budget will be respected, in case of refill rate change.

"burst budget" ambiguous. Let's always refer to things by their one sort of unique way. You mean the bucket size?

pkg/util/admission/testdata/cpu_time_token_allocator line 75 at r1 (raw file):


# Test that burst budget will be respected, in case of refill rate change.
resetInterval delta=1

Not having poked at the internals, I'm confused by what this means. We get delta=[1 1 1 1] in resetInterval. All the naming confusion aside, this means that all of the the buckets have grown by one and that we add [1 1 1 1] to them when we call with the updated capacities [6 5 4 3]. But that's clearly not how it works because the output below is different. Could you explain?

pkg/util/admission/cpu_time_token_granter.go line 251 at r1 (raw file):

// getTokensUsedInInterval returns the net number of tokens deducted from the
// buckets, since the last call to getTokensUsedInInterval.

This also, unexpectedly, resets it. Make this clear in comment and also rename resetTokensUsedInInterval() int64 (getting the old number is just a side-effect)

pkg/util/admission/testdata/cpu_time_token_granter line 25 at r1 (raw file):

kvtier0: tryGet(1) returned false

get-tokens-used

directive will also need rename.

pkg/util/admission/testdata/cpu_time_token_granter line 136 at r1 (raw file):

get-tokens-used
----
get-tokens-used-in-interval() returned 3

Would love some explanations for these, at least in enough places so that novice readers can piece together the remainder. These numbers currently come as a complete surprise to me. Sometimes it's zero when I think it should be nonzero, sometimes it's positive, sometimes it's negative, which I didn't know could happen.

pkg/util/admission/cpu_time_token_filler_test.go line 178 at r1 (raw file):

}

func TestCPUTimeTokenLinearModel(t *testing.T) {

Haven't reviewed yet, this comment serves as a reminder to myself.

pkg/util/admission/cpu_time_token_filler.go line 131 at r1 (raw file):

// cpuTimeTokenAllocator is to gradually allocate tokens every interval,
// while respecting bucketCapacity. The computation of the rate of tokens to add
// every interval is left to cpuTimeTokenLinearModel.

did you mean cpuTimeModel here?

pkg/util/admission/cpu_time_token_filler.go line 170 at r1 (raw file):

	}

	rates := a.model.getRefillRates()

This will pass 2x2 64-bit words via the stack (?).
Probably not an issue. Just made me check, since it's a common performance problem to accidentally pass large arrays via the stack and doing so at high frequency can really show up in benchmarks.
When done once per second for four words - probably doesn't hurt at all.

pkg/util/admission/cpu_time_token_filler.go line 175 at r1 (raw file):

		for kind := range rates[wc] {
			toAllocateTokens := allocateFunc(
				rates[wc][kind], a.allocated[wc][kind], expectedRemainingTicksInInterval)

I got myself really confused here and I think the "rates" is why. These aren't rates, right? They're the actual capacities of the buckets. This is only very clear when looking at granter.refill. I understand that the interval is 1s, so in that sense the capacity is also a rate, but please try to make this a lot clearer. My strong preference on this is to update verbiage and names throughout to refer to these as capacities, and also to use types for the [a][b]int64 slices so that capacities vs token deltas are clearly distinguished.

pkg/util/admission/cpu_time_token_filler.go line 185 at r1 (raw file):

// resetInterval is called to signal the beginning of a new interval. allocateTokens
// adds the desired number of tokens every interval.
func (a *cpuTimeTokenAllocator) resetInterval(skipFittingLinearModel bool) {

The bool smells a bit like a wart. I see that this is only called in start(), once, before entering the loop. Could this all clean itself up if the loop instead initialized intervalStart() with zero (so that the first iteration enters the resetInterval branch)? What is the desired state of the token bucket before the refill loop has started? It seems okay for it to be empty for a split second.

Another approach (one I'd probably favor since it's so straightforward) could be making cpuTimeModel.fit responsible for doing whatever a sensible thing might be on the first call (probably setting an effectively infinite size).

pkg/util/admission/cpu_time_token_filler.go line 190 at r1 (raw file):

		// call to fit to this one. We add it immediately to the bucket. The model itself
		// handles filtering.
		delta := a.model.fit()

Didn't quite get this. Let's say there's just one rate (for simplification) and it's 5. We fit the model, and the new rate is 4.
The comment says "difference from previous call to fit to this one" - I'd interpret this as: it returns the old rates (not the old delta - could be clearer) minus the new rates. But it's actually the other way around - new rate minus old rate: delta will be -1.
Okay, we call refill(-1,4). But why? If the bucket isn't full, we're subtracting from the bucket, but we only meant to change its capacity?
Probably missing something here.

If the old rate is 4 and the new one is 5: we're adding a token. That part makes sense - it gives the bucket a chance to fill up: if it was full before, it's full after. I don't know why it has to be that way - I would've expected it to keep current amount and just fill at updated rate - but it seems okay. Explanation for the behavior might be nice to give though, or just acking that it's an arbitrary choice.

pkg/util/admission/cpu_time_token_filler.go line 201 at r1 (raw file):

// cpuTimeModel abstracts cpuTimeLinearModel for testing.
type cpuTimeModel interface {

This is the place that needs the comments. I'll end here if I browse code and think "oh I wonder what this fit() thing returns"

pkg/util/admission/cpu_time_token_filler.go line 204 at r1 (raw file):

	init()
	fit() [numResourceTiers][numBurstQualifications]int64
	getRefillRates() [numResourceTiers][numBurstQualifications]int64

There are many of those [numTiers][numQuals] slices flying around. Sometimes they're deltas, sometimes they're refill rates, sometimes they're bucket sizes. I think that is confusing to have them all interchangeable. Can you see whether adding types for all of them makes things clearer? I expect that it will.

pkg/util/admission/cpu_time_token_filler.go line 231 at r1 (raw file):

// evaluation path, such as compaction. AC continuously fits a linear model:
// total-cpu-time = linearCorrectionTerm * reported-cpu-time, where a is forced to be
// in the interval [1, 20].

Explain why that interval. The lower bound is clear: 10ms of reported time is definitely worth at least 10ms of real time. (I actually don't know if that's entirely true for our accounting since we're counting up time slices or the like, but it's definitely true enough).
The upper bound (20) is more interesting. We're saying we're not willing to inflate the measured work by more than 20x to reflect measured aggregate CPU utilization. I think this is akin to saying that we're pretty sure that we're capturing at least 5% of the true CPU usage caused by our request. We may not be capturing all of it, but it shouldn't routinely happen that a request captures, say, 5ms of CPU but the true CPU is north of 100ms - if there's more than that, it must be caused by something else, unrelated to our request, and it is better not to model it because (I think) whatever it is may be less important than the work we have under our control, or not, who knows, it is just better to let the model highlight that it's reached its bounds and not try to be clever. It's basically "bug"/unexpected behavor territory.

pkg/util/admission/cpu_time_token_filler.go line 233 at r1 (raw file):

// in the interval [1, 20].
//
// The higher priority buckets have higher target utlizations than the lowest

typo here and one line down: utlization

pkg/util/admission/cpu_time_token_filler.go line 244 at r1 (raw file):

	// The time that fit was called last.
	lastFitTime time.Time
	// The comulative user/sys CPU time used since process start in

typo: comulative

pkg/util/admission/cpu_time_token_filler.go line 248 at r1 (raw file):

	totalCPUTimeMillis int64
	// The CPU capacity measured in vCPUs.
	cpuCapacity float64

In other places (mma) we measure capacity in nanosec/sec (so just a 64bit int). Maybe just rename this to vcpus to avoid mixing with that definition of capacity. I would suggest using an int here but you're doing float arithmetic anyway, and dealing with 1E9 numbers can be annoying in tests.

pkg/util/admission/cpu_time_token_filler.go line 249 at r1 (raw file):

	// The CPU capacity measured in vCPUs.
	cpuCapacity float64
	// The lineabr correction term, see the docs above cpuTimeTokenLinearModel.

typo: lineabr

My Goland shows typos like these btw, might be worth setting up if you haven't.

pkg/util/admission/cpu_time_token_filler.go line 296 at r1 (raw file):

	// Get used CPU time.
	totalCPUTimeMillis, _ := m.cpuMetricProvider.GetCPUInfo()

The capacity can change, btw. For example, a cgroup might be reconfigured. I'm sure things would break in multiple places today if that actually happened, but it seems fairly straightforward to be dynamic here? Or is there some reason capacity has to be fixed and only set in init? I could see init() just calling fit().

pkg/util/admission/cpu_time_token_filler.go line 308 at r1 (raw file):

	// Update multiplier.
	const lowCPUUtilFrac = 0.25
	isLowCPUUtil := intCPUTimeNanos < int64(float64(elapsedSinceLastFit)*m.cpuCapacity*lowCPUUtilFrac)

nit: elapsedSinceLastFit.Nanoseconds() is identical but reads more easily, I think.

pkg/util/admission/cpu_time_token_filler.go line 312 at r1 (raw file):

		// Ensure that low CPU utilization is not due to a flawed tokenToCPUTimeMultiplier
		// by multiplicatively lowering it until we are below the upperBound. If we are already
		// below uppperBound, we make no adjustment.

Putting an example here will save someone (like me) 10 minutes of reverse engineering what you're doing here every time we have to understand this. As a data point: I've tried and still don't.

pkg/util/admission/cpu_time_token_filler.go line 323 at r1 (raw file):

		tokenToCPUTimeMultiplier :=
			float64(intCPUTimeNanos) / float64(tokensUsed)
		// Mulitplier is forced into the interval [1, 20].

typo: Mulitplier

pkg/util/admission/cpu_time_token_filler.go line 331 at r1 (raw file):

			tokenToCPUTimeMultiplier = 1
		}
		// Decrease faster than increase. Giving out too many tokens can

Thinking out loud: the model is cpu_usage_nanos = f(request_nanos) = mult * request_nanos. So the bigger m is, the faster we refill the buckets. So lowering m is when we're tightening the belt. Okay, checks out.

pkg/util/admission/cpu_time_token_filler.go line 338 at r1 (raw file):

		}

		// Exponentially filter changes to the multiplier. 1s of data is noisy,

is "filter" the right work here? I (and the internet) would call this exponential smoothing.

pkg/util/admission/cpu_time_token_filler.go line 344 at r1 (raw file):

	}

	return m.updateRefillRates()

Catching a smell here in that this is its own little method. It's only called from .init otherwise. There's really no way to have there be only one path that goes through everything?

pkg/util/admission/cpu_time_token_filler.go line 350 at r1 (raw file):

// the admission.cpu_time_tokens.target_util cluster setting. updateRefillRates
// returns the delta refill rates. That is updateRefillRates returns the difference
// in tokens to add per interval (1s) from previous call to fit to this one.

I left a comment at the call site about the ambiguity in what is returned, just say explicitly that it's new minus old (unless I was on to something about questioning whether we need this returned at all, probably just confused though 😄 )

sumeerbhola

not finished reading. Flushing some initial comments.

@sumeerbhola reviewed all commit messages.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @joshimhoff and @tbg)

pkg/util/admission/cpu_time_token_filler.go line 231 at r1 (raw file):

Previously, tbg (Tobias Grieger) wrote…

Explain why that interval. The lower bound is clear: 10ms of reported time is definitely worth at least 10ms of real time. (I actually don't know if that's entirely true for our accounting since we're counting up time slices or the like, but it's definitely true enough).
The upper bound (20) is more interesting. We're saying we're not willing to inflate the measured work by more than 20x to reflect measured aggregate CPU utilization. I think this is akin to saying that we're pretty sure that we're capturing at least 5% of the true CPU usage caused by our request. We may not be capturing all of it, but it shouldn't routinely happen that a request captures, say, 5ms of CPU but the true CPU is north of 100ms - if there's more than that, it must be caused by something else, unrelated to our request, and it is better not to model it because (I think) whatever it is may be less important than the work we have under our control, or not, who knows, it is just better to let the model highlight that it's reached its bounds and not try to be clever. It's basically "bug"/unexpected behavor territory.

I agree we should say more here.
The reason we want to model it, even if the instrumentation should have been better (i.e., we should have captured more of the work explicitly) is that we do want to avoid very heavy queueing in the goroutine scheduler, which causes basic node/store health functionality to get starved. Say we have 20s of very heavy Go GC activity, or RPC activity, or something that can't be captured here -- it seems better to queue foreground work and limit the number of runnable goroutines (remember, we don't have the slot mechanism when this new scheme is running), than to throw up our hands and give up. I forget, but I think the number 20 was made up in the prototype.

pkg/util/admission/cpu_time_token_filler.go line 248 at r1 (raw file):

Previously, tbg (Tobias Grieger) wrote…

In other places (mma) we measure capacity in nanosec/sec (so just a 64bit int). Maybe just rename this to vcpus to avoid mixing with that definition of capacity. I would suggest using an int here but you're doing float arithmetic anyway, and dealing with 1E9 numbers can be annoying in tests.

It can't be an int because this (in the prototype) came from runtime.GetCPUCapacity() float64 which takes into account the cgroup, so can be fractional.

pkg/util/admission/cpu_time_token_filler.go line 296 at r1 (raw file):

Previously, tbg (Tobias Grieger) wrote…

The capacity can change, btw. For example, a cgroup might be reconfigured. I'm sure things would break in multiple places today if that actually happened, but it seems fairly straightforward to be dynamic here? Or is there some reason capacity has to be fixed and only set in init? I could see init() just calling fit().

+1 It definitely needs to be dynamic, and was in the prototype.

pkg/util/admission/cpu_time_token_filler.go line 233 at r1 (raw file):

// in the interval [1, 20].
//
// The higher priority buckets have higher target utlizations than the lowest

We have resourceTiers and within a tier, burstQualifications. I am guessing the "higher priority ..." is referring to the higher tier. Is that correct?
And if yes, the delta is fixed across the two burstQualifications in a tier, but not across tiers.
What is the utilization of the highest tier? We should probably set it to 100% since we will place the system tenant work in there.

sumeerbhola

@sumeerbhola reviewed 1 of 7 files at r1.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @joshimhoff and @tbg)

pkg/util/admission/cpu_time_token_filler.go line 312 at r1 (raw file):

Previously, tbg (Tobias Grieger) wrote…

Putting an example here will save someone (like me) 10 minutes of reverse engineering what you're doing here every time we have to understand this. As a data point: I've tried and still don't.

Perhaps something like
// With good integration with admission control, we hope to have a tokenToCPUTimeMultiplier < 2 (in experiments with the prototype, we have seen values as high as 3).
// Say tokenToCPUMultiplier was very high, say 10, and we observed < 25% utilization.
// We are paranoid about (a) an incorrect model being the cause of the low CPU utilization, (b) we expect the model to be more inaccurate when fit at low utilization, so we don't want to fit the model at this low utilization. So we adopt the heuristic of leaving the multiplier unchanged (to avoid (b)), except for when the multiplier is large enough to cause utilization < 25% (if the unknown true multiplier was 1) (to avoid (a)). The upperBound will allow admission of 1/upperBound which should cause CPU utilization > 25% if the unknown true multiplier was 1.

Also worth making a note above this code block that computes the tokenToCPUMultiplier that this is a heuristic.

pkg/util/admission/cpu_time_token_filler.go line 331 at r1 (raw file):

Previously, tbg (Tobias Grieger) wrote…

Thinking out loud: the model is cpu_usage_nanos = f(request_nanos) = mult * request_nanos. So the bigger m is, the faster we refill the buckets. So lowering m is when we're tightening the belt. Okay, checks out.

The code is the same as the prototype, but the explanation is not quite correct. The prototype decreases the multiplier faster, since a smaller multiplier allows for more admission (since fewer CPU tokens get consumed for each 1 nanos of work admitted). We don't want to waste CPU if the model was not quite correct. Say for some reason we estimated the multiplier to be 20 and it should have been 2. With a 0.5 alpha we would get 11, then 6.5, 4.25, 3.125, so we have multiple seconds of underadmission. I picked 0.8 out of a hat.

pkg/util/admission/cpu_time_token_filler.go line 344 at r1 (raw file):

Previously, tbg (Tobias Grieger) wrote…

Catching a smell here in that this is its own little method. It's only called from .init otherwise. There's really no way to have there be only one path that goes through everything?

I'm not sure I understand. Everything preceding this call in this method is about computing a new tokenToCPUMultiplier. In the init case we simply set that multiplier to 1. So there is a shared method that does the needful after computing the multiplier, which seems normal.

pkg/util/admission/cpu_time_token_filler.go line 69 at r1 (raw file):

	f.allocator.init()
	f.allocator.allocateTokens(1)
	f.allocator.resetInterval(true /* skipFittingLinearModel */)

The prototype had two methods on cpuTimeTokenAdjusterNew: adjustLocked and allocateTokenTicksLocked. Before the ticker started it would call adjustLocked followed by allocateTokenTicksLocked(int64(time.Second / time.Millisecond). The next call after the ticker would typically be allocateTokenTicksLocked(999) (if the ticker was not slow). The same could work here with two methods:

type tokenAllocator interface {
	allocateTokens(remainingTicksInInInterval int64)
	resetInterval()
}

The first call to resetInterval would do the initialization.

pkg/util/admission/cpu_time_token_filler.go line 205 at r1 (raw file):

	fit() [numResourceTiers][numBurstQualifications]int64
	getRefillRates() [numResourceTiers][numBurstQualifications]int64
}

I am not sure we need getRefillRates. Since the model is being driven by the cpuTimeAllocator, it can store the copy of the refill rates returned by the model. In which case init would also return the refill rates. This may be related to @tbg's comments.

And I don't think the following delta logic belongs in the model:

			delta[tier][qual] = newRate - m.rates[tier][qual]
			m.rates[tier][qual] = newRate

The model is just the model providing the new per-second rate. How that is used to adjust the token bucket should be fully in the place where the token buckets are kept.

pkg/util/admission/cpu_time_token_filler.go line 256 at r1 (raw file):

}

type tokenUsageTracker interface {

Can you add a comment here. Also, add an implementation reminder that in addition to the tokens used in the regular granter we should also get the elastic cpu tokens used. Which corresponds to the following todo in the prototype:

	// NB: this is the measurement from the elasticCPUGranter, not the elastic
	// work that is being managed by our granter.
	intElasticTokensUsed := int64(0) /* TODO(sumeer): get elastic tokens used and reset */

joshimhoff force-pushed the model branch from ae2f070 to b3480df Compare November 6, 2025 20:54

github-actions bot added the o-AI-Review-Potential-Issue-Detected AI reviewer found potential issue. Never assign manually—auto-applied by GH action only. label Nov 6, 2025

joshimhoff force-pushed the model branch from b3480df to d1becb2 Compare November 6, 2025 21:09

joshimhoff marked this pull request as ready for review November 6, 2025 21:10

joshimhoff requested a review from a team as a code owner November 6, 2025 21:10

joshimhoff requested review from sumeerbhola and tbg November 6, 2025 21:10

joshimhoff force-pushed the model branch from d1becb2 to bec12cb Compare November 6, 2025 21:15

joshimhoff added the O-AI-Review-Real-Issue-Found AI reviewer found real issue label Nov 6, 2025

tbg requested changes Nov 10, 2025

View reviewed changes

sumeerbhola requested changes Nov 10, 2025

View reviewed changes

sumeerbhola requested changes Nov 11, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

admission: compute cpu time token refill rates #157029

admission: compute cpu time token refill rates #157029

joshimhoff commented Nov 6, 2025

Uh oh!

cockroach-teamcity commented Nov 6, 2025

Uh oh!

github-actions bot commented Nov 6, 2025

Uh oh!

github-actions bot commented Nov 6, 2025

Uh oh!

tbg left a comment

Uh oh!

sumeerbhola left a comment

Uh oh!

sumeerbhola left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

admission: compute cpu time token refill rates #157029

Are you sure you want to change the base?

admission: compute cpu time token refill rates #157029

Conversation

joshimhoff commented Nov 6, 2025

Uh oh!

cockroach-teamcity commented Nov 6, 2025

Uh oh!

github-actions bot commented Nov 6, 2025

Potential Bug(s) Detected

Uh oh!

github-actions bot commented Nov 6, 2025

Potential Bug(s) Detected

Uh oh!

tbg left a comment

Choose a reason for hiding this comment

Uh oh!

sumeerbhola left a comment

Choose a reason for hiding this comment

Uh oh!

sumeerbhola left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants