Oximeter: cache compiled regex.#10698
Open
jmcarp wants to merge 1 commit into
Open
Conversation
Per #10552, oximeter spends about 2/3 of its cpu time compiling `TIMESERIES_NAME_REGEX`. This happens because we check each incoming sample against the regex in `validate_timeseries_name`, and compile the regex each time that function is invoked. To fix, this patch compiles `TIMESERIES_NAME_REGEX` at most once using `LazyLock`. Note: oximeter's observed cpu use isn't particularly high, but its throughput is more of a concern. When oximeter's database batcher receives samples faster than it can flush them from its queue, it drops old samples. So until we make a larger architectural change, such as sharding oximeter, we need to maintain high throughput in order to avoid dropping samples under high volume. This change alone may not avoid all dropped samples reported in #10552, but it should help, and has no obvious downsides. Part of #10552.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Per #10552, oximeter spends about 2/3 of its cpu time compiling
TIMESERIES_NAME_REGEX. This happens because we check each incoming sample against the regex invalidate_timeseries_name, and compile the regex each time that function is invoked. To fix, this patch compilesTIMESERIES_NAME_REGEXat most once usingLazyLock.Note: oximeter's observed cpu use isn't particularly high, but its throughput is more of a concern. When oximeter's database batcher receives samples faster than it can flush them from its queue, it drops old samples. So until we make a larger architectural change, such as sharding oximeter, we need to maintain high throughput in order to avoid dropping samples under high volume. This change alone may not avoid all dropped samples reported in #10552, but it should help, and has no obvious downsides.
Part of #10552.