Skip to content

Conversation

@masseyke
Copy link
Member

@masseyke masseyke commented Nov 14, 2025

This fixes the N^2 performance problem described in #97222. In addition to restoring the previous partial fix (#130857), it does the following:

  1. IndicesQueryCache::getStats now accepts a Supplier so that we can only call IndicesQueryCache::getSharedRamSizeForAllShards if it is absolutely needed. This fixes an N^2 performance problem that Improving statsByShard performance when the number of shards is very large #130857 introduced. If a user called TransportIndicesStatsAction but did not request query cache stats, then before Improving statsByShard performance when the number of shards is very large #130857 we did not enter the N^2 loop (it was only entered if a user did request query cache stats). But after Improving statsByShard performance when the number of shards is very large #130857, we had the N^2 performance all the time. This is a pretty big problem for clusters with large shards since this is called very frequently (including every 30 seconds by a background task).
  2. It fixes the N^2 performance in TransportIndicesStatsAction by sharing state across all shardOperation calls on a single node using the new NodeContext feature from Adding NodeContext to TransportBroadcastByNodeAction #138057.

Closes #97222

… to avoid the N^2 performance in TransportIndicesStatsAction if the user did not ask for query cache stats (although it is still there if the user asks for query cache stats). It also avoid O(N) performance in TransportClusterStatsAction if the user did not ask for query cache stats
…performance when a user asks for query cache stats
@masseyke masseyke added >bug :Data Management/Stats Statistics tracking and retrieval APIs v9.3.0 labels Nov 14, 2025
@elasticsearchmachine
Copy link
Collaborator

Hi @masseyke, I've created a changelog YAML for you.

@masseyke masseyke requested a review from Copilot November 17, 2025 19:47
Copilot finished reviewing on behalf of masseyke November 17, 2025 19:49
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR fixes an N^2 performance issue in stats collection for clusters with large numbers of shards. The key improvements are: (1) making IndicesQueryCache::getStats accept a Supplier<Long> to defer expensive calculations until needed, and (2) using the NodeContext feature to share cache totals computation across all shard operations on a node.

  • Modified IndicesQueryCache to compute shared RAM sizes more efficiently by pre-calculating totals once per node
  • Updated TransportIndicesStatsAction and TransportClusterStatsAction to use cached suppliers for cache totals
  • Changed all callers of getStats() to provide the precomputed shared RAM supplier

Reviewed Changes

Copilot reviewed 12 out of 12 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
server/src/main/java/org/elasticsearch/indices/IndicesQueryCache.java Added methods to compute cache totals once and reuse them; changed getStats() to accept a Supplier<Long> for lazy evaluation; fixed typos in comments
server/src/main/java/org/elasticsearch/action/admin/indices/stats/TransportIndicesStatsAction.java Implemented NodeContext pattern using CachedSupplier to share cache totals across shard operations
server/src/main/java/org/elasticsearch/action/admin/cluster/stats/TransportClusterStatsAction.java Applied similar CachedSupplier pattern for cache totals computation
server/src/main/java/org/elasticsearch/indices/IndicesService.java Modified statsByShard() to precompute shared RAM once and pass it to each shard
server/src/main/java/org/elasticsearch/action/admin/indices/stats/CommonStats.java Updated getShardLevelStats() signature to accept Supplier<Long> for precomputed shared RAM
server/src/test/java/org/elasticsearch/indices/IndicesQueryCacheTests.java Updated all test calls to new getStats() API; added test helper methods; contains duplicate line that should be removed
server/src/test/java/org/elasticsearch/indices/IndicesServiceTests.java Updated mock setup to use argument matchers for new signature
server/src/test/java/org/elasticsearch/indices/IndicesServiceCloseTests.java Updated test calls to pass () -> 0L supplier to getStats()
server/src/test/java/org/elasticsearch/index/shard/IndexShardTests.java Updated test call to new getShardLevelStats() signature
server/src/test/java/org/elasticsearch/action/admin/cluster/stats/VersionStatsTests.java Updated test call to new getShardLevelStats() signature
docs/changelog/138126.yaml Added changelog entry for this PR
docs/changelog/130857.yaml Added changelog entry for related PR #130857

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
@masseyke
Copy link
Member Author

masseyke commented Nov 18, 2025

To test this, I stood up a single-node cluster and loaded 30,000 shards with cache-test.sh. I also modified IndicesQueryCache.INDICES_QUERIES_CACHE_ALL_SEGMENTS_SETTING to always be true (for the sake of simplicity and not having to remember to set that on my cluster).
After loading 30k+ shards, here are what query times to _stats looked like (more than 20s):

ubuntu@ip-172-31-14-192:~$ time curl localhost:9200/_stats > /dev/null
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 32.6M    0 32.6M    0     0  1643k      0 --:--:--  0:00:20 --:--:-- 10.4M

real	0m20.348s
user	0m0.010s
sys	0m0.016s
ubuntu@ip-172-31-14-192:~$ time curl localhost:9200/_stats > /dev/null
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 32.6M    0 32.6M    0     0  1595k      0 --:--:--  0:00:20 --:--:-- 8924k

real	0m20.975s
user	0m0.007s
sys	0m0.014s

After applying the change in this PR, restarting the node, waiting for all 30,000+ shards to become allocated, and running the warm-up query from cache-test.sh, here is what it looked like:

ubuntu@ip-172-31-14-192:~/async-profiler-4.2-linux-x64$ time curl localhost:9200/_stats > /dev/null
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 33.7M    0 33.7M    0     0  36.9M      0 --:--:-- --:--:-- --:--:-- 36.9M

real	0m0.922s
user	0m0.009s
sys	0m0.010s
ubuntu@ip-172-31-14-192:~/async-profiler-4.2-linux-x64$ time curl localhost:9200/_stats > /dev/null
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 33.7M    0 33.7M    0     0  37.2M      0 --:--:-- --:--:-- --:--:-- 37.2M

real	0m0.914s
user	0m0.012s
sys	0m0.006s

Looking at the flamegraph for CommonStats.getShardLevelStats() during that query, the purple part on the right is all that's left of IndicesQueryCache.getStats():
Screenshot 2025-11-18 at 9 28 48 AM
Digging in a little more to just IndicesQueryCache.getStats(), the majority of the IndicesQueryCache.getStats() time is as expected in the initialization of the CachedSupplier that has been added to TransportIndicesStatsAction:
Screenshot 2025-11-18 at 9 30 40 AM

Performing only a request for query_cache stats, the response time is faster:

ubuntu@ip-172-31-14-192:~/async-profiler-4.2-linux-x64$ time curl localhost:9200/_stats/query_cache > /dev/null
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 2484k    0 2484k    0     0  8996k      0 --:--:-- --:--:-- --:--:-- 9002k

real	0m0.285s
user	0m0.006s
sys	0m0.005s

And a request for something other than query_cache stats is also fast since we don't go through the query_cache path at all (one of the problems with the original fix):

ubuntu@ip-172-31-14-192:~/async-profiler-4.2-linux-x64$ time curl localhost:9200/_stats/indexing > /dev/null
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 4712k    0 4712k    0     0  16.1M      0 --:--:-- --:--:-- --:--:-- 16.2M

real	0m0.292s
user	0m0.004s
sys	0m0.005s

@masseyke masseyke marked this pull request as ready for review November 18, 2025 15:55
@elasticsearchmachine elasticsearchmachine added the Team:Data Management Meta label for data/management team label Nov 18, 2025
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-data-management (Team:Data Management)

Copy link
Contributor

@seanzatzdev seanzatzdev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Curious what this second changelog entry is for

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ohh it's because I reverted #130857 and then un-reverted it. So we got the changelog for #130857 and for this PR. I'll remove the one for 130857

@masseyke masseyke merged commit d74334e into elastic:main Nov 18, 2025
34 checks passed
@masseyke masseyke deleted the fix-stats-performance branch November 18, 2025 19:26
@masseyke
Copy link
Member Author

masseyke commented Nov 19, 2025

Adding one last data point -- I now have 82,632 shards on a single node, and GET _stats is taking about 2.5s. Looking at it in the profiler, the runtime proportions of different pieces seem to be staying the same -- we just have a lot more shards. Within that 2.5s, the slowest pieces (in descending order) are:

  • IndexShard::docStats
  • IndexShard::segmentStats
  • IndexShard::translogStats
  • IndexShard::storeStats
  • IndexShard::indexingStats
  • IndexQueryCache::getStats (the subject of this PR)
  • IndexShard::denseVectorStats

@masseyke
Copy link
Member Author

Note: This hasn't been backported yet, but will be in the near future.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Computing IndicesQueryCache stats is O(N²) in shard count

4 participants