Skip to content

[Query]: Adds ability to choose global vs local/focused statistics for FullTextScore#48431

Open
aayush3011 wants to merge 4 commits intoAzure:mainfrom
aayush3011:users/akataria/fullTextImprovements
Open

[Query]: Adds ability to choose global vs local/focused statistics for FullTextScore#48431
aayush3011 wants to merge 4 commits intoAzure:mainfrom
aayush3011:users/akataria/fullTextImprovements

Conversation

@aayush3011
Copy link
Member

@aayush3011 aayush3011 commented Mar 16, 2026

Description

Why?

Cosmos DB's implementation of FullTextScore computes BM25 statistics (term frequency, inverse document frequency, and document length) across all documents in the container, including all physical and logical partitions.

While this provides a valid and comprehensive representation of statistics for the entire dataset, it introduces challenges for several common use cases:

  • Multi-tenant scenarios: Tenants often operate in very different domains, which can significantly change the distribution and importance of keywords. Using global statistics leads to distorted relevance rankings for individual tenants.
  • Large containers with many partitions: Computing statistics across hundreds or thousands of physical partitions can be time-consuming and expensive. Customers may prefer statistics derived from only a subset of partitions to improve performance and reduce RU consumption.

What?

This PR extends the flexibility of BM25 scoring so that developers can choose between:

  • Global (default): FullTextScore computes BM25 statistics across all documents in the container, regardless of any partition key filters. This is the existing behavior.
  • Local: When a query includes a partition key filter, BM25 statistics are computed only over the subset of documents within the specified partition key values. Scores and ranking reflect relevance within that partition-specific slice of data.

How?

A new CosmosFullTextScoreScope enum and setFullTextScoreScope() method are added to CosmosQueryRequestOptions:

CosmosQueryRequestOptions options = new CosmosQueryRequestOptions();
options.setFullTextScoreScope(CosmosFullTextScoreScope.LOCAL);
options.setPartitionKey(new PartitionKey(tenantId));
   
container.queryItems(
   "SELECT TOP 10 * FROM c WHERE c.tenantId = @tenantId ORDER BY RANK FullTextScore(c.text, 'keywords')",
   options,
   Document.class
);

When CosmosFullTextScoreScope.LOCAL is set, the hybrid search aggregator uses only the query's target partition ranges (instead of all ranges) when executing the global statistics query. This is a client-side only change — no new HTTP headers are sent to the backend.

Bug Fixes (discovered during development)

  1. NullPointerException in DocumentQueryExecutionContextFactory.tryCacheQueryPlan:
    When executing hybrid search queries with a partition key filter, getQueryInfo() returned null (hybrid search queries use hybridSearchQueryInfo instead), causing a NPE in query plan caching. Added a null guard.
  2. Race condition in HybridSearchDocumentQueryExecutionContext.getComponentQueryResults:
    The flatMap operator ran multiple component queries concurrently, but the parent class (ParallelDocumentQueryExecutionContextBase) has shared mutable state (documentProducers list, metrics trackers, retry counters) that is not thread-safe. This caused ConcurrentModificationException and IllegalArgumentException: retries must not be negative intermittently. Fixed by replacing flatMap with concatMap to serialize component query initialization and execution. Each component query still executes its per-partition queries in parallel; only the sequencing across component queries is serialized.

Testing

  • Tests validate:
    - GLOBAL scope (default) cross-partition returns all matching results
    - Explicit GLOBAL matches default behavior
    - LOCAL scope + pk="2" returns only pk="2" results
    - LOCAL scope + pk="1" returns only pk="1" results
    - RRF queries work with both LOCAL and GLOBAL scopes

All SDK Contribution checklist:

  • The pull request does not introduce [breaking changes]
  • CHANGELOG is updated for new features, bug fixes or other significant changes.
  • I have read the contribution guidelines.

General Guidelines and Best Practices

  • Title of the pull request is clear and informative.
  • There are a small number of commits, each of which have an informative message. This means that previously merged commits do not appear in the history of the PR. For more information on cleaning up the commits in your PR, see this page.

Testing Guidelines

  • Pull request includes test coverage for the included changes.

@aayush3011 aayush3011 marked this pull request as ready for review March 16, 2026 21:10
@aayush3011 aayush3011 requested review from a team and kirankumarkolli as code owners March 16, 2026 21:10
Copilot AI review requested due to automatic review settings March 16, 2026 21:10
@aayush3011
Copy link
Member Author

/azp run java - cosmos - tests

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds a CosmosFullTextScoreScope option to CosmosQueryRequestOptions that lets developers choose between GLOBAL (default, all partitions) and LOCAL (scoped to target partitions) BM25 statistics computation for hybrid search queries. It also fixes two bugs: a NPE in query plan caching for hybrid queries and a ConcurrentModificationException race condition in component query execution.

Changes:

  • New CosmosFullTextScoreScope enum with GLOBAL/LOCAL values, wired through CosmosQueryRequestOptions
  • Bug fix: null guard for queryInfo in tryCacheQueryPlan, and synchronized block in getComponentQueryResults to prevent concurrent modification
  • Tests updated with new partition key structure (/pk) and new test methods for LOCAL/GLOBAL scope validation

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated no comments.

Show a summary per file
File Description
CosmosFullTextScoreScope.java New enum defining GLOBAL and LOCAL scopes
CosmosQueryRequestOptions.java Public getter/setter for fullTextScoreScope
CosmosQueryRequestOptionsImpl.java Implementation field, copy constructor, getter/setter
HybridSearchDocumentQueryExecutionContext.java Uses scope to select statistics target ranges; synchronized fix for race condition
DocumentQueryExecutionContextFactory.java Null guard for queryInfo in tryCacheQueryPlan
CHANGELOG.md Documents new feature and bug fixes
HybridSearchQueryTest.java Updated partition key, new tests for LOCAL/GLOBAL scope, updated expected results

You can also share your feedback on Copilot code review. Take the survey.

@aayush3011
Copy link
Member Author

/azp run java - cosmos - tests

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

if (hybridSearchQueryInfo.getRequiresGlobalStatistics()) {
// When FullTextScoreScope is GLOBAL (default), use allFeedRanges for statistics.
// When LOCAL, use only targetFeedRanges for scoped statistics.
List<FeedRangeEpkImpl> statisticsTargetRanges =
Copy link
Member

@xinlian12 xinlian12 Mar 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just a QQ:

  • does LOCAL require user to define a partitionKey? do we need any validation?
  • instead of defining the partitionKey in the request options, what if the partitionKey is defined in the sql query? do we have tests to verify that?
  • do we have tests to verify HPK cases if this is a valid case?
  • do we have tests to verify query + feedRange if this is a valid case?

/**
* Specifies the scope for computing BM25 statistics used by FullTextScore in hybrid search queries.
*/
public enum CosmosFullTextScoreScope {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

enum here is fixed? no changes in future?

*/
public CosmosFullTextScoreScope getFullTextScoreScope() {
CosmosFullTextScoreScope scope = this.actualRequestOptions.getFullTextScoreScope();
return scope != null ? scope : CosmosFullTextScoreScope.GLOBAL;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe let this.actualRequestOptions to decide the default value, and here we just return this.actualRequestOptions.getFullTextScoreScope()? else it could cause an inconsistent behavior between the public options vs internal implementation options

Copy link
Member

@xinlian12 xinlian12 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks

@xinlian12
Copy link
Member

Deep Review Summary

PR Intent: Adds CosmosFullTextScoreScope enum (GLOBAL/LOCAL) for controlling BM25 statistics scope in hybrid search queries. Also fixes NPE in query plan caching and race condition (flatMap to concatMap) in hybrid search execution context.

Overall Assessment: The feature is well-designed and matches the .NET SDK's equivalent implementation. The bug fixes are correct. Main concerns are around the null-default inconsistency between public/Impl API layers, missing validation for the LOCAL-without-partition-key edge case, and test coverage gaps.

Existing Comments: Only a Copilot summary review (0 inline comments). No overlap with findings below.

Severity Count
🟡 Recommendation 6
🟢 Suggestion 3
💬 Observation 2

Top findings:

  1. Inconsistent null-default between public (GLOBAL) and Impl (null) getters creates a latent correctness trap
  2. LOCAL scope without partition key silently degenerates to GLOBAL with no warning
  3. @Ignore annotation commented out instead of properly removed

⚠️ AI-generated review — may be incorrect. Agree? → resolve the conversation. Disagree? → reply with your reasoning.

* @return the full text score scope, or null if not set (defaults to GLOBAL behavior).
*/
public CosmosFullTextScoreScope getFullTextScoreScope() {
return this.fullTextScoreScope;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Recommendation · Reliability: Inconsistent Contract

Inconsistent null-default between public and Impl getters

The public CosmosQueryRequestOptions.getFullTextScoreScope() maps null to GLOBAL. This Impl getter returns raw null. The current usage site calls the public getter (safe), but any future internal code checking getFullTextScoreScope() == GLOBAL via the Impl class would get false for the default case — a latent correctness trap.

Suggestion: Make the Impl getter consistent:
``java
return this.fullTextScoreScope != null ? this.fullTextScoreScope : CosmosFullTextScoreScope.GLOBAL;

---
<sub>⚠️ AI-generated review — may be incorrect. Agree? → resolve the conversation. Disagree? → reply with your reasoning.</sub>

// When FullTextScoreScope is GLOBAL (default), use allFeedRanges for statistics.
// When LOCAL, use only targetFeedRanges for scoped statistics.
List<FeedRangeEpkImpl> statisticsTargetRanges =
this.cosmosQueryRequestOptions.getFullTextScoreScope() == CosmosFullTextScoreScope.LOCAL
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Recommendation · Reliability: Silent Misconfiguration

LOCAL scope without partition key silently degenerates to GLOBAL

When CosmosFullTextScoreScope.LOCAL is set but no partition key is specified, targetFeedRanges equals allFeedRanges (all partitions are targeted). The statistics query silently covers all partitions — identical to GLOBAL. In multi-tenant scenarios (the primary use case for LOCAL), accidentally omitting the partition key defeats the entire purpose of the feature.

Suggestion: Add a warning log when LOCAL scope is set but targetFeedRanges equals allFeedRanges:
``java
if (scope == LOCAL && targetFeedRanges.size() == allFeedRanges.size()) {
logger.warn("LOCAL fullTextScoreScope set but no partition key filter specified; statistics computed across all partitions");
}

---
<sub>⚠️ AI-generated review — may be incorrect. Agree? → resolve the conversation. Disagree? → reply with your reasoning.</sub>

import static org.assertj.core.api.Assertions.fail;

@Ignore("TODO: Ignore these test cases until the public emulator is released.")
//@Ignore("TODO: Ignore these test cases until the public emulator is released.")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Recommendation · Maintainability: Ambiguous Change

@ignore annotation commented out instead of removed

The @Ignore had a TODO about waiting for the public emulator. Commenting it out (//@Ignore(...)) rather than removing it leaves ambiguity — is the emulator now available? Will CI break in environments without it?

Suggestion: If the emulator is now available, remove the annotation entirely. If not, keep it and explain the CI strategy.


⚠️ AI-generated review — may be incorrect. Agree? → resolve the conversation. Disagree? → reply with your reasoning.

.collectList().block();
assertThat(resultDocs).hasSize(15);
validateResults(Arrays.asList("51", "49", "24", "61", "54", "22", "2", "25", "75", "77", "57", "76", "66", "80", "85"), resultDocs);
validateResults(Arrays.asList("61", "54", "51", "49", "24", "2", "57", "22", "75", "25", "77", "76", "66", "80", "85"), resultDocs);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Recommendation · Maintainability: Unverifiable Change

Expected result orderings changed without explanation

Seven existing test assertions had their expected result orderings changed. The partition key was changed from /id to /pk, which changes the physical partition layout and consequently BM25 statistics and scoring order. Without explanation, reviewers cannot verify the new expected values are correct.

Suggestion: Add a comment explaining why the expected ordering changed (e.g., partition key change redistributes documents, altering BM25 statistics).


⚠️ AI-generated review — may be incorrect. Agree? → resolve the conversation. Disagree? → reply with your reasoning.

}

@Test(groups = {"query", "split"}, timeOut = TIMEOUT)
public void hybridQueryRRFWithLocalStatisticsTest() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Recommendation · Testing: Missing Coverage

No test for LOCAL scope without partition key

Tests cover LOCAL + partition key, but not the edge case where LOCAL is set without a partition key. Given the silent degeneration to GLOBAL behavior (finding above), this is the most likely user error scenario and should be tested to verify it degenerates gracefully.


⚠️ AI-generated review — may be incorrect. Agree? → resolve the conversation. Disagree? → reply with your reasoning.

}

@Test(groups = {"query", "split"}, timeOut = TIMEOUT)
public void hybridQueryWithLocalStatisticsTest() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟢 Suggestion · Testing: Weak Assertion

LOCAL scope tests do not assert result ordering

The LOCAL scope tests assert result count and partition key values, but not the order of results. For a scoring/ranking feature, ordering is the primary output. Different score computation (local vs global statistics) should produce different orderings.

Suggestion: Add validateResults(Arrays.asList(...), results) assertions for LOCAL scope, similar to the existing GLOBAL tests.


⚠️ AI-generated review — may be incorrect. Agree? → resolve the conversation. Disagree? → reply with your reasoning.

Map<String, PartitionedQueryExecutionInfo> queryPlanCache) {
if (canCacheQuery(partitionedQueryExecutionInfo.getQueryInfo()) && !queryPlanCache.containsKey(query.getQueryText())) {
QueryInfo queryInfo = partitionedQueryExecutionInfo.getQueryInfo();
if (queryInfo != null && canCacheQuery(queryInfo) && !queryPlanCache.containsKey(query.getQueryText())) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟢 Suggestion · Maintainability: Debugging Aid

NPE fix could benefit from trace logging

The null guard silently skips caching for hybrid search queries (which use hybridSearchQueryInfo instead of queryInfo). A trace log would help diagnose query plan cache miss rates.

Suggestion:
``java
if (queryInfo == null) {
logger.trace("Skipping query plan caching: queryInfo is null (likely a hybrid search query)");
return;
}

---
<sub>⚠️ AI-generated review — may be incorrect. Agree? → resolve the conversation. Disagree? → reply with your reasoning.</sub>

}

@Test(groups = {"query", "split"}, timeOut = TIMEOUT)
public void hybridQueryWithLocalStatisticsTest() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟢 Suggestion · Cross-SDK: Test Coverage Gap

Test coverage gap relative to .NET SDK

The .NET SDK's equivalent implementation includes tests that validate score values differ between LOCAL and GLOBAL scope (not just result sets). The Java tests only verify result counts and partition key membership.

Suggestion: Consider adding a test that verifies LOCAL and GLOBAL produce different score rankings for the same query when data distribution differs across partitions.


⚠️ AI-generated review — may be incorrect. Agree? → resolve the conversation. Disagree? → reply with your reasoning.

/**
* Specifies the scope for computing BM25 statistics used by FullTextScore in hybrid search queries.
*/
public enum CosmosFullTextScoreScope {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💬 Observation · Design: Enum Pattern

Enum pattern is consistent with codebase

CosmosFullTextScoreScope uses a plain Java enum, consistent with CosmosVectorDataType, IndexingMode, and 15 other enums in the Cosmos data plane models package (zero uses of ExpandableStringEnum in the data plane). Since this enum is client-side only (never serialized over the wire), omitting @JsonValue/toString() is acceptable. Per Azure SDK guidelines, plain enums are allowed for input-only types (exception 2).


⚠️ AI-generated review — may be incorrect. Agree? → resolve the conversation. Disagree? → reply with your reasoning.

// Use concatMap to serialize component query initialization. The parent class has shared mutable
// state (documentProducers, metrics trackers) that is not thread-safe for concurrent access.
// Each component query still executes its partition queries in parallel via the inner flatMap.
return rewrittenQueryInfos.concatMap(queryInfo -> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💬 Observation · Performance: Trade-off

concatMap serializes previously-parallel component query initialization

For queries with multiple FullTextScore components (e.g., RRF(FullTextScore(...), FullTextScore(...))), component query initialization is now sequential. Each component query still executes its per-partition queries in parallel internally. The latency impact depends on the number of components — typically 2-3, so the impact should be minor.


⚠️ AI-generated review — may be incorrect. Agree? → resolve the conversation. Disagree? → reply with your reasoning.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants