[Query]: Adds ability to choose global vs local/focused statistics for FullTextScore#48431
[Query]: Adds ability to choose global vs local/focused statistics for FullTextScore#48431aayush3011 wants to merge 4 commits intoAzure:mainfrom
Conversation
|
/azp run java - cosmos - tests |
|
Azure Pipelines successfully started running 1 pipeline(s). |
There was a problem hiding this comment.
Pull request overview
This PR adds a CosmosFullTextScoreScope option to CosmosQueryRequestOptions that lets developers choose between GLOBAL (default, all partitions) and LOCAL (scoped to target partitions) BM25 statistics computation for hybrid search queries. It also fixes two bugs: a NPE in query plan caching for hybrid queries and a ConcurrentModificationException race condition in component query execution.
Changes:
- New
CosmosFullTextScoreScopeenum withGLOBAL/LOCALvalues, wired throughCosmosQueryRequestOptions - Bug fix: null guard for
queryInfointryCacheQueryPlan, and synchronized block ingetComponentQueryResultsto prevent concurrent modification - Tests updated with new partition key structure (
/pk) and new test methods for LOCAL/GLOBAL scope validation
Reviewed changes
Copilot reviewed 7 out of 7 changed files in this pull request and generated no comments.
Show a summary per file
| File | Description |
|---|---|
CosmosFullTextScoreScope.java |
New enum defining GLOBAL and LOCAL scopes |
CosmosQueryRequestOptions.java |
Public getter/setter for fullTextScoreScope |
CosmosQueryRequestOptionsImpl.java |
Implementation field, copy constructor, getter/setter |
HybridSearchDocumentQueryExecutionContext.java |
Uses scope to select statistics target ranges; synchronized fix for race condition |
DocumentQueryExecutionContextFactory.java |
Null guard for queryInfo in tryCacheQueryPlan |
CHANGELOG.md |
Documents new feature and bug fixes |
HybridSearchQueryTest.java |
Updated partition key, new tests for LOCAL/GLOBAL scope, updated expected results |
You can also share your feedback on Copilot code review. Take the survey.
|
/azp run java - cosmos - tests |
|
Azure Pipelines successfully started running 1 pipeline(s). |
| if (hybridSearchQueryInfo.getRequiresGlobalStatistics()) { | ||
| // When FullTextScoreScope is GLOBAL (default), use allFeedRanges for statistics. | ||
| // When LOCAL, use only targetFeedRanges for scoped statistics. | ||
| List<FeedRangeEpkImpl> statisticsTargetRanges = |
There was a problem hiding this comment.
just a QQ:
- does LOCAL require user to define a partitionKey? do we need any validation?
- instead of defining the partitionKey in the request options, what if the partitionKey is defined in the sql query? do we have tests to verify that?
- do we have tests to verify HPK cases if this is a valid case?
- do we have tests to verify query + feedRange if this is a valid case?
| /** | ||
| * Specifies the scope for computing BM25 statistics used by FullTextScore in hybrid search queries. | ||
| */ | ||
| public enum CosmosFullTextScoreScope { |
There was a problem hiding this comment.
enum here is fixed? no changes in future?
| */ | ||
| public CosmosFullTextScoreScope getFullTextScoreScope() { | ||
| CosmosFullTextScoreScope scope = this.actualRequestOptions.getFullTextScoreScope(); | ||
| return scope != null ? scope : CosmosFullTextScoreScope.GLOBAL; |
There was a problem hiding this comment.
maybe let this.actualRequestOptions to decide the default value, and here we just return this.actualRequestOptions.getFullTextScoreScope()? else it could cause an inconsistent behavior between the public options vs internal implementation options
Deep Review SummaryPR Intent: Adds Overall Assessment: The feature is well-designed and matches the .NET SDK's equivalent implementation. The bug fixes are correct. Main concerns are around the null-default inconsistency between public/Impl API layers, missing validation for the LOCAL-without-partition-key edge case, and test coverage gaps. Existing Comments: Only a Copilot summary review (0 inline comments). No overlap with findings below.
Top findings:
|
| * @return the full text score scope, or null if not set (defaults to GLOBAL behavior). | ||
| */ | ||
| public CosmosFullTextScoreScope getFullTextScoreScope() { | ||
| return this.fullTextScoreScope; |
There was a problem hiding this comment.
🟡 Recommendation · Reliability: Inconsistent Contract
Inconsistent null-default between public and Impl getters
The public CosmosQueryRequestOptions.getFullTextScoreScope() maps null to GLOBAL. This Impl getter returns raw null. The current usage site calls the public getter (safe), but any future internal code checking getFullTextScoreScope() == GLOBAL via the Impl class would get false for the default case — a latent correctness trap.
Suggestion: Make the Impl getter consistent:
``java
return this.fullTextScoreScope != null ? this.fullTextScoreScope : CosmosFullTextScoreScope.GLOBAL;
---
<sub>⚠️ AI-generated review — may be incorrect. Agree? → resolve the conversation. Disagree? → reply with your reasoning.</sub>| // When FullTextScoreScope is GLOBAL (default), use allFeedRanges for statistics. | ||
| // When LOCAL, use only targetFeedRanges for scoped statistics. | ||
| List<FeedRangeEpkImpl> statisticsTargetRanges = | ||
| this.cosmosQueryRequestOptions.getFullTextScoreScope() == CosmosFullTextScoreScope.LOCAL |
There was a problem hiding this comment.
🟡 Recommendation · Reliability: Silent Misconfiguration
LOCAL scope without partition key silently degenerates to GLOBAL
When CosmosFullTextScoreScope.LOCAL is set but no partition key is specified, targetFeedRanges equals allFeedRanges (all partitions are targeted). The statistics query silently covers all partitions — identical to GLOBAL. In multi-tenant scenarios (the primary use case for LOCAL), accidentally omitting the partition key defeats the entire purpose of the feature.
Suggestion: Add a warning log when LOCAL scope is set but targetFeedRanges equals allFeedRanges:
``java
if (scope == LOCAL && targetFeedRanges.size() == allFeedRanges.size()) {
logger.warn("LOCAL fullTextScoreScope set but no partition key filter specified; statistics computed across all partitions");
}
---
<sub>⚠️ AI-generated review — may be incorrect. Agree? → resolve the conversation. Disagree? → reply with your reasoning.</sub>| import static org.assertj.core.api.Assertions.fail; | ||
|
|
||
| @Ignore("TODO: Ignore these test cases until the public emulator is released.") | ||
| //@Ignore("TODO: Ignore these test cases until the public emulator is released.") |
There was a problem hiding this comment.
🟡 Recommendation · Maintainability: Ambiguous Change
@ignore annotation commented out instead of removed
The @Ignore had a TODO about waiting for the public emulator. Commenting it out (//@Ignore(...)) rather than removing it leaves ambiguity — is the emulator now available? Will CI break in environments without it?
Suggestion: If the emulator is now available, remove the annotation entirely. If not, keep it and explain the CI strategy.
| .collectList().block(); | ||
| assertThat(resultDocs).hasSize(15); | ||
| validateResults(Arrays.asList("51", "49", "24", "61", "54", "22", "2", "25", "75", "77", "57", "76", "66", "80", "85"), resultDocs); | ||
| validateResults(Arrays.asList("61", "54", "51", "49", "24", "2", "57", "22", "75", "25", "77", "76", "66", "80", "85"), resultDocs); |
There was a problem hiding this comment.
🟡 Recommendation · Maintainability: Unverifiable Change
Expected result orderings changed without explanation
Seven existing test assertions had their expected result orderings changed. The partition key was changed from /id to /pk, which changes the physical partition layout and consequently BM25 statistics and scoring order. Without explanation, reviewers cannot verify the new expected values are correct.
Suggestion: Add a comment explaining why the expected ordering changed (e.g., partition key change redistributes documents, altering BM25 statistics).
| } | ||
|
|
||
| @Test(groups = {"query", "split"}, timeOut = TIMEOUT) | ||
| public void hybridQueryRRFWithLocalStatisticsTest() { |
There was a problem hiding this comment.
🟡 Recommendation · Testing: Missing Coverage
No test for LOCAL scope without partition key
Tests cover LOCAL + partition key, but not the edge case where LOCAL is set without a partition key. Given the silent degeneration to GLOBAL behavior (finding above), this is the most likely user error scenario and should be tested to verify it degenerates gracefully.
| } | ||
|
|
||
| @Test(groups = {"query", "split"}, timeOut = TIMEOUT) | ||
| public void hybridQueryWithLocalStatisticsTest() { |
There was a problem hiding this comment.
🟢 Suggestion · Testing: Weak Assertion
LOCAL scope tests do not assert result ordering
The LOCAL scope tests assert result count and partition key values, but not the order of results. For a scoring/ranking feature, ordering is the primary output. Different score computation (local vs global statistics) should produce different orderings.
Suggestion: Add validateResults(Arrays.asList(...), results) assertions for LOCAL scope, similar to the existing GLOBAL tests.
| Map<String, PartitionedQueryExecutionInfo> queryPlanCache) { | ||
| if (canCacheQuery(partitionedQueryExecutionInfo.getQueryInfo()) && !queryPlanCache.containsKey(query.getQueryText())) { | ||
| QueryInfo queryInfo = partitionedQueryExecutionInfo.getQueryInfo(); | ||
| if (queryInfo != null && canCacheQuery(queryInfo) && !queryPlanCache.containsKey(query.getQueryText())) { |
There was a problem hiding this comment.
🟢 Suggestion · Maintainability: Debugging Aid
NPE fix could benefit from trace logging
The null guard silently skips caching for hybrid search queries (which use hybridSearchQueryInfo instead of queryInfo). A trace log would help diagnose query plan cache miss rates.
Suggestion:
``java
if (queryInfo == null) {
logger.trace("Skipping query plan caching: queryInfo is null (likely a hybrid search query)");
return;
}
---
<sub>⚠️ AI-generated review — may be incorrect. Agree? → resolve the conversation. Disagree? → reply with your reasoning.</sub>| } | ||
|
|
||
| @Test(groups = {"query", "split"}, timeOut = TIMEOUT) | ||
| public void hybridQueryWithLocalStatisticsTest() { |
There was a problem hiding this comment.
🟢 Suggestion · Cross-SDK: Test Coverage Gap
Test coverage gap relative to .NET SDK
The .NET SDK's equivalent implementation includes tests that validate score values differ between LOCAL and GLOBAL scope (not just result sets). The Java tests only verify result counts and partition key membership.
Suggestion: Consider adding a test that verifies LOCAL and GLOBAL produce different score rankings for the same query when data distribution differs across partitions.
| /** | ||
| * Specifies the scope for computing BM25 statistics used by FullTextScore in hybrid search queries. | ||
| */ | ||
| public enum CosmosFullTextScoreScope { |
There was a problem hiding this comment.
💬 Observation · Design: Enum Pattern
Enum pattern is consistent with codebase
CosmosFullTextScoreScope uses a plain Java enum, consistent with CosmosVectorDataType, IndexingMode, and 15 other enums in the Cosmos data plane models package (zero uses of ExpandableStringEnum in the data plane). Since this enum is client-side only (never serialized over the wire), omitting @JsonValue/toString() is acceptable. Per Azure SDK guidelines, plain enums are allowed for input-only types (exception 2).
| // Use concatMap to serialize component query initialization. The parent class has shared mutable | ||
| // state (documentProducers, metrics trackers) that is not thread-safe for concurrent access. | ||
| // Each component query still executes its partition queries in parallel via the inner flatMap. | ||
| return rewrittenQueryInfos.concatMap(queryInfo -> { |
There was a problem hiding this comment.
💬 Observation · Performance: Trade-off
concatMap serializes previously-parallel component query initialization
For queries with multiple FullTextScore components (e.g., RRF(FullTextScore(...), FullTextScore(...))), component query initialization is now sequential. Each component query still executes its per-partition queries in parallel internally. The latency impact depends on the number of components — typically 2-3, so the impact should be minor.
Description
Why?
Cosmos DB's implementation of FullTextScore computes BM25 statistics (term frequency, inverse document frequency, and document length) across all documents in the container, including all physical and logical partitions.
While this provides a valid and comprehensive representation of statistics for the entire dataset, it introduces challenges for several common use cases:
What?
This PR extends the flexibility of BM25 scoring so that developers can choose between:
How?
A new CosmosFullTextScoreScope enum and setFullTextScoreScope() method are added to CosmosQueryRequestOptions:
When CosmosFullTextScoreScope.LOCAL is set, the hybrid search aggregator uses only the query's target partition ranges (instead of all ranges) when executing the global statistics query. This is a client-side only change — no new HTTP headers are sent to the backend.
Bug Fixes (discovered during development)
When executing hybrid search queries with a partition key filter, getQueryInfo() returned null (hybrid search queries use hybridSearchQueryInfo instead), causing a NPE in query plan caching. Added a null guard.
The flatMap operator ran multiple component queries concurrently, but the parent class (ParallelDocumentQueryExecutionContextBase) has shared mutable state (documentProducers list, metrics trackers, retry counters) that is not thread-safe. This caused ConcurrentModificationException and IllegalArgumentException: retries must not be negative intermittently. Fixed by replacing flatMap with concatMap to serialize component query initialization and execution. Each component query still executes its per-partition queries in parallel; only the sequencing across component queries is serialized.
Testing
- GLOBAL scope (default) cross-partition returns all matching results
- Explicit GLOBAL matches default behavior
- LOCAL scope + pk="2" returns only pk="2" results
- LOCAL scope + pk="1" returns only pk="1" results
- RRF queries work with both LOCAL and GLOBAL scopes
All SDK Contribution checklist:
General Guidelines and Best Practices
Testing Guidelines