-
Notifications
You must be signed in to change notification settings - Fork 2.2k
Nregion synchronous commit feature #47757
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Adding unit tests Refactoring and cleanup
# Conflicts: # sdk/cosmos/azure-cosmos/src/main/java/com/azure/cosmos/implementation/directconnectivity/ConsistencyWriter.java
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR adds support for N-region synchronous commit for less than strong consistency levels in the Azure Cosmos DB Java SDK. The feature introduces barrier write requests similar to global strong consistency, but specifically for N-region commit scenarios when the account has this capability enabled.
Changes:
- Added deserialization of
GlobalNRegionCommittedGLSNheader from backend RNTBD responses - Added
isNRegionSynchronousCommitEnabledproperty to DatabaseAccount and propagated it through GlobalEndpointManager and request context - Refactored ConsistencyWriter barrier request logic to support both global strong writes and N-region synchronous commit scenarios
- Added comprehensive unit tests for the new barrier request scenarios
Reviewed changes
Copilot reviewed 14 out of 14 changed files in this pull request and generated 8 comments.
Show a summary per file
| File | Description |
|---|---|
| RntbdConstants.java | Added GlobalNRegionCommittedGLSN response header constant with ID 0x0078 |
| RntbdResponseHeaders.java | Added deserialization support for GlobalNRegionCommittedGLSN header |
| WFConstants.java | Added GLOBAL_N_REGION_COMMITTED_GLSN backend header constant |
| StoreResponse.java | Added getNumberOfReadRegions() method to retrieve number of read regions from response headers |
| ConsistencyWriter.java | Refactored barrier request logic to support both global strong and N-region commit scenarios with dedicated helper methods |
| BarrierType.java | New enum to distinguish between barrier types (NONE, GLOBAL_STRONG_WRITE, N_REGION_SYNCHRONOUS_COMMIT) |
| RxDocumentClientImpl.java | Set NRegionSynchronousCommitEnabled flag in request context for write operations |
| RMResources.java | Added error message for N-region commit barrier not met scenarios |
| HttpConstants.java | Added sub-status code for N-region commit write barrier failures |
| GlobalEndpointManager.java | Added method to retrieve N-region synchronous commit enabled status from database account |
| DocumentServiceRequestContext.java | Added fields for N-region commit enabled flag and barrier type, renamed globalStrongWriteResponse to cachedWriteResponse |
| DatabaseAccount.java | Added method to check if N-region synchronous commit is enabled on the account |
| Constants.java | Added ENABLE_N_REGION_SYNCHRONOUS_COMMIT property constant |
| ConsistencyWriterTest.java | Added comprehensive unit tests for global strong and N-region commit barrier scenarios |
Comments suppressed due to low confidence (1)
sdk/cosmos/azure-cosmos/src/main/java/com/azure/cosmos/implementation/DocumentServiceRequestContext.java:170
- The clone method does not copy the two new fields (nRegionSynchronousCommitEnabled and barrierType) added to DocumentServiceRequestContext. When a context is cloned, these fields will be null/default in the cloned instance, which could lead to incorrect behavior during request processing. Add the following assignments to the clone method:
context.nRegionSynchronousCommitEnabled = this.nRegionSynchronousCommitEnabled;
context.barrierType = this.barrierType;
public DocumentServiceRequestContext clone() {
DocumentServiceRequestContext context = new DocumentServiceRequestContext();
context.forceAddressRefresh = this.forceAddressRefresh;
context.forceRefreshAddressCache = this.forceRefreshAddressCache;
context.requestChargeTracker = this.requestChargeTracker;
context.timeoutHelper = this.timeoutHelper;
context.resolvedCollectionRid = this.resolvedCollectionRid;
context.sessionToken = this.sessionToken;
context.quorumSelectedLSN = this.quorumSelectedLSN;
context.globalCommittedSelectedLSN = this.globalCommittedSelectedLSN;
context.cachedWriteResponse = this.cachedWriteResponse;
context.originalRequestConsistencyLevel = this.originalRequestConsistencyLevel;
context.readConsistencyStrategy = this.readConsistencyStrategy;
context.resolvedPartitionKeyRange = this.resolvedPartitionKeyRange;
context.resolvedPartitionKeyRangeForCircuitBreaker = this.resolvedPartitionKeyRangeForCircuitBreaker;
context.resolvedPartitionKeyRangeForPerPartitionAutomaticFailover = this.resolvedPartitionKeyRangeForPerPartitionAutomaticFailover;
context.regionIndex = this.regionIndex;
context.usePreferredLocations = this.usePreferredLocations;
context.locationIndexToRoute = this.locationIndexToRoute;
context.regionalRoutingContextToRoute = this.regionalRoutingContextToRoute;
context.performLocalRefreshOnGoneException = this.performLocalRefreshOnGoneException;
context.effectivePartitionKey = this.effectivePartitionKey;
context.performedBackgroundAddressRefresh = this.performedBackgroundAddressRefresh;
context.cosmosDiagnostics = this.cosmosDiagnostics;
context.resourcePhysicalAddress = this.resourcePhysicalAddress;
context.throughputControlRequestContext = this.throughputControlRequestContext;
context.replicaAddressValidationEnabled = this.replicaAddressValidationEnabled;
context.endToEndOperationLatencyPolicyConfig = this.endToEndOperationLatencyPolicyConfig;
context.unavailableRegionsForPartition = this.unavailableRegionsForPartition;
context.crossRegionAvailabilityContextForRequest = this.crossRegionAvailabilityContextForRequest;
return context;
| @Test | ||
| public void isBarrierRequest() { |
Copilot
AI
Jan 20, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The test method is missing the TestNG groups attribute that is present on similar test methods in this file. For consistency with the other test methods, this should include @test(groups = "unit").
| public long getNumberOfReadRegions() { | ||
| int numberOfReadRegions = -1; | ||
| String numberOfReadRegionsString = this.getHeaderValue(WFConstants.BackendHeaders.NUMBER_OF_READ_REGIONS); | ||
| if (StringUtils.isNotEmpty(numberOfReadRegionsString)) { | ||
| try { | ||
| return Long.parseLong(numberOfReadRegionsString); | ||
| } catch (NumberFormatException e) { | ||
| // If value cannot be parsed as Long, return -1. | ||
| } | ||
| } | ||
| return numberOfReadRegions; | ||
| } |
Copilot
AI
Jan 20, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The method declares a return type of long but initializes the local variable numberOfReadRegions as int (-1). This creates a type mismatch. The variable should be declared as long to match the return type and the Long.parseLong operation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
...smos/src/main/java/com/azure/cosmos/implementation/directconnectivity/ConsistencyWriter.java
Outdated
Show resolved
Hide resolved
.../src/test/java/com/azure/cosmos/implementation/directconnectivity/ConsistencyWriterTest.java
Outdated
Show resolved
Hide resolved
| options.setPartitionKeyDefinition(documentCollectionValueHolder.v.getPartitionKey()); | ||
|
|
||
| request.requestContext.setCrossRegionAvailabilityContext(crossRegionAvailabilityContextForRequest); | ||
| request.requestContext.setNRegionSynchronousCommitEnabled(this.globalEndpointManager.getNRegionSynchronousCommitEnabled()); |
Copilot
AI
Jan 20, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The setNRegionSynchronousCommitEnabled is only being set in the createDocument flow (line 2692), but N-region synchronous commit should apply to all write operations (replace, upsert, delete, patch). Based on the pattern where setCrossRegionAvailabilityContext is called in multiple places throughout this file (lines 2253, 2691, 3077, 3377, 3610, 3783, 3972, etc.), setNRegionSynchronousCommitEnabled should be set consistently in all similar places where write operations occur to ensure the feature works correctly for all write request types.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
| } | ||
|
|
||
| public Boolean getNRegionSynchronousCommitEnabled() { | ||
| return this.latestDatabaseAccount.isNRegionSynchronousCommitEnabled(); |
Copilot
AI
Jan 20, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The method could throw a NullPointerException if latestDatabaseAccount is null. The method should check for null and return a safe default value (e.g., false or null) before calling isNRegionSynchronousCommitEnabled(). Similar to how getEffectivePreferredRegions() handles the null case with locking, this method should also handle the null case appropriately.
| return this.latestDatabaseAccount.isNRegionSynchronousCommitEnabled(); | |
| this.databaseAccountReadLock.lock(); | |
| try { | |
| if (this.latestDatabaseAccount == null) { | |
| return null; | |
| } | |
| return this.latestDatabaseAccount.isNRegionSynchronousCommitEnabled(); | |
| } finally { | |
| this.databaseAccountReadLock.unlock(); | |
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Both the comment about having to take a read lock as well as null handling are valid - the former even more important
...zure-cosmos/src/main/java/com/azure/cosmos/implementation/DocumentServiceRequestContext.java
Outdated
Show resolved
Hide resolved
| return request.requestContext.getNRegionSynchronousCommitEnabled() | ||
| && !this.useMultipleWriteLocations | ||
| && StringUtils.isNotEmpty(response.getHeaderValue(WFConstants.BackendHeaders.GLOBAL_N_REGION_COMMITTED_GLSN)) | ||
| && Long.parseLong(response.getHeaderValue(WFConstants.BackendHeaders.GLOBAL_N_REGION_COMMITTED_GLSN)) != -1 | ||
| && response.getNumberOfReadRegions() > 0; |
Copilot
AI
Jan 20, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The method isNRegionSynchronousCommitBarrierRequest could throw a NumberFormatException when parsing the GLOBAL_N_REGION_COMMITTED_GLSN header value on line 481. The check on line 480 only verifies that the string is not empty, but doesn't validate that it's a valid parseable long. If the header contains an invalid numeric value, it will cause an unhandled exception. The parsing should be wrapped in a try-catch block, or the validation should be more robust.
| return request.requestContext.getNRegionSynchronousCommitEnabled() | |
| && !this.useMultipleWriteLocations | |
| && StringUtils.isNotEmpty(response.getHeaderValue(WFConstants.BackendHeaders.GLOBAL_N_REGION_COMMITTED_GLSN)) | |
| && Long.parseLong(response.getHeaderValue(WFConstants.BackendHeaders.GLOBAL_N_REGION_COMMITTED_GLSN)) != -1 | |
| && response.getNumberOfReadRegions() > 0; | |
| String globalCommittedGlsnHeader = | |
| response.getHeaderValue(WFConstants.BackendHeaders.GLOBAL_N_REGION_COMMITTED_GLSN); | |
| if (!request.requestContext.getNRegionSynchronousCommitEnabled() | |
| || this.useMultipleWriteLocations | |
| || StringUtils.isEmpty(globalCommittedGlsnHeader) | |
| || response.getNumberOfReadRegions() <= 0) { | |
| return false; | |
| } | |
| try { | |
| long globalCommittedGlsnValue = Long.parseLong(globalCommittedGlsnHeader); | |
| return globalCommittedGlsnValue != -1; | |
| } catch (NumberFormatException e) { | |
| // Malformed header value: treat as no barrier instead of throwing. | |
| return false; | |
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
…ntation/DocumentServiceRequestContext.java Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
| private Mono<Boolean> waitForWriteBarrierAsync( | ||
| private String getErrorMessageForBarrierRequest(RxDocumentServiceRequest request) { | ||
| if (request.requestContext.getBarrierType() == BarrierType.N_REGION_SYNCHRONOUS_COMMIT) { | ||
| return String.format("ConsistencyWriter: Write barrier has not been met for n region synchronous commit request. SelectedGlobalCommittedLsn: %s", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Log value of n - how many regions have to commit is configurable (should be part of the account metadata)?
| HttpConstants.SubStatusCodes.GLOBAL_STRONG_WRITE_BARRIER_NOT_MET); | ||
| } | ||
|
|
||
| boolean isBarrierRequest(RxDocumentServiceRequest request, StoreResponse response) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function has side effects - please reflect that in the name. The isXXX naming pattern is usually only used for functions withut any side-effects.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My preference would probably be to just have a function "calculating" the BarrierType - and then use != NONE as condition for the caller whether isBarrierRequest is set and then modifcy the RequestConetxt there?
| globalCommittedLsn.v = Long.parseLong(headerValue); | ||
| } | ||
| else if ((headerValue = response.getHeaderValue(WFConstants.BackendHeaders.GLOBAL_N_REGION_COMMITTED_GLSN)) != null) { | ||
| globalCommittedLsn.v = Long.parseLong(headerValue); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was wondering if we need a separate property in the diagnostic string for N-Region Committed LSN (to represent the LSN) and also account-level N-Region Commit enablement.
|
|
||
| if (!v) { | ||
| logger.info("ConsistencyWriter: Write barrier has not been met for global strong request. SelectedGlobalCommittedLsn: {}", request.requestContext.globalCommittedSelectedLSN); | ||
| logger.info(this.getErrorMessageForBarrierRequest(request)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if error (if unrecoverable) or warn (if recoverable) is the better choice here (not introduced by this PR but nevertheless).
| request.requestContext.setBarrierType(BarrierType.N_REGION_SYNCHRONOUS_COMMIT); | ||
| return true; | ||
| } | ||
| request.requestContext.setBarrierType(BarrierType.NONE); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the barrier expectation or that of wrapping write operation when is isNRegionSynchronousCommitBarrierRequest evaluates to false and barrier is required due to some weird response from the backend? Are we failing the wrapping write in a certain way?
Description
This PR adds support for nregion synchronous commit for less than strong consistency to the SDK
Major changes
Added isNRegionSynchronousCommitEnabled to DatabaseAccount to get if the feature is enabled on the account. This is then populated on the RequestContext for write requests so that it can be read from ConsistencyWriter.
Deserialize GlobalNRegionCommittedGLSN from backend RNTBD response headers
Prior to this, we used to issue barrier requests only for global strong account/requests. This PR introduce barrier request calls for write request when N-Region Synchronous commit is enabled for the account and N-Region Committed LSN is returned for less than strong consistency to enforce barrier writes
Testing
##Reference
Azure/azure-cosmos-dotnet-v3#5401
All SDK Contribution checklist:
General Guidelines and Best Practices
Testing Guidelines