Skip to content

[Cosmos][WIP]: Normalize region names passed as preferred or exclude regions.#49090

Open
jeet1995 wants to merge 14 commits intoAzure:mainfrom
jeet1995:squad/region-name-mapper
Open

[Cosmos][WIP]: Normalize region names passed as preferred or exclude regions.#49090
jeet1995 wants to merge 14 commits intoAzure:mainfrom
jeet1995:squad/region-name-mapper

Conversation

@jeet1995
Copy link
Copy Markdown
Member

@jeet1995 jeet1995 commented May 7, 2026

Problem

Customers passing region names in non-canonical forms (e.g., west us 3 or westus3 instead of West US 3) hit routing issues. The Java SDK stores region names in different representations (lowercased in maps, original case in lists), and some code paths use case-sensitive String.equals()/List.contains() — causing mismatches between user-provided and server-returned region names.

The .NET SDK solves this with a RegionNameMapper that normalizes all region input to canonical form at client construction. The Java SDK lacked this.

Core approach

Normalization: strip spaces + lowercase → lookup in a static map of known Azure regions → return canonical form.

Input After strip+lower Lookup result
westus3 westus3 West US 3
west us 3 westus3 West US 3
WEST US 3 westus3 West US 3
West US 3 westus3 West US 3 ✓ (no-op)

Escape hatch for unknown regions: If a region is not in the static map (e.g., a brand-new Azure region not yet compiled into the SDK), the input is returned as-is. This works because LocationCache applies toLowerCase() and uses CaseInsensitiveMap for all endpoint lookups — so even unknown regions match correctly as long as the customer's input has the same words as the server response (any casing works, spaces are optional for known regions only).

No deduplication: If a customer passes ["westus3", "West US 3"], both normalize to "West US 3" and the list will contain two identical entries. This is intentional — duplicate preferred regions are an obvious misconfiguration that the customer should fix, not something the SDK should silently mask.

Changes

RegionUtils.java (renamed from RegionNameToRegionIdMap.java) — single source of truth

  • Renamed to RegionUtils to better reflect its dual role: region ID mapping + region name normalization.
  • Region ID map: synced with the authoritative regionToIdMapping from Settings.xml (IDs 1–124). Used only for session token region-level progress tracking (localLsn).
  • Region name normalization: getCosmosDBRegionName(String) — static normalizer. Canonical names derived from the ID map. Unknown regions passed through as-is.
  • normalizeRegionNames(List<String>) and containsRegionIgnoreCase(List<String>, String) — utilities for batch normalization and case-insensitive region membership checks (moved out of LocationCache).
  • Built NORMALIZED_REGION_NAME_TO_REGION_ID_MAPPINGS programmatically from the ID map (was manually duplicated before).

ConnectionPolicy.setPreferredRegions()

  • Normalize each region via RegionUtils.getCosmosDBRegionName() at entry.
  • Unknown regions not in the static map are passed through as-is.

LocationCache.java

  • Constructor: apply RegionUtils.getCosmosDBRegionName() before toLowerCase() (defense-in-depth).
  • Bug fix (line 502): replaced case-sensitive List.contains() with RegionUtils.containsRegionIgnoreCase() in PPCB reevaluate logic — was causing excluded regions to be re-added as retry targets when casing didn't match.
  • Normalize user-configured exclude regions via RegionUtils.normalizeRegionNames() before comparison in getApplicableRegionRoutingContexts().

Tests

Unit tests (run locally, all pass)

  • 44 RegionUtilsNormalizationTest — case variants, space removal, passthrough, null/empty, unknown regions
  • 38 LocationCacheTest — 32 existing + 6 new integration tests with real Azure region names (preferred region and exclude region normalization)
  • 1 RegionUtilsTests — existing ID map consistency check

E2E tests (run in CI)

  • ExcludeRegionTests: 3 new tests (15 cases via DataProvider) — client with space-stripped preferred regions, exclude with space-stripped/uppercase names
  • FaultInjectionWithAvailabilityStrategyTestsBase: 1 new test — client with space-stripped preferred regions + eager availability strategy
  • PerPartitionCircuitBreakerE2ETests: 1 new test — client with space-stripped preferred regions + PPCB routing validation

Customers passing region names in non-canonical forms (e.g., 'west us 3'
instead of 'West US 3') hit routing issues because the Java SDK stores
region names in different forms and some comparisons use case-sensitive
String.equals()/List.contains().

Changes:
- Add RegionNameMapper: strips spaces + case-insensitive lookup against
  90+ known Azure regions to produce canonical names (e.g., 'westus3' or
  'west us 3' -> 'West US 3'). Unknown regions pass through as-is.
- ConnectionPolicy.setPreferredRegions(): normalize + order-preserving
  dedupe at entry point.
- LocationCache constructor: apply RegionNameMapper before toLowerCase
  for defense-in-depth.
- Fix case-sensitive List.contains() bug in reevaluate() (line 502):
  use containsRegionIgnoreCase() instead.
- Normalize user-configured exclude regions at point of use in
  getApplicableRegionRoutingContexts() to prevent mismatches with
  PPCB-derived lowercased region names.
- Add RegionNameMapperTest with 43 unit tests covering case variants,
  space removal, passthrough, null/empty handling.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@github-actions github-actions Bot added the Cosmos label May 7, 2026
jeet1995 and others added 7 commits May 7, 2026 10:12
The static region list in RegionNameMapper goes stale when new Azure
regions are added. Fix: add a ConcurrentHashMap-backed dynamic tier
that learns canonical region names from server responses.

- RegionNameMapper.registerRegionName(): registers canonical names from
  DatabaseAccountLocation (called from LocationCache.addRoutingContexts).
  After the first account read, even new regions like 'West US 4' can
  normalize 'westus4' → 'West US 4'.
- getCosmosDBRegionName(): checks static map first, then dynamic map.
- Add 2 new tests for dynamic registration behavior.
- 45/45 RegionNameMapperTest pass, 32/32 LocationCacheTest pass.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The previous commit had stash conflict markers (<<<<<<< Updated upstream /
>>>>>>> Stashed changes) left in RegionNameMapper.java and
RegionNameMapperTest.java. Rewrote both files clean.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Merge the separate RegionNameMapper into RegionNameToRegionIdMap as the
single source of truth for region names. This eliminates maintaining two
parallel region lists that can drift out of sync.

Changes:
- Delete RegionNameMapper.java — normalization logic moved into
  RegionNameToRegionIdMap.
- RegionNameToRegionIdMap now provides region ID mapping (existing) AND
  region name normalization (new) from one canonical list.
- Sync REGION_NAME_TO_REGION_ID_MAPPINGS with backend RegionToIdMap.cs:
  add Bleu France Central/South (107/108), Delos Cloud Germany
  Central/North (109/110), Singapore Central/North (111/112), fix
  'easteurope' → 'East Europe' (54).
- Build NORMALIZED_REGION_NAME_TO_REGION_ID_MAPPINGS programmatically
  from REGION_NAME_TO_REGION_ID_MAPPINGS instead of manual duplication.
- Normalization static map seeded from ID map keys + additional regions
  without IDs yet (from .NET SDK Regions.cs).
- Rename test: RegionNameMapperTest → RegionNameToRegionIdMapNormalizationTest.
- Update ConnectionPolicy and LocationCache references.
- All 78 tests pass (45 normalization + 32 LocationCache + 1 consistency).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Add 7 tests to LocationCacheTest using real Azure region names to verify
that preferred regions and exclude regions work correctly with
non-canonical input:

- preferredRegions_lowercaseShouldMatchCanonical: 'west us 3' → West US 3
- preferredRegions_noSpacesShouldMatchCanonical: 'westus3' → West US 3
- preferredRegions_uppercaseShouldMatchCanonical: 'WEST US 3' → West US 3
- preferredRegions_duplicateAfterNormalizationShouldDedupe: 'westus3' +
  'West US 3' deduped to single entry
- excludeRegions_lowercaseNoSpacesShouldExclude: 'westus3' excludes
  West US 3
- excludeRegions_mixedCasingShouldExclude: 'EAST us' excludes East US
- excludeRegions_requestLevelNoSpacesShouldExclude: request-level
  'eastus' excludes East US

All 39 LocationCacheTest unit tests pass (32 existing + 7 new).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Create a second CosmosClient with space-stripped preferred regions
(e.g., 'westus3' instead of 'west us 3') and verify that routing and
region exclusion work identically to canonical names.

New tests:
- nonCanonicalPreferredRegions_shouldRouteCorrectly: client with
  space-stripped preferred regions routes to correct first region
  (7 operation types via DataProvider)
- nonCanonicalExcludeRegion_shouldSkipExcludedRegion: excluding with
  space-stripped name (e.g., 'westus3') correctly skips that region
  (7 operation types via DataProvider)
- uppercaseExcludeRegion_shouldSkipExcludedRegion: excluding with
  UPPERCASE name correctly skips that region

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Add tests that create CosmosClients with space-stripped preferred regions
(e.g., 'westus3' instead of 'West US 3') and verify correct routing.

FaultInjectionWithAvailabilityStrategyTestsBase:
- Add nonCanonicalWriteableRegions field (space-stripped from server names)
- readAfterCreation_nonCanonicalPreferredRegions_shouldRouteCorrectly:
  creates client with space-stripped regions, reads with eager availability
  strategy, verifies first contacted region matches expected canonical name

PerPartitionCircuitBreakerE2ETests:
- nonCanonicalPreferredRegions_ppcbShouldStillRouteCorrectly: creates
  client with space-stripped regions, performs create+read, verifies
  diagnostics show routing to correct first preferred region

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Simplify RegionNameToRegionIdMap by removing the ConcurrentHashMap-backed
dynamic registration tier. Unknown regions are returned as-is, which is
sufficient because LocationCache's CaseInsensitiveMap + toLowerCase
handles the matching for any region the server returns.

- Remove DYNAMIC_NORMALIZED_TO_CANONICAL and registerRegionName()
- Remove registerRegionName() call from LocationCache.addRoutingContexts()
- Replace dynamic registration tests with passthrough assertion tests
- 84/84 tests pass (44 normalization + 39 LocationCache + 1 consistency)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@jeet1995 jeet1995 changed the title Port RegionNameMapper from .NET SDK to normalize region names [Cosmos][WIP]: Normalize region names passed as preferred or exclude regions. May 7, 2026
jeet1995 and others added 5 commits May 7, 2026 12:21
…sible

Duplicate preferred regions after normalization (e.g., ['westus3', 'West US 3']
both becoming 'West US 3') are an obvious customer misconfiguration. The SDK
should not silently mask this — let the duplicates pass through so the customer
can see and fix their config.

Also clarify code comments for the escape hatch behavior.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Add 10 regions from the authoritative LocationNames.cs that were missing
from the normalization map: East US SLV, Southeast US, Southwest US,
South Central US 2, Southeast US 3, Southeast US 5, Northeast US 5,
India South Central, Southeast Asia 3, West Central US FRE.

Region ID mappings remain a subset (only regions with assigned IDs from
RegionToIdMap.cs). The normalization map is the superset sourced from
LocationNames.cs.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Replace the previous RegionToIdMap.cs-based ID map with the complete
regionToIdMapping from Settings.xml (IDs 1-124). This is the
authoritative source for region name ↔ ID mappings used for session
token region-level progress tracking.

- Add 44 new region IDs (74-124): Brazil Southeast, West US 3, Qatar
  Central, Italy North, East US 3, Saudi Arabia East, etc.
- Remove separate 'additional canonical names' block — all canonical
  names now derive from the ID map since Settings.xml is the superset.
- Remove 'Greece Central' which was not in any authoritative source.
- Update javadoc and code comments to reference Settings.xml as the
  authoritative source instead of RegionToIdMap.cs.
- 83/83 tests pass.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…ationCache

- Rename class to RegionUtils — better reflects its dual role (ID mapping
  + region name normalization).
- Move normalizeRegionNames() and containsRegionIgnoreCase() from
  LocationCache private helpers into RegionUtils as public static methods.
- Rename all test files to match: RegionUtilsNormalizationTest,
  RegionUtilsTests.
- Update all references across ConnectionPolicy, LocationCache,
  PartitionScopedRegionLevelProgress, RxDocumentClientImpl.
- 83/83 tests pass.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@jeet1995
Copy link
Copy Markdown
Member Author

jeet1995 commented May 7, 2026

/azp run java - cosmos - tests

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

@jeet1995 jeet1995 marked this pull request as ready for review May 7, 2026 17:37
@jeet1995 jeet1995 requested a review from kirankumarkolli as a code owner May 7, 2026 17:37
Copilot AI review requested due to automatic review settings May 7, 2026 17:37
@jeet1995 jeet1995 requested review from a team as code owners May 7, 2026 17:37
@jeet1995
Copy link
Copy Markdown
Member Author

jeet1995 commented May 7, 2026

@sdkReviewAgent

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR addresses Cosmos DB routing mismatches caused by non-canonical Azure region name inputs by introducing centralized region normalization and applying it to preferred/excluded region handling across the Cosmos Java SDK routing stack.

Changes:

  • Added RegionUtils as the single source of truth for region ID mappings and canonical region name normalization, and updated call sites to use it.
  • Normalized preferred/excluded regions in ConnectionPolicy and LocationCache, including a fix for a case-sensitive exclude-region check in PPCB reevaluation logic.
  • Added/updated unit and E2E tests to validate routing behavior with non-canonical region inputs and updated the Cosmos changelog entry.

Reviewed changes

Copilot reviewed 13 out of 13 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
sdk/cosmos/azure-cosmos/src/main/java/com/azure/cosmos/implementation/RxDocumentClientImpl.java Switches region ID lookup to RegionUtils.
sdk/cosmos/azure-cosmos/src/main/java/com/azure/cosmos/implementation/routing/RegionUtils.java Introduces region normalization + region ID mapping utilities.
sdk/cosmos/azure-cosmos/src/main/java/com/azure/cosmos/implementation/routing/RegionNameToRegionIdMap.java Removes the old region mapping class in favor of RegionUtils.
sdk/cosmos/azure-cosmos/src/main/java/com/azure/cosmos/implementation/routing/LocationCache.java Normalizes excluded regions and fixes PPCB exclude-region comparison behavior.
sdk/cosmos/azure-cosmos/src/main/java/com/azure/cosmos/implementation/PartitionScopedRegionLevelProgress.java Updates region ID/name lookups to RegionUtils.
sdk/cosmos/azure-cosmos/src/main/java/com/azure/cosmos/implementation/ConnectionPolicy.java Normalizes preferred regions at configuration time.
sdk/cosmos/azure-cosmos/CHANGELOG.md Documents the normalization + PPCB exclude-region fix.
sdk/cosmos/azure-cosmos-tests/src/test/java/com/azure/cosmos/PerPartitionCircuitBreakerE2ETests.java Adds E2E coverage for PPCB routing with non-canonical preferred regions.
sdk/cosmos/azure-cosmos-tests/src/test/java/com/azure/cosmos/implementation/routing/RegionUtilsNormalizationTest.java Adds unit coverage for normalization behavior.
sdk/cosmos/azure-cosmos-tests/src/test/java/com/azure/cosmos/implementation/routing/LocationCacheTest.java Adds integration-style unit tests for preferred/exclude region normalization with real region names.
sdk/cosmos/azure-cosmos-tests/src/test/java/com/azure/cosmos/implementation/RegionUtilsTests.java Updates the existing mapping-consistency test to the new RegionUtils.
sdk/cosmos/azure-cosmos-tests/src/test/java/com/azure/cosmos/FaultInjectionWithAvailabilityStrategyTestsBase.java Adds E2E validation for availability strategy routing with non-canonical preferred regions.
sdk/cosmos/azure-cosmos-tests/src/test/java/com/azure/cosmos/ExcludeRegionTests.java Adds E2E coverage for non-canonical preferred/exclude region inputs.

Comment on lines +38 to +44
public static final Map<String, Integer> REGION_NAME_TO_REGION_ID_MAPPINGS = new HashMap<String, Integer>() {
{
put("East US", 1);
put("East US 2", 2);
put("Central US", 3);
put("North Central US", 4);
put("South Central US", 5);
Comment on lines +5331 to +5332
assertThat(diagnosticsContext.getContactedRegionNames().iterator().next())
.isEqualTo(expectedFirstRegion);
@xinlian12
Copy link
Copy Markdown
Member

@sdkReviewAgent

return canonical;
}

return regionName;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am wondering whether we should fallback to normalized version when not found from the map. Even in globalEndpointManager, we just use the normalized version for findings the regional endpoint

* returned as-is.</li>
* </ol>
*/
public class RegionUtils {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we also change to use RegionUtils for GlobalEndpointManager as well


String normalized = regionName.toLowerCase(Locale.ROOT).replace(" ", "");

String canonical = NORMALIZED_TO_CANONICAL.get(normalized);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟢 Suggestion — Forward Compatibility: Unknown regions with space variants won't match

For unknown regions (not in the static map), getCosmosDBRegionName returns the input as-is. This means two variant spellings of the same unknown region won't match if they differ by spaces:

  • Customer passes preferredRegions = ["futureregion"]
  • Server returns "Future Region" → stored as "future region" in addRoutingContexts
  • Preferred location "futureregion""future region"region not matched

The PR description acknowledges this: "spaces are optional for known regions only." The defense-in-depth via CaseInsensitiveMap + toLowerCase() handles case differences for unknown regions, but not space differences.

If forward compatibility for space-stripped unknown regions is desired, the fallback could return the normalized form instead of the original:

// Instead of: return regionName;
return normalized; // lowercase + no-spaces, matches server after toLowerCase

This would make "futureregion" and "Future Region" both collapse to "futureregion", matching in the lowercased endpoint map. The tradeoff: diagnostic logs would show the normalized form instead of the user's original input.

⚠️ AI-generated review — may be incorrect. Agree? → resolve the conversation. Disagree? → reply with your reasoning.

}
};

public static final Map<Integer, String> REGION_ID_TO_NORMALIZED_REGION_NAME_MAPPINGS = new HashMap<Integer, String>() {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Recommendation — Maintenance: Derive REGION_ID_TO_NORMALIZED_REGION_NAME_MAPPINGS programmatically

This map is still 124 hand-maintained entries, while NORMALIZED_REGION_NAME_TO_REGION_ID_MAPPINGS (same data, inverse direction) was correctly refactored to be derived programmatically from REGION_NAME_TO_REGION_ID_MAPPINGS. The inconsistency creates a maintenance trap: when a new region is added to REGION_NAME_TO_REGION_ID_MAPPINGS, a contributor must also remember to update this manual map. Two of four maps auto-derive; this one doesn't.

The existing test regionIdToRegionNameConsistency catches inconsistencies, but a pit-of-success design would eliminate the risk entirely:

static {
    Map<Integer, String> idToNormalized = new HashMap<>();
    Map<String, Integer> normalizedToId = new HashMap<>();
    for (Map.Entry<String, Integer> entry : REGION_NAME_TO_REGION_ID_MAPPINGS.entrySet()) {
        String normalized = entry.getKey().toLowerCase(Locale.ROOT).replace(" ", "");
        normalizedToId.put(normalized, entry.getValue());
        idToNormalized.putIfAbsent(entry.getValue(), normalized);
    }
    NORMALIZED_REGION_NAME_TO_REGION_ID_MAPPINGS = Collections.unmodifiableMap(normalizedToId);
    REGION_ID_TO_NORMALIZED_REGION_NAME_MAPPINGS = Collections.unmodifiableMap(idToNormalized);
}

This eliminates ~130 lines of duplicated data and makes REGION_NAME_TO_REGION_ID_MAPPINGS the true single source of truth.

⚠️ AI-generated review — may be incorrect. Agree? → resolve the conversation. Disagree? → reply with your reasoning.

/**
* Tests for {@link RegionUtils}
*/
public class RegionUtilsNormalizationTest {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Recommendation — Testing: No unit tests for containsRegionIgnoreCase and normalizeRegionNames

These two new public utility methods have zero direct unit tests. containsRegionIgnoreCase is the direct replacement for the buggy List.contains() — it deserves focused tests documenting its contract. normalizeRegionNames silently drops null elements from the input (reducing list size), which is a behavioral contract that should be explicitly tested.

Suggested additions to RegionUtilsNormalizationTest:

// containsRegionIgnoreCase
assertThat(RegionUtils.containsRegionIgnoreCase(
    Arrays.asList("westus3"), "West US 3")).isTrue();
assertThat(RegionUtils.containsRegionIgnoreCase(
    Arrays.asList("West US 3"), "WEST US 3")).isTrue();
assertThat(RegionUtils.containsRegionIgnoreCase(
    null, "anything")).isFalse();
assertThat(RegionUtils.containsRegionIgnoreCase(
    Arrays.asList("East US"), "unknownRegion")).isFalse();

// normalizeRegionNames
assertThat(RegionUtils.normalizeRegionNames(
    Arrays.asList("westus3", "east us")))
    .containsExactly("West US 3", "East US");
assertThat(RegionUtils.normalizeRegionNames(null)).isEmpty();
assertThat(RegionUtils.normalizeRegionNames(
    Arrays.asList("East US", null, "westus3")))
    .containsExactly("East US", "West US 3"); // null dropped

⚠️ AI-generated review — may be incorrect. Agree? → resolve the conversation. Disagree? → reply with your reasoning.

}
String normalizedTarget = getCosmosDBRegionName(target);
for (String region : regions) {
if (getCosmosDBRegionName(region).equalsIgnoreCase(normalizedTarget)) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Recommendation — Correctness: containsRegionIgnoreCase NPE if list contains null elements

If region is null, getCosmosDBRegionName(null) returns null (via the StringUtils.isEmpty guard), then null.equalsIgnoreCase(normalizedTarget) throws NullPointerException.

Current callers are safe — the sole production caller at LocationCache:506 passes a list already processed by normalizeRegionNames(), which filters nulls. But this is a package-visible utility method whose contract doesn't document the null-element restriction. A future caller could hit this.

for (String region : regions) {
    if (region != null && getCosmosDBRegionName(region).equalsIgnoreCase(normalizedTarget)) {
        return true;
    }
}

⚠️ AI-generated review — may be incorrect. Agree? → resolve the conversation. Disagree? → reply with your reasoning.


if (Utils.tryGetValue(regionalRoutingContextsByRegionName, internalExcludeRegion, regionalRoutingContextValueHolder)) {
if (!regionalRoutingContextValueHolder.v.equals(firstApplicableRegionalRoutingContext) && !userConfiguredExcludeRegions.contains(internalExcludeRegion)) {
if (!regionalRoutingContextValueHolder.v.equals(firstApplicableRegionalRoutingContext) && !RegionUtils.containsRegionIgnoreCase(userConfiguredExcludeRegions, internalExcludeRegion)) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Recommendation — Testing: No regression test for the PPCB List.contains() bug fix

This line is the actual bug fix described in the PR — replacing case-sensitive List.contains(internalExcludeRegion) with RegionUtils.containsRegionIgnoreCase(userConfiguredExcludeRegions, internalExcludeRegion). It prevents excluded regions from being silently re-added as retry targets when casing didn't match.

However, no test exercises this specific code path. The reevaluate() method is only entered when applicableRegionalRoutingContexts.size() < 2 AND preferredRoutingContexts.size() > 1 AND PPCB internal exclude regions are present. None of the 6 new LocationCacheTest normalization tests pass internalExcludeRegions. The E2E PPCB test (nonCanonicalPreferredRegions_ppcbShouldStillRouteCorrectly) does a basic create+read without fault injection, so it never triggers the circuit breaker or enters reevaluate.

A targeted regression test would: (1) set up user-configured exclude regions in non-canonical form (e.g., "westus3"), (2) provide internalExcludeRegions in lowercased canonical form (e.g., "west us 3"), (3) trigger the reevaluate path (ensure only 1 applicable endpoint remains), (4) assert the internally excluded region is NOT re-added when it matches the user's exclude list after normalization.

Without this, a future refactor could reintroduce the case-sensitive contains() without breaking any test.

⚠️ AI-generated review — may be incorrect. Agree? → resolve the conversation. Disagree? → reply with your reasoning.

@xinlian12
Copy link
Copy Markdown
Member

Review complete (58:11)

Posted 5 inline comment(s).

Steps: ✓ context, correctness, cross-sdk, design, history, past-prs, synthesis, test-coverage

- Fix comment indentation in LocationCache (line 349)
- Make REGION_NAME_TO_REGION_ID_MAPPINGS unmodifiable to prevent
  accidental mutation after initialization
- Derive REGION_ID_TO_NORMALIZED_REGION_NAME_MAPPINGS programmatically
  from the forward map — eliminates ~130 lines of manual duplication
- Return normalized form (lowercase, no spaces) for unknown regions
  instead of as-is — ensures space-stripped unknown regions match after
  LocationCache toLowerCase() (e.g., 'futureregion' matches 'future region')
- Add null guard in containsRegionIgnoreCase to prevent NPE on null
  list elements
- Fix PPCB test to use contains() instead of iterator().next() to avoid
  flaky assertion on Set iteration order
- Add unit tests for normalizeRegionNames() and containsRegionIgnoreCase()
  (9 new tests covering normalization, null/empty, null elements, matches)
- Update unknown-region tests to expect normalized form
- 90/90 tests pass (51 normalization + 38 LocationCache + 1 consistency)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@jeet1995
Copy link
Copy Markdown
Member Author

jeet1995 commented May 7, 2026

/azp run java - cosmos - tests

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

Copy link
Copy Markdown
Member

@kushagraThapar kushagraThapar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tier 2 review — region normalization (PR #49090)

9 findings (4 major, 3 minor, 1 nit, 1 design-alignment) anchored against HEAD 71d9f8e. Several build on existing reviewer threads — explicit framing is included inline. Verified-negative findings (rigor receipts) and cross-SDK follow-ups are tracked in the router's internal record.

Findings index

ID Severity Anchor Concern
F1 Major (correctness) RegionUtils.java:240 Routing regression for unknown Azure regions whose canonical name contains spaces — cross-layer fix needed (also LocationCache.java:1073)
F2 Major (api) RegionUtils.java:188 getRegionId(String) does not normalize input — silent footgun for new callers
F3 Major (testing) FaultInjectionWithAvailabilityStrategyTestsBase.java:404 TreeSet.iterator().next() order-flaky — same anti-pattern you fixed in PPCB in this PR's HEAD commit
F4 Major (agentic) summary body below CosmosClientBuilder.preferredRegions(...) javadoc does not disclose silent input rewriting (file untouched but behavior-changed)
F5 Minor (docs) CHANGELOG.md:10 PR description / CHANGELOG and code disagree on unknown-region behavior
F6 Minor (testing) LocationCacheTest.java:960 No integration test exercises end-to-end routing for an UNKNOWN region
F7 Minor (code quality) RegionUtils.java:173 putIfAbsent on derived ID map silently masks future duplicate-ID errors
F8 Nit (api) summary body below Diagnostics output now reflects normalized form (RxDocumentClientImpl.java:702 is outside any modified hunk so no inline anchor)
F-DA1 Design alignment ConnectionPolicy.java:503 getPreferredRegions() getter now returns canonicalized strings — observable round-trip behavior change

Of the 9: 5 NET-NEW (F2, F4, F5, F8, F-DA1) and 4 ADDITIVE to existing reviewer threads with explicit framing (F1 builds on @xinlian12 @235; F3 builds on @copilot-pull-request-reviewer's PPCB comment that you've already addressed there; F6 distinct from @xinlian12's @506 PPCB-test ask; F7 builds on @xinlian12's @167 derivation refactor).


F4 · Major — Agentic / API surface · CosmosClientBuilder.preferredRegions(...) javadoc does not disclose silent input rewriting

Posted in summary because the affected file (CosmosClientBuilder.java) is untouched by this PR — no diff line to anchor against. But its observable behavior changes as a direct consequence of ConnectionPolicy.setPreferredRegions (see F-DA1 inline at ConnectionPolicy.java:503).

File: sdk/cosmos/azure-cosmos/src/main/java/com/azure/cosmos/CosmosClientBuilder.java:810-826 (untouched by this PR but newly behavior-changed by it)

This is the public, agentic-consumer-facing entry point for the new normalization behavior. Current javadoc says only "Sets the preferred regions … For example, 'East US' as the preferred region."

Effective behavior after this PR (verified at ConnectionPolicy.setPreferredRegions:489-504):

  • Recognized region inputs are canonicalized: "westus3", "WEST US 3", "west us 3""West US 3"
  • Unrecognized regions are normalized to lowercase no-spaces: "My Custom Region""mycustomregion"
  • Round-tripping the value through internal config returns a different string than the caller supplied

Why this matters (agentic dimension): Per the Java SDK pack agentic_consumer_surfaces, javadoc on CosmosClientBuilder is the primary contract LLM-generated client config relies on. Silently rewriting input violates the principle that public observable behavior must be self-documenting. LLMs / IaC code that snapshot-test preferredRegions(["westus3"]) and assert it back via getConnectionPolicy().getPreferredRegions() will see surprising round-trip diffs with no documentation grounding the behavior.

Verification: grep -B0 -A20 'public CosmosClientBuilder preferredRegions' CosmosClientBuilder.java returns the full javadoc block — zero occurrences of "normalize", "canonical", "case", or "case-insensitive". The behavior is documented only in the internal ConnectionPolicy implementation comment.

Suggested fix: Add a <p> paragraph to the CosmosClientBuilder.preferredRegions(...) javadoc:

"Region names are normalized to the canonical CosmosDB form: case and spacing variants of recognized regions ('westus3', 'WEST US 3') are mapped to the canonical name ('West US 3'). Unrecognized regions are normalized to lowercase no-spaces form for forward compatibility. Values returned by downstream getters may differ in casing/spacing from the values supplied here."

The same disclosure should be added to excludedRegions(...) since the symmetric normalization applies there too.


F8 · Nit — API · Diagnostics output now reflects normalized form rather than the customer's literal input

Posted in summary because the referenced line (RxDocumentClientImpl.java:702) is outside any modified hunk in this PR's diff — no inline anchor available. The behavior is a direct consequence of F-DA1 (inline at ConnectionPolicy.java:503).

connectionPolicy.getPreferredRegions() now returns the normalized list. Two customer-visible consequences:

Customer input Pre-PR diagnostics Post-PR diagnostics
["westus3"] ["westus3"] ["West US 3"] (improvement — canonical)
["My Future Region"] ["My Future Region"] ["myfutureregion"] (surprise — lowercased + space-stripped)

Why this matters: Customers who scan diagnostics output by exact-string match (regex over log files, dashboard alerts grepping for region names, etc.) will see their match patterns silently break for unrecognized regions.

Suggested fix: Add a one-line note to the CHANGELOG bullet (currently at CHANGELOG.md:10) explicitly calling out that:

  • diagnostics for recognized regions now display the canonical form
  • diagnostics for unrecognized regions display the lowercase-no-spaces form

Customers running region-name string matchers on diagnostics output should review their patterns.


⚠️ AI-generated review — may be incorrect. Posted by router pr-review-49090. Agree → resolve the conversation. Disagree → reply with your reasoning.

return canonical;
}

return normalized;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Major — Correctness + design-alignment · Routing regression for unknown Azure regions whose canonical name contains spaces (cross-layer)

Building on @xinlian12's comment at RegionUtils.java:235 — the fix has to span two layers, and reframes the issue as a current-customer regression rather than forward-compat.

Files affected (cross-layer):

  • sdk/cosmos/azure-cosmos/src/main/java/com/azure/cosmos/implementation/routing/RegionUtils.java:240 (this anchor)
  • sdk/cosmos/azure-cosmos/src/main/java/com/azure/cosmos/implementation/routing/LocationCache.java:1073 (same change set)

Design-doc citation: cosmosdb-design-docs/09-global-distribution.md#key-classesLocationCache is documented as the source-of-truth for "Caches read/write endpoints per region". The PR establishes a one-sided normalization contract (user input is normalized; server-returned region names are not), which works only because the service consistently returns canonical names.

Concrete failure scenario (verified): Suppose Mexico Central ships as a new Azure region but isn't yet in the SDK's static map.

Path Behavior
Old (works) preferredRegions=["Mexico Central"] stored as-is → ctor toLowerCase()"mexico central" → matches server CaseInsensitiveMap key "mexico central"
New (silently broken) setPreferredRegions(["Mexico Central"])RegionUtils.getCosmosDBRegionName not in map → returns "mexicocentral" (lowercased + space-stripped, this line) → ctor lowercases again → still "mexicocentral" → server key is "mexico central" (only lowercased; spaces preserved) → MISS → silent fallback to default global endpoint

Evidence: Apache Commons CaseInsensitiveMap.convertKey() only lowercases keys; it does NOT strip spaces. So a server-returned region with a space stays keyed with a space. This is the asymmetry that breaks the lookup.

Why this differs from xinlian12's @235 thread: xinlian12 raised the user-side half framed as a 🟢 forward-compat concern. This finding (a) reframes it as a regression with concrete evidence, AND (b) shows the fix has to span TWO layers — even if RegionUtils:240 is fixed to actually pass-through-as-is, the second normalization at LocationCache:1073 would re-strip the spaces. Both layers must be fixed in lockstep.

Suggested fix (both layers must land together):

  1. RegionUtils.getCosmosDBRegionName (this line):

    return canonical != null ? canonical : regionName;  // pass-through ORIGINAL, not the space-stripped form

    This also matches what the PR description claims the unknown-region branch already does.

  2. LocationCache.DatabaseAccountLocationsInfo ctor (line 1073): once Blob storage hangs for files > about 3500 kb #1 is fixed, the existing getCosmosDBRegionName(loc).toLowerCase() call will produce the correct lowercase-with-spaces form for unknown regions, matching the server key. Both sides of the routing comparison then go through the same canonicalizer.

A regression test using a region NOT in the static map would catch this immediately (see F6).


⚠️ AI-generated review — may be incorrect. Posted by router pr-review-49090. Agree → resolve the conversation. Disagree → reply with your reasoning.

return REGION_ID_TO_NORMALIZED_REGION_NAME_MAPPINGS.getOrDefault(regionId, StringUtils.EMPTY);
}

public static int getRegionId(String regionName) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Major — API · RegionUtils.getRegionId(String) does not normalize input — silent footgun for new callers

The new RegionUtils Javadoc positions itself as the single source of truth for Azure region name mappings. getCosmosDBRegionName() accepts any casing/spacing variant. But getRegionId() (this method) requires the lookup key to already be lowercase-no-spaces — so getRegionId("West US 3") returns -1, while getRegionId("westus3") returns 77.

Verification — all current callers happen to pre-normalize:

$ grep -rn "RegionUtils.getRegionId" sdk/cosmos/azure-cosmos/src/main/java
RxDocumentClientImpl.java:838:  if (RegionUtils.getRegionId(normalizedReadableRegion) == -1) { ...
PartitionScopedRegionLevelProgress.java:129:  int regionId = RegionUtils.getRegionId(normalizedRegionRoutedTo);
PartitionScopedRegionLevelProgress.java:404:  int regionId = RegionUtils.getRegionId(lesserPreferredRegionPkProbablyRequestedFrom);

So no current bug. But after this PR connectionPolicy.getPreferredRegions().get(0) returns canonical form like "West US 3" — the next maintainer who wires that into getRegionId silently gets -1 and the wrong code path (e.g., RxDocumentClientImpl.java:836-840 would treat the region as "unknown" and suppress PPCB progress tracking).

Suggested fix — make the contract symmetric with getCosmosDBRegionName:

public static int getRegionId(String regionName) {
    if (StringUtils.isEmpty(regionName)) return -1;
    String normalized = regionName.toLowerCase(Locale.ROOT).replace(" ", "");
    return NORMALIZED_REGION_NAME_TO_REGION_ID_MAPPINGS.getOrDefault(normalized, -1);
}

⚠️ AI-generated review — may be incorrect. Posted by router pr-review-49090. Agree → resolve the conversation. Disagree → reply with your reasoning.

assertThat(diagnosticsContext.getContactedRegionNames().size()).isGreaterThan(0);
// The first contacted region should match the first writeable region (lowercased canonical)
assertThat(diagnosticsContext.getContactedRegionNames().iterator().next())
.isEqualTo(FIRST_REGION_NAME);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Major — Testing · TreeSet.iterator().next() makes new FI test order-flaky

@copilot-pull-request-reviewer flagged this anti-pattern on PerPartitionCircuitBreakerE2ETests.java:5332 — you fixed it there in 71d9f8e. The same anti-pattern remains in FaultInjectionWithAvailabilityStrategyTestsBase.java:403-404.

This assertion uses getContactedRegionNames().iterator().next() to assert the first preferred region was contacted. But CosmosDiagnosticsContext.getContactedRegionNames() returns a TreeSet<String> (verified at CosmosDiagnosticsContext.java:428-429). iterator().next() therefore returns the alphabetically smallest region, not the first-contacted.

The test enables availabilityStrategy(eagerThresholdAvailabilityStrategy) (line ~360 of this method) — so as soon as a 2nd contacted region's name sorts before the first preferred region's, the assertion will false-fail.

The kicker: the most recent commit 71d9f8e ("Address PR review comments") explicitly fixed this same anti-pattern in PerPartitionCircuitBreakerE2ETests.java (in response to copilot-bot's comment), but did not back-port the fix to this file (added one commit earlier in a5897e9d257).

Verification:

$ grep -n "iterator().next()" FaultInjectionWithAvailabilityStrategyTestsBase.java
403:    .iterator().next()).isEqualTo(FIRST_REGION_NAME);   # only remaining occurrence
$ grep -n "FIRST_REGION_NAME" FaultInjectionWithAvailabilityStrategyTestsBase.java
2991, 3032: existing call sites already use .contains(FIRST_REGION_NAME)

Suggested fix — same pattern you used in PPCB:

assertThat(diagnosticsContext.getContactedRegionNames())
    .as("contacted regions should include the first preferred region (lowercased)")
    .contains(FIRST_REGION_NAME);

⚠️ AI-generated review — may be incorrect. Posted by router pr-review-49090. Agree → resolve the conversation. Disagree → reply with your reasoning.

#### Breaking Changes

#### Bugs Fixed
* Fixed region name normalization: preferred regions and excluded regions passed in non-canonical forms (e.g., `"westus3"`, `"west us 3"`, `"WEST US 3"` instead of `"West US 3"`) are now normalized to the canonical CosmosDB format. Also fixed a case-sensitive `List.contains()` bug in the per-partition circuit breaker reevaluate logic that could cause excluded regions to be re-added as retry targets. - See [PR 49090](https://github.com/Azure/azure-sdk-for-java/pull/49090)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor — Documentation · PR description and code disagree on unknown-region behavior

The PR description states twice "If a region is not in the static map … the input is returned as-is." — and this CHANGELOG bullet implicitly relies on that same framing. But the code at RegionUtils.java:240 returns normalized (the lowercase-no-spaces form), NOT the input as-is. The class Javadoc on lines 220-227 of RegionUtils.java is consistent with the code — so it's the description and CHANGELOG framing that mislead.

This isn't just a documentation nit: @xinlian12 read the description and assumed the unknown-region routing failure described in F1 was impossible. Misleading framing here cost a reviewer cycle.

Suggested fix — pick one:

  • (a) Change the code to actually pass-through-as-is (which is also the fix for F1), and the description / CHANGELOG remain accurate, or
  • (b) Update the PR description and tighten this CHANGELOG bullet to clarify that unknown regions are normalized to lowercase-no-spaces (not pass-through), so future readers and reviewers don't make the same assumption.

⚠️ AI-generated review — may be incorrect. Posted by router pr-review-49090. Agree → resolve the conversation. Disagree → reply with your reasoning.

private static final URI EastUSEndpoint = createUrl("https://eastus.documents.azure.com");
private static final URI NorthEuropeEndpoint = createUrl("https://northeurope.documents.azure.com");

private static DatabaseAccount createDatabaseAccountWithRealRegions() {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor — Testing · No integration test exercises end-to-end routing for an UNKNOWN region

Distinct from @xinlian12's @LocationCache.java:506 PPCB-reevaluate regression-test ask — LocationCacheTest also doesn't exercise the unknown-region routing path.

All six new LocationCacheTest integration tests use real Azure regions present in the static map (West US 3, East US, North Europe — see createDatabaseAccountWithRealRegions on this line). The unknown-region path through RegionUtils.getCosmosDBRegionNameLocationCache.DatabaseAccountLocationsInfo ctor → CaseInsensitiveMap lookup is exercised only at the unit-test level (RegionUtilsNormalizationTest.unknownRegionVariantsShouldCollapse), which asserts string transformation but not endpoint resolution.

Verification:

$ grep -rn "createCacheWith\|createDatabaseAccountWith" LocationCacheTest.java
# all six callers use createDatabaseAccountWithRealRegions (this line)

Why this matters: A single test using a fake region (e.g. "Pluto Central") and asserting getReadEndpoints() returns the corresponding endpoint would catch the F1 regression immediately — and lock in the symmetry contract going forward.

Suggested addition: A small createDatabaseAccountWithUnknownRegions helper + one test that builds the cache with ["Pluto Central"] (or any name guaranteed to NOT be in REGION_NAME_TO_REGION_ID_MAPPINGS), populates server-side readableLocations with the same string, and asserts the read endpoint resolves correctly through the cache.


⚠️ AI-generated review — may be incorrect. Posted by router pr-review-49090. Agree → resolve the conversation. Disagree → reply with your reasoning.


static {
// Derive both maps programmatically from REGION_NAME_TO_REGION_ID_MAPPINGS
Map<Integer, String> idToNormalized = new HashMap<>();
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor — Code quality · putIfAbsent on derived ID map silently masks future duplicate-ID errors

Building on @xinlian12's @167 derivation refactor (now in 71d9f8e): the asymmetric put / putIfAbsent in the new block introduces a new defensive-coding gap.

This static block deriving REGION_ID_TO_NORMALIZED_REGION_NAME_MAPPINGS uses putIfAbsent on idToNormalized while the parallel write to NORMALIZED_REGION_NAME_TO_REGION_ID_MAPPINGS uses put on normalizedToId. This asymmetry is correct only because the hand-maintained source map currently has unique IDs.

A future copy-paste error duplicating an ID will be:

  • silently absorbed in one direction (putIfAbsent keeps the first entry, ignores the duplicate)
  • silently overwritten in the other (put keeps the last entry)

The two maps then go out of sync — and getRegionName(id) and getRegionId(name) no longer round-trip. This will only surface as a routing bug far from the source map edit.

Why this is distinct from xinlian12's @167 ask: that thread asked whether to derive the auxiliary maps programmatically (now resolved). This ask is about how to defend the derivation invariant going forward.

Suggested fix — pick one:

(a) Fail-fast on duplicates at class-load time:

for (Map.Entry<String, Integer> entry : REGION_NAME_TO_REGION_ID_MAPPINGS.entrySet()) {
    String normalized = entry.getKey().toLowerCase(Locale.ROOT).replace(" ", "");
    if (idToNormalized.put(entry.getValue(), normalized) != null) {
        throw new IllegalStateException("Duplicate region ID " + entry.getValue() + " in REGION_NAME_TO_REGION_ID_MAPPINGS");
    }
    if (normalizedToId.put(normalized, entry.getValue()) != null) {
        throw new IllegalStateException("Duplicate normalized region name " + normalized);
    }
}

(b) Add a unit test asserting REGION_NAME_TO_REGION_ID_MAPPINGS.values().size() == new HashSet<>(REGION_NAME_TO_REGION_ID_MAPPINGS.values()).size() — cheaper but requires the test stay alive.


⚠️ AI-generated review — may be incorrect. Posted by router pr-review-49090. Agree → resolve the conversation. Disagree → reply with your reasoning.

normalized.add(RegionUtils.getCosmosDBRegionName(region));
}
}
this.preferredRegions = normalized;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Design alignment · getPreferredRegions() getter now returns canonicalized strings, not customer input

Design-doc citation: cosmosdb-design-docs/09-global-distribution.md#key-classes

This setPreferredRegions(...) implementation now mutates the customer-supplied list — the value returned by getPreferredRegions() will differ from what the caller passed in. Recognized inputs ("westus3", "WEST US 3") become canonical ("West US 3"); unrecognized inputs become lowercase-no-spaces ("My Region""myregion").

This is observable behavior change on a public-getter round-trip:

  • Customers who snapshot-test setPreferredRegions(input) then read back via getPreferredRegions() will see a diff.
  • IaC / agentic-generated client code that compares the round-trip value will see surprising drift.
  • Diagnostics output that references connectionPolicy.getPreferredRegions() (e.g., RxDocumentClientImpl.java:702) now displays the normalized form rather than the literal customer input — see F8.

The change is intentional per the PR description, but the design contract should be documented explicitly. Two angles:

  1. Javadoc on the public surface (see F4) — the corresponding CosmosClientBuilder.preferredRegions(...) method needs a <p> paragraph disclosing this transformation, since that's the agentic-consumer-facing entry point.

  2. Design doc (cosmosdb-design-docs/09-global-distribution.md#key-classes) — a "Region Naming Convention" subsection establishing: SDKs MUST normalize both user-supplied and server-returned names through the same canonicalizer; for unknown regions, document the fallback form. This avoids future SDK refactors silently breaking the normalization-symmetry contract that F1 highlights.

No code change required on this line — this is a documentation-tracking item that pairs with F4 (javadoc) and F1 (symmetry of the contract).


⚠️ AI-generated review — may be incorrect. Posted by router pr-review-49090. Agree → resolve the conversation. Disagree → reply with your reasoning.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants