Skip to content

[Cosmos] Share PartitionKeyRangeCache across CosmosClients targeting the same account#49560

Open
xinlian12 wants to merge 21 commits into
Azure:mainfrom
xinlian12:feature/shared-partition-key-range-cache
Open

[Cosmos] Share PartitionKeyRangeCache across CosmosClients targeting the same account#49560
xinlian12 wants to merge 21 commits into
Azure:mainfrom
xinlian12:feature/shared-partition-key-range-cache

Conversation

@xinlian12

@xinlian12 xinlian12 commented Jun 18, 2026

Copy link
Copy Markdown
Member

Description

Today every CosmosClient / CosmosAsyncClient owns its own RxPartitionKeyRangeCache, even when many clients in the same JVM are configured with the same service endpoint (a common pattern for multi-tenant / multi-credential apps and frameworks that recreate clients). The routing-map data is duplicated N times and /pkranges calls fan out N times for the same containers.

This PR moves the routing-map storage to a process-wide, refcounted registry keyed by the service endpoint URI configured on CosmosClientBuilder. The fetching path (which depends on the per-client network stack, auth, collection cache, diagnostics) stays per-client.

Design

Split RxPartitionKeyRangeCache into two layers:

  1. StorageAsyncCacheNonBlocking<String, CollectionRoutingMap>. Account-level data, naturally shareable. Now obtained from SharedPartitionKeyRangeCacheRegistry (process-wide singleton) keyed by the service endpoint URI.
  2. Fetcher — issues /pkranges, depends on per-client RxDocumentClientImpl, RxCollectionCache, diagnostics. Unchanged.

Scope of sharing

Two clients share the cache only when their service endpoint URIs compare equal via URI.equals (case-insensitive on host per RFC 3986). Clients configured with different endpoint URIs — including the global endpoint vs a regional endpoint of the same logical account — do not share.

The natural-looking alternative of keying by DatabaseAccount.getId() (so global + regional clients of the same account would share) was tried and rejected: the id returned from a regional endpoint is <globalId>-<service-normalised-region>, and recovering the global form requires brittle suffix-stripping against the readable/writable locations list. DatabaseAccount.getResourceId() (the _rid field) is not a documented canonical id at the protocol level. Rather than ship a fragile canonicalisation, the registry honestly keys on the builder-supplied URI.

Concurrency model

All registry state transitions go through ConcurrentHashMap.compute(...), which provides atomic per-key check-and-update.

Lifecycle

  • RxPartitionKeyRangeCache ctor acquires from the registry (bumps refcount).
  • RxPartitionKeyRangeCache implements Closeable; close() releases the refcount and is idempotent (guarded by AtomicBoolean).
  • RxDocumentClientImpl.close() calls LifeCycleUtils.closeQuietly(partitionKeyRangeCache).
  • A leak-safety net registers a one-shot cleanup with com.azure.core.util.ReferenceManager: if a client is GC'd without calling close(), the cleanup decrements the refcount once. A WARN log identifies the leaking endpoint.
  • When the last reference is released, the registry entry is evicted so idle endpoints don't pin memory.

Diagnostics

PARTITION_KEY_RANGE_LOOK_UP metadata diagnostics are unchanged: they continue to be recorded only on the actual /pkranges network fetch path (getRoutingMapForCollectionAsync), never on cache hits. A client that serves a lookup from the shared cache populated by a sibling client records no fetch diagnostic — identical to the pre-existing behaviour when a client's own per-instance cache satisfied a lookup. (An earlier revision moved recording to the outer lookup site so every lookup emitted a record; that changed customer-facing semantics and broke FaultInjectionMetadataRequestRuleTests, so it was reverted.)

Opt-out

System property COSMOS.SHARED_PARTITION_KEY_RANGE_CACHE_ENABLED=false restores per-client private caches.

Files

File Change
caches/SharedPartitionKeyRangeCacheRegistry.java NEW — process-wide registry singleton, keyed by service-endpoint URI
caches/RxPartitionKeyRangeCache.java 3-arg ctor (client, collectionCache, URI); registry-backed storage; idempotent close(); PKR_LOOK_UP diagnostics unchanged (recorded on the /pkranges fetch path only)
Configs.java New system property COSMOS.SHARED_PARTITION_KEY_RANGE_CACHE_ENABLED (default: enabled)
RxDocumentClientImpl.java Pass this.serviceEndpoint to the cache ctor; release the cache in close()
caches/SharedPartitionKeyRangeCacheRegistryTest.java NEW — 13 unit tests inc. 32-thread concurrency stress, GC-driven leak cleanup, URI host case-insensitivity, and a negative-case pin that regional vs global endpoints don't share
caches/RxPartitionKeyRangeCacheTest.java +5 unit tests: cross-client sharing, cross-endpoint isolation, close idempotency, ctor-lifecycle, cross-client force-refresh visibility
SharedPartitionKeyRangeCacheE2ETest.java NEW — e2e test against a real Cosmos endpoint: positive sharing on the same endpoint (cross-endpoint isolation is covered by unit tests, since CosmosClientBuilder normalises the endpoint URI so two distinct connectable endpoints can't be built in a single-endpoint test environment)
CHANGELOG.md Entry under 4.82.0-beta.1

Test plan

  • mvn install (azure-cosmos)
  • mvn checkstyle:check spotbugs:check (azure-cosmos + azure-cosmos-tests)
  • ✅ Unit tests pass: 24 tests, 0 failures (RxPartitionKeyRangeCacheTest + SharedPartitionKeyRangeCacheRegistryTest)
  • ⏳ E2e tests (SharedPartitionKeyRangeCacheE2ETest) registered under the emulator and fast Maven profiles — executed in CI against the configured Cosmos endpoint.

Key behavioural tests (unit)

  • twoCachesForSameEndpointShareRoutingMapStorage — client A populates the routing map, client B serves the same lookup with clientB.readPartitionKeyRanges invoked zero times.
  • cachesForDifferentEndpointsDoNotShareStorage — clients with different endpoint URIs each invoke their own readPartitionKeyRanges exactly once.
  • forceRefreshOnSharedCacheIsVisibleToSiblingClient — client A's force-refresh propagates to client B without B issuing its own fetch.
  • closeIsIdempotent — repeated close() calls do not drive refcount negative.
  • clientWithServiceEndpointAcquiresAndReleasesRegistryRefcount — regression guard for the RxDocumentClientImpl.close()partitionKeyRangeCache.close() wiring.
  • concurrentAcquireAndReleaseProducesConsistentRefcount — 32 threads × 200 ops, refcount ends at 0.
  • referenceManagerReleasesSharedCacheWhenOwnerIsGarbageCollected — leak-safety net: an unclosed client is reclaimed by ReferenceManager once GC'd.
  • acquireTreatsHostCaseInsensitivelyMatchingUriEquals — RFC 3986 host case-insensitivity flows through to the registry key.
  • regionalAndGlobalEndpointsDoNotShareStorage — pins the explicit scope: distinct endpoint URIs use distinct registry entries.
  • disabledFlagReturnsIsolatedCachesAndPreservesRegistryEmpty — opt-out preserves pre-sharing behaviour.

Key behavioural tests (e2e, real Cosmos endpoint)

  • twoClientsOnSameEndpointShareRoutingMapStorage — spins up two real CosmosAsyncClients configured with the same endpoint, performs PK-routed reads on both, and asserts they share the same AsyncCacheNonBlocking instance, the registry refcount accounts for both holders, and closing each client decrements the refcount by exactly one.
  • Cross-endpoint isolation (clients with distinct endpoint URIs use distinct registry entries) is pinned by unit tests — SharedPartitionKeyRangeCacheRegistryTest.acquireReturnsDifferentInstanceForDifferentEndpoints / regionalAndGlobalEndpointsDoNotShareStorage and RxPartitionKeyRangeCacheTest.cachesForDifferentEndpointsDoNotShareStorage — rather than e2e, because CosmosClientBuilder.validateConfig() strips path/query so two distinct connectable endpoint URIs can't be constructed against a single test endpoint.

Breaking changes

None. RxPartitionKeyRangeCache is in the implementation package; its ctor signature and its new Closeable supertype are not part of the public API surface. No customer-visible APIs change.

…ing the same account

Move the partition-key-range routing-map cache from per-CosmosClient to a
process-wide, refcounted registry keyed by service endpoint. Multiple
CosmosClient / CosmosAsyncClient instances in the same JVM targeting the
same Cosmos account now share a single AsyncCacheNonBlocking instance for
collection -> CollectionRoutingMap, eliminating duplicate routing-map
memory and redundant /pkranges fetches.

Design

- New SharedRoutingMapCacheRegistry (process-wide singleton) holds an
  AsyncCacheNonBlocking per endpoint URL plus an AtomicInteger refcount.
  All state transitions go through ConcurrentHashMap.compute, giving
  atomic per-key check-and-update without a global lock.
- RxPartitionKeyRangeCache: new ctor accepts the service endpoint;
  underlying routingMapCache is obtained from the registry. Implements
  Closeable; close() releases this client's reference and is idempotent.
- RxDocumentClientImpl: passes serviceEndpoint to the cache ctor and
  releases the cache reference in its close() path.
- Opt-out: COSMOS.SHARED_PARTITION_KEY_RANGE_CACHE_ENABLED=false restores
  the pre-sharing behaviour (each client owns a private cache).

Why this is safe

- PK-range data is account-level metadata, not credential-bound.
- AsyncCacheNonBlocking already enforces single-flight per key; sharing
  the instance strengthens that to "single in-flight /pkranges per
  (account, container) across all clients".
- The two-arg back-compat ctor resolves the endpoint from the client, so
  existing mocked tests continue to work (mock returns null endpoint ->
  isolated cache, matching today's behaviour).

Tests

- New SharedRoutingMapCacheRegistryTest: acquire/release sharing,
  refcount eviction, idempotent release, null-endpoint isolation,
  opt-out flag, 32-thread concurrent acquire/release stress.
- New RxPartitionKeyRangeCacheTest cases: two caches at same endpoint
  share storage (verified by mock /pkranges call count = 1, not 2),
  caches at different endpoints stay independent, close() is idempotent.
- Existing 7 RxPartitionKeyRangeCacheTest cases unchanged and passing.

Reference

Pattern matches Python (sdk/cosmos/azure-cosmos/azure/cosmos/_routing/
routing_map_provider.py) which uses module-level endpoint-keyed dicts
with refcounted cleanup. Adapted to Java idioms (ConcurrentHashMap.compute
instead of explicit RLock, Closeable instead of __del__).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings June 18, 2026 18:42
@xinlian12 xinlian12 requested review from a team and kirankumarkolli as code owners June 18, 2026 18:42

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR reduces duplicated routing-map cache memory and redundant /pkranges requests by sharing the storage layer of RxPartitionKeyRangeCache across CosmosClient / CosmosAsyncClient instances that target the same Cosmos account (keyed by service endpoint), while keeping the per-client fetch path unchanged. The shared cache is managed by a process-wide, refcounted registry and can be disabled via a new system property for opt-out.

Changes:

  • Introduces SharedRoutingMapCacheRegistry (endpoint-keyed, refcounted) to share AsyncCacheNonBlocking<String, CollectionRoutingMap> across clients.
  • Updates RxPartitionKeyRangeCache to acquire shared storage by endpoint and to implement Closeable for refcount release on client shutdown.
  • Wires RxDocumentClientImpl.close() to release the cache reference, adds config flag plumbing, and adds targeted unit tests + changelog entry.
Show a summary per file
File Description
sdk/cosmos/azure-cosmos/src/main/java/com/azure/cosmos/implementation/RxDocumentClientImpl.java Passes endpoint into the cache ctor and releases the cache reference during client close.
sdk/cosmos/azure-cosmos/src/main/java/com/azure/cosmos/implementation/Configs.java Adds COSMOS.SHARED_PARTITION_KEY_RANGE_CACHE_ENABLED flag (default true).
sdk/cosmos/azure-cosmos/src/main/java/com/azure/cosmos/implementation/caches/SharedRoutingMapCacheRegistry.java New process-wide singleton registry for shared routing-map cache storage with refcounted eviction.
sdk/cosmos/azure-cosmos/src/main/java/com/azure/cosmos/implementation/caches/RxPartitionKeyRangeCache.java Splits “storage” vs “fetcher” by sourcing storage from the shared registry and adding close() ref-release.
sdk/cosmos/azure-cosmos/CHANGELOG.md Documents the new sharing behavior and opt-out property.
sdk/cosmos/azure-cosmos-tests/src/test/java/com/azure/cosmos/implementation/caches/SharedRoutingMapCacheRegistryTest.java New unit tests validating sharing, eviction, disabled behavior, and concurrency refcount correctness.
sdk/cosmos/azure-cosmos-tests/src/test/java/com/azure/cosmos/implementation/caches/RxPartitionKeyRangeCacheTest.java Adds tests validating cross-client sharing, cross-endpoint isolation, and idempotent close behavior.

Copilot's findings

  • Files reviewed: 7/7 changed files
  • Comments generated: 1

@xinlian12 xinlian12 changed the title [Cosmos] Share PartitionKeyRangeCache across CosmosClients targeting the same account [NO REVIEW][Cosmos] Share PartitionKeyRangeCache across CosmosClients targeting the same account Jun 18, 2026
@xinlian12

Copy link
Copy Markdown
Member Author

@sdkReviewAgent

xinlian12 and others added 2 commits June 19, 2026 09:22
…e host matching

Switch SharedRoutingMapCacheRegistry's key type from String to URI so
URI.equals() — which is case-insensitive on the host component per RFC 3986
— is used for sharing identity. Previously, two clients built with
'https://Acct.documents.azure.com/' and 'https://acct.documents.azure.com/'
would fragment into two registry entries even though they target the same
account. With URI as the key the two collapse into a single shared entry.

This matches the spirit of the Rust SDK, which uses Url-based equality on
its AccountReference identity. Python uses raw string comparison; Java's
URI gives us strictly better behaviour for free.

Added a new test (acquireTreatsHostCaseInsensitivelyMatchingUriEquals)
that asserts URI.equals() considers the two casings equal AND that the
registry produces a single shared entry for them. Ran 34 cache unit tests,
0 failures.

No public API change. RxPartitionKeyRangeCache's three-arg ctor still
takes URI; only the internal field type changed (String -> URI).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…cross-SDK consistency

Confirmed via cross-SDK review that both peer Cosmos SDKs key sharing on the
user-supplied account endpoint URL, not on the account _rid:

- Python (sdk/cosmos/azure-cosmos/azure/cosmos/_routing/_routing_map_provider_common.py):
  _resolve_endpoint() returns client.url_connection (the input endpoint string)
  with no normalisation and no _rid lookup.
- Rust (sdk/cosmos/azure_data_cosmos_driver/src/models/account_reference.rs):
  AccountReference identity is endpoint-only via AccountEndpoint(Url) which
  Hash/Eq on the Url; PartialEq deliberately excludes credentials and backup
  endpoints. No _rid involvement.

This SDK should match. The "regional vs global endpoint to the same account"
case stays a known fragmentation case across all three SDKs rather than
something Java solves alone via _rid.

Why _rid keying was rejected after exploration:
1. Diverges from Python and Rust — increases mental-model and maintenance cost
   for cross-SDK contributors.
2. DatabaseAccount.getResourceId() returns the empty string in emulator and
   some service paths where the account JSON has no _rid (Resource.java:130
   delegates to JsonSerializable.getString(R_ID)). Would silently fall back
   and fragment differently than peers.
3. Brittle to init reorders: today GlobalEndpointManager.init() runs before
   cache construction, but any future refactor (lazy account fetch,
   offline-mode init) would silently break sharing. Endpoint URI is
   constructor-immutable; _rid depends on a successful prior network call.

Final shape:
- Registry keyed by URI (case-insensitive host via URI.equals).
- RxPartitionKeyRangeCache 3-arg ctor takes (client, collectionCache,
  serviceEndpoint URI). Two-arg ctor delegates with client.getServiceEndpoint().
- JavaDoc on SharedRoutingMapCacheRegistry now explicitly documents the
  cross-SDK alignment and the regional-endpoint fragmentation tradeoff.

All 34 cache unit tests still pass.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@xinlian12

Copy link
Copy Markdown
Member Author

Review complete (35:07)

Posted 7 inline comment(s).

Steps: ✓ context, correctness, cross-sdk, design, history, past-prs, synthesis, test-coverage

xinlian12 and others added 3 commits June 19, 2026 09:54
…clients

Without this safety net, a customer that forgets to call CosmosClient.close()
would pin the shared partition-key-range cache entry for the lifetime of the
JVM. The owning RxPartitionKeyRangeCache holds a strong reference to the
shared AsyncCacheNonBlocking and the registry's refcount stays > 0 forever.

Peer SDKs handle this:
- Python: __del__ in PartitionKeyRangeCache calls release() as a GC fallback
  (sdk/cosmos/azure-cosmos/azure/cosmos/_routing/routing_map_provider.py L192).
- Rust: no Drop impl needed — the cache lives as a field on the driver and
  Rust ownership guarantees cleanup on driver drop.

Java cannot use java.lang.ref.Cleaner because azure-cosmos targets Java 8
(verified: sdk/parents/azure-client-sdk-parent/pom.xml <source>1.8</source>).
Solution uses the pre-Cleaner pattern: PhantomReference + ReferenceQueue +
daemon reaper thread. All Java 1.2+ APIs.

Design

- SharedRoutingMapCacheRegistry holds:
  * ReferenceQueue<Object> reaperQueue
  * Set<OwnerPhantom> livePhantoms (concurrent) — critical for correctness:
    the JVM only enqueues phantoms that are themselves still strongly
    reachable, so the registry must hold them alive until processed.
  * One daemon thread (cosmos-shared-pkr-cache-reaper) blocking on
    reaperQueue.remove().
- acquire(URI endpoint, Object owner): registers an OwnerPhantom on the
  owner, adds it to livePhantoms, returns AcquireResult { cache, phantom }.
- release(URI, cache, PhantomReference) — new 3-arg overload — clears the
  phantom and removes it from livePhantoms in addition to decrementing the
  refcount. This is the path RxPartitionKeyRangeCache.close() uses.
- When the owner becomes phantom-reachable, the reaper drains the queue,
  logs a WARN ("Leaked (unclosed) RxPartitionKeyRangeCache detected..."),
  calls release(endpoint, cache) to decrement refcount, then removes the
  phantom from livePhantoms.
- close() is still the right primary path; the reaper is a safety net that
  prevents permanent JVM-lifetime cache pinning, not a substitute.

Tests

- reaperReleasesSharedCacheWhenOwnerIsGarbageCollected: acquires in a helper
  method (so the test frame cannot keep owner alive), polls referenceCount
  while forcing System.gc() in a 15s window. Reaper warning is observable
  in test output.
- promptCloseClearsPhantomSoReaperDoesNotDoubleRelease: validates the
  prompt-close path clears the phantom and a subsequent GC produces no
  extra release.

36 cache unit tests pass (was 34, +2 new leak tests).

Key correctness note in code

The first attempt at this had a subtle bug: acquire() returned the phantom
in AcquireResult but the registry didn't hold it. Once the test discarded
the AcquireResult, the phantom became unreachable and the JVM never enqueued
it — the reaper sat idle forever. The livePhantoms set fixes this. The
fields/JavaDoc explicitly document the why.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
… net

Replace the bespoke PhantomReference + ReferenceQueue + daemon-thread reaper
with com.azure.core.util.ReferenceManager.INSTANCE, the SDK-wide singleton
that already encapsulates this pattern. ReferenceManagerImpl:
- On Java 9+ delegates reflectively to java.lang.ref.Cleaner.
- On Java 8 (our baseline) uses an internal PhantomReference + daemon
  thread named "azure-sdk-referencemanager" — exactly the same mechanism
  this PR was reimplementing.

Confirmed in test output: the leak WARN is logged on the
"azure-sdk-referencemanager" thread, proving the azure-core path is wired.

Why this is better:
- Reuses supported, well-tested azure-core machinery instead of rolling
  our own. One thread per JVM regardless of how many SDK components opt
  into the pattern, instead of cosmos adding its own competing thread.
- Java 9+ automatically gets the Cleaner-based implementation (better
  shutdown semantics, less thread-stack overhead).
- Drops ~100 lines of bespoke phantom plumbing from
  SharedRoutingMapCacheRegistry (OwnerPhantom inner class, livePhantoms
  set, reaper loop). Net negative on code we maintain.

Design notes preserved:
- The lambda registered with ReferenceManager.INSTANCE.register MUST NOT
  capture `owner`, otherwise the owner never becomes phantom-reachable.
  We capture only the endpoint URI and the cache reference (both
  independent of the owner) and document this constraint in code.
- ReleaseHandle is a one-shot AtomicBoolean fulfilment flag shared between
  the prompt close() path and the deferred ReferenceManager cleanup, so
  whichever runs first wins via compareAndSet and the refcount is
  decremented exactly once.

36 cache unit tests still pass; the leak test was renamed to
referenceManagerReleasesSharedCacheWhenOwnerIsGarbageCollected to reflect
the new mechanism.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@xinlian12 xinlian12 force-pushed the feature/shared-partition-key-range-cache branch from 1a27dc2 to 9b43616 Compare June 19, 2026 20:39
xinlian12 and others added 2 commits June 19, 2026 14:56
Per PR feedback, comments in the shared-cache implementation were too verbose
and contained cross-SDK comparisons that don't add value to maintainers
reading the Java code. Trimmed everywhere:

- SharedRoutingMapCacheRegistry: removed Python/Rust comparison paragraphs,
  the "Cross-SDK consistency" and "Leaked-client safety net" walls of text,
  and condensed JavaDoc on individual methods. Kept only the critical
  "lambda must not capture owner" comment because it's a correctness
  invariant that's easy to break in a refactor.
- RxPartitionKeyRangeCache: removed the long ownerPhantom-style field
  comments; consolidated the class JavaDoc into two sentences.
- Configs: condensed the system-property comment to two lines.
- RxDocumentClientImpl: shortened the close-path log message.
- CHANGELOG entry: condensed to a single sentence describing the change
  and the opt-out flag.
- Tests: stripped the "First client / Second client" narration, the
  "must hit the shared cache" explanations, and the multi-paragraph
  preambles on the leak tests. Kept enough to explain the GC-related
  test setup since that's not obvious from the code.

Behavior unchanged; 36 cache unit tests still pass.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Renamed SharedRoutingMapCacheRegistry → SharedPartitionKeyRangeCacheRegistry
  for consistency with the class it serves (RxPartitionKeyRangeCache).
- Removed the test-only acquire(URI) overload that bypassed ReferenceManager
  registration; tests now use acquire(URI, owner) so the cleanup-action path
  is exercised end-to-end.
- Added clientWithServiceEndpointAcquiresAndReleasesRegistryRefcount:
  regression test guarding the RxDocumentClientImpl.close() →
  partitionKeyRangeCache.close() → refcount-- wiring. Constructs the cache
  via the 2-arg ctor (matching production) and asserts the refcount delta
  on construct and close.
- Added forceRefreshOnSharedCacheIsVisibleToSiblingClient: cross-client
  invalidation propagation. Client A populates → A force-refreshes after a
  simulated split → B's lookup sees A's refreshed value (same routing-map
  instance) without issuing its own /pkranges call. Asserts object identity
  on the shared CollectionRoutingMap.

38 cache unit tests pass (was 36).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@xinlian12

Copy link
Copy Markdown
Member Author

@sdkReviewAgent

Previous run failed in azure-cosmos-spark_3-3_2-12 with a
scala-maven-plugin classpath flake (xsbt/ZincCompiler$sbtAnalyzer$
ClassNotFoundException) unrelated to this PR's changes (PR touches
azure-cosmos core; Spark connector is unaffected). Empty commit to
re-run the pipeline.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Per PR review feedback, switch the registry key from the service endpoint
URI to the Cosmos database account id returned by
`globalEndpointManager.getLatestDatabaseAccount().getId()`. Different
service endpoint URIs (aliases, regional endpoints) targeting the same
logical account now correctly share the cache. Also applies the
suggested CHANGELOG.md wording.

- SharedPartitionKeyRangeCacheRegistry: key type URI -> String accountId
  (rename acquire/release params and test-only `registeredAccountCount`).
- RxPartitionKeyRangeCache: ctor takes accountId; rename field
  sharedCacheEndpointKey -> sharedCacheAccountId; update Javadoc.
- RxDocumentClientImpl: pass databaseAccountSnapshot.getId() instead of
  this.serviceEndpoint when constructing the cache.
- Tests: update both registry and cache unit tests to use String account
  ids; drop URI-specific test `acquireTreatsHostCaseInsensitively...`
  (account ids are canonical, so the property doesn't apply) and add
  `emptyAccountIdReturnsIsolatedCacheAndDoesNotEnterRegistry` for the
  new isolated-cache fast-path on blank ids.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@xinlian12 xinlian12 changed the title [NO REVIEW][Cosmos] Share PartitionKeyRangeCache across CosmosClients targeting the same account [Cosmos] Share PartitionKeyRangeCache across CosmosClients targeting the same account Jun 22, 2026
@xinlian12

Copy link
Copy Markdown
Member Author

/azp run java - cosmos - tests

@xinlian12

Copy link
Copy Markdown
Member Author

/azp run java - cosmos - spark

@azure-pipelines

Copy link
Copy Markdown
Azure Pipelines successfully started running 1 pipeline(s).

@xinlian12

Copy link
Copy Markdown
Member Author

/azp run java - cosmos - kafka

@azure-pipelines

Copy link
Copy Markdown
Azure Pipelines successfully started running 1 pipeline(s).

1 similar comment
@azure-pipelines

Copy link
Copy Markdown
Azure Pipelines successfully started running 1 pipeline(s).

…ment

Adds `SharedPartitionKeyRangeCacheE2ETest` exercising the sharing
semantics end-to-end against a real Cosmos endpoint (emulator/live):

- `twoClientsOnSameAccountShareRoutingMapStorage`: spins up two
  `CosmosAsyncClient` instances against the same endpoint, performs
  PK-routed reads on both, then via reflection asserts they share the
  same `AsyncCacheNonBlocking<String, CollectionRoutingMap>` instance,
  the registry refcount accounts for both holders, and closing each
  client decrements the refcount by exactly one.

- `clientUsingRegionalEndpointSharesCacheWithClientUsingGlobalEndpoint`:
  validates the motivating case for keying by account id — builds one
  client with the global endpoint and a second with a per-region
  endpoint discovered via `DatabaseAccount.getReadableLocations()`, and
  asserts both clients share the same routing-map storage and registry
  entry. Skipped when the account exposes only a single region or
  aliases the global endpoint to the regional one.

Also refreshes a stale `Configs.java` comment ("same endpoint" ->
"same Cosmos account") to match the registry's new keying scheme.

Verification:
- `mvn install` (azure-cosmos) + `mvn test-compile checkstyle:check` ✅
- 23 existing unit tests still pass ✅
- E2e tests run under the `emulator` and `fast` Maven profiles; they
  require a reachable Cosmos endpoint and were not executed locally
  here (no emulator available in this environment).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…ssors

- Adopt the standard @factory(dataProvider = "simpleGatewayClient")
  + super(clientBuilder) pattern matching other TestSuiteBase derivatives;
  builds clients via getClientBuilder() so test-runner overrides apply
  uniformly. The regional-endpoint test saves/restores the shared
  builder's endpoint in a finally block to avoid poisoning later tests.

- Swap the hand-rolled HashMap document for TestObject.create() (used
  elsewhere in the suite); container path adjusted to /mypk accordingly.

- Replace ReflectionUtils calls with public accessors where available:
    * CosmosBridgeInternal.getAsyncDocumentClient(client) instead of
      ReflectionUtils.getAsyncDocumentClient(...).
    * rxDocumentClient.getPartitionKeyRangeCache() and
      rxDocumentClient.getGlobalEndpointManager() instead of
      reflection.
  Reflection is retained only for two test-only accesses:
    * ReflectionUtils.getRoutingMapAsyncCacheNonBlocking(...) — the
      inner AsyncCacheNonBlocking storage isn't exposed publicly.
    * SharedPartitionKeyRangeCacheRegistry.referenceCount(String) —
      package-private test accessor; widening it for an e2e test
      would pollute the implementation surface.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@xinlian12

Copy link
Copy Markdown
Member Author

/azp run java - cosmos - tests

@xinlian12

Copy link
Copy Markdown
Member Author

/azp run java - cosmos - spark

@azure-pipelines

Copy link
Copy Markdown
Azure Pipelines successfully started running 1 pipeline(s).

@azure-pipelines

Copy link
Copy Markdown
No pipelines are associated with this pull request.

@azure-pipelines

Copy link
Copy Markdown
Azure Pipelines successfully started running 1 pipeline(s).

@xinlian12

Copy link
Copy Markdown
Member Author

/azp run java - cosmos - kafka

@azure-pipelines

Copy link
Copy Markdown
Azure Pipelines successfully started running 1 pipeline(s).

xinlian12 and others added 2 commits June 22, 2026 21:22
1. Canonicalise the Cosmos account id before keying the shared registry.

`DatabaseAccount.getId()` returns "<globalId>-<normalize(region)>" when
fetched from a regional endpoint, defeating cross-endpoint sharing. The
new e2e test `clientUsingRegionalEndpointSharesCacheWithClientUsingGlobalEndpoint`
caught this on a multi-region account (expected "te71ddbba19694a2a" but
saw "te71ddbba19694a2a-westcentralus").

Added `SharedPartitionKeyRangeCacheRegistry.canonicalAccountId(account)`:
walks the account's readable/writable locations, and when `rawId` equals
a location's regional-host prefix and ends with
"-" + `RegionNameNormalizer.normalize(loc.getName())`, strips that
suffix. The two-step match (regional-host equality + normalised-suffix
endsWith) is safe against global accounts whose names legitimately end
in a hyphenated region-shaped tail.

`RxDocumentClientImpl` now passes the canonical id to the cache ctor.
The e2e test's helper also canonicalises so its sharing assertions hold.
Added six unit tests covering null/empty, global passthrough, regional
strip, multi-word region normalisation, hyphen-tail non-match, and the
defining global=regional canonical-equality invariant.

2. Restore `PARTITION_KEY_RANGE_LOOK_UP` metadata diagnostic on cache hits.

The existing `CosmosDiagnosticsTest.directDiagnostics` asserts the
diagnostic appears in every operation's diagnostics. Pre-PR each fresh
client guaranteed a network fetch on its first request; post-PR the
second-and-later client's first request was a cache hit, so no diagnostic
was emitted. This was flagged as APIView observability concern earlier
and is now confirmed as a real regression.

Moved the `MetadataDiagnostics` recording from the inner network-fetch
path (`getRoutingMapForCollectionAsync.doFinally`) to the outer
`tryLookupAsync` / `tryGetRangeByPartitionKeyRangeId` lookups. Now exactly
one PKR_LOOK_UP record is emitted per lookup, regardless of hit/miss.
Duration on cache hits is the lookup latency (typically sub-millisecond);
on misses it includes the network fetch — giving customers a useful
hit/miss signal in addition to preserving the observability contract.

Verification:
- All 29 cache unit tests pass (23 existing + 6 new canonical-id tests).
- `mvn checkstyle:check` ✅
- `mvn spotbugs:check` ✅
- E2e tests will be exercised by the CI live-test pipelines.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…canonicalisation)

The previous attempt at sharing across global+regional endpoints relied on
stripping a trailing "-<RegionNameNormalizer.normalize(region)>" from
`DatabaseAccount.getId()`. As reviewer noted, this string transform is not
reliable as a cross-endpoint canonical identifier:

  * `DatabaseAccount.getResourceId()` (the `_rid` field) is not a documented
    canonical account id at the protocol level — test fixtures show
    "testaccount.documents.azure.com" but the live wire format isn't part
    of the public contract.
  * Suffix-stripping by normalised region names is fragile across future
    service-side region naming changes.

Revert to the simpler, robust design: key the registry by the service
endpoint `URI` configured on `CosmosClientBuilder`. Two clients configured
with the same endpoint share; clients configured with different endpoint
URIs — including the global vs a regional endpoint of the same logical
account — do NOT share. This honestly reflects what we can guarantee
without a documented canonical account identifier.

Changes:
  * `SharedPartitionKeyRangeCacheRegistry`: key type `String` -> `URI`,
    drop `canonicalAccountId` helper, restore `registeredEndpointCount`
    naming. Class javadoc spells out the sharing scope.
  * `RxPartitionKeyRangeCache`: ctor takes `URI serviceEndpoint`; field
    renamed `sharedCacheAccountId` -> `sharedCacheEndpointKey`.
  * `RxDocumentClientImpl`: passes `this.serviceEndpoint` to the cache ctor.
  * `SharedPartitionKeyRangeCacheRegistryTest`: rewrote to use URIs; kept
    13 unit tests including 32-thread concurrency stress, GC-driven leak
    cleanup, case-insensitive URI host matching, and a new negative test
    `regionalAndGlobalEndpointsDoNotShareStorage` that pins the scope.
  * `RxPartitionKeyRangeCacheTest`: updated 11 tests to construct caches
    with URIs.
  * `SharedPartitionKeyRangeCacheE2ETest`: replaced the regional-endpoint
    sharing test (which was the symptom that surfaced the canonicalisation
    bug) with `clientsOnDifferentEndpointsDoNotShareRoutingMapStorage` — a
    negative-case pin that two clients on different endpoint URIs use
    distinct registry entries. The same-endpoint sharing test is unchanged.
  * `Configs.java` and `CHANGELOG.md`: refreshed wording to "same service
    endpoint" instead of "same Cosmos account".

The earlier diagnostic fix is retained (moving `PARTITION_KEY_RANGE_LOOK_UP`
recording to the outer lookup paths so cache hits also emit the diagnostic);
that regression is independent of the keying scheme.

Verification:
  * 24 cache unit tests pass (13 registry + 11 cache).
  * `mvn checkstyle:check` ✅ on azure-cosmos and azure-cosmos-tests.
  * `mvn spotbugs:check` ✅ on azure-cosmos.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@xinlian12

Copy link
Copy Markdown
Member Author

/azp run java - cosmos - tests

@azure-pipelines

Copy link
Copy Markdown
Azure Pipelines successfully started running 1 pipeline(s).

…ostics to fetch-only

CI was red on two real, code-caused failures (the rest were infra flakes:
emulator 408 timeout in CosmosEncryptionItemSerializerTest, and the
random-injection TransientIOErrorsRetryingReadManyByPartitionKeyIteratorSpec).

1. SharedPartitionKeyRangeCacheE2ETest: remove
   clientsOnDifferentEndpointsDoNotShareRoutingMapStorage. CosmosClientBuilder
   .validateConfig() strips path/query, so two distinct connectable endpoint URIs
   cannot be constructed in a single-endpoint (emulator/fast) environment; both
   clients resolve to the same registry key and correctly share, so the assertion
   could never hold. The "different URIs don't share" contract is already covered
   by unit tests (acquireReturnsDifferentInstanceForDifferentEndpoints,
   regionalAndGlobalEndpointsDoNotShareStorage,
   cachesForDifferentEndpointsDoNotShareStorage). Also drop the now-unused
   TestConfigurations import. Fixes emulator + *_Tcp_Fast live suites.

2. RxPartitionKeyRangeCache: revert PARTITION_KEY_RANGE_LOOK_UP recording back to
   the actual /pkranges fetch path (getRoutingMapForCollectionAsync) instead of the
   outer lookup site. Recording on every lookup (including cache hits) broke
   FaultInjectionMetadataRequestRuleTests
   .faultInjectionServerErrorRuleTests_PartitionKeyRanges_DelayError, which asserts
   the first PKR_LOOK_UP entry's duration reflects the injected fetch delay. This
   restores the pre-PR, customer-facing semantics (recorded only on real fetches),
   matching the design described in the review discussion. Fixes the
   *_Tcp_MultiMaster live suites.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@xinlian12

Copy link
Copy Markdown
Member Author

Pushed 9162bb2315d to fix the two real, code-caused CI failures. Root-caused each failing leg of the last run (ci 6468419 / tests 6468421):

1. Emulator + every *_Tcp_Fast live suite — real failure (now fixed).
SharedPartitionKeyRangeCacheE2ETest.clientsOnDifferentEndpointsDoNotShareRoutingMapStorage asserted that an "alternate" endpoint (primary + /alt/ path) lands on a distinct registry entry. But CosmosClientBuilder.validateConfig() strips path/query (rebuilds as scheme://authority/), so both clients resolve to the same service-endpoint URI and correctly share — the assertion can never hold, and there's only one connectable endpoint in emulator/fast environments anyway. Removed the test; the "different URIs don't share" contract is already covered by unit tests: acquireReturnsDifferentInstanceForDifferentEndpoints, regionalAndGlobalEndpointsDoNotShareStorage, cachesForDifferentEndpointsDoNotShareStorage.

2. *_Tcp_MultiMaster live suites — real regression (now fixed).
Moving PARTITION_KEY_RANGE_LOOK_UP recording to the outer lookup site made it fire on cache hits too, so FaultInjectionMetadataRequestRuleTests.faultInjectionServerErrorRuleTests_PartitionKeyRanges_DelayError — which asserts the first PKR_LOOK_UP entry's duration equals the injected /pkranges fetch delay — picked a fast cache-hit entry. Reverted recording to the actual fetch path (getRoutingMapForCollectionAsync); restores the pre-PR semantics (recorded only on real fetches), matching the resolution of the earlier diagnostics review thread. PR description's Diagnostics section updated accordingly.

Infra flakes (not PR-related, no code change):

  • Encryption emulator: CosmosEncryptionItemSerializerTest.batchAndChangeFeedWithObjectNode — 408 RequestTimeout against the emulator (transitTime 7.2s).
  • Spark: TransientIOErrorsRetryingReadManyByPartitionKeyIteratorSpec:75 transientErrorCount > 0 — depends on a random 0.2 error-injection probability; pre-existing flake unrelated to this PR (no Spark connector changes here).

Local validation: azure-cosmos compile + checkstyle:check spotbugs:check green; azure-cosmos-tests checkstyle green; unit suites RxPartitionKeyRangeCacheTest + SharedPartitionKeyRangeCacheRegistryTest = 24 tests, 0 failures. Re-running CI below.

@xinlian12

Copy link
Copy Markdown
Member Author

/azp run java - cosmos - tests

@azure-pipelines

Copy link
Copy Markdown
Azure Pipelines successfully started running 1 pipeline(s).

@xinlian12

Copy link
Copy Markdown
Member Author

/azp run java - cosmos - tests

@azure-pipelines

Copy link
Copy Markdown
Azure Pipelines successfully started running 1 pipeline(s).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants