Skip to content

Conversation

@sakthivelmanii
Copy link
Collaborator

Description:

Currently when multiplexed session fails with any error, we are storing the exception in the session reference and re-throwing that error to all the subsequent requests. This will cause the library to stall since no further requests will be processed successfully. It's a general expectation that all RPC requests are expected due to CPU, Network, GFE and other factors.

Proposed solution:
We will be retrying creation of multiplexed session for the duration of waitForMinSessions. If a client is having a cold start issues(higher CPU/memory during initial start), they can increase the waitForMinSessions time for the client to retry

@sakthivelmanii sakthivelmanii requested review from a team as code owners January 6, 2026 08:11
@product-auto-label product-auto-label bot added size: l Pull request size is large. api: spanner Issues related to the googleapis/java-spanner API. labels Jan 6, 2026
@sakthivelmanii sakthivelmanii changed the title fix(spanner): Retry creation of multiplexed session fix: Retry creation of multiplexed session Jan 6, 2026
@gemini-code-assist
Copy link

Summary of Changes

Hello @sakthivelmanii, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request addresses a critical reliability issue in the Spanner client library where transient failures during the initial creation of multiplexed sessions could lead to a complete stall. By introducing a retry mechanism for specific transient error codes (DEADLINE_EXCEEDED, RESOURCE_EXHAUSTED, UNAVAILABLE) during the waitForMinSessions period, the client library becomes more resilient to temporary network or resource issues. This change significantly improves the robustness of the Spanner client, especially during application startup or under fluctuating load conditions, ensuring that operations can proceed after transient hiccups.

Highlights

  • Multiplexed Session Creation Retry Logic: Implemented a retry mechanism for the creation of multiplexed sessions in the Spanner client library. This prevents the library from stalling indefinitely if initial session creation fails due to transient errors.
  • Configurable Retry Duration: The retry duration for multiplexed session creation is now configurable via the waitForMinSessions option in SessionPoolOptions, allowing users to adjust for cold start issues or other transient network conditions.
  • Specific Retryable Error Codes: Retries are specifically triggered for DEADLINE_EXCEEDED, RESOURCE_EXHAUSTED, and UNAVAILABLE error codes, ensuring that only transient, retryable failures lead to retries, while permanent errors are propagated immediately.
  • Enhanced Test Coverage: Added comprehensive unit tests to validate the new retry logic under various scenarios, including successful retries, non-retryable exceptions, and different delay configurations.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a retry mechanism for creating multiplexed sessions, which is a great improvement for handling transient errors during client startup. The changes are well-structured, and the new tests cover the retry logic effectively.

I have a few suggestions to further enhance the implementation:

  • The retry loop in maybeWaitForSessionCreation can be made more efficient by exiting immediately on success or non-retryable errors.
  • Using an EnumSet for RETRYABLE_ERROR_CODES would be more performant.
  • The new test cases in MultiplexedSessionDatabaseClientMockServerTest have some code duplication that could be refactored into a helper method for better maintainability.

Overall, this is a solid contribution that improves the robustness of the client library.

Comment on lines +231 to +245
Spanner testSpanner =
SpannerOptions.newBuilder()
.setProjectId("test-project")
.setChannelProvider(channelProvider)
.setCredentials(NoCredentials.getInstance())
.setSessionPoolOption(
SessionPoolOptions.newBuilder()
.setUseMultiplexedSession(true)
.setUseMultiplexedSessionForRW(true)
.setUseMultiplexedSessionPartitionedOps(true)
.setWaitForMinSessionsDuration(Duration.ofSeconds(1))
.setFailOnSessionLeak()
.build())
.build()
.getService();

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

There's significant code duplication across the new test cases for creating Spanner instances with different SessionPoolOptions. To improve maintainability and reduce boilerplate, consider extracting this logic into a private helper method.

For example:

private Spanner createTestSpanner(SessionPoolOptions sessionPoolOptions) {
    return SpannerOptions.newBuilder()
        .setProjectId("test-project")
        .setChannelProvider(channelProvider)
        .setCredentials(NoCredentials.getInstance())
        .setSessionPoolOption(sessionPoolOptions)
        .build()
        .getService();
}

This helper could then be called from each test, passing in the specific SessionPoolOptions required for that test case. This would make the tests cleaner and easier to read.

@product-auto-label product-auto-label bot added size: m Pull request size is medium. and removed size: l Pull request size is large. labels Jan 6, 2026
@sakthivelmanii sakthivelmanii force-pushed the retry_multiplexed_session_till_session_wait_time branch from f86aa85 to 9f585a8 Compare January 6, 2026 08:58
@product-auto-label product-auto-label bot added size: l Pull request size is large. and removed size: m Pull request size is medium. labels Jan 6, 2026
@sakthivelmanii sakthivelmanii force-pushed the retry_multiplexed_session_till_session_wait_time branch 4 times, most recently from 04eab47 to f9260d1 Compare January 6, 2026 11:37
@sakthivelmanii
Copy link
Collaborator Author

/gemini review

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a valuable retry mechanism for creating multiplexed sessions, making the Spanner client more resilient to transient failures, particularly during cold starts. The implementation correctly extracts the session creation logic for reuse within a new retry loop in maybeWaitForSessionCreation, and the accompanying tests are comprehensive. I have one suggestion to improve the clarity of an error message in the case of a timeout, but otherwise, this is a solid and well-executed enhancement.

@sakthivelmanii sakthivelmanii force-pushed the retry_multiplexed_session_till_session_wait_time branch 6 times, most recently from eaab0fb to 8fea7db Compare January 6, 2026 17:15
@sakthivelmanii sakthivelmanii force-pushed the retry_multiplexed_session_till_session_wait_time branch from 8fea7db to 26ddd73 Compare January 6, 2026 17:30
Comment on lines 362 to 378
mockSpanner.setCreateSessionExecutionTime(
SimulatedExecutionTime.ofMinimumAndRandomTimeAndExceptions(
600,
0,
Arrays.asList(
Status.DEADLINE_EXCEEDED
.withDescription(
"CallOptions deadline exceeded after 22.986872393s. "
+ "Name resolution delay 6.911918521 seconds. [closed=[], "
+ "open=[[connecting_and_lb_delay=32445014148ns, was_still_waiting]]]")
.asRuntimeException(),
Status.DEADLINE_EXCEEDED
.withDescription(
"CallOptions deadline exceeded after 22.986872393s. "
+ "Name resolution delay 6.911918521 seconds. [closed=[], "
+ "open=[[connecting_and_lb_delay=32445014148ns, was_still_waiting]]]")
.asRuntimeException())));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understand this correctly, then it will wait for 600ms and then return a DEADLINE_EXCEEDED error based on the exceptions. This will again trigger a retry, which will time out. Can we:

  1. Lower the wait time to keep the tests as quick as possible
  2. Use a different retryable error code for the exceptions to show that the timeout error is really coming from the retry and not the exceptions that are being returned

public void testRetryWithDelayInExceptionWithInSessionCreationWaitTime() {
mockSpanner.setCreateSessionExecutionTime(
SimulatedExecutionTime.ofMinimumAndRandomTimeAndExceptions(
200,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we lower this value to keep the tests as fast as possible?

@sakthivelmanii sakthivelmanii merged commit 735e29e into main Jan 7, 2026
60 of 62 checks passed
@sakthivelmanii sakthivelmanii deleted the retry_multiplexed_session_till_session_wait_time branch January 7, 2026 15:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

api: spanner Issues related to the googleapis/java-spanner API. size: l Pull request size is large.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants