-
Notifications
You must be signed in to change notification settings - Fork 135
fix: Retry creation of multiplexed session #4288
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: Retry creation of multiplexed session #4288
Conversation
Summary of ChangesHello @sakthivelmanii, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request addresses a critical reliability issue in the Spanner client library where transient failures during the initial creation of multiplexed sessions could lead to a complete stall. By introducing a retry mechanism for specific transient error codes (DEADLINE_EXCEEDED, RESOURCE_EXHAUSTED, UNAVAILABLE) during the Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces a retry mechanism for creating multiplexed sessions, which is a great improvement for handling transient errors during client startup. The changes are well-structured, and the new tests cover the retry logic effectively.
I have a few suggestions to further enhance the implementation:
- The retry loop in
maybeWaitForSessionCreationcan be made more efficient by exiting immediately on success or non-retryable errors. - Using an
EnumSetforRETRYABLE_ERROR_CODESwould be more performant. - The new test cases in
MultiplexedSessionDatabaseClientMockServerTesthave some code duplication that could be refactored into a helper method for better maintainability.
Overall, this is a solid contribution that improves the robustness of the client library.
...e-cloud-spanner/src/main/java/com/google/cloud/spanner/MultiplexedSessionDatabaseClient.java
Show resolved
Hide resolved
...e-cloud-spanner/src/main/java/com/google/cloud/spanner/MultiplexedSessionDatabaseClient.java
Outdated
Show resolved
Hide resolved
| Spanner testSpanner = | ||
| SpannerOptions.newBuilder() | ||
| .setProjectId("test-project") | ||
| .setChannelProvider(channelProvider) | ||
| .setCredentials(NoCredentials.getInstance()) | ||
| .setSessionPoolOption( | ||
| SessionPoolOptions.newBuilder() | ||
| .setUseMultiplexedSession(true) | ||
| .setUseMultiplexedSessionForRW(true) | ||
| .setUseMultiplexedSessionPartitionedOps(true) | ||
| .setWaitForMinSessionsDuration(Duration.ofSeconds(1)) | ||
| .setFailOnSessionLeak() | ||
| .build()) | ||
| .build() | ||
| .getService(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's significant code duplication across the new test cases for creating Spanner instances with different SessionPoolOptions. To improve maintainability and reduce boilerplate, consider extracting this logic into a private helper method.
For example:
private Spanner createTestSpanner(SessionPoolOptions sessionPoolOptions) {
return SpannerOptions.newBuilder()
.setProjectId("test-project")
.setChannelProvider(channelProvider)
.setCredentials(NoCredentials.getInstance())
.setSessionPoolOption(sessionPoolOptions)
.build()
.getService();
}This helper could then be called from each test, passing in the specific SessionPoolOptions required for that test case. This would make the tests cleaner and easier to read.
f86aa85 to
9f585a8
Compare
04eab47 to
f9260d1
Compare
|
/gemini review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces a valuable retry mechanism for creating multiplexed sessions, making the Spanner client more resilient to transient failures, particularly during cold starts. The implementation correctly extracts the session creation logic for reuse within a new retry loop in maybeWaitForSessionCreation, and the accompanying tests are comprehensive. I have one suggestion to improve the clarity of an error message in the case of a timeout, but otherwise, this is a solid and well-executed enhancement.
...e-cloud-spanner/src/main/java/com/google/cloud/spanner/MultiplexedSessionDatabaseClient.java
Show resolved
Hide resolved
eaab0fb to
8fea7db
Compare
8fea7db to
26ddd73
Compare
| mockSpanner.setCreateSessionExecutionTime( | ||
| SimulatedExecutionTime.ofMinimumAndRandomTimeAndExceptions( | ||
| 600, | ||
| 0, | ||
| Arrays.asList( | ||
| Status.DEADLINE_EXCEEDED | ||
| .withDescription( | ||
| "CallOptions deadline exceeded after 22.986872393s. " | ||
| + "Name resolution delay 6.911918521 seconds. [closed=[], " | ||
| + "open=[[connecting_and_lb_delay=32445014148ns, was_still_waiting]]]") | ||
| .asRuntimeException(), | ||
| Status.DEADLINE_EXCEEDED | ||
| .withDescription( | ||
| "CallOptions deadline exceeded after 22.986872393s. " | ||
| + "Name resolution delay 6.911918521 seconds. [closed=[], " | ||
| + "open=[[connecting_and_lb_delay=32445014148ns, was_still_waiting]]]") | ||
| .asRuntimeException()))); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I understand this correctly, then it will wait for 600ms and then return a DEADLINE_EXCEEDED error based on the exceptions. This will again trigger a retry, which will time out. Can we:
- Lower the wait time to keep the tests as quick as possible
- Use a different retryable error code for the exceptions to show that the timeout error is really coming from the retry and not the exceptions that are being returned
| public void testRetryWithDelayInExceptionWithInSessionCreationWaitTime() { | ||
| mockSpanner.setCreateSessionExecutionTime( | ||
| SimulatedExecutionTime.ofMinimumAndRandomTimeAndExceptions( | ||
| 200, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we lower this value to keep the tests as fast as possible?
Description:
Currently when multiplexed session fails with any error, we are storing the exception in the session reference and re-throwing that error to all the subsequent requests. This will cause the library to stall since no further requests will be processed successfully. It's a general expectation that all RPC requests are expected due to CPU, Network, GFE and other factors.
Proposed solution:
We will be retrying creation of multiplexed session for the duration of
waitForMinSessions. If a client is having a cold start issues(higher CPU/memory during initial start), they can increase thewaitForMinSessionstime for the client to retry