[client] Fix stale metadata on readOnlyGateway by adding RetryableGatewayClientProxy by loserwang1024 · Pull Request #3390 · apache/fluss

loserwang1024 · 2026-05-27T12:55:27Z

Purpose

Linked issue: close #3389

Brief change log

Tests

API and Format

Documentation

…ewayClientProxy

loserwang1024 · 2026-05-27T12:56:27Z

@swuferhong @wuchong @fresh-borzoni , CC

fresh-borzoni

@loserwang1024 Thank you for the very important PR, left some comments, PTAL

fresh-borzoni · 2026-05-29T01:18:33Z

    private final AdminReadOnlyGateway readOnlyGateway;
    private final MetadataUpdater metadataUpdater;

+    private static final int READ_ONLY_GATEWAY_MAX_RETRIES = 3;


With maxRetries=3, bootstrap reinit needs 4 refreshes. You only get 3 per request.
Shall we loop inside updateMetadata until either success or null-triggered bootstrap?

fresh-borzoni · 2026-05-29T01:23:53Z

+                            cause);
+                    // Run metadata refresh and retry on a separate thread to avoid
+                    // blocking Netty IO threads that may complete the failed future.
+                    CompletableFuture.runAsync(


do we want some backoff?
I mean 3 retries fire in milliseconds, seems wasteful on slow DNS or restarting pods.

fresh-borzoni · 2026-05-29T01:25:15Z

+ *   <li>Metadata refresh is triggered, which marks the failed server as unavailable
+ *   <li>After N failed refreshes, all servers are marked unavailable, triggering re-initialization
+ *       from bootstrap servers
+ *   <li>The next retry succeeds with the refreshed server addresses


ditto: only true when maxRetries > cluster_size

fresh-borzoni · 2026-05-29T01:26:25Z

                GatewayClientProxy.createGatewayProxy(
                        metadataUpdater::getCoordinatorServer, client, AdminGateway.class);
-        this.readOnlyGateway =
+        AdminGateway rawReadOnlyGateway =


Shall we add TODO for writes, since they are still broken?

fresh-borzoni · 2026-05-29T01:27:35Z

    private final AdminReadOnlyGateway readOnlyGateway;
    private final MetadataUpdater metadataUpdater;

+    private static final int READ_ONLY_GATEWAY_MAX_RETRIES = 3;


Shall we make it a ConfigOption to make more operator-friendly?

fresh-borzoni · 2026-05-29T01:29:10Z

-        this.readOnlyGateway =
+        AdminGateway rawReadOnlyGateway =
                GatewayClientProxy.createGatewayProxy(
                        metadataUpdater::getRandomTabletServer, client, AdminGateway.class);


AdminReadOnlyGateway.class?

fresh-borzoni · 2026-05-29T01:59:55Z

+                            cause);
+                    // Run metadata refresh and retry on a separate thread to avoid
+                    // blocking Netty IO threads that may complete the failed future.
+                    CompletableFuture.runAsync(


runAsync without an executor uses ForkJoinPool.commonPool(), should we use a dedicated executor instead?

fresh-borzoni · 2026-05-29T02:05:03Z

+                            cause);
+                    // Run metadata refresh and retry on a separate thread to avoid
+                    // blocking Netty IO threads that may complete the failed future.
+                    CompletableFuture.runAsync(


Every retry here fires its own updateMetadata call, and that method's synchronized(this) block is the same paths use to refresh leader info.
Example: during a rolling upgrade, N concurrent failing admin calls × 3 retries all queue up behind one lock, and the data plane's refreshes wait in the same line.

Could we share one in-flight refresh across concurrent retriers?

[client] Fix stale metadata on readOnlyGateway by adding RetryableGat…

662772d

…ewayClientProxy

fresh-borzoni reviewed May 29, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[client] Fix stale metadata on readOnlyGateway by adding RetryableGatewayClientProxy#3390

[client] Fix stale metadata on readOnlyGateway by adding RetryableGatewayClientProxy#3390
loserwang1024 wants to merge 1 commit into
apache:mainfrom
loserwang1024:retry-with-retry

loserwang1024 commented May 27, 2026

Uh oh!

loserwang1024 commented May 27, 2026

Uh oh!

fresh-borzoni left a comment

Uh oh!

fresh-borzoni May 29, 2026

Uh oh!

fresh-borzoni May 29, 2026

Uh oh!

fresh-borzoni May 29, 2026

Uh oh!

fresh-borzoni May 29, 2026

Uh oh!

fresh-borzoni May 29, 2026

Uh oh!

fresh-borzoni May 29, 2026

Uh oh!

fresh-borzoni May 29, 2026

Uh oh!

fresh-borzoni May 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

loserwang1024 commented May 27, 2026

Purpose

Brief change log

Tests

API and Format

Documentation

Uh oh!

loserwang1024 commented May 27, 2026

Uh oh!

fresh-borzoni left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants