[Flink] Kvscan flink integration by polyzos · Pull Request #3383 · apache/fluss

polyzos · 2026-05-26T11:57:48Z

#3126 extended the java client to support kvscan for the live rocksdb table.

This PR integrates that functionality inside the flink connector

Introduces KvBatchSplit and KvBatchSplitState as a new split type in the Flink source, enabling the connector to perform bounded full-table scans on primary-key tables via the server-side KV scan API (FIP-17), rather than reading from snapshots.

Copilot

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

Copilot

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

Copilot

Pull request overview

Copilot reviewed 17 out of 17 changed files in this pull request and generated 2 comments.

binary-signal · 2026-05-29T17:10:49Z

+Enable the feature by setting `client.scanner.kv.server-side.enabled = true` on the table or as a SQL hint:
+
+- This is a **bounded** read. The source finishes once all buckets have been drained and does not continue reading the change-log.
+- On task restart, each bucket is rescanned from scratch. Progress within a scan session is not checkpointed, because an expired or invalidated server-side session cannot be resumed from a mid-point.


On task restart, each bucket is scanned again from the beginning. Progress within an active scan session is not checkpointed because expired or invalidated server-side sessions cannot be resumed from an intermediate position.

binary-signal · 2026-05-29T17:11:34Z

+
+- This is a **bounded** read. The source finishes once all buckets have been drained and does not continue reading the change-log.
+- On task restart, each bucket is rescanned from scratch. Progress within a scan session is not checkpointed, because an expired or invalidated server-side session cannot be resumed from a mid-point.
+- The feature is disabled by default (`false`). Without it, unbounded (streaming) reads on primary-key tables work as usual; bounded reads require the data-lake integration to be enabled.


When disabled, unbounded (streaming) reads on primary-key tables continue to work as usual. Bounded reads require data-lake integration unless server-side KV scanning is enabled.

binary-signal · 2026-05-29T17:12:50Z

+SELECT * FROM pk_table;
+```
+
+You can also enable the feature dynamically without storing it in the table metadata:


You can also enable server-side scanning dynamically without storing the option in table metadata:

binary-signal · 2026-05-29T17:14:14Z

+```
+
 ### Limit Read
 The Fluss source supports limiting reads for both primary-key tables and log tables, making it convenient to preview the latest `N` records in a table.


making it easy to preview the latest N records in a table.

binary-signal · 2026-05-29T17:15:30Z

+
+Fluss can perform a bounded full-table scan on a primary-key table directly via the server-side KV scan API.
+
+Enable the feature by setting `client.scanner.kv.server-side.enabled = true` on the table or as a SQL hint:


Enable this feature by setting client.scanner.kv.server-side.enabled = true in the table options or by using a SQL hint.

binary-signal · 2026-05-29T17:16:40Z

+
+Enable the feature by setting `client.scanner.kv.server-side.enabled = true` on the table or as a SQL hint:
+
+- This is a **bounded** read. The source finishes once all buckets have been drained and does not continue reading the change-log.


The source finishes after all buckets have been scanned and does not continue consuming the change log.

binary-signal · 2026-05-29T17:28:11Z

 | scan.partition.discovery.interval             | Duration   | 1min                                            | The time interval for the Fluss source to discover the new partitions for partitioned table while scanning. A non-positive value disables the partition discovery. The default value is 1 minute. Currently, since Fluss Admin#listPartitions(TablePath tablePath) requires a large number of requests to ZooKeeper in server, this option cannot be set too small, as a small value would cause frequent requests and increase server load. In the future, once list partitions is optimized, the default value of this parameter can be reduced. |
 | scan.kv.snapshot.lease.id                     | String     | UUID                                            | The lease ID used to protect acquired KV snapshots from deletion. If specified, the snapshots will be retained until either the consumer finishes processing all of them or the lease duration expires. By default, this value is set to a randomly generated UUID string if not explicitly provided.                                                                                                                                                                                                                                              |
 | scan.kv.snapshot.lease.duration               | Duration   | 1day                                            | The time period how long to wait before expiring the kv snapshot lease to avoid kv snapshot blocking to delete.                                                                                                                                                                                                                                                                                                                                                                                                                                    |
+| client.scanner.kv.server-side.enabled         | Boolean    | false                                           | Master switch for using the server-side KV scan (FIP-17) in bounded reads of primary-key tables when no KV snapshot file is available. When false (default), bounded primary-key reads fall back to the prior behavior (log-only when lake is enabled, or fail when lake is disabled). See [Full Scan of Primary Key Tables](engine-flink/reads.md#full-scan-of-primary-key-tables) for details. |


Enables server-side KV scanning (FIP-17) for bounded reads on primary-key tables when no KV snapshot file is available. When disabled (default), bounded reads fall back to the previous behavior: read from the log when data-lake integration is enabled, or fail when it is disabled.

polyzos added 6 commits May 26, 2026 11:58

[flink] Add KvBatchSplit for server-side KV scan integration

3e4cc49

[flink] Emit KvBatchSplit for bounded PK reads without snapshot

8269b6d

[flink] Wire KvBatchSplit into FlinkSourceSplitReader

d7c4e0a

[flink] Gate bounded KvBatchSplit emission behind master switch

88e7fb2

[flink] redundand code cleanup and test

b48c90a

[flink] add a test with the client

00afd9e

polyzos mentioned this pull request May 26, 2026

[Flink] Extend the Flink connector to support KvScan #3293

Open

2 tasks

polyzos added the component=connector/flink label May 26, 2026

polyzos added this to the v1.0 milestone May 26, 2026

[flink] add documentation

ecdb181

polyzos force-pushed the kvscan-flink-integration branch from ddb782e to ecdb181 Compare May 26, 2026 12:21

polyzos added 3 commits May 26, 2026 15:56

[flink] small improvements and tests

98d8c9f

[flink] add more tests

4f5e693

[flink] fix formatting issue

0de049a

polyzos requested a review from Copilot May 26, 2026 17:27

Copilot started reviewing on behalf of polyzos May 26, 2026 17:27 View session

Copilot AI reviewed May 26, 2026

View reviewed changes

polyzos requested a review from Copilot May 26, 2026 17:58

Copilot started reviewing on behalf of polyzos May 26, 2026 17:59 View session

Copilot AI reviewed May 26, 2026

View reviewed changes

polyzos requested a review from Copilot May 27, 2026 04:40

Copilot started reviewing on behalf of polyzos May 27, 2026 04:40 View session

Copilot AI reviewed May 27, 2026

View reviewed changes

Comment thread website/docs/engine-flink/options.md Outdated

Comment thread ...ink-common/src/main/java/org/apache/fluss/flink/source/enumerator/FlinkSourceEnumerator.java Outdated

[flink] address copilot comments

113bc4e

polyzos requested review from loserwang1024 and wuchong May 27, 2026 06:20

polyzos added 2 commits May 28, 2026 08:15

[flink] change message back to datalake enabled

e3c9db5

[flink] fix broken test

eec665b

binary-signal reviewed May 29, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Flink] Kvscan flink integration#3383

[Flink] Kvscan flink integration#3383
polyzos wants to merge 13 commits into
apache:mainfrom
polyzos:kvscan-flink-integration

polyzos commented May 26, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

binary-signal May 29, 2026

Uh oh!

binary-signal May 29, 2026

Uh oh!

binary-signal May 29, 2026

Uh oh!

binary-signal May 29, 2026

Uh oh!

binary-signal May 29, 2026

Uh oh!

binary-signal May 29, 2026

Uh oh!

binary-signal May 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants


		Fluss can perform a bounded full-table scan on a primary-key table directly via the server-side KV scan API.

		Enable the feature by setting `client.scanner.kv.server-side.enabled = true` on the table or as a SQL hint:


		Enable the feature by setting `client.scanner.kv.server-side.enabled = true` on the table or as a SQL hint:

		- This is a bounded read. The source finishes once all buckets have been drained and does not continue reading the change-log.

Conversation

polyzos commented May 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

binary-signal May 29, 2026

Choose a reason for hiding this comment

Uh oh!

binary-signal May 29, 2026

Choose a reason for hiding this comment

Uh oh!

binary-signal May 29, 2026

Choose a reason for hiding this comment

Uh oh!

binary-signal May 29, 2026

Choose a reason for hiding this comment

Uh oh!

binary-signal May 29, 2026

Choose a reason for hiding this comment

Uh oh!

binary-signal May 29, 2026

Choose a reason for hiding this comment

Uh oh!

binary-signal May 29, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

polyzos commented May 26, 2026 •

edited

Loading