Skip to content

[FLINK] Support native checkpoint state propagation and fix Gluten task output metrics#12318

Open
zhanglistar wants to merge 5 commits into
apache:mainfrom
zhanglistar:codex/flink-ckpt-through
Open

[FLINK] Support native checkpoint state propagation and fix Gluten task output metrics#12318
zhanglistar wants to merge 5 commits into
apache:mainfrom
zhanglistar:codex/flink-ckpt-through

Conversation

@zhanglistar

Copy link
Copy Markdown
Contributor

What changes are proposed in this pull request?

This PR wires native source checkpoint state through Gluten Flink and fixes task-level output metrics for Gluten streaming operators. Depends on bigo-sg/velox#47 and bigo-sg/velox4j#37.
Add Gluten source checkpoint state persistence through Flink operator ListState.
Pass real Flink checkpoint IDs into native snapshot/complete/abort paths.
Restore native checkpoint records during source initialization.
Add native source metrics fallback for unique TableScan stats when Flink operator ID does not match the Velox plan node ID.
Fix Gluten one-input and two-input operators to update task-level numRecordsOut based on the actual number of emitted rows.
Change VectorOutputBridge.collect() to return emitted record count so task metrics reflect row output accurately.
Why
Kafka source checkpointing needs native progress snapshots to be persisted by Flink and restored on failover. Also, Kafka -> Gluten calc -> blackhole jobs previously showed vertex-level write-records = 0 even when Gluten operator metrics were non-zero, because task-level output counters were not updated by the Gluten output path.

How was this patch tested?

Validation

mvn -pl runtime,ut -am -Dtest=GlutenStreamFilterTest,SourceTaskMetricsTest -DfailIfNoTests=false test
mvn -pl runtime,loader -am -DskipTests -DfailIfNoTests=false package

Ran local Flink 1.19.2 Kafka smoke:Kafka source produced 5 records
Gluten calc emitted 3 records after filter
Flink REST vertex metrics showed write-records = 3
Job remained RUNNING
Checkpoints completed successfully.

Was this patch authored or co-authored using generative AI tooling?

co work with codex.

Copilot AI review requested due to automatic review settings June 18, 2026 03:22
@github-actions github-actions Bot added the FLINK label Jun 18, 2026

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot was unable to review this pull request because the user who requested the review has reached their quota limit.

Copilot AI review requested due to automatic review settings June 18, 2026 09:45

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot was unable to review this pull request because the user who requested the review has reached their quota limit.

@github-actions github-actions Bot added the INFRA label Jun 18, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants