Skip to content

Conversation

@fresh-borzoni
Copy link
Contributor

@fresh-borzoni fresh-borzoni commented Jan 24, 2026

close #179

Purpose

Add ScanBatch struct - batch-level equivalent of ScanRecord that wraps Arrow RecordBatch with bucket and offset metadata.

  • Add ScanBatch struct to record/mod.rs with bucket, batch, and base_offset fields
  • Add fetch_batches_with_offsets() to CompletedFetch trait to return batches with their offsets
  • Update poll_batches() chain to return Vec instead of Vec

This enables downstream consumers to track per-bucket progress and determine stopping conditions when using the batch API.

API and Format

No breaking changes. poll_batches() return type changes from Vec to Vec (internal API).

@fresh-borzoni fresh-borzoni changed the title [TASK-179] Add more infor to RecordBatches [TASK-179] Add more info for RecordBatches Jan 24, 2026
@fresh-borzoni fresh-borzoni force-pushed the enhance-poll-batches-with-more-info branch from 4f3dc10 to 2e3b018 Compare January 24, 2026 23:19
@fresh-borzoni
Copy link
Contributor Author

@luoyuxia PTAL 🙏

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds batch-level metadata support to the log scanning API by introducing a ScanBatch struct that wraps Arrow RecordBatch with bucket and offset information. This enables downstream consumers (like the DuckDB plugin mentioned in issue #179) to track per-bucket progress and determine stopping conditions when using the batch API.

Changes:

  • Added ScanBatch struct to wrap RecordBatch with bucket and base offset metadata
  • Updated RecordBatchLogScanner::poll() to return Vec<ScanBatch> instead of Vec<RecordBatch>
  • Modified CompletedFetch::fetch_batches() to return offset information alongside batches

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.

File Description
crates/fluss/src/record/mod.rs Introduces ScanBatch struct with bucket, batch, and base_offset fields, plus helper methods like last_offset() and comprehensive unit tests
crates/fluss/src/client/table/scanner.rs Updates poll_batches() and related methods to return Vec<ScanBatch>, constructing batches with bucket metadata in fetch_batches_from_fetch()
crates/fluss/src/client/table/log_fetch_buffer.rs Modifies CompletedFetch::fetch_batches() trait method and next_fetched_batch() to return tuples of (RecordBatch, base_offset) with correct offset tracking for sliced batches

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Contributor

@luoyuxia luoyuxia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fresh-borzoni Thanks for the pr. LGTM! But ci fails

@fresh-borzoni fresh-borzoni force-pushed the enhance-poll-batches-with-more-info branch from bf1155b to 7951287 Compare January 25, 2026 06:31
@fresh-borzoni
Copy link
Contributor Author

@luoyuxia Thanks for the review. Fixed CI/CD

@luoyuxia luoyuxia merged commit 2b5fc64 into apache:main Jan 25, 2026
13 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Allow to return more info in poll batches method

2 participants