Skip to content

Add V2 batch format with statistics collection#2886

Open
platinumhamburg wants to merge 1 commit intoapache:mainfrom
platinumhamburg:filter-v2-batch-format
Open

Add V2 batch format with statistics collection#2886
platinumhamburg wants to merge 1 commit intoapache:mainfrom
platinumhamburg:filter-v2-batch-format

Conversation

@platinumhamburg
Copy link
Contributor

Introduce V2 batch format that collects min/max statistics for each column to enable efficient filtering.

  • Add LogRecordBatchStatistics and related classes for statistics collection
  • Add StatisticsConfigUtils for parsing table.statistics.columns configuration
  • Extend DefaultLogRecordBatch to support V2 format with statistics
  • Place statistics data between header and records with StatisticsLength field
  • Add comprehensive tests for statistics collection and parsing

Purpose

Linked issue: close #2885

Brief change log

Tests

API and Format

Documentation

@platinumhamburg platinumhamburg force-pushed the filter-v2-batch-format branch 4 times, most recently from dde562c to af26717 Compare March 17, 2026 04:13
Introduce V2 batch format that collects min/max statistics for each column
to enable efficient filtering.

- Add LogRecordBatchStatistics and related classes for statistics collection
- Add StatisticsConfigUtils for parsing table.statistics.columns configuration
- Extend DefaultLogRecordBatch to support V2 format with statistics
- Place statistics data between header and records with StatisticsLength field
- Add comprehensive tests for statistics collection and parsing
@platinumhamburg platinumhamburg force-pushed the filter-v2-batch-format branch from af26717 to 836948c Compare March 17, 2026 04:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[common] Add V2 batch format with statistics collection for filter pushdown

1 participant