Skip to content

Add RecordBatch::ensure_schema for schema evolution#9798

Open
huymq1710 wants to merge 1 commit intoapache:mainfrom
huymq1710:map_record_batch
Open

Add RecordBatch::ensure_schema for schema evolution#9798
huymq1710 wants to merge 1 commit intoapache:mainfrom
huymq1710:map_record_batch

Conversation

@huymq1710
Copy link
Copy Markdown

Which issue does this PR close?

Rationale for this change

When Schema evolution happen, a table's schema may have columns that do not appear in individual RecordBatches (e.g. older Parquet files missing columns added later). Query engines like DataFusion and Iceberg need to expand old batches with null values to align them to a merged schema

What changes are included in this PR?

Add RecordBatch::ensure_schema(target_schema) — a stateless method that adapts a batch to a target schem

I follows the InfluxDB approach:
https://github.com/influxdata/influxdb3_core/blob/0f5ecbd6b17f83f7ad4ba55699fc2cd3e151cf94/arrow_util/src/util.rs#L28-L47

A SchemaMapper for repeated application (pre-computed mapping) will follow in a separate PR

Are these changes tested?

Are there any user-facing changes?

New public method RecordBatch::ensure_schema on arrow-array

@huymq1710 huymq1710 changed the title Add RecordBatch::ensure_schema for schema evolutio Add RecordBatch::ensure_schema for schema evolution Apr 23, 2026
@github-actions github-actions Bot added the arrow Changes to the arrow crate label Apr 23, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

arrow Changes to the arrow crate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add a way to map RecordBatch schema from one to another

1 participant