Skip to content

refactor: simplify dynamic state for Avro record projection#9419

Open
mzabaluev wants to merge 2 commits intoapache:mainfrom
mzabaluev:optimize-projection
Open

refactor: simplify dynamic state for Avro record projection#9419
mzabaluev wants to merge 2 commits intoapache:mainfrom
mzabaluev:optimize-projection

Conversation

@mzabaluev
Copy link
Contributor

Rationale for this change

The inner loop in Projector::project_record gives the optimizer somewhat complicated dynamic data to branch through.
The sparse arrays in Projector are redundantly coded: None in the index positions of writer_to_reader must match Some in skip_decoders and vice versa.

What changes are included in this PR?

Refactor record projection state with a single array of directive-like enums corresponding to each writer schema field.

Are these changes tested?

Added a benchmark for record projection (the benchmark code is partially shared with #9397).
Somewhat counterintuitively for me, it does not show improvement on a more complex case with a mix of projected fields, but does improve the simpler one-field projection cases.

Passes the existing tests.

@github-actions github-actions bot added arrow Changes to the arrow crate arrow-avro arrow-avro crate labels Feb 16, 2026
Instead of parallel sparse arrays that need to be zipped for
record projection, have a vector of enums that can either give a
writer-to-reader mapping or a skipper.
Copy link
Contributor

@jecsand838 jecsand838 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mzabaluev LGTM! Love the cleanup and these performance improvements.

Image

Thank you so much for getting this up!

@alamb If possible this would be another good one to get in before release! 🤞

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

arrow Changes to the arrow crate arrow-avro arrow-avro crate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants

Comments