-
Notifications
You must be signed in to change notification settings - Fork 530
fix: support system columns in dataset.take* operations #5722
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
fix: support system columns in dataset.take* operations #5722
Conversation
Signed-off-by: Daniel Rammer <hamersaw@protonmail.com>
Signed-off-by: Daniel Rammer <hamersaw@protonmail.com>
|
ACTION NEEDED The PR title and description are used as the merge commit message. Please update your PR title and description to match the specification. For details on the error please inspect the "PR Title Check" action. |
Signed-off-by: Daniel Rammer <hamersaw@protonmail.com>
Codecov Report❌ Patch coverage is
📢 Thoughts on this report? Let us know! |
Xuanwo
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @hamersaw for working on this! Only have a question.
Signed-off-by: Daniel Rammer <hamersaw@protonmail.com>
Signed-off-by: Daniel Rammer <hamersaw@protonmail.com>
Signed-off-by: Daniel Rammer <hamersaw@protonmail.com>
Signed-off-by: Daniel Rammer <hamersaw@protonmail.com>
Signed-off-by: Daniel Rammer <hamersaw@protonmail.com>
Signed-off-by: Daniel Rammer <hamersaw@protonmail.com>
| Ok(batch) | ||
| let row_addr_col: ArrayRef = Arc::new(UInt64Array::from(row_addrs)); | ||
|
|
||
| if projection.must_add_row_offset { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there is an ordering problem here, if usr requests for example dataset.take(columns=["a, "_rowoffset", "b"], then it will append the offset column in the end making it dataset.take(columns=["a, "b", "_rowoffset"].
I think instead of stripping it from requested_output_expr, we can keep it, inject the system column into the batch before calling project_batch, so that projection will handle the order of the columns.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! This worked perfectly - just needed to inject ROW_OFFSET into the physical_projection schema the same way this is done in the scanner. Did a ton of tests all permutations and orderings of system / non-system columns and I feel quite confident.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, the project_batch method at the bottom will correct any misordering
Signed-off-by: Daniel Rammer <hamersaw@protonmail.com>
|
looks like there are quite a few CI failures, could you fix those? |
westonpace
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good but can you add a few tests? Preferably python tests.
Previously, "take*" operations did not support
_rowid,_rowoffset,_row_created_at_version, and_row_last_updated_at_version. In this PR we add support for all of these columns.We preserve these system columns through the initial schema projection so that they can be used to populate the correct flags when building the
ProjectionPlanandPhysicalProjectionstructs._rowid/_rowaddr: persisting these through toProjectionPlanfields was enough to make them work_rowoffset: required additionally (1) strippingROW_OFFSETfield fromProjectionPlanrequested_output_exprand (2) manually injecting column usingAddRowOffsetExec(after exposing some methods publicly)_row_created_at_version/_row_last_updated_at_version: required piping through flags toFragmentreaders.Closes #5615.