Skip to content

Conversation

@hamersaw
Copy link

@hamersaw hamersaw commented Jan 15, 2026

Previously, "take*" operations did not support _rowid, _rowoffset, _row_created_at_version, and _row_last_updated_at_version. In this PR we add support for all of these columns.

We preserve these system columns through the initial schema projection so that they can be used to populate the correct flags when building the ProjectionPlan and PhysicalProjection structs.

  • _rowid / _rowaddr: persisting these through to ProjectionPlan fields was enough to make them work
  • _rowoffset: required additionally (1) stripping ROW_OFFSET field from ProjectionPlan requested_output_expr and (2) manually injecting column using AddRowOffsetExec (after exposing some methods publicly)
  • _row_created_at_version / _row_last_updated_at_version: required piping through flags to Fragment readers.

Closes #5615.

Signed-off-by: Daniel Rammer <hamersaw@protonmail.com>
Signed-off-by: Daniel Rammer <hamersaw@protonmail.com>
Signed-off-by: Daniel Rammer <hamersaw@protonmail.com>
@github-actions
Copy link
Contributor

ACTION NEEDED
Lance follows the Conventional Commits specification for release automation.

The PR title and description are used as the merge commit message. Please update your PR title and description to match the specification.

For details on the error please inspect the "PR Title Check" action.

@hamersaw hamersaw changed the title bug: support system columns in take_scan fix: support system columns in take_scan Jan 15, 2026
@github-actions github-actions bot added the bug Something isn't working label Jan 15, 2026
Signed-off-by: Daniel Rammer <hamersaw@protonmail.com>
@codecov
Copy link

codecov bot commented Jan 15, 2026

Codecov Report

❌ Patch coverage is 85.10638% with 7 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
rust/lance-core/src/datatypes/schema.rs 84.09% 5 Missing and 2 partials ⚠️

📢 Thoughts on this report? Let us know!

Copy link
Collaborator

@Xuanwo Xuanwo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @hamersaw for working on this! Only have a question.

Signed-off-by: Daniel Rammer <hamersaw@protonmail.com>
@hamersaw hamersaw marked this pull request as draft January 19, 2026 15:34
Signed-off-by: Daniel Rammer <hamersaw@protonmail.com>
Signed-off-by: Daniel Rammer <hamersaw@protonmail.com>
@hamersaw hamersaw changed the title fix: support system columns in take_scan fix: support system columns in dataset::take* operations Jan 20, 2026
Signed-off-by: Daniel Rammer <hamersaw@protonmail.com>
@hamersaw hamersaw changed the title fix: support system columns in dataset::take* operations fix: support system columns in dataset.take* operations Jan 20, 2026
Signed-off-by: Daniel Rammer <hamersaw@protonmail.com>
Signed-off-by: Daniel Rammer <hamersaw@protonmail.com>
@hamersaw hamersaw marked this pull request as ready for review January 20, 2026 13:43
Ok(batch)
let row_addr_col: ArrayRef = Arc::new(UInt64Array::from(row_addrs));

if projection.must_add_row_offset {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there is an ordering problem here, if usr requests for example dataset.take(columns=["a, "_rowoffset", "b"], then it will append the offset column in the end making it dataset.take(columns=["a, "b", "_rowoffset"].

I think instead of stripping it from requested_output_expr, we can keep it, inject the system column into the batch before calling project_batch, so that projection will handle the order of the columns.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! This worked perfectly - just needed to inject ROW_OFFSET into the physical_projection schema the same way this is done in the scanner. Did a ton of tests all permutations and orderings of system / non-system columns and I feel quite confident.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, the project_batch method at the bottom will correct any misordering

Signed-off-by: Daniel Rammer <hamersaw@protonmail.com>
@hamersaw hamersaw requested review from Xuanwo and jackye1995 January 21, 2026 21:30
@jackye1995
Copy link
Contributor

looks like there are quite a few CI failures, could you fix those?

Copy link
Member

@westonpace westonpace left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good but can you add a few tests? Preferably python tests.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working python

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat: take_scan can take "_rowid" as meta column

4 participants