Skip to content

MINOR: Preserve empty list offsets during split transfers#31

Draft
telemenar wants to merge 7 commits into
dremio:mainfrom
telemenar:codex/split-transfer-empty-list-offset
Draft

MINOR: Preserve empty list offsets during split transfers#31
telemenar wants to merge 7 commits into
dremio:mainfrom
telemenar:codex/split-transfer-empty-list-offset

Conversation

@telemenar

Copy link
Copy Markdown

What changed

This is stacked on top of #30 and extends the same empty-offset-buffer handling into zero-length splitAndTransfer() paths for ListVector and LargeListVector.

For zero-length splits, the target vector now materializes the required empty offset entry and recursively invokes the child transfer pair with a zero-length range so nested list vectors get the same invariant treatment.

Why

splitAndTransfer() should always return a valid allocated vector. A vector with no entries still needs, by spec, a value of 0 in its offsetBuffer; a list vector with a zero-capacity offsetBuffer is not valid.

PR 30 is still useful as serialization/export hardening, but this moves the repair closer to where the invalid empty-vector state can be introduced.

One grey area remains: getFieldBuffers() triggering allocation for an otherwise unallocated vector is useful as a last-line guard, but it is not the cleanest owner of the allocation invariant.

Testing

mvn -pl vector -Dmaven.gitcommitid.skip=true -Dsurefire.failIfNoSpecifiedTests=false -Dtest=TestListVector,TestLargeListVector test
mvn -pl vector spotless:apply

Targeted tests passed under both Netty and Unsafe allocator executions.

prashanthbdremio and others added 7 commits June 22, 2026 18:07
splitAndTransfer should always return a valid allocated vector. A vector with no entries still needs, by spec, a value of 0 in its offsetBuffer. A list vector with a zero-capacity offsetBuffer is therefore not valid.

This moves the empty-offset repair closer to where the invalid state is introduced by ensuring zero-length ListVector and LargeListVector split transfers materialize the required offset entry. Nested zero-length list transfers get the same treatment through the child transfer pair.

One grey area remains: getFieldBuffers() triggering allocation for an otherwise unallocated vector is useful as a last-line guard for serialization/export, but it is not the cleanest owner of the allocation invariant.
@telemenar telemenar changed the title Preserve empty list offsets during split transfers MINOR: Preserve empty list offsets during split transfers Jun 24, 2026
@telemenar telemenar force-pushed the codex/split-transfer-empty-list-offset branch from 4c70b45 to 0ab8ab1 Compare June 24, 2026 17:40
@github-actions

This comment has been minimized.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Development

Successfully merging this pull request may close these issues.

2 participants