branch-4.1: [fix](column) avoid mutable nullable crc32c hashing #64944#65088
Open
github-actions[bot] wants to merge 1 commit into
Open
branch-4.1: [fix](column) avoid mutable nullable crc32c hashing #64944#65088github-actions[bot] wants to merge 1 commit into
github-actions[bot] wants to merge 1 commit into
Conversation
### What problem does this PR solve?
`ColumnNullable::update_crc32c_batch()` normalized nullable fixed-width
nested columns by mutating and replacing `_nested_column` in a logically
const hash path. If another path reads the same nullable column at the
same time, it can keep reading the old nested column or its raw data
after the hash path releases it.
Root cause: the CRC32C nullable hash path used mutable source-column
access to replace NULL payloads with nested default values before
hashing. That preserved hash semantics, but it violated the expectation
that `update_crc32c_batch()` only reads the source column.
A focused BEUT can reproduce the problem before this fix:
```text
==2227142==ERROR: AddressSanitizer: heap-use-after-free
READ of size 4
#0 doris::ColumnVector<(doris::PrimitiveType)5>::insert_indices_from(...)
be/ut_build_ASAN/../src/core/column/column_vector.cpp:369:21
#1 doris::ColumnVector<(doris::PrimitiveType)5>::insert_indices_from(...)
be/ut_build_ASAN/../src/core/column/column_vector.cpp:373:5
#2 doris::ColumnNullable::insert_indices_from(...)
be/ut_build_ASAN/../src/core/column/column_nullable.cpp:378:25
freed by thread T18 here:
#12 doris::ColumnNullable::update_crc32c_batch(unsigned int*, unsigned char const*) const
be/ut_build_ASAN/../src/core/column/column_nullable.cpp:194:86
SUMMARY: AddressSanitizer: heap-use-after-free
```
This PR keeps the nullable CRC32C hash result unchanged without mutating
the source nested column. For replaceable fixed-width nested columns,
NULL rows are hashed as the nested type's default payload through
`update_crc32c_batch_default_on_null()`, avoiding both source mutation
and per-block cloned-column materialization.
A similar mutable-source issue exists in
`ColumnNullable::filter_by_selector()`: the selector API is currently
non-const even though it conceptually reads the source column and writes
only to the destination column. That path is intentionally left
unchanged in this PR and will be handled separately, because fixing it
cleanly requires changing the older selector API contract and
dictionary-column scratch-buffer behavior.
### Release note
None
Contributor
|
Thank you for your contribution to Apache Doris. Please clearly describe your PR:
|
Contributor
|
run buildall |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Cherry-picked from #64944