Skip to content

branch-4.1: [fix](column) avoid mutable nullable crc32c hashing #64944#65088

Open
github-actions[bot] wants to merge 1 commit into
branch-4.1from
auto-pick-64944-branch-4.1
Open

branch-4.1: [fix](column) avoid mutable nullable crc32c hashing #64944#65088
github-actions[bot] wants to merge 1 commit into
branch-4.1from
auto-pick-64944-branch-4.1

Conversation

@github-actions

@github-actions github-actions Bot commented Jul 1, 2026

Copy link
Copy Markdown
Contributor

Cherry-picked from #64944

### What problem does this PR solve?


`ColumnNullable::update_crc32c_batch()` normalized nullable fixed-width
nested columns by mutating and replacing `_nested_column` in a logically
const hash path. If another path reads the same nullable column at the
same time, it can keep reading the old nested column or its raw data
after the hash path releases it.

Root cause: the CRC32C nullable hash path used mutable source-column
access to replace NULL payloads with nested default values before
hashing. That preserved hash semantics, but it violated the expectation
that `update_crc32c_batch()` only reads the source column.

A focused BEUT can reproduce the problem before this fix:

```text
==2227142==ERROR: AddressSanitizer: heap-use-after-free
READ of size 4
    #0 doris::ColumnVector<(doris::PrimitiveType)5>::insert_indices_from(...)
       be/ut_build_ASAN/../src/core/column/column_vector.cpp:369:21
    #1 doris::ColumnVector<(doris::PrimitiveType)5>::insert_indices_from(...)
       be/ut_build_ASAN/../src/core/column/column_vector.cpp:373:5
    #2 doris::ColumnNullable::insert_indices_from(...)
       be/ut_build_ASAN/../src/core/column/column_nullable.cpp:378:25

freed by thread T18 here:
    #12 doris::ColumnNullable::update_crc32c_batch(unsigned int*, unsigned char const*) const
        be/ut_build_ASAN/../src/core/column/column_nullable.cpp:194:86

SUMMARY: AddressSanitizer: heap-use-after-free
```

This PR keeps the nullable CRC32C hash result unchanged without mutating
the source nested column. For replaceable fixed-width nested columns,
NULL rows are hashed as the nested type's default payload through
`update_crc32c_batch_default_on_null()`, avoiding both source mutation
and per-block cloned-column materialization.

A similar mutable-source issue exists in
`ColumnNullable::filter_by_selector()`: the selector API is currently
non-const even though it conceptually reads the source column and writes
only to the destination column. That path is intentionally left
unchanged in this PR and will be handled separately, because fixing it
cleanly requires changing the older selector API contract and
dictionary-column scratch-buffer behavior.



### Release note

None
@github-actions github-actions Bot requested a review from yiguolei as a code owner July 1, 2026 07:21
@hello-stephen

Copy link
Copy Markdown
Contributor

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@hello-stephen

Copy link
Copy Markdown
Contributor

run buildall

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants