[fix](be) Fix short-circuit row store nested access path read#65051
[fix](be) Fix short-circuit row store nested access path read#65051mrhhsg wants to merge 1 commit into
Conversation
|
Thank you for your contribution to Apache Doris. Please clearly describe your PR:
|
21ae27b to
4059bf3
Compare
|
/review |
There was a problem hiding this comment.
Review completed. I found two blocking issues.
Critical checkpoints:
- Goal/test proof: the non-cache path moves access-path slots to column-store rowid reads, but cache-enabled point queries still bypass that fallback; the new test does not prove the stored map leaf path.
- Scope: the change is focused, but the row-cache parallel path was missed.
- Concurrency/lifecycle: no new lock-order, thread-lifecycle, or ownership issue found in the changed code.
- Config/compatibility: the existing dynamic BE config
disable_storage_row_cache=falseexposes the skipped fallback. - Parallel paths: row-cache hits are a functional parallel path to rowid lookup and must be handled consistently.
- Test coverage: the regression covers the struct access path and a missing-map-key NULL result, but not the stored nested map value or a cache-hit repeat query.
- Data correctness: repeated cache-enabled short-circuit point queries can leave access-path columns unfilled or mismatched.
Subagent conclusions: OPT-1 was accepted as MAIN-1; TSC-1 was accepted as MAIN-2; no candidates were dismissed or merged as duplicates. Convergence round 1 ended with both live subagents reporting NO_NEW_VALUABLE_FINDINGS for the same final ledger/comment set.
User focus: no additional user-provided focus points were present.
| _reusable->get_col_default_values(), _reusable->include_col_uids())); | ||
| } | ||
| if (!_reusable->missing_col_uids().empty()) { | ||
| if (!_reusable->runtime_state()->enable_short_circuit_query_access_column_store()) { |
There was a problem hiding this comment.
This fallback is skipped for row-cache hits, so enabling disable_storage_row_cache=false can still return the old broken shape on repeated point queries. The first lookup calls BaseTablet::lookup_row_data() and inserts only the row-store JSONB blob into RowCache; the second lookup can hit _cached_row_data, deserialize only _include_col_uids, and continue before lookup_missing_columns_by_rowid() runs. Since this PR intentionally leaves access-path columns in _missing_col_uids, those columns are never read from column store on the cached path. Please either bypass row cache when missing columns exist, or keep enough row-location state to run this fallback after a cache hit as well.
| SELECT /*+ SET_VAR(enable_nereids_planner=true) */ | ||
| pk, | ||
| CHAR_LENGTH(struct_element(element_at(struct_element(deep, 'deep_map'), 'dense'), 'leaf')) AS char_len, | ||
| LENGTH(LOWER(struct_element(element_at(struct_element(deep, 'deep_map'), 'dense'), 'leaf'))) AS lower_len, |
There was a problem hiding this comment.
This query uses map key dense, but the inserted row only contains deep_map = map('b_key', ...), so both map-derived outputs are NULL even if reading the stored map value through the new access-path column-store path is still broken. Please query the inserted key, for example b_key, and regenerate the expected output with non-null char_len/lower_len values so the regression actually covers nested map payload materialization.
4059bf3 to
c818c65
Compare
|
/review |
There was a problem hiding this comment.
No additional non-duplicate blocking issue found in this pass.
Critical checkpoint conclusions:
- Goal and coverage: The PR targets short-circuit point queries on row-store MOW tables when nested access-path slots must be read from column store. The current code keeps access-path columns in the missing-column set, propagates slot access paths into
StorageReadOptions, and the regression covers normal, short-circuit, and repeated short-circuit reads with non-null map hit outputs. - Scope/focus: The change is narrow to point-query row-store fallback plus one regression case.
- Concurrency/lifecycle: No new locks or background lifecycle were added. The row-cache path is now bypassed when fallback columns exist, addressing the already-raised cache-hit concern.
- Configuration/session propagation: The existing
enable_short_circuit_query_access_column_storequery option is used by the BE fallback. The regression togglesdisable_storage_row_cacheand restores it infinally. - Compatibility and storage format: No new thrift fields, function symbols, or storage format changes were introduced by this PR.
- Parallel paths: The access-path propagation helper matches the existing rowid-fetcher pattern. I did not find another point-query path that needs the same change.
- Tests/results: The added
.outfile matches the updated SQL shape, including non-nullb_keyresults and repeated short-circuit coverage. I did not run build or regression tests locally in this runner. - Observability/performance: Existing point-query metrics/logging remain in place. The extra column-store rowid read is limited to slots requiring access paths.
Existing review context:
- The previous row-cache fallback issue is already covered by the existing inline thread and the current revision gates row-cache use on
missing_col_uids().empty(). - The previous test map-key issue is already covered by the existing inline thread and the current revision now queries inserted key
b_keywith non-null expected output.
User focus points: No additional user-provided focus points were supplied.
Subagent conclusions:
optimizer-rewrite: no candidates in the initial pass; convergence round 1 returnedNO_NEW_VALUABLE_FINDINGS.tests-session-config: no candidates in the initial pass; convergence round 1 returnedNO_NEW_VALUABLE_FINDINGS.- No subagent candidates became inline comments; the two known concerns were treated as duplicates of existing threads and rechecked against the current revision.
### What problem does this PR solve?
Issue Number: None
Problem Summary: Short-circuit point queries on row-store Unique Key MOW tables could deserialize full row-store complex column payloads into nested access-path pruned slots. The serialized layouts do not match and could crash the backend while reading nested struct/map/string fields. This change keeps access-path slots on the column-store rowid path and propagates their access paths to the segment iterator, while other row-store columns continue to be read from row store.
### Release note
None
### Check List (For Author)
- Test:
- Regression test: doris-local-regression.sh --network 10.26.20.3/24 all -d point_query_p0 -s test_short_circuit_rowstore_nested_complex -forceGenOut
- Regression test: doris-local-regression.sh --network 10.26.20.3/24 all -d point_query_p0 -s test_short_circuit_rowstore_nested_complex
- Format check: build-support/check-format.sh
- Diff check: git diff --check
- Static analysis: build-support/run-clang-tidy.sh --build-dir be/build_Release (not passed in local environment due existing clang-tidy/header setup issues)
- Behavior changed: No
- Does this need documentation: No
c818c65 to
f2ff791
Compare
|
/review |
There was a problem hiding this comment.
No additional non-duplicate blocking issue found for the current head f2ff791adfdfc50f60b09c3818ccf59bca4929ef.
Critical checkpoint conclusions:
- Goal and test proof: The PR targets short-circuit point queries on row-store MOW tables when nested access-path slots must be read from column store. The current code keeps access-path columns in the missing-column set, propagates slot access paths into
StorageReadOptions, bypasses row cache when fallback columns exist, and the regression covers normal, short-circuit, and repeated short-circuit reads with non-nullb_keymap hit outputs. - Scope/focus: The change is narrow to the point-query row-store fallback plus one regression case.
- Concurrency/lifecycle: No new locks, background lifecycle, static initialization, or ownership issue found in the changed code.
- Configuration/session propagation: The existing
enable_short_circuit_query_access_column_storequery option is used by the BE fallback. The regression togglesdisable_storage_row_cacheand restores it infinally. - Compatibility and storage format: No new thrift fields, function symbols, or storage format changes were introduced by this PR.
- Parallel paths: The access-path propagation helper matches the existing rowid-fetcher pattern, and row-cache lookup/insert are now gated consistently for missing-column fallback requests.
- Tests/results: The added
.outmatches the updated SQL shape, including non-nullb_keyresults and repeated short-circuit coverage. I did not run build or regression tests locally because this grafted checkout lacksthirdparty/installed. - Observability/performance: Existing point-query metrics/logging remain in place. The extra column-store rowid read is limited to slots requiring access paths.
Existing review context:
- The previous row-cache fallback issue is already covered by inline thread 3502746530 and is addressed in the current revision by gating row-cache use on
missing_col_uids().empty(). - The previous test map-key issue is already covered by inline thread 3502746533 and is addressed in the current revision by querying inserted key
b_keywith non-null expected output.
User focus points: No additional user-provided focus points were supplied.
Subagent conclusions:
optimizer-rewrite: no candidates in the initial pass; convergence round 1 returnedNO_NEW_VALUABLE_FINDINGSfor the final ledger/comment set.tests-session-config: no candidates in the initial pass; convergence round 1 returnedNO_NEW_VALUABLE_FINDINGSfor the final ledger/comment set.- No subagent candidates became inline comments; the two known concerns were treated as duplicates of existing threads and rechecked against the current revision.
What problem does this PR solve?
Issue Number: None
Problem Summary:
Short-circuit point queries on row-store Unique Key MOW tables could deserialize full row-store complex column payloads into nested access-path pruned slots. The serialized layouts do not match and could crash the backend while reading nested struct/map/string fields.
This change keeps slots that require nested access paths on the column-store rowid path by folding access-path column ids into the missing-column set after the original row-store include/missing calculation. It also propagates their access paths to the segment iterator and skips storage row cache when some slots still require column-store fallback, so a row-cache hit cannot bypass the fallback path. Other row-store columns continue to be read from row store.
Release note
None
Check List (For Author)
doris-local-regression.sh --network 10.26.20.3/24 all -d point_query_p0 -s test_short_circuit_rowstore_nested_complex -forceGenOutdoris-local-regression.sh --network 10.26.20.3/24 all -d point_query_p0 -s test_short_circuit_rowstore_nested_complexbuild-support/check-format.shgit diff --checkbuild-support/run-clang-tidy.sh --build-dir be/build_Release(not passed in local environment due existing clang-tidy/header setup issues)