Skip to content

[fix](be) Fix short-circuit row store nested access path read#65051

Draft
mrhhsg wants to merge 1 commit into
apache:masterfrom
mrhhsg:fix/short-circuit-rowstore-nested
Draft

[fix](be) Fix short-circuit row store nested access path read#65051
mrhhsg wants to merge 1 commit into
apache:masterfrom
mrhhsg:fix/short-circuit-rowstore-nested

Conversation

@mrhhsg

@mrhhsg mrhhsg commented Jun 30, 2026

Copy link
Copy Markdown
Member

What problem does this PR solve?

Issue Number: None

Problem Summary:

Short-circuit point queries on row-store Unique Key MOW tables could deserialize full row-store complex column payloads into nested access-path pruned slots. The serialized layouts do not match and could crash the backend while reading nested struct/map/string fields.

This change keeps slots that require nested access paths on the column-store rowid path by folding access-path column ids into the missing-column set after the original row-store include/missing calculation. It also propagates their access paths to the segment iterator and skips storage row cache when some slots still require column-store fallback, so a row-cache hit cannot bypass the fallback path. Other row-store columns continue to be read from row store.

Release note

None

Check List (For Author)

  • Test:
    • Regression test: doris-local-regression.sh --network 10.26.20.3/24 all -d point_query_p0 -s test_short_circuit_rowstore_nested_complex -forceGenOut
    • Regression test: doris-local-regression.sh --network 10.26.20.3/24 all -d point_query_p0 -s test_short_circuit_rowstore_nested_complex
    • Format check: build-support/check-format.sh
    • Diff check: git diff --check
    • Static analysis: build-support/run-clang-tidy.sh --build-dir be/build_Release (not passed in local environment due existing clang-tidy/header setup issues)
  • Behavior changed: No
  • Does this need documentation: No

@hello-stephen

Copy link
Copy Markdown
Contributor

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@mrhhsg mrhhsg force-pushed the fix/short-circuit-rowstore-nested branch 2 times, most recently from 21ae27b to 4059bf3 Compare July 1, 2026 01:19
@mrhhsg

mrhhsg commented Jul 1, 2026

Copy link
Copy Markdown
Member Author

/review

@github-actions github-actions Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review completed. I found two blocking issues.

Critical checkpoints:

  • Goal/test proof: the non-cache path moves access-path slots to column-store rowid reads, but cache-enabled point queries still bypass that fallback; the new test does not prove the stored map leaf path.
  • Scope: the change is focused, but the row-cache parallel path was missed.
  • Concurrency/lifecycle: no new lock-order, thread-lifecycle, or ownership issue found in the changed code.
  • Config/compatibility: the existing dynamic BE config disable_storage_row_cache=false exposes the skipped fallback.
  • Parallel paths: row-cache hits are a functional parallel path to rowid lookup and must be handled consistently.
  • Test coverage: the regression covers the struct access path and a missing-map-key NULL result, but not the stored nested map value or a cache-hit repeat query.
  • Data correctness: repeated cache-enabled short-circuit point queries can leave access-path columns unfilled or mismatched.

Subagent conclusions: OPT-1 was accepted as MAIN-1; TSC-1 was accepted as MAIN-2; no candidates were dismissed or merged as duplicates. Convergence round 1 ended with both live subagents reporting NO_NEW_VALUABLE_FINDINGS for the same final ledger/comment set.

User focus: no additional user-provided focus points were present.

_reusable->get_col_default_values(), _reusable->include_col_uids()));
}
if (!_reusable->missing_col_uids().empty()) {
if (!_reusable->runtime_state()->enable_short_circuit_query_access_column_store()) {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This fallback is skipped for row-cache hits, so enabling disable_storage_row_cache=false can still return the old broken shape on repeated point queries. The first lookup calls BaseTablet::lookup_row_data() and inserts only the row-store JSONB blob into RowCache; the second lookup can hit _cached_row_data, deserialize only _include_col_uids, and continue before lookup_missing_columns_by_rowid() runs. Since this PR intentionally leaves access-path columns in _missing_col_uids, those columns are never read from column store on the cached path. Please either bypass row cache when missing columns exist, or keep enough row-location state to run this fallback after a cache hit as well.

SELECT /*+ SET_VAR(enable_nereids_planner=true) */
pk,
CHAR_LENGTH(struct_element(element_at(struct_element(deep, 'deep_map'), 'dense'), 'leaf')) AS char_len,
LENGTH(LOWER(struct_element(element_at(struct_element(deep, 'deep_map'), 'dense'), 'leaf'))) AS lower_len,

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This query uses map key dense, but the inserted row only contains deep_map = map('b_key', ...), so both map-derived outputs are NULL even if reading the stored map value through the new access-path column-store path is still broken. Please query the inserted key, for example b_key, and regenerate the expected output with non-null char_len/lower_len values so the regression actually covers nested map payload materialization.

@mrhhsg mrhhsg force-pushed the fix/short-circuit-rowstore-nested branch from 4059bf3 to c818c65 Compare July 1, 2026 03:47
@mrhhsg

mrhhsg commented Jul 1, 2026

Copy link
Copy Markdown
Member Author

/review

@github-actions github-actions Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No additional non-duplicate blocking issue found in this pass.

Critical checkpoint conclusions:

  • Goal and coverage: The PR targets short-circuit point queries on row-store MOW tables when nested access-path slots must be read from column store. The current code keeps access-path columns in the missing-column set, propagates slot access paths into StorageReadOptions, and the regression covers normal, short-circuit, and repeated short-circuit reads with non-null map hit outputs.
  • Scope/focus: The change is narrow to point-query row-store fallback plus one regression case.
  • Concurrency/lifecycle: No new locks or background lifecycle were added. The row-cache path is now bypassed when fallback columns exist, addressing the already-raised cache-hit concern.
  • Configuration/session propagation: The existing enable_short_circuit_query_access_column_store query option is used by the BE fallback. The regression toggles disable_storage_row_cache and restores it in finally.
  • Compatibility and storage format: No new thrift fields, function symbols, or storage format changes were introduced by this PR.
  • Parallel paths: The access-path propagation helper matches the existing rowid-fetcher pattern. I did not find another point-query path that needs the same change.
  • Tests/results: The added .out file matches the updated SQL shape, including non-null b_key results and repeated short-circuit coverage. I did not run build or regression tests locally in this runner.
  • Observability/performance: Existing point-query metrics/logging remain in place. The extra column-store rowid read is limited to slots requiring access paths.

Existing review context:

  • The previous row-cache fallback issue is already covered by the existing inline thread and the current revision gates row-cache use on missing_col_uids().empty().
  • The previous test map-key issue is already covered by the existing inline thread and the current revision now queries inserted key b_key with non-null expected output.

User focus points: No additional user-provided focus points were supplied.

Subagent conclusions:

  • optimizer-rewrite: no candidates in the initial pass; convergence round 1 returned NO_NEW_VALUABLE_FINDINGS.
  • tests-session-config: no candidates in the initial pass; convergence round 1 returned NO_NEW_VALUABLE_FINDINGS.
  • No subagent candidates became inline comments; the two known concerns were treated as duplicates of existing threads and rechecked against the current revision.

### What problem does this PR solve?

Issue Number: None

Problem Summary: Short-circuit point queries on row-store Unique Key MOW tables could deserialize full row-store complex column payloads into nested access-path pruned slots. The serialized layouts do not match and could crash the backend while reading nested struct/map/string fields. This change keeps access-path slots on the column-store rowid path and propagates their access paths to the segment iterator, while other row-store columns continue to be read from row store.

### Release note

None

### Check List (For Author)

- Test:
    - Regression test: doris-local-regression.sh --network 10.26.20.3/24 all -d point_query_p0 -s test_short_circuit_rowstore_nested_complex -forceGenOut
    - Regression test: doris-local-regression.sh --network 10.26.20.3/24 all -d point_query_p0 -s test_short_circuit_rowstore_nested_complex
    - Format check: build-support/check-format.sh
    - Diff check: git diff --check
    - Static analysis: build-support/run-clang-tidy.sh --build-dir be/build_Release (not passed in local environment due existing clang-tidy/header setup issues)
- Behavior changed: No
- Does this need documentation: No
@mrhhsg mrhhsg force-pushed the fix/short-circuit-rowstore-nested branch from c818c65 to f2ff791 Compare July 1, 2026 07:42
@mrhhsg

mrhhsg commented Jul 1, 2026

Copy link
Copy Markdown
Member Author

/review

@github-actions github-actions Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No additional non-duplicate blocking issue found for the current head f2ff791adfdfc50f60b09c3818ccf59bca4929ef.

Critical checkpoint conclusions:

  • Goal and test proof: The PR targets short-circuit point queries on row-store MOW tables when nested access-path slots must be read from column store. The current code keeps access-path columns in the missing-column set, propagates slot access paths into StorageReadOptions, bypasses row cache when fallback columns exist, and the regression covers normal, short-circuit, and repeated short-circuit reads with non-null b_key map hit outputs.
  • Scope/focus: The change is narrow to the point-query row-store fallback plus one regression case.
  • Concurrency/lifecycle: No new locks, background lifecycle, static initialization, or ownership issue found in the changed code.
  • Configuration/session propagation: The existing enable_short_circuit_query_access_column_store query option is used by the BE fallback. The regression toggles disable_storage_row_cache and restores it in finally.
  • Compatibility and storage format: No new thrift fields, function symbols, or storage format changes were introduced by this PR.
  • Parallel paths: The access-path propagation helper matches the existing rowid-fetcher pattern, and row-cache lookup/insert are now gated consistently for missing-column fallback requests.
  • Tests/results: The added .out matches the updated SQL shape, including non-null b_key results and repeated short-circuit coverage. I did not run build or regression tests locally because this grafted checkout lacks thirdparty/installed.
  • Observability/performance: Existing point-query metrics/logging remain in place. The extra column-store rowid read is limited to slots requiring access paths.

Existing review context:

  • The previous row-cache fallback issue is already covered by inline thread 3502746530 and is addressed in the current revision by gating row-cache use on missing_col_uids().empty().
  • The previous test map-key issue is already covered by inline thread 3502746533 and is addressed in the current revision by querying inserted key b_key with non-null expected output.

User focus points: No additional user-provided focus points were supplied.

Subagent conclusions:

  • optimizer-rewrite: no candidates in the initial pass; convergence round 1 returned NO_NEW_VALUABLE_FINDINGS for the final ledger/comment set.
  • tests-session-config: no candidates in the initial pass; convergence round 1 returned NO_NEW_VALUABLE_FINDINGS for the final ledger/comment set.
  • No subagent candidates became inline comments; the two known concerns were treated as duplicates of existing threads and rechecked against the current revision.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants