Skip to content

Dev 0701#65068

Draft
Gabriel39 wants to merge 2 commits into
apache:masterfrom
Gabriel39:dev_0701
Draft

Dev 0701#65068
Gabriel39 wants to merge 2 commits into
apache:masterfrom
Gabriel39:dev_0701

Conversation

@Gabriel39

Copy link
Copy Markdown
Contributor

What problem does this PR solve?

Issue Number: close #xxx

Related PR: #xxx

Problem Summary:

Release note

None

Check List (For Author)

  • Test

    • Regression test
    • Unit Test
    • Manual test (add detailed scripts or steps below)
    • No need to test or manual test. Explain why:
      • This is a refactor/code format and no logic has been changed.
      • Previous test can cover this change.
      • No code files have been changed.
      • Other reason
  • Behavior changed:

    • No.
    • Yes.
  • Does this need documentation?

    • No.
    • Yes.

Check List (For Reviewer who merge this PR)

  • Confirm the release note
  • Confirm test cases
  • Confirm document
  • Add branch pick label

Gabriel39 added 2 commits July 1, 2026 10:01
### What problem does this PR solve?

Issue Number: close #xxx

Related PR: apache#63893

Problem Summary: Add the file scanner v2 reader stack for external file scans, including native readers for Parquet, CSV/TEXT, JSON, JNI-backed table readers, schema projection, column mapping, predicate handling, reader statistics, page cache support, and related BE/FE integration. This also restores affected Parquet LZO regression cases by adding Doris thirdparty Arrow LZO page decompression support for file scanner v2.

The change keeps VDirectInPredicate source-compatible with existing ordinary two-argument construction by defaulting the new HybridSet child-type flag to true. Dictionary-code rewrites can still pass false explicitly, while existing runtime filter tests continue to compile with the old call shape.

Review follow-up fixes make RuntimeFilterExpr global-index slot rewriting update the executable _impl tree, document enable_file_scanner_v2 as default-on to match the FE default, and trim generated regression outputs so diff hygiene passes.

### Release note

Support file scanner v2 readers for external file scan paths, including LZO-compressed Parquet reads in the new Parquet reader path.

### Check List (For Author)

- Test: Manual test
    - Verified apache-arrow-17.0.0-lzo.patch applies with patch -p1 --dry-run against Arrow 17 column_reader.cc
    - Ran bash -n thirdparty/build-thirdparty.sh thirdparty/download-thirdparty.sh
    - Ran build-support/clang-format.sh
    - Ran git diff --check
    - Attempted ./run-be-ut.sh --run --filter='RuntimeFilterExprSamplingTest.deep_clone_clones_impl_tree'; local sandboxed run could not complete because the BE UT script required JDK 17 setup first, then needed submodule metadata writes and GitHub access for thirdparty dependencies. Escalated retry was not approved before timeout.
    - Attempted ./run-be-ut.sh --run --filter='FileScannerV2Test.RewriteSlotRefsToGlobalIndexMatrix'; local sandboxed run could not complete because the BE UT script needed submodule metadata writes and GitHub access for thirdparty dependencies. Escalated retry was not approved before timeout.
    - Full BE unit tests and external regression tests were not run in this local environment
- Behavior changed: Yes. Adds file scanner v2 reader behavior and enables LZO-compressed Parquet reads through the new reader path
- Does this need documentation: No
@Gabriel39

Copy link
Copy Markdown
Contributor Author

run buildall

@Gabriel39 Gabriel39 marked this pull request as draft July 1, 2026 04:22
@hello-stephen

Copy link
Copy Markdown
Contributor

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@hello-stephen

Copy link
Copy Markdown
Contributor

Cloud UT Coverage Report

Increment line coverage 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 77.39% (1896/2450)
Line Coverage 64.43% (34060/52862)
Region Coverage 64.83% (17524/27032)
Branch Coverage 54.04% (9398/17390)

@hello-stephen

Copy link
Copy Markdown
Contributor

FE UT Coverage Report

Increment line coverage 60.00% (3/5) 🎉
Increment coverage report
Complete coverage report

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants