Skip to content

[fix](be) Validate task executor scan handles#65054

Merged
Gabriel39 merged 1 commit into
apache:masterfrom
Gabriel39:fix_0630
Jul 1, 2026
Merged

[fix](be) Validate task executor scan handles#65054
Gabriel39 merged 1 commit into
apache:masterfrom
Gabriel39:fix_0630

Conversation

@Gabriel39

Copy link
Copy Markdown
Contributor

What problem does this PR solve?

Issue Number: None

Related PR: None

Problem Summary: Task-executor scan scheduling could pass a null or invalid task handle into TimeSharingTaskExecutor. enqueue_splits and related paths cast the base TaskHandle to TimeSharingTaskHandle and immediately dereferenced the result, so a broken ScannerContext task-handle invariant caused BE to crash with SIGSEGV instead of returning a diagnostic error. This change validates scanner context, scan task, and task handle before submitting scan splits, and validates the task handle type at TimeSharingTaskExecutor entry points before dereferencing it.

Release note

None

Check List (For Author)

  • Test: Unit Test
    • Added TimeSharingTaskExecutorTest coverage for null and invalid task handles.
    • Tried: JDK_17=/usr/local/opt/openjdk@17/libexec/openjdk.jdk/Contents/Home JAVA_HOME=/usr/local/opt/openjdk@17/libexec/openjdk.jdk/Contents/Home ./run-be-ut.sh --run --filter='TimeSharingTaskExecutorTest.*'; build failed during CMake configure because thirdparty/installed is missing Snappy.
  • Behavior changed: Yes. Invalid task-executor scan handles now return InternalError instead of dereferencing a null cast result.
  • Does this need documentation: No

What problem does this PR solve?

Issue Number: close #xxx

Related PR: #xxx

Problem Summary:

Release note

None

Check List (For Author)

  • Test

    • Regression test
    • Unit Test
    • Manual test (add detailed scripts or steps below)
    • No need to test or manual test. Explain why:
      • This is a refactor/code format and no logic has been changed.
      • Previous test can cover this change.
      • No code files have been changed.
      • Other reason
  • Behavior changed:

    • No.
    • Yes.
  • Does this need documentation?

    • No.
    • Yes.

Check List (For Reviewer who merge this PR)

  • Confirm the release note
  • Confirm test cases
  • Confirm document
  • Add branch pick label

### What problem does this PR solve?

Issue Number: None

Related PR: None

Problem Summary: Task-executor scan scheduling could pass a null or invalid task handle into TimeSharingTaskExecutor. enqueue_splits and related paths cast the base TaskHandle to TimeSharingTaskHandle and immediately dereferenced the result, so a broken ScannerContext task-handle invariant caused BE to crash with SIGSEGV instead of returning a diagnostic error. This change validates scanner context, scan task, and task handle before submitting scan splits, and validates the task handle type at TimeSharingTaskExecutor entry points before dereferencing it.

### Release note

None

### Check List (For Author)

- Test: Unit Test
    - Added TimeSharingTaskExecutorTest coverage for null and invalid task handles.
    - Tried: JDK_17=/usr/local/opt/openjdk@17/libexec/openjdk.jdk/Contents/Home JAVA_HOME=/usr/local/opt/openjdk@17/libexec/openjdk.jdk/Contents/Home ./run-be-ut.sh --run --filter='TimeSharingTaskExecutorTest.*'; build failed during CMake configure because thirdparty/installed is missing Snappy.
- Behavior changed: Yes. Invalid task-executor scan handles now return InternalError instead of dereferencing a null cast result.
- Does this need documentation: No
@hello-stephen

Copy link
Copy Markdown
Contributor

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@Gabriel39

Copy link
Copy Markdown
Contributor Author

run buildall

@Gabriel39

Copy link
Copy Markdown
Contributor Author

/review

@github-actions github-actions Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Found one blocking test issue: the new invalid-handle BE UT constructs TimeSharingTaskExecutor with uninitialized thread-count fields, so the added coverage is nondeterministic.

Critical checkpoint conclusions: the production goal is clear and the scanner/executor validation paths look focused; concurrency and lifecycle paths around ScannerContext scheduling, split re-enqueue, remove_task cleanup, and the one-shot submit overload were checked without finding another substantiated issue; no new config, persistence, wire/storage compatibility, FE-BE protocol, or observability concern was introduced by this patch; test coverage was added, but the new test setup must be fixed before it is reliable. Existing inline review context was empty. No additional user focus was provided. Validation limit: BE UT/build was not run because this runner is missing .worktree_initialized and thirdparty/installed; the live PR patch reverse-applied cleanly with whitespace errors enabled.

Subagent conclusions: optimizer-rewrite reported no optimizer/parallel scheduling candidate; tests-session-config reported TEST-1, which was merged into MAIN-1 and became the inline comment below. Convergence round 1 ended with both live subagents replying NO_NEW_VALUABLE_FINDINGS for the same ledger/comment set.

@github-actions github-actions Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Anchor-repaired review submission. The substantiated issue remains MAIN-1: the new invalid-handle BE UT constructs TimeSharingTaskExecutor with uninitialized thread-count fields, so the added coverage is nondeterministic.

Critical checkpoint conclusions: the production scanner/executor validation paths are focused and I did not find another substantiated production correctness issue after tracing ScannerContext scheduling, split re-enqueue, remove_task cleanup, and the one-shot submit overload. No new config, persistence, wire/storage compatibility, FE-BE protocol, or observability concern was introduced by this patch. Test coverage was added, but the new test setup must be fixed before it is reliable. Existing inline review context was empty; no additional user focus was provided. Validation limit: BE UT/build was not run because this runner is missing .worktree_initialized and thirdparty/installed; the live PR patch reverse-applied cleanly with whitespace errors enabled.

Subagent conclusions: optimizer-rewrite reported no optimizer/parallel scheduling candidate; tests-session-config reported TEST-1, merged into MAIN-1. After repairing the inline anchor, convergence round 2 ended with both live subagents replying NO_NEW_VALUABLE_FINDINGS for the corrected ledger/comment set.

auto ticker = std::make_shared<TestingTicker>();

TimeSharingTaskExecutor::ThreadConfig thread_config;
thread_config.thread_name = "invalid_task_handle";

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new test leaves thread_config.max_thread_num and thread_config.min_thread_num uninitialized here. ThreadConfig only defaults max_queue_size; the constructor copies the two indeterminate ints into _max_threads/_min_threads, and init() immediately calls _try_create_thread(_min_threads, ...). That makes this test nondeterministic: it can try to create an arbitrary number of worker threads or fail depending on stack contents. Please initialize both fields, like the adjacent tests do, before constructing the executor.

@hello-stephen

Copy link
Copy Markdown
Contributor

BE UT Coverage Report

Increment line coverage 50.00% (18/36) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 55.12% (21729/39420)
Line Coverage 38.61% (208044/538891)
Region Coverage 34.69% (163947/472669)
Branch Coverage 35.69% (71860/201348)

@hello-stephen

Copy link
Copy Markdown
Contributor

BE Regression && UT Coverage Report

Increment line coverage 66.67% (24/36) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 74.16% (28535/38478)
Line Coverage 58.00% (310729/535700)
Region Coverage 54.78% (260061/474774)
Branch Coverage 56.13% (113155/201591)

@github-actions

github-actions Bot commented Jul 1, 2026

Copy link
Copy Markdown
Contributor

PR approved by at least one committer and no changes requested.

@github-actions github-actions Bot added the approved Indicates a PR has been approved by one committer. label Jul 1, 2026
@Gabriel39 Gabriel39 merged commit 023b766 into apache:master Jul 1, 2026
33 of 34 checks passed
github-actions Bot pushed a commit that referenced this pull request Jul 1, 2026
Task-executor scan scheduling could pass a null or
invalid task handle into TimeSharingTaskExecutor. enqueue_splits and
related paths cast the base TaskHandle to TimeSharingTaskHandle and
immediately dereferenced the result, so a broken ScannerContext
task-handle invariant caused BE to crash with SIGSEGV instead of
returning a diagnostic error. This change validates scanner context,
scan task, and task handle before submitting scan splits, and validates
the task handle type at TimeSharingTaskExecutor entry points before
dereferencing it.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by one committer. dev/4.0.x-conflict dev/4.1.x

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants