[fix](be) Validate task executor scan handles#65054
Conversation
### What problem does this PR solve?
Issue Number: None
Related PR: None
Problem Summary: Task-executor scan scheduling could pass a null or invalid task handle into TimeSharingTaskExecutor. enqueue_splits and related paths cast the base TaskHandle to TimeSharingTaskHandle and immediately dereferenced the result, so a broken ScannerContext task-handle invariant caused BE to crash with SIGSEGV instead of returning a diagnostic error. This change validates scanner context, scan task, and task handle before submitting scan splits, and validates the task handle type at TimeSharingTaskExecutor entry points before dereferencing it.
### Release note
None
### Check List (For Author)
- Test: Unit Test
- Added TimeSharingTaskExecutorTest coverage for null and invalid task handles.
- Tried: JDK_17=/usr/local/opt/openjdk@17/libexec/openjdk.jdk/Contents/Home JAVA_HOME=/usr/local/opt/openjdk@17/libexec/openjdk.jdk/Contents/Home ./run-be-ut.sh --run --filter='TimeSharingTaskExecutorTest.*'; build failed during CMake configure because thirdparty/installed is missing Snappy.
- Behavior changed: Yes. Invalid task-executor scan handles now return InternalError instead of dereferencing a null cast result.
- Does this need documentation: No
|
Thank you for your contribution to Apache Doris. Please clearly describe your PR:
|
|
run buildall |
|
/review |
There was a problem hiding this comment.
Found one blocking test issue: the new invalid-handle BE UT constructs TimeSharingTaskExecutor with uninitialized thread-count fields, so the added coverage is nondeterministic.
Critical checkpoint conclusions: the production goal is clear and the scanner/executor validation paths look focused; concurrency and lifecycle paths around ScannerContext scheduling, split re-enqueue, remove_task cleanup, and the one-shot submit overload were checked without finding another substantiated issue; no new config, persistence, wire/storage compatibility, FE-BE protocol, or observability concern was introduced by this patch; test coverage was added, but the new test setup must be fixed before it is reliable. Existing inline review context was empty. No additional user focus was provided. Validation limit: BE UT/build was not run because this runner is missing .worktree_initialized and thirdparty/installed; the live PR patch reverse-applied cleanly with whitespace errors enabled.
Subagent conclusions: optimizer-rewrite reported no optimizer/parallel scheduling candidate; tests-session-config reported TEST-1, which was merged into MAIN-1 and became the inline comment below. Convergence round 1 ended with both live subagents replying NO_NEW_VALUABLE_FINDINGS for the same ledger/comment set.
There was a problem hiding this comment.
Anchor-repaired review submission. The substantiated issue remains MAIN-1: the new invalid-handle BE UT constructs TimeSharingTaskExecutor with uninitialized thread-count fields, so the added coverage is nondeterministic.
Critical checkpoint conclusions: the production scanner/executor validation paths are focused and I did not find another substantiated production correctness issue after tracing ScannerContext scheduling, split re-enqueue, remove_task cleanup, and the one-shot submit overload. No new config, persistence, wire/storage compatibility, FE-BE protocol, or observability concern was introduced by this patch. Test coverage was added, but the new test setup must be fixed before it is reliable. Existing inline review context was empty; no additional user focus was provided. Validation limit: BE UT/build was not run because this runner is missing .worktree_initialized and thirdparty/installed; the live PR patch reverse-applied cleanly with whitespace errors enabled.
Subagent conclusions: optimizer-rewrite reported no optimizer/parallel scheduling candidate; tests-session-config reported TEST-1, merged into MAIN-1. After repairing the inline anchor, convergence round 2 ended with both live subagents replying NO_NEW_VALUABLE_FINDINGS for the corrected ledger/comment set.
| auto ticker = std::make_shared<TestingTicker>(); | ||
|
|
||
| TimeSharingTaskExecutor::ThreadConfig thread_config; | ||
| thread_config.thread_name = "invalid_task_handle"; |
There was a problem hiding this comment.
The new test leaves thread_config.max_thread_num and thread_config.min_thread_num uninitialized here. ThreadConfig only defaults max_queue_size; the constructor copies the two indeterminate ints into _max_threads/_min_threads, and init() immediately calls _try_create_thread(_min_threads, ...). That makes this test nondeterministic: it can try to create an arbitrary number of worker threads or fail depending on stack contents. Please initialize both fields, like the adjacent tests do, before constructing the executor.
BE UT Coverage ReportIncrement line coverage Increment coverage report
|
BE Regression && UT Coverage ReportIncrement line coverage Increment coverage report
|
|
PR approved by at least one committer and no changes requested. |
Task-executor scan scheduling could pass a null or invalid task handle into TimeSharingTaskExecutor. enqueue_splits and related paths cast the base TaskHandle to TimeSharingTaskHandle and immediately dereferenced the result, so a broken ScannerContext task-handle invariant caused BE to crash with SIGSEGV instead of returning a diagnostic error. This change validates scanner context, scan task, and task handle before submitting scan splits, and validates the task handle type at TimeSharingTaskExecutor entry points before dereferencing it.
What problem does this PR solve?
Issue Number: None
Related PR: None
Problem Summary: Task-executor scan scheduling could pass a null or invalid task handle into TimeSharingTaskExecutor. enqueue_splits and related paths cast the base TaskHandle to TimeSharingTaskHandle and immediately dereferenced the result, so a broken ScannerContext task-handle invariant caused BE to crash with SIGSEGV instead of returning a diagnostic error. This change validates scanner context, scan task, and task handle before submitting scan splits, and validates the task handle type at TimeSharingTaskExecutor entry points before dereferencing it.
Release note
None
Check List (For Author)
What problem does this PR solve?
Issue Number: close #xxx
Related PR: #xxx
Problem Summary:
Release note
None
Check List (For Author)
Test
Behavior changed:
Does this need documentation?
Check List (For Reviewer who merge this PR)