[fix](mtmv): serialize alterJob with running tasks to prevent concurrent refresh on same MTMV#64958
Open
yujun777 wants to merge 1 commit into
Open
[fix](mtmv): serialize alterJob with running tasks to prevent concurrent refresh on same MTMV#64958yujun777 wants to merge 1 commit into
yujun777 wants to merge 1 commit into
Conversation
…t refresh ALTER MTMV REFRESH ON COMMIT calls alterJob() which does dropJob() + createJob(), creating a new MTMVJob instance with a new ReentrantReadWriteLock. If the CREATE MTMV immediate build task is still running under the old jobs writeLock, the new on-commit task can acquire the new jobs writeLock concurrently, causing two refresh tasks to operate on the same MTMV simultaneously and triggering "partition not found". Fix: In alterJob(), acquire the existing jobs writeLock before drop+create, ensuring all running tasks complete before the job is rebuilt. This serializes the alter with any in-flight tasks using the same lock instance. Co-Authored-By: Claude <noreply@anthropic.com>
Contributor
|
Thank you for your contribution to Apache Doris. Please clearly describe your PR:
|
Contributor
Author
|
run buildall |
Contributor
TPC-H: Total hot run time: 29395 ms |
Contributor
TPC-DS: Total hot run time: 173548 ms |
Contributor
ClickBench: Total hot run time: 25.19 s |
Contributor
FE Regression Coverage ReportIncrement line coverage |
Contributor
Author
|
run cloud_p0 |
Contributor
FE Regression Coverage ReportIncrement line coverage |
morrySnow
approved these changes
Jul 1, 2026
Contributor
|
PR approved by at least one committer and no changes requested. |
Contributor
|
PR approved by anyone and no changes requested. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
ALTER MTMV REFRESH ON COMMIT calls
alterJob()which doesdropJob()+createJob(), creating a new MTMVJob instance with a new ReentrantReadWriteLock. If the CREATE MTMV immediate build task is still running under the old job's writeLock, the new on-commit task can acquire the new job's writeLock concurrently, causing two refresh tasks to operate on the same MTMV simultaneously.This race triggers "partition not found" when the on-commit task reads partition metadata via
UpdateMvByPartitionCommand.constructTableWithPredicates(), while the immediate build task is in the middle of replacing partitions.Root Cause
alterJob()drops the old job and creates a new oneReentrantReadWriteLockinstanceMTMVTask.runTask()acquires the writeLock fromgetJobOrJobException()getJobOrJobException()returns the new job with a different lockFix
In
alterJob(), acquire the existing job's writeLock before drop+create, ensuring all running tasks complete before the job is rebuilt. This serializes the alter with any in-flight tasks using the same lock instance.Change
Single file:
fe/fe-core/src/main/java/org/apache/doris/mtmv/MTMVJobManager.javaisReplay=true) unchangedoldJob🤖 Generated with Claude Code