HIVE-29572: ACID Compaction: Cleaner should mark a compaction failed when its txn is aborted by kuczoram · Pull Request #6498 · apache/hive

kuczoram · 2026-05-18T13:33:00Z

What changes were proposed in this pull request?

When a compaction is fetched by the cleaner, first check the state of the compaction's transaction. If it is already aborted, delete nothing, but mark the compaction as failed.

Why are the changes needed?

It can happen that a compaction is marked as finished and get into "ready for cleaning" state, but the compaction txn stays open. And when the timeout reached, the txn gets aborted.
With min.history.level, a compaction like this can block the cleaning for all consecutive compaction, even after the txn is aborted.

Does this PR introduce any user-facing change?

No

How was this patch tested?

The scenarios are covered in TestCleaner unit test.

…when its txn is aborted

sonarqubecloud · 2026-05-26T11:53:28Z

Quality Gate passed

Issues
17 New issues
0 Accepted issues

Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarQube Cloud

deniskuzZ · 2026-05-27T08:13:32Z


+    if (ci.nextTxnId == 0 && ci.txnId > 0 &&
+        (ci.type == CompactionType.MAJOR || ci.type == CompactionType.MINOR || ci.type == CompactionType.REBALANCE)) {
+      TxnStatus status = txnHandler.getTransactionStatus(ci.txnId);


i don't think that is an optimal solution.
if i get it right, it requires checking every cleanup request for an open txn and creates additional stress on backend DB

It can happen that a compaction is marked as finished and get into "ready for cleaning" state, but the compaction txn stays open. And when the timeout reached, the txn gets aborted.

could we handle this in abortTxn? if txnType=3 (compaction) cleanup the COMPACTION_QUEUE as well?

With min.history.level, a compaction like this can block the cleaning for all consecutive compaction, even after the txn is aborted.

if i get it right, it's not the case with min.history.writeid, however, the check is generic

It checks for the transaction state only for compactions which don't have nextTxnId. If I understand it correctly if a compaction's txn is committed successfully, the nextTxnId is filled. So it will mean just one additional HMS call for compactions without successfully committed txn. For compaction's with comitted txn, it will do nothing.

I tried an other approach as well, when I made the compactions failed in the AbortTxnsFunction, but I didn't really like that either. I didn't find a way to call the markFailed from there, so I either had to extend the SQLs there or fetch the compactioninfo for all txn ids and then check if the condition's are matched. Like it has to be a compaction, not a soft-delete and it has to be in ready-for-cleaning state, because if it is in working state, we cannot fail it, as it could be revoked. At the end I felt that going with this approach could have more side effects. So I went with this one, but I can go back to making the failure in AbortTxnsFunction if you'd like that.

With min.history.level, a compaction like this can block the cleaning for all consecutive compaction, even after the txn is aborted.

if i get it right, it's not the case with min.history.writeid, however, the check is generic

No, with min.history.writeid it is not blocking the following cleaners. Because in that case what happens is that the cleaner's highwatermark will be 0, so it won't delete anything, but there is no remainder checking for min.history.writeid, so it will consider the cleaning as successful. Marks the compaction as successful and just goes on. With min.history.level, it will clean nothing, but finds deltas which should be deleted, so it will stay in the queue with "ready for cleaning" state.

If I understand it correctly if a compaction's txn is committed successfully, the nextTxnId is filled.

why uncommited txn would be even eligible in first place? is it a race between mark ready-for-cleaning and commit/abort? what if we mark after commit/abort or make commit atomic?

asf-ci-hive added tests pending tests unstable and removed tests pending tests unstable labels May 18, 2026

kuczoram force-pushed the HIVE-29572 branch from 529e2cb to 16b5564 Compare May 21, 2026 08:43

asf-ci-hive added tests pending tests passed and removed tests unstable tests pending labels May 21, 2026

kuczoram changed the title ~~[WIP] HIVE-29572: ACID Compaction: Cleaner should check the state of the compaction txn before start cleaning~~ HIVE-29572: Cleaner should mark a compaction failed when its txn is aborted May 22, 2026

HIVE-29572: ACID Compaction: Cleaner should mark a compaction failed …

9117b88

…when its txn is aborted

kuczoram changed the title ~~HIVE-29572: Cleaner should mark a compaction failed when its txn is aborted~~ HIVE-29572: ACID Compaction: Cleaner should mark a compaction failed when its txn is aborted May 22, 2026

kuczoram force-pushed the HIVE-29572 branch from 16b5564 to 9117b88 Compare May 22, 2026 08:08

asf-ci-hive added tests pending tests unstable and removed tests passed tests pending tests unstable labels May 22, 2026

asf-ci-hive added tests passed and removed tests pending labels May 26, 2026

deniskuzZ reviewed May 27, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

HIVE-29572: ACID Compaction: Cleaner should mark a compaction failed when its txn is aborted#6498

HIVE-29572: ACID Compaction: Cleaner should mark a compaction failed when its txn is aborted#6498
kuczoram wants to merge 1 commit into
apache:masterfrom
kuczoram:HIVE-29572

kuczoram commented May 18, 2026 •

edited

Loading

Uh oh!

sonarqubecloud Bot commented May 26, 2026

Uh oh!

deniskuzZ May 27, 2026 •

edited

Loading

Uh oh!

deniskuzZ May 27, 2026 •

edited

Loading

Uh oh!

deniskuzZ May 27, 2026

Uh oh!

kuczoram May 27, 2026 •

edited

Loading

Uh oh!

kuczoram May 27, 2026

Uh oh!

deniskuzZ May 27, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

kuczoram commented May 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

sonarqubecloud Bot commented May 26, 2026

Quality Gate passed

Uh oh!

deniskuzZ May 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

deniskuzZ May 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

deniskuzZ May 27, 2026

Choose a reason for hiding this comment

Uh oh!

kuczoram May 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kuczoram May 27, 2026

Choose a reason for hiding this comment

Uh oh!

deniskuzZ May 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

kuczoram commented May 18, 2026 •

edited

Loading

deniskuzZ May 27, 2026 •

edited

Loading

deniskuzZ May 27, 2026 •

edited

Loading

kuczoram May 27, 2026 •

edited

Loading

deniskuzZ May 27, 2026 •

edited

Loading