Core, Spark: Clean up uncommitted files when a staged table is aborted#16388
Open
wombatu-kun wants to merge 1 commit into
Open
Core, Spark: Clean up uncommitted files when a staged table is aborted#16388wombatu-kun wants to merge 1 commit into
wombatu-kun wants to merge 1 commit into
Conversation
StagedSparkTable.abortStagedChanges() was an empty `// TODO: clean up`, so when an atomic CTAS/RTAS (or the snapshot/migrate actions) failed after the staged write but before commitStagedChanges(), the manifest list and manifests already written into the uncommitted transaction were leaked. For a staged CREATE the table is never registered, so those orphans are unreachable by removeOrphanFiles and leak permanently. This adds a best-effort `default void abortTransaction()` to the Transaction API, overridden in BaseTransaction to run the existing cleanUp() (cleanAllUpdates() + deleteUncommittedFiles()) and delegated by CommitCallbackTransaction. StagedSparkTable.abortStagedChanges() now calls it across all supported Spark versions (3.4, 3.5, 4.0, 4.1). The default is a no-op rather than throwing because abort runs from catch/finally blocks where a secondary exception would mask the original failure; the underlying deletion is already best-effort and idempotent, so calling abort after a failed commit is safe. Out of scope (unchanged, pre-existing): executor-written data files in the write-succeeds-then-commit-fails path, which match Iceberg's existing create-transaction behavior and are handled by SparkWrite.abort() on write-job failure. Tests: a new abort case in core TestCreateTransaction, and a byte-identical TestStagedSparkTable added to all four Spark version trees. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
StagedSparkTable.abortStagedChanges()was an empty// TODO: clean up. Spark calls it to roll back an atomic CTAS/RTAS, and it is also used by the snapshot/migrate actions. Iceberg's internal transaction cleanup only runs whencommitTransaction()is actually invoked and then fails. When a failure happens after the staged write but beforecommitStagedChanges()is called (for example,SnapshotTableSparkAction/MigrateTableSparkActionfailing while importing data, or an interrupted CTAS), nothing cleaned the manifest list and manifests already written into the uncommitted transaction. For a staged CREATE the table is never registered, so those orphans have no table metadata pointing at them and are unreachable byremoveOrphanFiles— they leak permanently.Changes
default void abortTransaction()to theTransactionAPI. The default is a documented no-op, so existing implementations are unaffected.BaseTransactionto run the existingcleanUp()(cleanAllUpdates()+deleteUncommittedFiles()) — the same cleanup Iceberg already performs when a create/replace transaction's own commit fails.CommitCallbackTransaction; the post-commit callback is intentionally not run on abort.StagedSparkTable.abortStagedChanges()totransaction.abortTransaction()in all supported Spark versions (3.4, 3.5, 4.0, 4.1).java.method.addedToInterfaceentry to.palantir/revapi.yml.A no-op default (rather than throwing) is used because
abortTransaction()runs fromcatch/finallyblocks where a secondary exception would mask the original failure. The underlying deletion is already best-effort and idempotent (CatalogUtil.deleteFileswallowsNotFoundException,cleanAllUpdates()suppresses failures), so calling abort after a failedcommitStagedChanges()is safe.Out of scope
Executor-written data files in the write-succeeds-then-commit-fails path are not deleted here. That matches Iceberg's existing create-transaction behavior, and those files are handled by
SparkWrite.abort()on write-job failure.Testing
TestCreateTransaction(runs across all format-version templates): stages a create transaction, performs an append, asserts the manifest and manifest list exist, callsabortTransaction(), asserts they are removed and the table is not created, and verifies a secondabortTransaction()does not throw.TestStagedSparkTableadded to all four Spark version trees: stages a CREATE viaSparkCatalog.stageCreate, routes an append into the staged transaction, callsabortStagedChanges(), and asserts the uncommitted manifest/manifest-list files are deleted and the table is not created. The session-catalog parameterization is skipped (it producesRollbackStagedTable, notStagedSparkTable). Per Spark version: 3 configs (hive/hadoop/rest) pass, 1 skipped, 0 failures../gradlew revApiCheck,spotlessApply -DallModules, and the tests above all pass.🤖 Generated with Claude Code