Skip to content

feat(transaction): prune statistics files for expired snapshots#2667

Open
dhruvarya-db wants to merge 1 commit into
apache:mainfrom
dhruvarya-db:feat-expire-snapshots-prune-statistics
Open

feat(transaction): prune statistics files for expired snapshots#2667
dhruvarya-db wants to merge 1 commit into
apache:mainfrom
dhruvarya-db:feat-expire-snapshots-prune-statistics

Conversation

@dhruvarya-db

@dhruvarya-db dhruvarya-db commented Jun 17, 2026

Copy link
Copy Markdown
Contributor

Which issue does this PR close?

Part of #2145. Extends #2591 (the ExpireSnapshotsAction) and #2664 (the history.expire.* defaults).

What changes are included in this PR?

When ExpireSnapshotsAction expires snapshots, it now also drops the statistics and partition-statistics metadata entries tied to each expired snapshot, mirroring Java RemoveSnapshots (TableMetadata.Builder.removeSnapshots, which calls removeStatistics/removePartitionStatistics for each removed snapshot). For every expired snapshot that has a statistics entry, commit() emits a RemoveStatistics update; likewise RemovePartitionStatistics for a partition-statistics entry. Only entries that actually exist produce an update, matching Java's behavior of recording a change only when an entry is present.

This is metadata-only, physical deletion of the puffin files those entries point at is part of the file-cleanup maintenance operation (mirroring Java's FileCleanupStrategy), not this PR.

Are these changes tested?

Yes. New unit tests cover: an expired snapshot's statistics and partition-statistics entries are removed while a retained snapshot keeps its own; an expired snapshot with no statistics emits no removal; and only the statistics variant that is actually present is removed.

When ExpireSnapshotsAction expires snapshots, also drop the statistics and
partition-statistics metadata entries tied to each expired snapshot, mirroring
Java RemoveSnapshots.removeSnapshots. This is metadata-only: the puffin files
those entries point at are deleted by the higher-level file-cleanup operation.

@viirya viirya left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Verified the design holds up. The key subtlety: commit() builds the updates vec and hands it straight to ActionCommit::new(...), so these RemoveStatistics / RemovePartitionStatistics updates do not go through TableMetadataBuilder::remove_statistics (which is itself only-when-present, gated on if previous.is_some()). That means the explicit statistics_for_snapshot(...).is_some() / partition_statistics_for_snapshot(...).is_some() guards here are necessary, not redundant — without them, expiring a snapshot with no statistics would emit a spurious RemoveStatistics. This matches Java RemoveSnapshots, which records a removal only when an entry exists.

test_only_present_statistics_variant_is_removed pins exactly that invariant (snapshot with statistics but no partition-statistics → only RemoveStatistics emitted), and the three tests together cover present-both / present-neither / present-one. Metadata-only scope (puffin file deletion left to file-cleanup) is the right boundary and mirrors Java's FileCleanupStrategy split.

Pulled the branch and ran it: all 32 transaction::expire_snapshots tests pass, clippy + rustfmt clean.

LGTM. (Note this stacks on #2591, which should land first.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants