Skip to content

Only one retention cycle in progress at a time#2797

Merged
pjfanning merged 3 commits intoapache:mainfrom
He-Pin:port-single-retention-cycle
Apr 4, 2026
Merged

Only one retention cycle in progress at a time#2797
pjfanning merged 3 commits intoapache:mainfrom
He-Pin:port-single-retention-cycle

Conversation

@He-Pin
Copy link
Copy Markdown
Member

@He-Pin He-Pin commented Mar 28, 2026

Motivation

Port the upstream "only one retention cycle in progress at a time" guard from akka/akka-core commit 57b750a3dc (which is now Apache licensed) to Pekko's event-sourced persistence.

When SnapshotCountRetentionCriteria is configured, a full retention cycle (snapshot → delete events → delete snapshots) can take significant time. If events are persisted faster than retention completes, overlapping cycles can race against each other, causing non-deterministic ordering of snapshot/event deletions and test flakiness.

Modification

Core change — BehaviorSetup:

  • Add retentionInProgress flag with lifecycle tracking methods (retentionProgressSaveSnapshotStarted, retentionProgressSaveSnapshotEnded, retentionProgressDeleteEventsStarted, retentionProgressDeleteEventsEnded, retentionProgressDeleteSnapshotsStarted, retentionProgressDeleteSnapshotsEnded)
  • Each method logs the retention lifecycle at DEBUG level for diagnostics

Running.scala:

  • When SnapshotWithRetention triggers, check setup.isRetentionInProgress()
  • If a retention cycle is already running, skip the entire snapshot+retention together (intentional: saving a snapshot without its associated retention would accumulate orphaned snapshots that are never cleaned up)
  • The next retention cycle at a higher seqNr will cover the range of the skipped one
  • Log at DEBUG level (not INFO) to avoid noise under load

ExternalInteractions.scala:

  • Simplify internalDeleteSnapshots — always delete from minSequenceNr = 0L instead of tracking a windowed lower bound. This simplifies the logic and is safe because Pekko's built-in snapshot stores handle range deletions efficiently.
  • Remove the fromSequenceNr parameter

RetentionCriteriaImpl.scala:

  • Remove deleteLowerSequenceNr method (no longer needed)

Tests:

  • EventSourcedBehaviorRetentionSpec — update all expectDeleteSnapshotCompleted calls to single-param form
  • RetentionCriteriaSpec — remove deleteLowerSequenceNr assertions, reformat long line to multi-line for readability

Result

  • No more overlapping retention cycles, preventing non-deterministic snapshot/event deletion ordering
  • All 11 retention spec tests pass ✅
  • All 4 RetentionCriteriaSpec tests pass ✅

References

@He-Pin He-Pin marked this pull request as ready for review March 28, 2026 06:58
@He-Pin He-Pin added this to the 2.0.0-M2 milestone Mar 28, 2026
@He-Pin He-Pin force-pushed the port-single-retention-cycle branch from a6ef064 to 9519350 Compare March 28, 2026 07:43
@He-Pin He-Pin requested review from Roiocam, Copilot and pjfanning March 28, 2026 08:03
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates the typed persistence retention flow to avoid overlapping retention cycles (snapshot/event deletion), which could otherwise interfere with each other.

Changes:

  • Track retention-cycle progress in BehaviorSetup and add a guard in Running to avoid starting a new retention cycle while a previous one is still in progress.
  • Simplify snapshot deletion to delete snapshots up to a max sequence number (dropping the previous lower-bound window logic).
  • Update retention-related tests to match the adjusted deletion criteria and retention behavior.

Reviewed changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
persistence-typed/src/main/scala/org/apache/pekko/persistence/typed/internal/Running.scala Adds the “only one retention at a time” guard and wires retention progress tracking into snapshot/event/snapshot deletion flow.
persistence-typed/src/main/scala/org/apache/pekko/persistence/typed/internal/BehaviorSetup.scala Introduces retentionInProgress state and helper methods to track retention lifecycle across async steps.
persistence-typed/src/main/scala/org/apache/pekko/persistence/typed/internal/ExternalInteractions.scala Changes snapshot deletion to always use minSequenceNr = 0L and updates logging/signature accordingly.
persistence-typed/src/main/scala/org/apache/pekko/persistence/typed/internal/RetentionCriteriaImpl.scala Removes now-unused deleteLowerSequenceNr logic from internal retention criteria implementation.
persistence-typed/src/main/scala/org/apache/pekko/persistence/typed/internal/EventSourcedBehaviorImpl.scala Updates BehaviorSetup construction to pass the new retentionInProgress parameter.
persistence-typed/src/test/scala/org/apache/pekko/persistence/typed/internal/RetentionCriteriaSpec.scala Adjusts expectations to match removal of lower-bound deletion logic.
persistence-typed-tests/src/test/scala/org/apache/pekko/persistence/typed/scaladsl/EventSourcedBehaviorWatchSpec.scala Updates test BehaviorSetup construction for the new parameter.
persistence-typed-tests/src/test/scala/org/apache/pekko/persistence/typed/scaladsl/EventSourcedBehaviorRetentionSpec.scala Updates assertions to match snapshot deletion criteria changes (min sequence no longer asserted).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@He-Pin He-Pin force-pushed the port-single-retention-cycle branch 4 times, most recently from 8cac580 to bd445ef Compare March 28, 2026 12:47
Copy link
Copy Markdown
Member

@pjfanning pjfanning left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm - part of #2730

@pjfanning
Copy link
Copy Markdown
Member

compilation issues

[03-30 13:00:21.923] [error] -- [E007] Type Mismatch Error: /home/runner/work/pekko/pekko/persistence-typed-tests/src/test/scala/org/apache/pekko/persistence/typed/scaladsl/EventSourcedBehaviorRetentionSpec.scala:292:56 
[03-30 13:00:21.923] [error] 292 |      snapshotSignalProbe.expectDeleteSnapshotCompleted(6, 0)
[03-30 13:00:21.923] [error]     |                                                        ^^^^
[03-30 13:00:21.923] [error]     |                                                  Found:    (Int, Int)
[03-30 13:00:21.924] [error]     |                                                  Required: Long
[03-30 13:00:21.924] [error]     |
[03-30 13:00:21.924] [error]     | longer explanation available when compiling with `-explain`
[03-30 13:00:22.234] [error] -- [E007] Type Mismatch Error: /home/runner/work/pekko/pekko/persistence-typed-tests/src/test/scala/org/apache/pekko/persistence/typed/scaladsl/EventSourcedBehaviorRetentionSpec.scala:298:56 
[03-30 13:00:22.234] [error] 298 |      snapshotSignalProbe.expectDeleteSnapshotCompleted(9, 3)
[03-30 13:00:22.234] [error]     |                                                        ^^^^
[03-30 13:00:22.235] [error]     |                                                  Found:    (Int, Int)
[03-30 13:00:22.235] [error]     |                                                  Required: Long
[03-30 13:00:22.235] [error]     |
[03-30 13:00:22.235] [error]     | longer explanation available when compiling with `-explain`
[03-30 13:00:23.289] [error] two errors found

Copy link
Copy Markdown
Member

@pjfanning pjfanning left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

due to compile issue

He-Pin and others added 2 commits April 2, 2026 00:40
Track retention lifecycle steps with mutable retentionInProgress state
in BehaviorSetup. Key changes:

- Add retentionInProgress flag and 6 progress tracking methods with
  detailed debug logging to BehaviorSetup.
- Skip new retention cycle when previous one has not completed yet,
  logging at INFO level. Next retention will cover skipped retention.
- Simplify internalDeleteSnapshots to always use minSequenceNr=0,
  preventing leftover snapshots when retention is skipped.
- Remove now-unnecessary deleteLowerSequenceNr from
  SnapshotCountRetentionCriteriaImpl.
- Fix upstream logging placeholder mismatch bug in
  retentionProgressDeleteEventsEnded (2 placeholders, 1 argument).

The retention process for SnapshotCountRetentionCriteria:
1. Save snapshot when shouldSnapshotAfterPersist returns
   SnapshotWithRetention.
2. Delete events (when deleteEventsOnSnapshot=true), in background.
3. Delete snapshots (when isOnlyOneSnapshot=false), in background.

Upstream: akka/akka-core@57b750a3dc
Cherry-picked from akka/akka-core v2.8.0, which is now Apache licensed.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…ationale comments

- Reformat RetentionCriteriaSpec expected list to multi-line for readability
- Change 'Skipping retention' log level from INFO to DEBUG to avoid log noise
- Add design rationale comment explaining why snapshot+retention are skipped
  together (prevents orphaned snapshots that would never be cleaned up)
- Add Scaladoc explaining why minSequenceNr=0L is used (simplifies logic,
  safe for built-in snapshot stores)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@He-Pin He-Pin force-pushed the port-single-retention-cycle branch from fe8ea15 to 259edc8 Compare April 1, 2026 16:41
@He-Pin He-Pin requested a review from pjfanning April 1, 2026 16:45
@pjfanning
Copy link
Copy Markdown
Member

compile issue in tests listed above is still there

and also this

[04-01 16:57:50.456] [error] /home/runner/work/pekko/pekko/persistence-typed-tests/src/test/scala/org/apache/pekko/persistence/typed/scaladsl/EventSourcedBehaviorRetentionSpec.scala:138:15: private method nextPid in class EventSourcedBehaviorRetentionSpec is never used
[04-01 16:57:50.456] [error] Applicable -Wconf / @nowarn filters for this fatal warning: msg=<part of the message>, cat=unused-privates, site=org.apache.pekko.persistence.typed.scaladsl.EventSourcedBehaviorRetentionSpec.nextPid
[04-01 16:57:50.457] [error]   private def nextPid(): PersistenceId = PersistenceId.ofUniqueId(s"c${pidCounter.incrementAndGet()}")
[04-01 16:57:50.458] [error]               ^
[04-01 16:57:51.402] [error] three errors found

…on spec

Lines 292 and 298 still used the old two-argument form after the API
change to single-argument expectDeleteSnapshotCompleted(Long).
@He-Pin
Copy link
Copy Markdown
Member Author

He-Pin commented Apr 4, 2026

@pjfanning Please take a look

Copy link
Copy Markdown
Member

@pjfanning pjfanning left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@pjfanning pjfanning merged commit 38e7217 into apache:main Apr 4, 2026
9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants