CEP-45: Lost witness marker race by bdeggleston · Pull Request #4892 · apache/cassandra

bdeggleston · 2026-06-17T20:17:49Z

Don't truncate journal segments until witnessed offsets they contain are flushed. Also moves MutationTrackingService startup to after the commit log is replayed

Thanks for sending a pull request! Here are some tips if you're new here:

Ensure you have added or run the appropriate tests for your PR.
Be sure to keep the PR description updated to reflect all changes.
Write your PR title to summarize what this PR proposes.
If possible, provide a concise example to reproduce the issue for a faster review.
Read our contributor guidelines
If you're making a documentation change, see our guide to documentation contribution

Commit messages should follow the following format:

<One sentence description, usually Jira title or CHANGES.txt summary>

<Optional lengthier description (context on patch)>

patch by <Authors>; reviewed by <Reviewers> for CASSANDRA-#####

Co-authored-by: Name1 <email1>
Co-authored-by: Name2 <email2>

The Cassandra Jira

Don't truncate journal segments until witnessed offsets they contain are flushed. Also moves MutationTrackingService startup to after the commit log is replayed

frankgh

I've added a couple of comments, but the patch looks good in general.

frankgh · 2026-06-18T17:42:48Z

        );
+        pendingClearReplaySize = Metrics.register(
+                factory.createMetricName("PendingClearReplaySize"),
+                () -> MutationJournal.instance().pendingClearReplaySize()


Do we want to worry about the case where MT is disabled and maybe handle the IllegalStateException thrown when the instance is null?

frankgh · 2026-06-18T17:43:28Z

+    // opaque / immutable list of segments that we should clear the needs-replay flag on
+    public static class PendingClearReplay
+    {
+        private ImmutableSet<Long> segments;


NIT: can we make this final?

Suggested change

private ImmutableSet<Long> segments;

private final mmutableSet<Long> segments;

frankgh · 2026-06-18T21:29:26Z


    private void shutdownBlocking() throws InterruptedException
    {
        ClusterMetadataService.instance().log().removeListener(tcmListener);


should we conditionally remove the listener here, only if the service was started?

Suggested change

boolean wasStarted;

synchronized (this)

{

wasStarted = started;

if (wasStarted)

ClusterMetadataService.instance().log().removeListener(tcmListener);

}

additionally, we need to synchronize for access to the started volatile variable.

no, it's just a noop if the tcmListener isn't registered.

frankgh · 2026-06-18T21:30:00Z

        executor.awaitTermination(1, TimeUnit.MINUTES);
+        // attempt to persist offsets and mark segments as
+        // not needing replay one last time before shutdown
+        if (started)


Suggested change

if (started)

if (wasStarted)

frankgh · 2026-06-18T21:30:49Z

        ClusterMetadataService.instance().log().removeListener(tcmListener);
        activeReconciler.shutdownBlocking();
        executor.shutdown();
        executor.awaitTermination(1, TimeUnit.MINUTES);


Should we log if we fail to shutdown here?

Suggested change

executor.awaitTermination(1, TimeUnit.MINUTES);

if (!executor.awaitTermination(1, TimeUnit.MINUTES))

{

logger.warn("Mutation tracking executor did not terminate within 1 minute; forcing shutdown");

}

frankgh · 2026-06-18T21:46:24Z

+     * To improve startup, we periodically save our view of mutation ids that we've witnessed to disk as part of this
+     * class. Any ids witnessed since the last time this class was run are reconstructed by replaying the journal.
+     *
+     * However, if an sstable is flushed is after the most recent LogStatePersister run, AND it marks a segment as no


NIT:

Suggested change

* However, if an sstable is flushed is after the most recent LogStatePersister run, AND it marks a segment as no

* However, if an sstable is flushed after the most recent LogStatePersister run, AND it marks a segment as no

frankgh · 2026-06-18T21:51:59Z

+            TableMetadata table = Schema.instance.getTableMetadata(keyspaceName, tableName);
+            DecoratedKey dk = Murmur3Partitioner.instance.decorateKey(ByteBufferUtil.bytes(key));
+            MutationSummary summary = MutationTrackingService.instance().createSummaryForKey(dk, table.id, false);
+            if (summary.size() == 0)


NIT:

Suggested change

if (summary.size() == 0)

if (summary.isEmpty())

frankgh

+1 looks good to me

bdeggleston · 2026-06-20T05:12:19Z

Just pushed up a small test fix.

Unlike normal writes, mutation tracking noops a write if we’ve already seen it, which is a reasonable optimization when we’re tracking each write. Unfortunately this can bite us on startup. If witnessed offsets are flushed to disk before the memtable containing those offsets are also flushed to sstables, then on startup mutation tracking will think it’s already seen all of those mutations and not write them to the memtable and losing a bunch of data in the process. The fix is pretty simple and just does what commit log replay does. Commit log replay applies it’s mutations with makeDurable set to false which means apply to the memtable but not the commit log. So I updated applyInternalTracked to also take a makeDurable flag. If this is false, we now skip applying the mutation, and we also apply the mutation to the memtable whether we’ve seen it before or not.

CEP-45: Lost witness marker race

6a4f54f

Don't truncate journal segments until witnessed offsets they contain are flushed. Also moves MutationTrackingService startup to after the commit log is replayed

bdeggleston requested a review from frankgh June 17, 2026 20:17

frankgh reviewed Jun 18, 2026

View reviewed changes

review feedback

6f9cc80

frankgh approved these changes Jun 18, 2026

View reviewed changes

fix startup when witnessed mutations are ahead of journal replay

e8b3075

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CEP-45: Lost witness marker race#4892

CEP-45: Lost witness marker race#4892
bdeggleston wants to merge 3 commits into
apache:cep-45-mutation-trackingfrom
bdeggleston:C21443-lost-witness-all

bdeggleston commented Jun 17, 2026

Uh oh!

frankgh left a comment

Uh oh!

frankgh Jun 18, 2026

Uh oh!

frankgh Jun 18, 2026

Uh oh!

frankgh Jun 18, 2026

Uh oh!

frankgh Jun 18, 2026

Uh oh!

bdeggleston Jun 18, 2026

Uh oh!

frankgh Jun 18, 2026

Uh oh!

frankgh Jun 18, 2026

Uh oh!

frankgh Jun 18, 2026

Uh oh!

frankgh Jun 18, 2026

Uh oh!

frankgh left a comment

Uh oh!

bdeggleston commented Jun 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	private ImmutableSet<Long> segments;
	private final mmutableSet<Long> segments;

+        boolean wasStarted;
+        synchronized (this)
+        {
+            wasStarted = started;
+            if (wasStarted)
+                ClusterMetadataService.instance().log().removeListener(tcmListener);
+        }

-        executor.awaitTermination(1, TimeUnit.MINUTES);
+        if (!executor.awaitTermination(1, TimeUnit.MINUTES))
+        {
+            logger.warn("Mutation tracking executor did not terminate within 1 minute; forcing shutdown");
+        }

	* However, if an sstable is flushed is after the most recent LogStatePersister run, AND it marks a segment as no
	* However, if an sstable is flushed after the most recent LogStatePersister run, AND it marks a segment as no

Conversation

bdeggleston commented Jun 17, 2026

Uh oh!

frankgh left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

frankgh left a comment

Choose a reason for hiding this comment

Uh oh!

bdeggleston commented Jun 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants