Skip to content

consensus: persist AppQC, blocks, and CommitQCs with async persistence#2896

Open
wen-coding wants to merge 53 commits intomainfrom
wen/persist_appqc_and_blocks
Open

consensus: persist AppQC, blocks, and CommitQCs with async persistence#2896
wen-coding wants to merge 53 commits intomainfrom
wen/persist_appqc_and_blocks

Conversation

@wen-coding
Copy link
Contributor

@wen-coding wen-coding commented Feb 16, 2026

Summary

Crash-safe persistence for availability state (AppQC, signed lane proposals, and CommitQCs). All I/O is fully asynchronous — no disk operations on the critical path or under locks.

  • Extract consensus/persist/ sub-package: Generic A/B file persistence (Persister[T]) with crash-safe alternating writes. BlockPersister manages per-block files in a blocks/ subdirectory. CommitQCPersister manages per-road-index CommitQC files. No-op implementations (NewNoOpPersister, NewNoOpBlockPersister, NewNoOpCommitQCPersister) for test/disabled paths.
  • Generic Persister[T proto.Message] interface: Strongly-typed persistence API; concrete abPersister[T] handles A/B file strategy. NewNoOpPersister[T] silently discards writes. A/B suffixes are unexported; WriteRawFile helper for corruption tests.
  • Block-file persistence (persist/blocks.go): Each signed lane proposal stored as <lane_hex>_<blocknum>.pb. On load, blocks are sorted and truncated at the first gap. PersistBatch encapsulates the I/O: persist blocks, update tips, clean up old files. When noop, all disk I/O is skipped but cursor/tips tracking still works.
  • CommitQC persistence (persist/commitqcs.go): Each CommitQC stored as commitqc_<roadindex>.pb. On load, QCs are sorted and truncated at the first gap. Needed for reconstructing FullCommitQCs on restart. When noop, disk I/O is skipped but next index tracking still works.
  • Single persist goroutine with ordered writes: One background goroutine (always spawned — real or no-op) watches inner state directly (the in-memory state is the queue). collectPersistBatch acquires the lock, waits for new data, reads persistence cursors (inner.nextBlockToPersist, inner.nextCommitQCToPersist) directly from inner state, clamps past pruned entries, and collects the batch. I/O runs with no lock held. No channel, no backpressure. Write order: blocks → CommitQCs → AppQC → delete old CommitQCs — guaranteeing a matching CommitQC is always on disk before its AppQC, and the AppQC is persisted before old CommitQCs are deleted (crash-safe ordering).
  • Persistence cursors in inner state: nextBlockToPersist (per-lane) and nextCommitQCToPersist track how far persistence has progressed. Always initialized — the no-op persist goroutine bumps them immediately. Reconstructed from disk on restart. collectPersistBatch reads these cursors under lock (no parameter passing); markPersisted updates them after successful I/O.
  • Cursor clamp for prune safety: Between persist iterations (lock not held), prune can delete map entries. Cursors are clamped to q.first before reading to prevent nil pointer dereference. Regression test (TestStateWithPersistence) reliably catches this.
  • No-op persister pattern: When persistence is disabled (stateDir is None), newNoOpPersisters() creates no-op implementations for all three persisters. The persist goroutine always runs (same code path as production) — it just skips disk I/O and immediately bumps cursors. This eliminates all nil checks for disabled persistence and ensures tests exercise the exact same control flow as production.
  • persisters struct — pure I/O, always initialized: Groups the three disk persisters (appQC, blocks, commitQCs). Always present on State (not wrapped in Option). All inner state access goes through State methods (collectPersistBatch, markPersisted), keeping a clean separation between orchestration and I/O.
  • Gate consensus on CommitQC persistence: PushCommitQC no longer publishes latestCommitQC directly — the persist goroutine publishes it after writing to disk (or immediately for no-op persisters) via markPersisted. Since consensus subscribes to LastCommitQC() to advance views, it won't proceed until the CommitQC is durable — preventing CommitQC loss on crash.
  • Gate voting on block persistence (avail/subscriptions.go): RecvBatch only yields blocks below the nextBlockToPersist watermark, so votes are only signed for durably written blocks.
  • Wire persistence into availability state (avail/state.go): NewState accepts stateDir, initialises the A/B persister (for AppQC), BlockPersister, and CommitQCPersister, and loads persisted data on restart. Returns error for corrupt state.
  • Restore state on restart (avail/inner.go): Uses queue.reset() to set starting positions, then pushBack to reload entries. Finally calls inner.prune() with the persisted AppQC to advance all queues — same code path as runtime. Returns error for corrupt state (non-consecutive CommitQCs, AppQC without matching CommitQC on disk).
  • Clean up orphaned block files: DeleteBefore removes files for pruned blocks and orphaned lanes (from previous committees). Driven by the persist goroutine observing laneFirsts.
  • prune() returns (bool, error): Simplified from returning a laneFirsts map — callers only need to know if pruning occurred. The persist goroutine reads q.first directly.
  • Test injection via newState constructors: consensus/state.go exposes newState() accepting a custom Persister for test mocks (e.g. failPersister), avoiding fragile field mutation after construction. avail/state.go's NewState accepts stateDir as Option[string].
  • queue.reset() method: Clearly sets the starting position of an empty queue, replacing misleading prune() calls during initialization.
  • Latency TODO: Documents a potential positive feedback loop where slow block persistence causes batches to grow, which makes PersistBatch slower, which delays nextBlockToPersist and RecvBatch, which delays voting, which grows batches further. Mitigation (e.g. persisting one block at a time) deferred.

Ref: sei-protocol/sei-v3#512

Test plan

  • persist/blocks_test.go: load/store, gap truncation, DeleteBefore, orphaned lane cleanup, header mismatch, corrupt files
  • persist/commitqcs_test.go: load/store, gap truncation, DeleteBefore, corrupt files, mismatched index
  • persist/persist_test.go: A/B file crash safety, seq management, corrupt fallback, generic typed API
  • avail/state_test.go: fresh start, load AppQC, load blocks, load both, load commitQCs, load commitQCs with AppQC, corrupt data, headers returns ErrPruned for blocks before loaded range
  • avail/state_test.go (TestStateWithPersistence): end-to-end persist + prune race regression test (5 iterations with disk persistence; reliably catches cursor race without the clamp fix)
  • avail/inner_test.go: newInner with loaded state, newInner with all three (AppQC + CommitQCs + blocks), newInner error cases (non-consecutive CommitQCs, AppQC without matching CommitQC), nextBlockToPersist reconstruction, nextCommitQCToPersist reconstruction, votes queue advancement
  • avail/queue_test.go: newQueue, pushBack, reset, prune, stale prune, prune past next
  • consensus/inner_test.go: consensus inner persistence round-trip, persist error propagation via newState injection
  • data/state_test.go: data state tests

@github-actions
Copy link

github-actions bot commented Feb 16, 2026

The latest Buf updates on your PR. Results from workflow Buf / buf (pull_request).

BuildFormatLintBreakingUpdated (UTC)
✅ passed✅ passed✅ passed✅ passedFeb 27, 2026, 11:17 PM

@codecov
Copy link

codecov bot commented Feb 16, 2026

Codecov Report

❌ Patch coverage is 82.97362% with 71 lines in your changes missing coverage. Please review.
✅ Project coverage is 77.53%. Comparing base (1c50d43) to head (0371ae1).
⚠️ Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
...mint/internal/autobahn/consensus/persist/blocks.go 78.46% 14 Missing and 14 partials ⚠️
sei-tendermint/internal/autobahn/avail/state.go 84.34% 9 Missing and 9 partials ⚠️
...t/internal/autobahn/consensus/persist/commitqcs.go 80.85% 9 Missing and 9 partials ⚠️
...ei-tendermint/internal/autobahn/consensus/state.go 58.33% 4 Missing and 1 partial ⚠️
sei-tendermint/internal/autobahn/avail/inner.go 94.73% 1 Missing and 1 partial ⚠️
Additional details and impacted files

Impacted file tree graph

@@             Coverage Diff             @@
##             main    #2896       +/-   ##
===========================================
+ Coverage   58.13%   77.53%   +19.39%     
===========================================
  Files        2113       20     -2093     
  Lines      174071     1749   -172322     
===========================================
- Hits       101204     1356    -99848     
+ Misses      63812      253    -63559     
+ Partials     9055      140     -8915     
Flag Coverage Δ
sei-chain-pr 79.31% <80.57%> (?)
sei-db 70.41% <ø> (+0.90%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
sei-tendermint/internal/autobahn/avail/queue.go 100.00% <100.00%> (ø)
...endermint/internal/autobahn/avail/subscriptions.go 86.66% <100.00%> (+0.30%) ⬆️
...ei-tendermint/internal/autobahn/consensus/inner.go 63.21% <100.00%> (ø)
...int/internal/autobahn/consensus/persist/persist.go 79.66% <100.00%> (ø)
sei-tendermint/internal/autobahn/avail/inner.go 92.30% <94.73%> (+0.87%) ⬆️
...ei-tendermint/internal/autobahn/consensus/state.go 83.03% <58.33%> (+0.10%) ⬆️
sei-tendermint/internal/autobahn/avail/state.go 77.13% <84.34%> (+3.38%) ⬆️
...t/internal/autobahn/consensus/persist/commitqcs.go 80.85% <80.85%> (ø)
...mint/internal/autobahn/consensus/persist/blocks.go 78.46% <78.46%> (ø)

... and 2097 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@wen-coding wen-coding force-pushed the wen/persist_appqc_and_blocks branch from ebf93df to f4a9c1e Compare February 17, 2026 04:50
@wen-coding wen-coding changed the title Port sei-v3 PR #512: persist AppQC and blocks to disk consensus: persist AppQC and blocks to disk Feb 18, 2026
@wen-coding wen-coding changed the title consensus: persist AppQC and blocks to disk consensus: persist AppQC and blocks, async block fsync Feb 18, 2026
@wen-coding wen-coding changed the title consensus: persist AppQC and blocks, async block fsync consensus: persist AppQC and blocks in avail Feb 18, 2026
wen-coding and others added 7 commits February 20, 2026 10:15
Extract generic A/B file persistence into a reusable consensus/persist/
sub-package and add block-file persistence for crash-safe availability
state recovery.

Changes:
- Move persist.go and persist_test.go into consensus/persist/ (git mv to
  preserve history), exporting Persister, NewPersister, WriteAndSync,
  SuffixA, SuffixB.
- Add persist/blocks.go: per-block file persistence using
  <lane_hex>_<blocknum>.pb files in a blocks/ subdirectory, with load,
  delete-before, and header-mismatch validation.
- Wire avail.NewState to accept stateDir, create A/B persister for
  AppQC and BlockPersister for signed lane proposals, and restore both
  on restart (contiguous block runs, queue alignment).
- Update avail/state.go to persist AppQC on prune and delete obsolete
  block files after each AppQC advance.
- Thread PersistentStateDir from consensus.Config through to
  avail.NewState.
- Expand consensus/inner.go doc comment with full persistence design
  (what, why, recovery, write behavior, rebroadcasting).
- Move TestRunOutputsPersistErrorPropagates to consensus/inner_test.go
  for proper package alignment.
- Add comprehensive tests for blocks persistence (empty dir, multi-lane,
  corrupt/mismatched skip, DeleteBefore, filename roundtrip).

Ref: sei-protocol/sei-v3#512
Co-authored-by: Cursor <cursoragent@cursor.com>
Move persisted data loading (AppQC deserialization and block loading)
into a dedicated function for readability.

Co-authored-by: Cursor <cursoragent@cursor.com>
Move block sorting, contiguous-prefix extraction, and gap truncation
from avail/inner.go into persist/blocks.go so all disk-recovery logic
lives in one place. This isolates storage concerns in the persistence
layer, simplifying newInner and preparing for a future storage backend
swap.

Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
PushBlock and ProduceBlock now add blocks to the in-memory queue
immediately and send a persist job to a background goroutine via a
buffered channel. The background writer fsyncs each block to disk
and advances a per-lane blockPersisted cursor under the inner lock.

RecvBatch gates on this cursor so votes are only signed for blocks
that have been durably written to disk. When persistence is disabled
(testing), the cursor is nil and RecvBatch falls back to bq.next.

Co-authored-by: Cursor <cursoragent@cursor.com>
newInner no longer takes a separate persistEnabled bool; loaded != nil
already implies persistence is enabled. Tests with loaded data now
correctly reflect this.

Co-authored-by: Cursor <cursoragent@cursor.com>
blockPersisted is reconstructed from disk on restart, not persisted
itself. Move its creation to just above the block restoration loop
(past the loaded==nil early return) so the code reads top-down.

Co-authored-by: Cursor <cursoragent@cursor.com>
@wen-coding wen-coding force-pushed the wen/persist_appqc_and_blocks branch from 05beddb to 2f0bbad Compare February 20, 2026 18:46
wen-coding and others added 6 commits February 20, 2026 10:48
Co-authored-by: Cursor <cursoragent@cursor.com>
Move persistCh, persistJob, and the writer loop from avail/State into
BlockPersister.Queue + BlockPersister.Run, so callers just call Queue()
and the persist layer owns the channel, buffer sizing, and drain loop.

Queue blocks with context to avoid holes in the sequential
blockPersisted cursor (which would permanently stall voting).
Call sites use utils.IgnoreAfterCancel to swallow shutdown errors.

Co-authored-by: Cursor <cursoragent@cursor.com>
…l/sei-chain into wen/persist_appqc_and_blocks
@wen-coding wen-coding changed the title consensus: persist AppQC and blocks in avail consensus: persist AppQC and blocks with async block persistence Feb 20, 2026
wen-coding and others added 5 commits February 22, 2026 09:27
Remove redundant loop that explicitly zeroed every lane. Map zero-values
handle lanes without loaded blocks; only lanes with blocks on disk need
an explicit write. Add comment explaining why starting at 0 is safe.

Co-authored-by: Cursor <cursoragent@cursor.com>
- Add comment explaining why votes are not persisted and why the votes
  queue must be advanced past loaded blocks on restart.
- Consolidate redundant tests: fold blockPersisted assertions into
  existing tests, remove TestNewInnerLoadedBlocksContiguousPrefix.
- Add test that headers() returns ErrPruned for blocks before the
  loaded range (verifies votes queue advancement prevents hangs).

Co-authored-by: Cursor <cursoragent@cursor.com>
Replace require.Contains(err.Error(), "...") with require.Error(err).
Callers don't branch on specific error messages, so string matching
adds no value; the test name already documents what is being rejected.

Co-authored-by: Cursor <cursoragent@cursor.com>
BlockPersister now owns the per-lane contiguous persistence cursor and
passes the exclusive upper bound to the callback. The caller no longer
needs to compute n+1 or guard against out-of-order completion.

This localizes the ordering assumption (FIFO queue) inside
BlockPersister, so switching to parallel storage only requires changing
BlockPersister.Run.

Co-authored-by: Cursor <cursoragent@cursor.com>
Also add TODO for retry on persistence failure.

Co-authored-by: Cursor <cursoragent@cursor.com>
Copy link
Contributor

@pompon0 pompon0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The updated version looks very nice! Good work! There are still a few missing pieces. I've left comments.

wen-coding and others added 4 commits February 25, 2026 13:39
Replace three separate Option fields (persister, blockPersist,
commitQCPersist) with one Option[persisters] struct — persistence is
all-or-nothing, no partial subsets.

Return error from newInner if loaded commitQCs have gaps, instead of
silently skipping. The persister already guarantees consecutive output,
so this is defense-in-depth.

Co-authored-by: Cursor <cursoragent@cursor.com>
Move all persistence goroutine logic from *State methods to *persisters
methods (run, collectBatch). Store Watch/AtomicSend pointers in persisters
to avoid value-copy disconnection. Inline persistLoop into the spawn
closure. Remove cp != nil guards since commitQCs persister is always
present when persisters exists.

Co-authored-by: Cursor <cursoragent@cursor.com>
Remove inner/appQCSend pointers from persisters — it should not access
State's locked inner. Move collectPersistBatch and markBlocksPersisted
to *State methods. Spawn persist goroutines directly in State.Run()
where both inner and disk persisters are naturally accessible.

Made-with: Cursor
The laneFirsts map was only checked for nil (pruned or not). The persist
goroutine reads q.first directly in collectPersistBatch, so the map is
unnecessary.

Made-with: Cursor
wen-coding and others added 4 commits February 25, 2026 13:55
Remove the separate appQCPersist goroutine and AtomicSend plumbing.
AppQC is now persisted in the main persist loop after CommitQCs,
guaranteeing the matching CommitQC is on disk first. This prevents
an inconsistent state where a persisted AppQC references a CommitQC
that hasn't been written yet.

Made-with: Cursor
When persistence is enabled, PushCommitQC no longer publishes to
latestCommitQC immediately. Instead, the persist goroutine publishes
after writing the CommitQC to disk via markPersisted. Since consensus
subscribes to LastCommitQC() to advance views, it won't proceed until
the CommitQC is durable. This prevents losing CommitQCs on crash.

Analogous to nextBlockToPersist gating RecvBatch for blocks.

Made-with: Cursor
// When persistence is disabled, publish immediately.
// When enabled, the persist goroutine publishes after writing to disk,
// so consensus won't advance until the CommitQC is durable.
if inner.nextBlockToPersist == nil {
Copy link
Contributor

@pompon0 pompon0 Feb 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't compare to nil, either make nextBlockToPersist into an Option, or check persisters field presence instead.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

alternatively, you can move branching to the persisters logic - i.e. have a dummy persister task which will just bump nextBlockToPersist and latestCommitQC without persisting anything - it would make the control flow in tests and in prod more similar imo.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added dummy persisters.

func (s *State) Run(ctx context.Context) error {
return scope.Run(ctx, func(ctx context.Context, scope scope.Scope) error {
if pers, ok := s.persisters.Get(); ok {
scope.SpawnNamed("persist", func() error {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: this is a rather large task, I feel like it deserves having its own method.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

}
commitQCCur = qc.Index() + 1
}
if len(batch.commitQCs) > 0 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we have now similar problem here - you delete commitQCs before bumping AppQC, therefore we can end up in a state where persisted AppQC does not have a corresponding AppQC.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, changed it so that we persist AppQC before delete, hopefully in WAL we don't need delete any more.

commitQCCur := pers.commitQCs.LoadNext()
var lastPersistedAppQC utils.Option[*types.AppQC]
for {
batch, err := s.collectPersistBatch(ctx, blockCur, commitQCCur)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couple of things I feel that will be important in the final version, consider adding a todo:

  • I suspect that having block persistence block commitQC persistence will degrade consensus latency significantly (imo they should be fsynced independently)
  • let's consider a situation in which our nodes' disk is (just slightly) slower than other nodes' disks. In such a situation our batches will grow (network will move slightly faster). With larger batches, PersistBatch will slow down even more and making us fall behind even more, making the batches larger, etc (i.e. our node will choke on persistence). What I mean is that the larger the batch, the larger the avg latency from receiving a proposal to voting for availability. It would be nice to make this latency independent from the amount of blocks to be persisted at any given time. I'm afraid that if for some reason majority of nodes will end up with large batches just once (let's say because of a network hiccup), they will never recover from having large batches (batched blocks -> batched votes -> batched tipcut -> batch of new capacity -> and so on). I'm not sure if it is clear what I'm trying to say here, we can discuss offline.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I understand your concern. Several observations:

  1. We do eventually need to keep up with the incoming blocks, if any of the pipeline can't keep up, then we are in trouble and can't support this flow, this may be persistence right now, but the same argument can be applied to execution and storage and other components. I think we do need real benchmarking to see where we are
  2. Currently we don't set a limit on number of blocks we persist, if turns out persistence is the bottleneck after benchmarking, then we can be smarter in handling batches. Maybe instead of trying to save everything, we can sometimes wait a bit so we don't have to persist blocks which are quickly pruned

Adding a TODO for the concern now, this is definitely valid concern but probably too big to address in this PR.

// collectPersistBatch waits for new blocks or commitQCs and collects them under lock.
func (s *State) collectPersistBatch(
ctx context.Context,
blockCur map[types.LaneID]types.BlockNumber,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I feel like passing blockCur and commitQCCur is redundant here - you have lastQC and nextBlockToPersist in the inner state already.

Copy link
Contributor Author

@wen-coding wen-coding Feb 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I think we don't have to pass them in here, changed.

wen-coding and others added 4 commits February 27, 2026 11:14
Moved DeleteBefore after AppQC persist so a crash between the two
never leaves the on-disk AppQC pointing at a deleted CommitQC.
Extracted the persist loop into its own runPersist method.

Made-with: Cursor
Instead of conditionally spawning the persist goroutine and checking
nextBlockToPersist == nil in PushCommitQC and RecvBatch, always run the
persist goroutine with either real or no-op persisters. The no-op
persisters skip disk I/O but still track cursors, making the test and
production code paths identical.

Made-with: Cursor
Instead of batching all block writes and updating nextBlockToPersist
once at the end, persist each block individually and advance the cursor
after each fsync. This makes vote latency equal to single-block write
time regardless of backlog, preventing the positive feedback loop where
slow persistence causes ever-growing batches.

Also moves persistence cursors (nextBlockToPersist, nextCommitQCToPersist)
into inner state, read directly by collectPersistBatch under lock.

Made-with: Cursor
Comment on lines +675 to +679
for lane, q := range inner.blocks {
if inner.nextBlockToPersist[lane] < q.next {
return true
}
}

Check warning

Code scanning / CodeQL

Iteration over map Warning

Iteration over map may be a possible source of non-determinism
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants