consensus: persist AppQC, blocks, and CommitQCs with async persistence#2896
consensus: persist AppQC, blocks, and CommitQCs with async persistence#2896wen-coding wants to merge 53 commits intomainfrom
Conversation
|
The latest Buf updates on your PR. Results from workflow Buf / buf (pull_request).
|
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #2896 +/- ##
===========================================
+ Coverage 58.13% 77.53% +19.39%
===========================================
Files 2113 20 -2093
Lines 174071 1749 -172322
===========================================
- Hits 101204 1356 -99848
+ Misses 63812 253 -63559
+ Partials 9055 140 -8915
Flags with carried forward coverage won't be shown. Click here to find out more.
🚀 New features to boost your workflow:
|
ebf93df to
f4a9c1e
Compare
Extract generic A/B file persistence into a reusable consensus/persist/ sub-package and add block-file persistence for crash-safe availability state recovery. Changes: - Move persist.go and persist_test.go into consensus/persist/ (git mv to preserve history), exporting Persister, NewPersister, WriteAndSync, SuffixA, SuffixB. - Add persist/blocks.go: per-block file persistence using <lane_hex>_<blocknum>.pb files in a blocks/ subdirectory, with load, delete-before, and header-mismatch validation. - Wire avail.NewState to accept stateDir, create A/B persister for AppQC and BlockPersister for signed lane proposals, and restore both on restart (contiguous block runs, queue alignment). - Update avail/state.go to persist AppQC on prune and delete obsolete block files after each AppQC advance. - Thread PersistentStateDir from consensus.Config through to avail.NewState. - Expand consensus/inner.go doc comment with full persistence design (what, why, recovery, write behavior, rebroadcasting). - Move TestRunOutputsPersistErrorPropagates to consensus/inner_test.go for proper package alignment. - Add comprehensive tests for blocks persistence (empty dir, multi-lane, corrupt/mismatched skip, DeleteBefore, filename roundtrip). Ref: sei-protocol/sei-v3#512 Co-authored-by: Cursor <cursoragent@cursor.com>
Move persisted data loading (AppQC deserialization and block loading) into a dedicated function for readability. Co-authored-by: Cursor <cursoragent@cursor.com>
Move block sorting, contiguous-prefix extraction, and gap truncation from avail/inner.go into persist/blocks.go so all disk-recovery logic lives in one place. This isolates storage concerns in the persistence layer, simplifying newInner and preparing for a future storage backend swap. Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
PushBlock and ProduceBlock now add blocks to the in-memory queue immediately and send a persist job to a background goroutine via a buffered channel. The background writer fsyncs each block to disk and advances a per-lane blockPersisted cursor under the inner lock. RecvBatch gates on this cursor so votes are only signed for blocks that have been durably written to disk. When persistence is disabled (testing), the cursor is nil and RecvBatch falls back to bq.next. Co-authored-by: Cursor <cursoragent@cursor.com>
newInner no longer takes a separate persistEnabled bool; loaded != nil already implies persistence is enabled. Tests with loaded data now correctly reflect this. Co-authored-by: Cursor <cursoragent@cursor.com>
blockPersisted is reconstructed from disk on restart, not persisted itself. Move its creation to just above the block restoration loop (past the loaded==nil early return) so the code reads top-down. Co-authored-by: Cursor <cursoragent@cursor.com>
05beddb to
2f0bbad
Compare
Co-authored-by: Cursor <cursoragent@cursor.com>
Move persistCh, persistJob, and the writer loop from avail/State into BlockPersister.Queue + BlockPersister.Run, so callers just call Queue() and the persist layer owns the channel, buffer sizing, and drain loop. Queue blocks with context to avoid holes in the sequential blockPersisted cursor (which would permanently stall voting). Call sites use utils.IgnoreAfterCancel to swallow shutdown errors. Co-authored-by: Cursor <cursoragent@cursor.com>
…l/sei-chain into wen/persist_appqc_and_blocks
Remove redundant loop that explicitly zeroed every lane. Map zero-values handle lanes without loaded blocks; only lanes with blocks on disk need an explicit write. Add comment explaining why starting at 0 is safe. Co-authored-by: Cursor <cursoragent@cursor.com>
- Add comment explaining why votes are not persisted and why the votes queue must be advanced past loaded blocks on restart. - Consolidate redundant tests: fold blockPersisted assertions into existing tests, remove TestNewInnerLoadedBlocksContiguousPrefix. - Add test that headers() returns ErrPruned for blocks before the loaded range (verifies votes queue advancement prevents hangs). Co-authored-by: Cursor <cursoragent@cursor.com>
Replace require.Contains(err.Error(), "...") with require.Error(err). Callers don't branch on specific error messages, so string matching adds no value; the test name already documents what is being rejected. Co-authored-by: Cursor <cursoragent@cursor.com>
BlockPersister now owns the per-lane contiguous persistence cursor and passes the exclusive upper bound to the callback. The caller no longer needs to compute n+1 or guard against out-of-order completion. This localizes the ordering assumption (FIFO queue) inside BlockPersister, so switching to parallel storage only requires changing BlockPersister.Run. Co-authored-by: Cursor <cursoragent@cursor.com>
Also add TODO for retry on persistence failure. Co-authored-by: Cursor <cursoragent@cursor.com>
pompon0
left a comment
There was a problem hiding this comment.
The updated version looks very nice! Good work! There are still a few missing pieces. I've left comments.
Replace three separate Option fields (persister, blockPersist, commitQCPersist) with one Option[persisters] struct — persistence is all-or-nothing, no partial subsets. Return error from newInner if loaded commitQCs have gaps, instead of silently skipping. The persister already guarantees consecutive output, so this is defense-in-depth. Co-authored-by: Cursor <cursoragent@cursor.com>
Move all persistence goroutine logic from *State methods to *persisters methods (run, collectBatch). Store Watch/AtomicSend pointers in persisters to avoid value-copy disconnection. Inline persistLoop into the spawn closure. Remove cp != nil guards since commitQCs persister is always present when persisters exists. Co-authored-by: Cursor <cursoragent@cursor.com>
Remove inner/appQCSend pointers from persisters — it should not access State's locked inner. Move collectPersistBatch and markBlocksPersisted to *State methods. Spawn persist goroutines directly in State.Run() where both inner and disk persisters are naturally accessible. Made-with: Cursor
The laneFirsts map was only checked for nil (pruned or not). The persist goroutine reads q.first directly in collectPersistBatch, so the map is unnecessary. Made-with: Cursor
Remove the separate appQCPersist goroutine and AtomicSend plumbing. AppQC is now persisted in the main persist loop after CommitQCs, guaranteeing the matching CommitQC is on disk first. This prevents an inconsistent state where a persisted AppQC references a CommitQC that hasn't been written yet. Made-with: Cursor
When persistence is enabled, PushCommitQC no longer publishes to latestCommitQC immediately. Instead, the persist goroutine publishes after writing the CommitQC to disk via markPersisted. Since consensus subscribes to LastCommitQC() to advance views, it won't proceed until the CommitQC is durable. This prevents losing CommitQCs on crash. Analogous to nextBlockToPersist gating RecvBatch for blocks. Made-with: Cursor
Made-with: Cursor
| // When persistence is disabled, publish immediately. | ||
| // When enabled, the persist goroutine publishes after writing to disk, | ||
| // so consensus won't advance until the CommitQC is durable. | ||
| if inner.nextBlockToPersist == nil { |
There was a problem hiding this comment.
don't compare to nil, either make nextBlockToPersist into an Option, or check persisters field presence instead.
There was a problem hiding this comment.
alternatively, you can move branching to the persisters logic - i.e. have a dummy persister task which will just bump nextBlockToPersist and latestCommitQC without persisting anything - it would make the control flow in tests and in prod more similar imo.
There was a problem hiding this comment.
Added dummy persisters.
| func (s *State) Run(ctx context.Context) error { | ||
| return scope.Run(ctx, func(ctx context.Context, scope scope.Scope) error { | ||
| if pers, ok := s.persisters.Get(); ok { | ||
| scope.SpawnNamed("persist", func() error { |
There was a problem hiding this comment.
nit: this is a rather large task, I feel like it deserves having its own method.
| } | ||
| commitQCCur = qc.Index() + 1 | ||
| } | ||
| if len(batch.commitQCs) > 0 { |
There was a problem hiding this comment.
we have now similar problem here - you delete commitQCs before bumping AppQC, therefore we can end up in a state where persisted AppQC does not have a corresponding AppQC.
There was a problem hiding this comment.
Good catch, changed it so that we persist AppQC before delete, hopefully in WAL we don't need delete any more.
| commitQCCur := pers.commitQCs.LoadNext() | ||
| var lastPersistedAppQC utils.Option[*types.AppQC] | ||
| for { | ||
| batch, err := s.collectPersistBatch(ctx, blockCur, commitQCCur) |
There was a problem hiding this comment.
Couple of things I feel that will be important in the final version, consider adding a todo:
- I suspect that having block persistence block commitQC persistence will degrade consensus latency significantly (imo they should be fsynced independently)
- let's consider a situation in which our nodes' disk is (just slightly) slower than other nodes' disks. In such a situation our batches will grow (network will move slightly faster). With larger batches, PersistBatch will slow down even more and making us fall behind even more, making the batches larger, etc (i.e. our node will choke on persistence). What I mean is that the larger the batch, the larger the avg latency from receiving a proposal to voting for availability. It would be nice to make this latency independent from the amount of blocks to be persisted at any given time. I'm afraid that if for some reason majority of nodes will end up with large batches just once (let's say because of a network hiccup), they will never recover from having large batches (batched blocks -> batched votes -> batched tipcut -> batch of new capacity -> and so on). I'm not sure if it is clear what I'm trying to say here, we can discuss offline.
There was a problem hiding this comment.
Yeah I understand your concern. Several observations:
- We do eventually need to keep up with the incoming blocks, if any of the pipeline can't keep up, then we are in trouble and can't support this flow, this may be persistence right now, but the same argument can be applied to execution and storage and other components. I think we do need real benchmarking to see where we are
- Currently we don't set a limit on number of blocks we persist, if turns out persistence is the bottleneck after benchmarking, then we can be smarter in handling batches. Maybe instead of trying to save everything, we can sometimes wait a bit so we don't have to persist blocks which are quickly pruned
Adding a TODO for the concern now, this is definitely valid concern but probably too big to address in this PR.
| // collectPersistBatch waits for new blocks or commitQCs and collects them under lock. | ||
| func (s *State) collectPersistBatch( | ||
| ctx context.Context, | ||
| blockCur map[types.LaneID]types.BlockNumber, |
There was a problem hiding this comment.
nit: I feel like passing blockCur and commitQCCur is redundant here - you have lastQC and nextBlockToPersist in the inner state already.
There was a problem hiding this comment.
Yeah I think we don't have to pass them in here, changed.
Moved DeleteBefore after AppQC persist so a crash between the two never leaves the on-disk AppQC pointing at a deleted CommitQC. Extracted the persist loop into its own runPersist method. Made-with: Cursor
Instead of conditionally spawning the persist goroutine and checking nextBlockToPersist == nil in PushCommitQC and RecvBatch, always run the persist goroutine with either real or no-op persisters. The no-op persisters skip disk I/O but still track cursors, making the test and production code paths identical. Made-with: Cursor
Instead of batching all block writes and updating nextBlockToPersist once at the end, persist each block individually and advance the cursor after each fsync. This makes vote latency equal to single-block write time regardless of backlog, preventing the positive feedback loop where slow persistence causes ever-growing batches. Also moves persistence cursors (nextBlockToPersist, nextCommitQCToPersist) into inner state, read directly by collectPersistBatch under lock. Made-with: Cursor
Made-with: Cursor
Summary
Crash-safe persistence for availability state (AppQC, signed lane proposals, and CommitQCs). All I/O is fully asynchronous — no disk operations on the critical path or under locks.
consensus/persist/sub-package: Generic A/B file persistence (Persister[T]) with crash-safe alternating writes.BlockPersistermanages per-block files in ablocks/subdirectory.CommitQCPersistermanages per-road-index CommitQC files. No-op implementations (NewNoOpPersister,NewNoOpBlockPersister,NewNoOpCommitQCPersister) for test/disabled paths.Persister[T proto.Message]interface: Strongly-typed persistence API; concreteabPersister[T]handles A/B file strategy.NewNoOpPersister[T]silently discards writes. A/B suffixes are unexported;WriteRawFilehelper for corruption tests.persist/blocks.go): Each signed lane proposal stored as<lane_hex>_<blocknum>.pb. On load, blocks are sorted and truncated at the first gap.PersistBatchencapsulates the I/O: persist blocks, update tips, clean up old files. Whennoop, all disk I/O is skipped but cursor/tips tracking still works.persist/commitqcs.go): Each CommitQC stored ascommitqc_<roadindex>.pb. On load, QCs are sorted and truncated at the first gap. Needed for reconstructing FullCommitQCs on restart. Whennoop, disk I/O is skipped butnextindex tracking still works.innerstate directly (the in-memory state is the queue).collectPersistBatchacquires the lock, waits for new data, reads persistence cursors (inner.nextBlockToPersist,inner.nextCommitQCToPersist) directly from inner state, clamps past pruned entries, and collects the batch. I/O runs with no lock held. No channel, no backpressure. Write order: blocks → CommitQCs → AppQC → delete old CommitQCs — guaranteeing a matching CommitQC is always on disk before its AppQC, and the AppQC is persisted before old CommitQCs are deleted (crash-safe ordering).innerstate:nextBlockToPersist(per-lane) andnextCommitQCToPersisttrack how far persistence has progressed. Always initialized — the no-op persist goroutine bumps them immediately. Reconstructed from disk on restart.collectPersistBatchreads these cursors under lock (no parameter passing);markPersistedupdates them after successful I/O.prunecan delete map entries. Cursors are clamped toq.firstbefore reading to prevent nil pointer dereference. Regression test (TestStateWithPersistence) reliably catches this.stateDiris None),newNoOpPersisters()creates no-op implementations for all three persisters. The persist goroutine always runs (same code path as production) — it just skips disk I/O and immediately bumps cursors. This eliminates allnilchecks for disabled persistence and ensures tests exercise the exact same control flow as production.persistersstruct — pure I/O, always initialized: Groups the three disk persisters (appQC,blocks,commitQCs). Always present onState(not wrapped inOption). All inner state access goes throughStatemethods (collectPersistBatch,markPersisted), keeping a clean separation between orchestration and I/O.PushCommitQCno longer publisheslatestCommitQCdirectly — the persist goroutine publishes it after writing to disk (or immediately for no-op persisters) viamarkPersisted. Since consensus subscribes toLastCommitQC()to advance views, it won't proceed until the CommitQC is durable — preventing CommitQC loss on crash.avail/subscriptions.go):RecvBatchonly yields blocks below thenextBlockToPersistwatermark, so votes are only signed for durably written blocks.avail/state.go):NewStateacceptsstateDir, initialises the A/B persister (for AppQC),BlockPersister, andCommitQCPersister, and loads persisted data on restart. Returns error for corrupt state.avail/inner.go): Usesqueue.reset()to set starting positions, thenpushBackto reload entries. Finally callsinner.prune()with the persisted AppQC to advance all queues — same code path as runtime. Returns error for corrupt state (non-consecutive CommitQCs, AppQC without matching CommitQC on disk).DeleteBeforeremoves files for pruned blocks and orphaned lanes (from previous committees). Driven by the persist goroutine observinglaneFirsts.prune()returns(bool, error): Simplified from returning alaneFirstsmap — callers only need to know if pruning occurred. The persist goroutine readsq.firstdirectly.newStateconstructors:consensus/state.goexposesnewState()accepting a customPersisterfor test mocks (e.g.failPersister), avoiding fragile field mutation after construction.avail/state.go'sNewStateacceptsstateDirasOption[string].queue.reset()method: Clearly sets the starting position of an empty queue, replacing misleadingprune()calls during initialization.PersistBatchslower, which delaysnextBlockToPersistandRecvBatch, which delays voting, which grows batches further. Mitigation (e.g. persisting one block at a time) deferred.Ref: sei-protocol/sei-v3#512
Test plan
persist/blocks_test.go: load/store, gap truncation, DeleteBefore, orphaned lane cleanup, header mismatch, corrupt filespersist/commitqcs_test.go: load/store, gap truncation, DeleteBefore, corrupt files, mismatched indexpersist/persist_test.go: A/B file crash safety, seq management, corrupt fallback, generic typed APIavail/state_test.go: fresh start, load AppQC, load blocks, load both, load commitQCs, load commitQCs with AppQC, corrupt data, headers returns ErrPruned for blocks before loaded rangeavail/state_test.go(TestStateWithPersistence): end-to-end persist + prune race regression test (5 iterations with disk persistence; reliably catches cursor race without the clamp fix)avail/inner_test.go: newInner with loaded state, newInner with all three (AppQC + CommitQCs + blocks), newInner error cases (non-consecutive CommitQCs, AppQC without matching CommitQC), nextBlockToPersist reconstruction, nextCommitQCToPersist reconstruction, votes queue advancementavail/queue_test.go: newQueue, pushBack, reset, prune, stale prune, prune past nextconsensus/inner_test.go: consensus inner persistence round-trip, persist error propagation via newState injectiondata/state_test.go: data state tests