Wait for blocksync goroutines on Stop to fix leveldb shutdown panic by masih · Pull Request #3415 · sei-protocol/sei-chain

masih · 2026-05-11T13:20:02Z

Reactor.OnStart and BlockPool.OnStart started their long-running goroutines (requestRoutine, poolRoutine, processBlockSyncCh, processPeerUpdates, makeRequestersRoutine) with raw go fn(ctx) using the outer context. They were therefore not registered with the BaseService WaitGroup, and Stop() never waited for them. The outer ctx also outlived Stop, so the goroutines kept running after Stop returned.

During node shutdown this raced nodeImpl.OnStop's blockStore.Close(): poolRoutine, still inside SaveBlock -> Base() -> bs.db.Iterator, observed its leveldb table reader released and panicked with "leveldb/table: reader released".

Route each goroutine through BaseService.Spawn so it is tracked by the WaitGroup and bound to inner.ctx. Stop() now cancels them and blocks until they exit, which happens before the node closes the BlockStore DB. Add a regression test that asserts no blocksync goroutines remain after Reactor.Stop() returns.

Note

Medium Risk
Touches blocksync shutdown/concurrency by changing how long-running goroutines are started and awaited, which can affect node sync and shutdown behavior. Scope is limited and covered by a new regression test, but failures could manifest as hangs or incomplete shutdown.

Overview
Fixes blocksync shutdown ordering by starting Reactor and BlockPool long-running routines via BaseService.Spawn() instead of raw go calls, ensuring they are bound to the service’s inner context and are waited on during Stop().

Updates Reactor.OnStop documentation to reflect that Stop() now blocks until requestRoutine, poolRoutine, processBlockSyncCh, and processPeerUpdates exit, and adds a regression test (TestReactor_OnStopWaitsForGoroutines) that asserts no blocksync goroutines remain after Reactor.Stop() returns to prevent the LevelDB shutdown panic.

^{Reviewed by Cursor Bugbot for commit e4972b7. Bugbot is set up for automated code reviews on this repo. Configure here.}

Reactor.OnStart and BlockPool.OnStart started their long-running goroutines (requestRoutine, poolRoutine, processBlockSyncCh, processPeerUpdates, makeRequestersRoutine) with raw `go fn(ctx)` using the outer context. They were therefore not registered with the BaseService WaitGroup, and Stop() never waited for them. The outer ctx also outlived Stop, so the goroutines kept running after Stop returned. During node shutdown this raced nodeImpl.OnStop's blockStore.Close(): poolRoutine, still inside SaveBlock -> Base() -> bs.db.Iterator, observed its leveldb table reader released and panicked with "leveldb/table: reader released". Route each goroutine through BaseService.Spawn so it is tracked by the WaitGroup and bound to inner.ctx. Stop() now cancels them and blocks until they exit, which happens before the node closes the BlockStore DB. Add a regression test that asserts no blocksync goroutines remain after Reactor.Stop() returns.

github-actions · 2026-05-11T13:21:08Z

The latest Buf updates on your PR. Results from workflow Buf / buf (pull_request).

Build	Format	Lint	Breaking	Updated (UTC)
`✅ passed`	`✅ passed`	`✅ passed`	`✅ passed`	May 11, 2026, 1:23 PM

codecov · 2026-05-11T13:22:09Z

Codecov Report

❌ Patch coverage is 71.42857% with 8 lines in your changes missing coverage. Please review.
✅ Project coverage is 59.24%. Comparing base (0543e0e) to head (e4972b7).

Files with missing lines	Patch %	Lines
sei-tendermint/internal/blocksync/reactor.go	66.66%	8 Missing ⚠️

Additional details and impacted files

@@           Coverage Diff           @@
##             main    #3415   +/-   ##
=======================================
  Coverage   59.24%   59.24%           
=======================================
  Files        2110     2110           
  Lines      174149   174170   +21     
=======================================
+ Hits       103175   103193   +18     
- Misses      62041    62044    +3     
  Partials     8933     8933

Flag	Coverage Δ
sei-chain-pr	`70.59% <71.42%> (?)`
sei-db	`70.41% <ø> (ø)`

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines	Coverage Δ
sei-tendermint/internal/blocksync/pool.go	`81.58% <100.00%> (+1.01%)`	⬆️
sei-tendermint/internal/blocksync/reactor.go	`61.53% <66.66%> (+0.27%)`	⬆️

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

cursor

Cursor Bugbot has reviewed your changes and found 1 potential issue.

^{❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.}

^{Reviewed by Cursor Bugbot for commit e4972b7. Configure here.}

cursor · 2026-05-11T13:59:59Z

+		r.Spawn("poolRoutine", func(ctx context.Context) error {
+			r.poolRoutine(ctx, false)
+			return nil
+		})


SwitchToConsensus receives prematurely-cancelled inner context

Medium Severity

Switching poolRoutine from go r.poolRoutine(ctx, false) to r.Spawn(...) changes the ctx it receives from the outer (node-scoped) context to the Reactor's inner.ctx. Inside poolRoutine, this ctx is passed to r.consReactor.SwitchToConsensus(ctx, state, ...) on line 495. The consensus reactor now receives a context that is cancelled when the blocksync Reactor.Stop() runs, rather than when the node shuts down. This can prematurely cancel consensus operations that are expected to outlive the blocksync reactor.

Additional Locations (1)

sei-tendermint/internal/blocksync/reactor.go#L494-L495

^{Reviewed by Cursor Bugbot for commit e4972b7. Configure here.}

masih · 2026-05-11T17:38:29Z

Marked back as draft to take a closer look at reactor code before opening back up for review

masih added non-app-hash-breaking backport release/v6.5 labels May 11, 2026

masih marked this pull request as ready for review May 11, 2026 13:50

masih requested review from sei-will and wen-coding May 11, 2026 13:54

cursor Bot reviewed May 11, 2026

View reviewed changes

wen-coding approved these changes May 11, 2026

View reviewed changes

masih marked this pull request as draft May 11, 2026 17:35

pompon0 approved these changes May 11, 2026

View reviewed changes

pompon0 self-requested a review May 11, 2026 17:45

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Wait for blocksync goroutines on Stop to fix leveldb shutdown panic#3415

Wait for blocksync goroutines on Stop to fix leveldb shutdown panic#3415
masih wants to merge 1 commit into
mainfrom
masih/panic-leveldb-iter-tm

masih commented May 11, 2026 •

edited by cursor Bot

Loading

Uh oh!

github-actions Bot commented May 11, 2026 •

edited

Loading

Uh oh!

codecov Bot commented May 11, 2026 •

edited

Loading

Uh oh!

cursor Bot left a comment

Uh oh!

cursor Bot May 11, 2026

Uh oh!

masih commented May 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

masih commented May 11, 2026 • edited by cursor Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions Bot commented May 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov Bot commented May 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

cursor Bot left a comment

Choose a reason for hiding this comment

Uh oh!

cursor Bot May 11, 2026

Choose a reason for hiding this comment

SwitchToConsensus receives prematurely-cancelled inner context

Uh oh!

masih commented May 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

masih commented May 11, 2026 •

edited by cursor Bot

Loading

github-actions Bot commented May 11, 2026 •

edited

Loading

codecov Bot commented May 11, 2026 •

edited

Loading