Skip to content

sql/backfill: report panics in backfill goroutines#169062

Merged
trunk-io[bot] merged 1 commit intocockroachdb:masterfrom
rafiss:backfill-handle-panic
Apr 27, 2026
Merged

sql/backfill: report panics in backfill goroutines#169062
trunk-io[bot] merged 1 commit intocockroachdb:masterfrom
rafiss:backfill-handle-panic

Conversation

@rafiss
Copy link
Copy Markdown
Collaborator

@rafiss rafiss commented Apr 24, 2026

The index backfiller and the MVCC index merger each spawn goroutines that, until now, had no panic recovery. A panic inside the goroutine spawned by indexBackfiller.Run (or one re-thrown from a ctxgroup worker by g.Wait) would tear down the SQL pod with no Sentry report and no CRDB-formatted log entry — only the Go runtime's bare stderr dump.

Switch the indexBackfiller goroutine to stopper.RunAsyncTaskEx so that the stopper's recover wrapper reports the panic to Sentry before re-panicking. The MVCC index merger keeps its bare goroutine because Run depends on g.Wait returning before the deferred memory monitor cleanup runs (a refused stopper task would force an early return with workers still using the bound account); instead it gets a
defer logcrash.RecoverAndReportPanic so the same Sentry visibility applies.

Both changes are defense in depth: the SQL pod still crashes after a panic in either path, but now the crash is observable instead of silent.

Informs: #169059
Epic: none

Release note: None

The index backfiller and the MVCC index merger each spawn goroutines
that, until now, had no panic recovery. A panic inside the goroutine
spawned by indexBackfiller.Run (or one re-thrown from a ctxgroup worker
by g.Wait) would tear down the SQL pod with no Sentry report and no
CRDB-formatted log entry — only the Go runtime's bare stderr dump.

Switch the indexBackfiller goroutine to stopper.RunAsyncTaskEx so that
the stopper's recover wrapper reports the panic to Sentry before
re-panicking. The MVCC index merger keeps its bare goroutine because Run
depends on g.Wait returning before the deferred memory monitor cleanup
runs (a refused stopper task would force an early return with workers
still using the bound account); instead it gets a
defer logcrash.RecoverAndReportPanic so the same Sentry visibility
applies.

Both changes are defense in depth: the SQL pod still crashes after a
panic in either path, but now the crash is observable instead of silent.

Informs: cockroachdb#169059
Epic: none

Release note: None

Co-Authored-By: roachdev-claude <roachdev-claude-bot@cockroachlabs.com>
@rafiss rafiss requested review from mw5h and spilchen April 24, 2026 15:33
@rafiss rafiss requested a review from a team as a code owner April 24, 2026 15:33
@blathers-crl
Copy link
Copy Markdown

blathers-crl Bot commented Apr 24, 2026

It looks like your PR touches production code but doesn't add or edit any test code. Did you consider adding tests to your PR?

🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.

@trunk-io
Copy link
Copy Markdown
Contributor

trunk-io Bot commented Apr 24, 2026

😎 Merged successfully - details.

@cockroach-teamcity
Copy link
Copy Markdown
Member

This change is Reviewable

@rafiss rafiss added backport-25.4.x Flags PRs that need to be backported to 25.4 backport-26.1.x Flags PRs that need to be backported to 26.1 backport-26.2.x Flags PRs that need to be backported to 26.2 labels Apr 24, 2026
Copy link
Copy Markdown
Contributor

@spilchen spilchen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm: nice find

@spilchen made 2 comments.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on mw5h and rafiss).


pkg/sql/rowexec/indexbackfiller.go line 578 at r1 (raw file):

	// we loop over progCh, which is closed only after the goroutine returns.
	if startErr := ib.flowCtx.Stopper().RunAsyncTaskEx(ctx, stop.TaskOpts{
		TaskName: "indexBackfiller-runBackfill",

nit: we don't typically use camel case for task names. I'm fine if you want to ignore this, I just thought it looked a bit odd.

Copy link
Copy Markdown
Collaborator Author

@rafiss rafiss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TFTR!

/trunk merge

@rafiss made 2 comments.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on mw5h and spilchen).


pkg/sql/rowexec/indexbackfiller.go line 578 at r1 (raw file):

Previously, spilchen wrote…

nit: we don't typically use camel case for task names. I'm fine if you want to ignore this, I just thought it looked a bit odd.

we use camel case (and also a few other non-standard formats) for this in a few other places:

TaskName: "backupDataProcessor.runBackupProcessor",

TaskName: "generativeSplitAndScatter-worker",

TaskName: "txnCommitter: making txn commit explicit",

since there's no unified convention, i'll keep this as is

@trunk-io trunk-io Bot merged commit ac091d7 into cockroachdb:master Apr 27, 2026
29 checks passed
@blathers-crl
Copy link
Copy Markdown

blathers-crl Bot commented Apr 27, 2026

Encountered an error creating backports. Some common things that can go wrong:

  1. The backport branch might have already existed.
  2. There was a merge conflict.
  3. The backport branch contained merge commits.

You might need to create your backport manually using the backport tool.


merge conflict cherry-picking 5fdf1a2 to blathers/backport-release-25.4-169062

Backport to branch 25.4.x failed. See errors above.


merge conflict cherry-picking 5fdf1a2 to blathers/backport-release-26.1-169062

Backport to branch 26.1.x failed. See errors above.


merge conflict cherry-picking 5fdf1a2 to blathers/backport-release-26.2-169062

Backport to branch 26.2.x failed. See errors above.


🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backport-25.4.x Flags PRs that need to be backported to 25.4 backport-26.1.x Flags PRs that need to be backported to 26.1 backport-26.2.x Flags PRs that need to be backported to 26.2 target-release-26.3.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants