release-26.1: sql/backfill: report panics in backfill goroutines#169207
Merged
trunk-io[bot] merged 1 commit intocockroachdb:release-26.1from Apr 28, 2026
Merged
release-26.1: sql/backfill: report panics in backfill goroutines#169207trunk-io[bot] merged 1 commit intocockroachdb:release-26.1from
trunk-io[bot] merged 1 commit intocockroachdb:release-26.1from
Conversation
The index backfiller and the MVCC index merger each spawn goroutines that, until now, had no panic recovery. A panic inside the goroutine spawned by indexBackfiller.Run (or one re-thrown from a ctxgroup worker by g.Wait) would tear down the SQL pod with no Sentry report and no CRDB-formatted log entry — only the Go runtime's bare stderr dump. Switch the indexBackfiller goroutine to stopper.RunAsyncTaskEx so that the stopper's recover wrapper reports the panic to Sentry before re-panicking. The MVCC index merger keeps its bare goroutine because Run depends on g.Wait returning before the deferred memory monitor cleanup runs (a refused stopper task would force an early return with workers still using the bound account); instead it gets a defer logcrash.RecoverAndReportPanic so the same Sentry visibility applies. Both changes are defense in depth: the SQL pod still crashes after a panic in either path, but now the crash is observable instead of silent. Informs: cockroachdb#169059 Epic: none Release note: None Co-Authored-By: roachdev-claude <roachdev-claude-bot@cockroachlabs.com>
|
Thanks for opening a backport. Before merging, please confirm that the change does not break backwards compatibility and otherwise complies with the backport policy. Include a brief release justification in the PR description explaining why the backport is appropriate. All backports must be reviewed by the TL for the owning area. While the stricter LTS policy does not yet apply, please exercise judgment and consider gating non-critical changes behind a disabled-by-default feature flag when appropriate. |
Contributor
|
😎 Merged successfully - details. |
|
It looks like your PR touches production code but doesn't add or edit any test code. Did you consider adding tests to your PR? 🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf. |
Member
Collaborator
Author
|
/trunk merge |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Backport 1/1 commits from #169062 on behalf of @rafiss.
The index backfiller and the MVCC index merger each spawn goroutines that, until now, had no panic recovery. A panic inside the goroutine spawned by indexBackfiller.Run (or one re-thrown from a ctxgroup worker by g.Wait) would tear down the SQL pod with no Sentry report and no CRDB-formatted log entry — only the Go runtime's bare stderr dump.
Switch the indexBackfiller goroutine to stopper.RunAsyncTaskEx so that the stopper's recover wrapper reports the panic to Sentry before re-panicking. The MVCC index merger keeps its bare goroutine because Run depends on g.Wait returning before the deferred memory monitor cleanup runs (a refused stopper task would force an early return with workers still using the bound account); instead it gets a
defer logcrash.RecoverAndReportPanic so the same Sentry visibility applies.
Both changes are defense in depth: the SQL pod still crashes after a panic in either path, but now the crash is observable instead of silent.
Informs: #169059
Epic: none
Release note: None
Release justification: Observability for production crashes: defense-in-depth fix that enables Sentry reporting for otherwise-silent SQL pod crashes during index backfill/merge.