chore(deps): bump go-openaudio ETL to halt-on-block-error (#323)#883
Merged
Conversation
Picks up OpenAudio/go-openaudio#323. Before this, a processBlock failure in pkg/etl was swallowed with `continue`, advancing past the failed block. That leaves a hidden core_indexed_blocks gap because MAX(height) jumps ahead, so the api/health_check block_diff probes look healthy while the skipped block is silently lost forever — the same pattern that produced the ray52726 dropped-signup earlier this week. After this bump, processBlock failure halts indexBlocks() and returns the error. The pod crash-restart path picks up from MAX(core_indexed_blocks.height), the prefetcher re-hands us the failed block, and we retry. Transient failures self-heal on one restart; persistent corruption crashloops loudly (worth verifying pod-restart alerting is wired — covered in the #323 review notes). Bump is one-commit-clean (4d1c9dfdfb52..819100b28c94). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
raymondjacobson
added a commit
that referenced
this pull request
May 30, 2026
…)" (#885) ## Summary Reverts #883, pinning go-openaudio back to `v1.3.1-0.20260529221831-4d1c9dfdfb52`. The halt-on-error behavior from upstream go-openaudio#323 is correct in isolation, but is **incompatible with the current dual-run state**: Python and api-side ETL both write to overlapping tables, and the on-chain plays bridge from #881 doesn't ON CONFLICT-protect against rows Python has already written. So: - Pre-#883: the failure was silently swallowed by `continue` — ETL was effectively a no-op on essentially every block since #881 deployed, but block_diff stayed green because Python's writes kept `MAX(blocks.height)` moving. Block-level data loss masked by Python carrying the load. - Post-#883: the same failure crashes the indexing loop. We saw it tonight at `processBlock failed` on block 25415514, reproducibly across pod restarts because Python writes the same plays in the same block before the ETL gets to it. Once #884 (the api-wrapper fix that makes that halt actually exit the process) ships, every pod would crashloop the moment it tries to index any recent block. So shipping #883 + #884 without first handling the cross-writer collision points would convert today's silent wedge into a continuous outage that takes the parity jobs (`IndexChallengesJob`, `UserListeningHistory`, `HourlyPlayCounts`, etc.) down with the ETL. Strictly worse. ## Plan 1. **This PR**: pin upstream back to the pre-halt version. Today's silent wedge stays in place — bad, but bounded — and the parity jobs keep ticking. 2. Close #884 (already done). The diagnosis there is correct, but it amplifies #883's bad sequencing, so we re-land it after #883 is safe to re-ship. 3. Revert OpenAudio/go-openaudio#323 upstream too, so no future bump trips this accidentally. 4. **Audit + fix the cross-writer collision points in pkg/etl** — start with the plays bridge (#881), apply the same ON CONFLICT pattern #319 used for the `blocks` table. Then sweep anywhere else ETL and Python touch the same row. 5. Re-land go-openaudio#323, then api#883, then api#884 (in that order). At that point the halt-on-error guarantee is honest. ## Bump details (revert direction) | | from | to | |---|---|---| | `github.com/OpenAudio/go-openaudio` | `v1.3.1-0.20260529230137-819100b28c94` | `v1.3.1-0.20260529221831-4d1c9dfdfb52` | | `github.com/OpenAudio/go-openaudio/pkg/etl` | `v1.3.1-0.20260529230137-819100b28c94` | `v1.3.1-0.20260529221831-4d1c9dfdfb52` | ## Test plan - [x] `go build ./...` clean. - [ ] After deploy: confirm new pod boots, no `processBlock failed` halt log on block 25415514 (it'll go back to silent `continue`). - [ ] Verify parity jobs still tick and block_diff stays at 0 (no functional change vs. pre-#883 prod). 🤖 Generated with [Claude Code](https://claude.com/claude-code)
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Picks up OpenAudio/go-openaudio#323.
Before that change, a
processBlockfailure inpkg/etlwas swallowed withcontinue, advancing past the failed block.core_indexed_blocksis keyed byMAX(height), so the api'shealth_check?max_core_indexer_block_diff=...probes look healthy while the skipped block is silently lost forever — and the prefetcher works forward from the indexed tip so it never re-hands a skipped block. That's the same pattern that produced theray52726dropped-signup we tracked down earlier this week (blocks_number_keycollision on the per-block tx →continue→ silent gap).After this bump,
processBlockfailure causesindexBlocks()to return the error. The pod crash-restart path picks up fromMAX(core_indexed_blocks.height), the prefetcher re-hands the previously-failed block, and we retry it. Transient failures self-heal on one restart; persistent corruption crashloops loudly.Bump details
github.com/OpenAudio/go-openaudiov1.3.1-0.20260529221831-4d1c9dfdfb52v1.3.1-0.20260529230137-819100b28c94github.com/OpenAudio/go-openaudio/pkg/etlv1.3.1-0.20260529221831-4d1c9dfdfb52v1.3.1-0.20260529230137-819100b28c94gh api .../compare/4d1c9dfdfb52...819100b2confirms this is one commit only — #323's merge, no drive-bys.Operational note
This trades "silent data loss" for "loud crashloop on persistent corruption." Worth confirming that pod-restart / unhealthy-pod alerting on
core-indexeris wired so the loud failure doesn't also go unnoticed. If it isn't, that's a quick follow-up — but even without it this PR is strictly better than today.Test plan
go build ./...,go vet ./...clean.go mod tidyclean (no transitive surprises)../indexer/...tests pass (the existing user_pubkey + user_events hook tests exercise the pkg/etl integration path).blocks_behindstays at 0 (no behavior change on the happy path).🤖 Generated with Claude Code