Skip to content

Revert "fix(etl): halt on block indexing failure (#323)"#324

Merged
raymondjacobson merged 1 commit into
mainfrom
revert-323-halt-on-block-error
May 30, 2026
Merged

Revert "fix(etl): halt on block indexing failure (#323)"#324
raymondjacobson merged 1 commit into
mainfrom
revert-323-halt-on-block-error

Conversation

@raymondjacobson
Copy link
Copy Markdown
Contributor

Summary

Reverts #323.

The halt-on-error change was correct in isolation, but not compatible with the current dual-run state where the legacy Python discovery-provider indexer and the pkg/etl indexer both write to overlapping tables in the api/ schema. Specifically, since the on-chain plays bridge from AudiusProject/api#881 doesn't ON CONFLICT-protect against rows Python has already written, the per-block savepoint fails on essentially every recent block — and #323 turned that into a hard halt instead of the silent continue it used to be.

Observed in prod on AudiusProject/api tonight: after #323 was consumed, the api-side ETL repeatedly halted on block 25415514 (Python had written its plays first), pod restarts re-attempted the same block, same halt. AudiusProject/api#883 (the consumer bump) and AudiusProject/api#884 (the api-side errgroup fix that would actually make the halt exit the process) are being closed/reverted in lockstep with this PR; the diagnosis there is correct, the deploy sequencing isn't.

Plan

  1. This PR: revert fix(etl): halt on block indexing failure #323. Restores the silent-skip-on-failure behavior — bad for data quality, fine for keeping the indexer running while dual-run is the operating mode.
  2. Revert "chore(deps): bump go-openaudio ETL to halt-on-block-error (#883)" AudiusProject/api#885: pins pkg/etl back to the pre-fix(etl): halt on block indexing failure #323 version.
  3. Audit pkg/etl for every cross-writer collision point with Python — start with the plays bridge, mirror the blocks adopt-by-hash + ON CONFLICT pattern introduced in fix(etl): tolerate co-existing writers + process each block atomically #319. Sweep anywhere else both writers touch the same row.
  4. Re-land fix(etl): halt on block indexing failure #323 only after step 3 has been verified end-to-end, so the halt-on-error guarantee is honest in the dual-run state.

Test plan

  • cd pkg/etl && go build ./... clean.
  • go test . still passes (the change is back to the original continue path).

🤖 Generated with Claude Code

@raymondjacobson raymondjacobson merged commit 7be5f08 into main May 30, 2026
2 of 3 checks passed
@raymondjacobson raymondjacobson deleted the revert-323-halt-on-block-error branch May 30, 2026 00:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant