Isolate per-file indexing failures instead of aborting the whole realm#5389
Isolate per-file indexing failures instead of aborting the whole realm#5389richardhjtan wants to merge 2 commits into
Conversation
tryToVisit rethrew any non-404 error from a file visit, which unwound the entire fromScratch/incremental visit loop and skipped batch.done(). Since batch.done() is what promotes the working table into the live boxel_index, a single file's transport-level failure (e.g. a prerender timeout) discarded every other already-successfully-indexed file in the same job, leaving the realm mounted with an empty or stale search index and no error surfaced. Catch non-404 errors per file and record a file-error entry for that URL instead, mirroring the error_doc pattern already used for in-band render errors, so the rest of the realm still indexes and commits.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 6ae5858e07
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| let entry: FileErrorIndexEntry = { | ||
| type: 'file-error', | ||
| error, | ||
| }; | ||
| await this.batch.updateEntry(url, entry); |
There was a problem hiding this comment.
Don't leave card instances tombstoned on visit failures
When an incremental reindex hits a transient prerenderVisit failure for an existing .json card, Batch.invalidate() has already inserted tombstones for every production row at that URL, including the instance row. This catch only overwrites the file tombstone with a file-error, so batch.done() promotes the untouched instance tombstone and removes the card from search/instance results instead of preserving the last-known-good instance with an error row like the in-band render-error path does. This affects existing card files whenever the fused visit throws before indexCardWithResult runs.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
[Claude Code 🤖] Confirmed — Batch.invalidate() tombstones every existing type at a URL up front (instance + file for a card), and the file-error write alone left the instance tombstone to be promoted, silently deleting the card from search on a transient error. Fixed in 5e88b86: now also writes an instance-error entry when the failed URL is a card, so the last-known-good instance survives the same way the in-band render-error path preserves it.
There was a problem hiding this comment.
Pull request overview
This PR makes realm indexing more robust by preventing a single file’s transport-level prerender/visit failure from aborting the entire indexing batch (and therefore preventing batch.done() from promoting the working index into the live boxel_index table).
Changes:
- Updates
IndexRunner.tryToVisitto treat non-404 visit failures as per-file failures by writing a persistedfile-errorindex entry for the URL. - Ensures dependency-row cache invalidation and
stats.fileErrorsaccounting happen on this new error path.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| // A transport-level failure (prerender timeout/abort, network error) | ||
| // never reaches performCardIndexing/performFileIndexing's own | ||
| // error-entry construction — visitFileForIndexingFused rethrows | ||
| // before calling indexCardWithResult/indexFileWithResults. Left | ||
| // uncaught here, one file's failure propagates out of the |
There was a problem hiding this comment.
[Claude Code 🤖] Fair point on the missing regression test for the mid-batch isolation behavior — flagged as open in the PR description since simulating a transport-level prerender abort isn't trivial in the current IndexRunner test harness. Open to suggestions on the lowest-effort way to cover it if you have one.
Batch.invalidate() tombstones every existing type at a URL up front, including the instance row for a previously-indexed card. The prior commit's file-error entry only overwrote the file tombstone, so batch.done() would promote the untouched instance tombstone and silently remove an existing card from search over a transient error (e.g. a prerender timeout) instead of preserving it with an error row the way the in-band render-error path does. Re-check whether the failed URL is a card and, if so, also write an instance-error entry alongside the file-error one.
Summary
IndexRunner.tryToVisitrethrew any non-404 error from a file's fused visit (e.g. a prerender-visit timeout), which unwound the entirefromScratch/incrementalvisit loop and skippedbatch.done().batch.done()is what promotes the working table into the liveboxel_index, a single file's transport-level failure discarded every other already-successfully-indexed file in the same job — leaving the realm mounted but with an empty or stale search index, and no error surfaced anywhere.boxel_index.file-errorentry for that URL, mirroring the existingerror_docpattern already used for in-band render errors, so the rest of the realm still indexes and commits.Test plan
tsc --noEmitclean on the changed fileboxel_indexcorrectlyIndexRunnertest harness; open to suggestions on the least-effort way to cover this🤖 Generated with Claude Code