You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
PR Review: fix(rivetkit): disk I/O error on destroy vfs
This PR fixes a production race condition where a WebSocket handler running past the actor sleep grace period could trigger a cryptic SQLite "disk I/O error" when the VFS was torn down mid-query. The approach — poisoning the KV store before acquiring the mutex, then capturing VFS errors via an onError callback and rethrowing them with a developer-friendly message — is solid and well-tested.
What is Good
Poison-before-mutex ordering is correct. Setting poisoned = true before waiting on the lock means any query holding the mutex when close() is called will see the poison flag the moment it resumes and attempts the next KV operation. No TOCTOU window.
onError callback chain through VFS -> KV store -> DB module converts previously swallowed SQLite error codes into the original JS error, enabling the descriptive rethrow.
Error message is actionable: it names c.abortSignal as the remedy and explains why the failure happened, which is much better than a bare "disk I/O error".
Four distinct fixture actors cover the key failure modes (single handler, concurrent handlers, active writes, raw handle). This is solid coverage of a subtle race condition.
Issues
1. lastVfsError only benefits the first concurrent failure
lastVfsError is a single slot: the first query to catch an error clears it (lastVfsError = null), so any subsequent concurrent queries that fail from the same poison event rethrow the raw SQLite error. This means under concurrent load users may still see "disk I/O error" for the second and later failures. Not clearing lastVfsError until a dedicated reset (or using a flag rather than a nullable) would close the gap.
2. Test 1 assertion needs clarification
The test "ws handler exceeding grace period should still complete db writes" expects msg-finish to appear in the DB log, but the fixture sleepWsMessageExceedsGrace delays 2000ms (grace = 200ms) before the second DB write. After the grace period expires the KV store is poisoned, so the second write should fail, not succeed. If the intent is to assert that the handler gets a descriptive error rather than hanging, the assertion should check for an error. If the intent is that the first write (committed before the delay) is visible, the assertion should target that earlier entry instead.
3. Fragile timing in fixtures and tests
Constants like ACTIVE_DB_WRITE_DELAY_MS = 5 combined with ACTIVE_DB_GRACE_PERIOD = 50 leave very little margin on a loaded CI runner. The concurrent-handler test also relies on waitFor with wall-clock delays. State-polling (repeatedly reading actor state until a condition is met) instead of fixed sleeps would make these more reliable.
4. handlerErrors is declared but never written
In sleepWsConcurrentDbExceedsGrace, handlerErrors: [] as string[] is initialised in state and asserted to be empty, but the handler never pushes to it when an error occurs. The assertion is vacuously true. Either catch errors in the handler and push to handlerErrors, or remove the field and its assertion.
Nits
Constants in sleep-db.ts are declared with const and re-exported via export { ... }. Using export const directly matches the style of SLEEP_DB_TIMEOUT elsewhere in the file.
A few comments use dashes as separators. Per CLAUDE.md, prefer periods to separate clauses instead of dashes.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
Please include a summary of the changes and the related issue. Please also include relevant motivation and context.
Type of change
How Has This Been Tested?
Please describe the tests that you ran to verify your changes.
Checklist: