Skip to content

fix(rivetkit): disk I/O error on destroy vfs#4530

Merged
NathanFlurry merged 1 commit intomainfrom
03-29-fix_rivetkit_disk_i_o_error_on_destroy_vfs
Mar 31, 2026
Merged

fix(rivetkit): disk I/O error on destroy vfs#4530
NathanFlurry merged 1 commit intomainfrom
03-29-fix_rivetkit_disk_i_o_error_on_destroy_vfs

Conversation

@NathanFlurry
Copy link
Copy Markdown
Member

Description

Please include a summary of the changes and the related issue. Please also include relevant motivation and context.

Type of change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • This change requires a documentation update

How Has This Been Tested?

Please describe the tests that you ran to verify your changes.

Checklist:

  • My code follows the style guidelines of this project
  • I have performed a self-review of my code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes

@railway-app
Copy link
Copy Markdown

railway-app bot commented Mar 30, 2026

🚅 Deployed to the rivet-pr-4530 environment in rivet-frontend

Service Status Web Updated (UTC)
kitchen-sink ❌ Build Failed (View Logs) Web Mar 31, 2026 at 2:39 am
frontend-inspector 🕒 Building (View Logs) Web Mar 31, 2026 at 2:39 am
frontend-cloud 🕒 Building (View Logs) Web Mar 31, 2026 at 2:39 am
website 😴 Sleeping (View Logs) Web Mar 31, 2026 at 2:10 am
mcp-hub ✅ Success (View Logs) Web Mar 30, 2026 at 6:07 pm
ladle ❌ Build Failed (View Logs) Web Mar 30, 2026 at 6:07 pm

@railway-app railway-app bot temporarily deployed to rivet-frontend / rivet-pr-4530 March 30, 2026 18:06 Destroyed
Copy link
Copy Markdown
Member Author

This stack of pull requests is managed by Graphite. Learn more about stacking.

@claude
Copy link
Copy Markdown

claude bot commented Mar 30, 2026

PR Review: fix(rivetkit): disk I/O error on destroy vfs

This PR fixes a production race condition where a WebSocket handler running past the actor sleep grace period could trigger a cryptic SQLite "disk I/O error" when the VFS was torn down mid-query. The approach — poisoning the KV store before acquiring the mutex, then capturing VFS errors via an onError callback and rethrowing them with a developer-friendly message — is solid and well-tested.


What is Good

  • Poison-before-mutex ordering is correct. Setting poisoned = true before waiting on the lock means any query holding the mutex when close() is called will see the poison flag the moment it resumes and attempts the next KV operation. No TOCTOU window.
  • onError callback chain through VFS -> KV store -> DB module converts previously swallowed SQLite error codes into the original JS error, enabling the descriptive rethrow.
  • Error message is actionable: it names c.abortSignal as the remedy and explains why the failure happened, which is much better than a bare "disk I/O error".
  • Four distinct fixture actors cover the key failure modes (single handler, concurrent handlers, active writes, raw handle). This is solid coverage of a subtle race condition.

Issues

1. lastVfsError only benefits the first concurrent failure

lastVfsError is a single slot: the first query to catch an error clears it (lastVfsError = null), so any subsequent concurrent queries that fail from the same poison event rethrow the raw SQLite error. This means under concurrent load users may still see "disk I/O error" for the second and later failures. Not clearing lastVfsError until a dedicated reset (or using a flag rather than a nullable) would close the gap.

2. Test 1 assertion needs clarification

The test "ws handler exceeding grace period should still complete db writes" expects msg-finish to appear in the DB log, but the fixture sleepWsMessageExceedsGrace delays 2000ms (grace = 200ms) before the second DB write. After the grace period expires the KV store is poisoned, so the second write should fail, not succeed. If the intent is to assert that the handler gets a descriptive error rather than hanging, the assertion should check for an error. If the intent is that the first write (committed before the delay) is visible, the assertion should target that earlier entry instead.

3. Fragile timing in fixtures and tests

Constants like ACTIVE_DB_WRITE_DELAY_MS = 5 combined with ACTIVE_DB_GRACE_PERIOD = 50 leave very little margin on a loaded CI runner. The concurrent-handler test also relies on waitFor with wall-clock delays. State-polling (repeatedly reading actor state until a condition is met) instead of fixed sleeps would make these more reliable.

4. handlerErrors is declared but never written

In sleepWsConcurrentDbExceedsGrace, handlerErrors: [] as string[] is initialised in state and asserted to be empty, but the handler never pushes to it when an error occurs. The assertion is vacuously true. Either catch errors in the handler and push to handlerErrors, or remove the field and its assertion.


Nits

  • Constants in sleep-db.ts are declared with const and re-exported via export { ... }. Using export const directly matches the style of SLEEP_DB_TIMEOUT elsewhere in the file.
  • A few comments use dashes as separators. Per CLAUDE.md, prefer periods to separate clauses instead of dashes.

@pkg-pr-new
Copy link
Copy Markdown

pkg-pr-new bot commented Mar 30, 2026

More templates

@rivetkit/cloudflare-workers

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/cloudflare-workers@4530

@rivetkit/framework-base

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/framework-base@4530

@rivetkit/next-js

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/next-js@4530

@rivetkit/react

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/react@4530

rivetkit

pnpm add https://pkg.pr.new/rivet-dev/rivet/rivetkit@4530

@rivetkit/sql-loader

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/sql-loader@4530

@rivetkit/sqlite-vfs

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/sqlite-vfs@4530

@rivetkit/traces

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/traces@4530

@rivetkit/workflow-engine

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/workflow-engine@4530

@rivetkit/virtual-websocket

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/virtual-websocket@4530

@rivetkit/engine-runner

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/engine-runner@4530

@rivetkit/engine-runner-protocol

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/engine-runner-protocol@4530

commit: 1d237a0

@NathanFlurry NathanFlurry force-pushed the 03-29-fix_rivetkit_disk_i_o_error_on_destroy_vfs branch from 3ae002d to ac63d07 Compare March 30, 2026 18:12
@railway-app railway-app bot temporarily deployed to rivet-frontend / rivet-pr-4530 March 30, 2026 18:12 Destroyed
@NathanFlurry NathanFlurry force-pushed the 03-29-fix_rivetkit_disk_i_o_error_on_destroy_vfs branch from ac63d07 to 1d237a0 Compare March 31, 2026 01:56
@railway-app railway-app bot temporarily deployed to rivet-frontend / rivet-pr-4530 March 31, 2026 01:56 Destroyed
@NathanFlurry NathanFlurry force-pushed the 03-29-fix_rivetkit_disk_i_o_error_on_destroy_vfs branch from 1d237a0 to 212ed3c Compare March 31, 2026 02:00
@railway-app railway-app bot temporarily deployed to rivet-frontend / rivet-pr-4530 March 31, 2026 02:00 Destroyed
@NathanFlurry NathanFlurry marked this pull request as ready for review March 31, 2026 02:20
@NathanFlurry NathanFlurry force-pushed the 03-29-fix_rivetkit_disk_i_o_error_on_destroy_vfs branch from 212ed3c to 6ea1614 Compare March 31, 2026 02:39
@railway-app railway-app bot temporarily deployed to rivet-frontend / rivet-pr-4530 March 31, 2026 02:39 Destroyed
@NathanFlurry NathanFlurry merged commit d0e47ea into main Mar 31, 2026
13 of 20 checks passed
@NathanFlurry NathanFlurry deleted the 03-29-fix_rivetkit_disk_i_o_error_on_destroy_vfs branch March 31, 2026 02:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant