Skip to content

[codex] Reproduce reused-sleep replay divergence in core runtime#2169

Draft
pranaygp wants to merge 4 commits into
stablefrom
codex/runtime-only-reused-sleep-repro
Draft

[codex] Reproduce reused-sleep replay divergence in core runtime#2169
pranaygp wants to merge 4 commits into
stablefrom
codex/runtime-only-reused-sleep-repro

Conversation

@pranaygp
Copy link
Copy Markdown
Contributor

@pranaygp pranaygp commented May 29, 2026

What this proves

This adds core-runtime regression and discriminator tests for the observed Promise.race([iterator.next(), reused sleep]) divergence. The tests drive setupWorkflowContext() with explicitly ordered in-memory event histories, so they do not involve DynamoDB, Postgres, world-local, a Vercel deployment, or network timing.

The original ordered durable history is:

hook_created
wait_created
hook_received
step_created setupStep
step_started setupStep
step_completed setupStep
wait_completed
step_created drainStep

That history records the hook branch having won: the durable next operation is drainStep. Current stable replay instead follows the sleep branch and attempts to consume syncNextStep, reporting the same path-divergence corruption observed in hosted runs.

Early waiter across a drain

The hosted repro was subsequently changed to install iterator.next() before syncStatusSurfaceLikeStep, matching this PR's original positive control. That narrows the original boundary, but it is not a complete workaround when the loop reuses its sleep.

This PR now includes a two-iteration history matching the remaining window:

iteration 0: progressStep -> create reused wait -> hook wins -> drainStep starts
during drainStep: second hook_received and reused wait_completed become ready
iteration 1: installs iterator.next() before progressStep

There are two order-controlled tests for that history:

  • If the buffered second hook is delivered first, replay follows the recorded drainStep branch and passes.
  • If the reused wait_completed is delivered first, the recorded next operation is progressStep, but current stable replay consumes the hook branch and attempts drainStep.

The failing error is:

Corrupted event log: step event step_created ... belongs to "progressStep", but the current step consumer is "drainStep"

This reproduces the failure direction seen in wrun_01KSV07R3NQ9C26F4E0D0RTA8S from a complete, ordered in-memory event history. Moving iterator.next() before the progress step cannot cover the interval while the previous hook-winning drainStep is awaited.

Expected failing validation

fnm exec --using 24 pnpm --filter '@workflow/core...' build
fnm exec --using 24 pnpm --filter @workflow/core exec vitest run src/hook-sleep-interaction.test.ts --reporter=verbose

The targeted suite deterministically fails in both synchronous and asynchronous deserialization modes. This PR is intentionally test-only and expected to be red: its purpose is to preserve the minimal runtime-only reproductions while the fix is developed.

Promise-shape discriminator

For the original one-race history, both the mapped race

Promise.race([
  iterator.next().then((value) => ({ kind: 'hook', value })),
  pendingSleep.then(() => ({ kind: 'sleep' })),
]);

and raw Promise.race([iterator.next(), pendingSleep]) fail with the same drainStep versus syncNextStep divergence. Installing the iterator read before setupStep makes that original history pass.

The new two-iteration test demonstrates why that source-level adjustment does not resolve the overall bug: after the first hook value is consumed, no next iterator read is pending while the hook-side drain step is awaited. Reused sleep completion and buffered hook delivery can still be consumed into a trajectory different from the recorded next operation.

Relationship to the candidate fix

#2048 repairs the original single-iteration waiter-installation reproduction. I also applied the new two-iteration drain-window test to its current candidate commit (6164a6dd9) locally: the hook-first control passes, but the wait-first case still fails in both deserialization modes with the same progressStep versus drainStep corruption. The new test therefore captures a remaining runtime boundary not covered by that candidate repair.

@changeset-bot
Copy link
Copy Markdown

changeset-bot Bot commented May 29, 2026

⚠️ No Changeset found

Latest commit: e354f37

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

@vercel
Copy link
Copy Markdown
Contributor

vercel Bot commented May 29, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
example-nextjs-workflow-turbopack Ready Ready Preview, Comment May 30, 2026 12:50am
example-nextjs-workflow-webpack Ready Ready Preview, Comment May 30, 2026 12:50am
example-workflow Ready Ready Preview, Comment May 30, 2026 12:50am
workbench-astro-workflow Ready Ready Preview, Comment May 30, 2026 12:50am
workbench-express-workflow Ready Ready Preview, Comment May 30, 2026 12:50am
workbench-fastify-workflow Ready Ready Preview, Comment May 30, 2026 12:50am
workbench-hono-workflow Ready Ready Preview, Comment May 30, 2026 12:50am
workbench-nitro-workflow Ready Ready Preview, Comment May 30, 2026 12:50am
workbench-nuxt-workflow Ready Ready Preview, Comment May 30, 2026 12:50am
workbench-sveltekit-workflow Ready Ready Preview, Comment May 30, 2026 12:50am
workbench-tanstack-start-workflow Ready Ready Preview, Comment May 30, 2026 12:50am
workbench-vite-workflow Ready Ready Preview, Comment May 30, 2026 12:50am
workflow-swc-playground Ready Ready Preview, Comment May 30, 2026 12:50am
workflow-tarballs Ready Ready Preview, Comment May 30, 2026 12:50am
workflow-web Ready Ready Preview, Comment May 30, 2026 12:50am
1 Skipped Deployment
Project Deployment Actions Updated (UTC)
workflow-docs Skipped Skipped May 30, 2026 12:50am

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 29, 2026

🧪 E2E Test Results

Some tests failed

Summary

Passed Failed Skipped Total
❌ ▲ Vercel Production 900 1 67 968
✅ 💻 Local Development 970 0 86 1056
✅ 📦 Local Production 970 0 86 1056
✅ 🐘 Local Postgres 970 0 86 1056
❌ 🌍 Community Worlds 15 69 0 84
✅ 📋 Other 492 0 36 528
Total 4317 70 361 4748

❌ Failed Tests

▲ Vercel Production (1 failed)

example (1 failed):

  • health check (queue-based) - workflow and step endpoints respond to health check messages
🌍 Community Worlds (69 failed)

mongodb-dev (1 failed):

  • dev e2e should rebuild on imported step dependency change

redis-dev (1 failed):

  • dev e2e should rebuild on imported step dependency change

turso-dev (1 failed):

  • dev e2e should rebuild on imported step dependency change

turso (66 failed):

  • addTenWorkflow | wrun_01KSV5M6P0YQ746CTD79AJPKCA
  • addTenWorkflow | wrun_01KSV5M6P0YQ746CTD79AJPKCA
  • wellKnownAgentWorkflow (.well-known/agent) | wrun_01KSV5NAKF026J118XEVKKKZ58
  • should work with react rendering in step
  • promiseAllWorkflow | wrun_01KSV5MDHM4T28F28FG9WYQ8P3
  • promiseRaceWorkflow | wrun_01KSV5MJ4PD923Y5G7DN7R01QP
  • promiseAnyWorkflow | wrun_01KSV5MNCK5XVYVNJEF6MFQ7MS
  • importedStepOnlyWorkflow | wrun_01KSV5NSQZRW0PW4273Y41N9RS
  • readableStreamWorkflow | wrun_01KSV5MQFYPEGZJWA1EW3D7FVS
  • hookWorkflow | wrun_01KSV5N4S41VMBSBJGJSDB4B80
  • hookWorkflow is not resumable via public webhook endpoint | wrun_01KSV5NB51RA04RWKM1AAYGAKH
  • webhookWorkflow | wrun_01KSV5NG4JAKKYGRNYASJY44GM
  • sleepingWorkflow | wrun_01KSV5NQ5CSHF1NQNP1EVNBREA
  • parallelSleepWorkflow | wrun_01KSV5P5GTYYD2VABKAHA62HJR
  • nullByteWorkflow | wrun_01KSV5P9TQER1C0DN4K84J8C4P
  • workflowAndStepMetadataWorkflow | wrun_01KSV5PCWVJYFXGE3935HWBPE9
  • outputStreamWorkflow no startIndex (reads all chunks)
  • outputStreamWorkflow positive startIndex (skips first chunk)
  • outputStreamWorkflow negative startIndex (reads from end)
  • outputStreamWorkflow - getTailIndex and getStreamChunks getTailIndex returns correct index after stream completes
  • outputStreamWorkflow - getTailIndex and getStreamChunks getTailIndex returns -1 before any chunks are written
  • outputStreamWorkflow - getTailIndex and getStreamChunks getStreamChunks returns same content as reading the stream
  • outputStreamInsideStepWorkflow - getWritable() called inside step functions | wrun_01KSV5RRJ09BWM6DHT67S899EP
  • fetchWorkflow | wrun_01KSV5S5Z7PP8XQ1VN5EBJKAA1
  • promiseRaceStressTestWorkflow | wrun_01KSV5S9481QRGKMFQ0R1ATDDG
  • error handling error propagation workflow errors nested function calls preserve message and stack trace
  • error handling error propagation workflow errors cross-file imports preserve message and stack trace
  • error handling error propagation step errors basic step error preserves message and stack trace
  • error handling error propagation step errors cross-file step error preserves message and function names in stack
  • error handling retry behavior regular Error retries until success
  • error handling retry behavior FatalError fails immediately without retries
  • error handling retry behavior RetryableError respects custom retryAfter delay
  • error handling retry behavior maxRetries=0 disables retries
  • error handling catchability FatalError can be caught and detected with FatalError.is()
  • error handling not registered WorkflowNotRegisteredError fails the run when workflow does not exist
  • error handling not registered StepNotRegisteredError fails the step but workflow can catch it
  • error handling not registered StepNotRegisteredError fails the run when not caught in workflow
  • hookCleanupTestWorkflow - hook token reuse after workflow completion | wrun_01KSV5W14VEBSQE28W2RFDYX5Y
  • concurrent hook token conflict - two workflows cannot use the same hook token simultaneously | wrun_01KSV5WAHSZX4HG8PZZ13VH14K
  • hookDisposeTestWorkflow - hook token reuse after explicit disposal while workflow still running | wrun_01KSV5WQ0YZFTS86MA5BHYX0NP
  • stepFunctionPassingWorkflow - step function references can be passed as arguments (without closure vars) | wrun_01KSV5X34RFQR3C06RNAJXAXAR
  • stepFunctionWithClosureWorkflow - step function with closure variables passed as argument | wrun_01KSV5X95CGFK3BV0RNKXVMHH4
  • closureVariableWorkflow - nested step functions with closure variables | wrun_01KSV5XDBJMQ01NGS9F6BN1WPF
  • spawnWorkflowFromStepWorkflow - spawning a child workflow using start() inside a step | wrun_01KSV5XFGTESABQ48KKRV8Y22Y
  • health check (queue-based) - workflow and step endpoints respond to health check messages
  • health check (CLI) - workflow health command reports healthy endpoints
  • pathsAliasWorkflow - TypeScript path aliases resolve correctly | wrun_01KSV5XTGBVXTS7T298F7A9VBE
  • Calculator.calculate - static workflow method using static step methods from another class | wrun_01KSV5XYHVDYPMRVKF6C7MF9ZB
  • AllInOneService.processNumber - static workflow method using sibling static step methods | wrun_01KSV5Y3R6AS2VYFSPH6EYCM93
  • ChainableService.processWithThis - static step methods using this to reference the class | wrun_01KSV5Y9HFJAR99CP60ZNSVFEQ
  • thisSerializationWorkflow - step function invoked with .call() and .apply() | wrun_01KSV5YEMBC8HFVP5A47ZEK17Z
  • customSerializationWorkflow - custom class serialization with WORKFLOW_SERIALIZE/WORKFLOW_DESERIALIZE | wrun_01KSV5YMV3X196VM0FTJ3CQ3X5
  • instanceMethodStepWorkflow - instance methods with "use step" directive | wrun_01KSV5YTD13Y7BW74H5T58MBQT
  • crossContextSerdeWorkflow - classes defined in step code are deserializable in workflow context | wrun_01KSV5Z3TSKG1WN319H74WE900
  • stepFunctionAsStartArgWorkflow - step function reference passed as start() argument | wrun_01KSV5ZB3RZEFR9M1RZBE59XF3
  • cancelRun - cancelling a running workflow | wrun_01KSV5ZG5BGDBMTS9RAH32HSAE
  • cancelRun via CLI - cancelling a running workflow | wrun_01KSV5ZQS5NXP8FTP28HT5JMMC
  • pages router addTenWorkflow via pages router
  • pages router promiseAllWorkflow via pages router
  • pages router sleepingWorkflow via pages router
  • hookWithSleepWorkflow - hook payloads delivered correctly with concurrent sleep | wrun_01KSV600YWBEACJ1A1G68RYC71
  • sleepInLoopWorkflow - sleep inside loop with steps actually delays each iteration | wrun_01KSV60F4YR3T95PR87M8SXD38
  • sleepWithSequentialStepsWorkflow - sequential steps work with concurrent sleep (control) | wrun_01KSV60T2N539SNCKPZ85Z6N1C
  • importMetaUrlWorkflow - import.meta.url is available in step bundles | wrun_01KSV6109CK4YJJE4C5PH6HCZV
  • metadataFromHelperWorkflow - getWorkflowMetadata/getStepMetadata work from module-level helper (#1577) | wrun_01KSV6126D92RWN0HV4XK5NTM3
  • resilient start: addTenWorkflow completes when run_created returns 500 | wrun_01KSV6144X9AHX08HDWRFMS8V5

Details by Category

❌ ▲ Vercel Production
App Passed Failed Skipped
✅ astro 81 0 7
❌ example 80 1 7
✅ express 81 0 7
✅ fastify 81 0 7
✅ hono 81 0 7
✅ nextjs-turbopack 86 0 2
✅ nextjs-webpack 86 0 2
✅ nitro 81 0 7
✅ nuxt 81 0 7
✅ sveltekit 81 0 7
✅ vite 81 0 7
✅ 💻 Local Development
App Passed Failed Skipped
✅ astro-stable 82 0 6
✅ express-stable 82 0 6
✅ fastify-stable 82 0 6
✅ hono-stable 82 0 6
✅ nextjs-turbopack-canary 69 0 19
✅ nextjs-turbopack-stable 88 0 0
✅ nextjs-webpack-canary 69 0 19
✅ nextjs-webpack-stable 88 0 0
✅ nitro-stable 82 0 6
✅ nuxt-stable 82 0 6
✅ sveltekit-stable 82 0 6
✅ vite-stable 82 0 6
✅ 📦 Local Production
App Passed Failed Skipped
✅ astro-stable 82 0 6
✅ express-stable 82 0 6
✅ fastify-stable 82 0 6
✅ hono-stable 82 0 6
✅ nextjs-turbopack-canary 69 0 19
✅ nextjs-turbopack-stable 88 0 0
✅ nextjs-webpack-canary 69 0 19
✅ nextjs-webpack-stable 88 0 0
✅ nitro-stable 82 0 6
✅ nuxt-stable 82 0 6
✅ sveltekit-stable 82 0 6
✅ vite-stable 82 0 6
✅ 🐘 Local Postgres
App Passed Failed Skipped
✅ astro-stable 82 0 6
✅ express-stable 82 0 6
✅ fastify-stable 82 0 6
✅ hono-stable 82 0 6
✅ nextjs-turbopack-canary 69 0 19
✅ nextjs-turbopack-stable 88 0 0
✅ nextjs-webpack-canary 69 0 19
✅ nextjs-webpack-stable 88 0 0
✅ nitro-stable 82 0 6
✅ nuxt-stable 82 0 6
✅ sveltekit-stable 82 0 6
✅ vite-stable 82 0 6
❌ 🌍 Community Worlds
App Passed Failed Skipped
❌ mongodb-dev 4 1 0
❌ redis-dev 4 1 0
❌ turso-dev 4 1 0
❌ turso 3 66 0
✅ 📋 Other
App Passed Failed Skipped
✅ e2e-local-dev-nest-stable 82 0 6
✅ e2e-local-dev-tanstack-start-stable 82 0 6
✅ e2e-local-postgres-nest-stable 82 0 6
✅ e2e-local-postgres-tanstack-start-stable 82 0 6
✅ e2e-local-prod-nest-stable 82 0 6
✅ e2e-local-prod-tanstack-start-stable 82 0 6

📋 View full workflow run


Some E2E test jobs failed:

  • Vercel Prod: failure
  • Local Dev: success
  • Local Prod: success
  • Local Postgres: success
  • Windows: failure

Check the workflow run for details.

expect(result).toEqual(['first', 'second']);
});

it('should let a queued hook payload win when a reused wait completes after the step that installs the race', async () => {
Copy link
Copy Markdown
Contributor

@vercel vercel Bot May 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Three newly added, intentionally-red reproduction tests use plain it(...) instead of it.fails(...), breaking the core package's CI test suite (vitest run src) on this and every subsequent PR until the fix lands.

Fix on Vercel

@vercel vercel Bot temporarily deployed to Preview – workflow-docs May 30, 2026 00:46 Inactive
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant