diff --git a/research/API_CANDIDATES.md b/research/API_CANDIDATES.md new file mode 100644 index 0000000..2f3e822 --- /dev/null +++ b/research/API_CANDIDATES.md @@ -0,0 +1,988 @@ +# TanStack Workflow — Top 3 API Design Candidates + +> Three concrete API designs, each implementing the same canonical workflow end-to-end. Prioritized for type safety, composable primitives, and platform agnosticism — TanStack's core tenets applied to durable execution. + +--- + +## Design philosophy: the non-negotiables + +Every candidate below adheres to the same foundational commitments. The candidates differ only in _how_ they express these. + +1. **Type safety is paramount.** Step input/output types are inferred from closures, never declared twice. Event payload types are inferred from schemas. Workflow return types flow through to call sites and React hooks. Zero casts in user code. No `any` leakage at boundaries. +2. **Composable primitives.** `step`, `sleep`, `waitForEvent`, `invoke`, `parallel`, `race`, `compensate` are first-class units. They can be defined once and reused across workflows. +3. **Platform agnosticism.** The workflow _definition_ is decoupled from storage, runtime, and deployment target. Adapters wire the engine to Postgres / SQLite / D1 / Durable Objects / Redis / in-memory at startup. The same definition runs on Vercel, Cloudflare Workers, Fly machines, Bun servers, and Node containers. +4. **Headless / framework-agnostic core.** Pure TS in `@tanstack/workflow-core` with zero framework deps. React / Solid / Vue / Svelte bindings ship as separate packages with `useWorkflow*` hooks. +5. **No decorators, no class-based APIs.** Function-first, like the rest of TanStack. +6. **Standard Schema, observable run streams, updater patterns.** Borrow existing TanStack design language wherever possible. +7. **Auto-versioned executions.** Every run pins to the code SHA that started it; new deploys can't break in-flight workflows. +8. **Storage is the only source of truth.** No in-process workflow→workflow calls. Everything goes through the storage adapter so cross-runtime invocation works. + +These eight commitments are settled. The candidates below differ on _how_ they express the workflow, not on what guarantees the engine provides. + +--- + +## The canonical example + +To make comparisons concrete, every candidate implements the same workflow. It exercises everything the API needs to cover. + +**User onboarding flow:** + +1. Load the user's profile (step with typed return) +2. Kick off a child workflow to send onboarding emails (subworkflow invoke, detached) +3. Sleep 24 hours +4. Race up to 7 days for one of: an `approved` event, a `rejected` event, or timeout +5. If approved: charge their card + activate their account, with saga compensation if activate fails +6. If rejected or timed out: mark inactive +7. Return a discriminated-union result the call site can pattern-match against + +In TypeScript, the expected signature of the final exported workflow is: + +```typescript +type OnboardWorkflow = Workflow< + // Input + { userId: string }, + // Output (discriminated union) + | { status: 'active'; userId: string; by: string; chargeId: string } + | { status: 'inactive'; userId: string; reason: 'rejected' | 'timed_out' }, + // Events + { approved: { approverId: string }; rejected: { reason: string } } +> +``` + +All three candidates must produce a workflow with this inferred shape from the user's code, with zero explicit type annotations beyond schema declarations. + +Shared imports for all three candidates: + +```typescript +import { z } from 'zod' +import { + loadProfile, // (userId: string) => Promise<{ id: string; cardId: string; email: string }> + chargeCard, // (cardId: string, amount: number) => Promise<{ chargeId: string }> + refundCharge, // (chargeId: string) => Promise + activateAccount, // (userId: string) => Promise + markInactive, // (userId: string) => Promise + sendEmail, // (userId: string, template: 'welcome' | 'day-2' | 'day-7') => Promise +} from '~/integrations' +``` + +--- + +# Candidate 1 — Definition-object + ctx primitives + +> **The conservative pick.** Most familiar to TanStack users. Mirrors `useQuery({ queryKey, queryFn })`, `createCollection({...})`, `createRoute({...})`. Lowest implementation risk. Ships first, debuggable, no build-toolchain dependency. + +## Shape + +A workflow is a definition object passed to `createWorkflow()`. The `run` function receives a `ctx` object exposing every durable primitive as a method. Step identity comes from explicit string labels (with an optional build transform that derives them from lexical position). + +## Hello world + +```typescript +import { createWorkflow } from '@tanstack/workflow' + +export const hello = createWorkflow({ + name: 'hello', + run: async (ctx, input: { name: string }) => { + await ctx.step('greet', () => console.log(`hi, ${input.name}`)) + return { greeted: input.name } + }, +}) +``` + +The exported `hello` is both the definition and the typed handle. No registration step, no class, no decorator. + +## Subworkflow + +```typescript +const emailSeries = createWorkflow({ + name: 'email-series', + input: z.object({ userId: z.string() }), + run: async (ctx, { userId }) => { + await ctx.step('welcome', () => sendEmail(userId, 'welcome')) + await ctx.sleep('to-day-2', '1d') + await ctx.step('day-2', () => sendEmail(userId, 'day-2')) + await ctx.sleep('to-day-7', '5d') + await ctx.step('day-7', () => sendEmail(userId, 'day-7')) + }, +}) +``` + +## Reusable step primitive + +```typescript +import { defineStep } from '@tanstack/workflow' + +const charge = defineStep({ + name: 'charge-card', + retries: 5, + backoff: { kind: 'exponential', baseMs: 1000, maxMs: 60_000 }, + run: async (cardId: string, amount: number) => chargeCard(cardId, amount), + // ^? returns { chargeId: string } + compensate: async (result) => refundCharge(result.chargeId), +}) +``` + +`charge` is callable as `charge(ctx, cardId, amount)` — explicit ctx threading. It carries its retry policy, name, and compensation logic with it. + +## Full onboarding workflow + +```typescript +export const onboard = createWorkflow({ + name: 'onboard', + input: z.object({ userId: z.string() }), + events: { + approved: z.object({ approverId: z.string() }), + rejected: z.object({ reason: z.string() }), + }, + retries: 2, + timeout: '14d', + run: async (ctx, { userId }) => { + const profile = await ctx.step('load-profile', () => loadProfile(userId)) + // ^? { id: string; cardId: string; email: string } + + await ctx.invoke(emailSeries, { userId }, { detached: true }) + + await ctx.sleep('cooldown', '1d') + + const decision = await ctx.race( + { + approved: ctx.waitForEvent('approved'), + rejected: ctx.waitForEvent('rejected'), + }, + { timeout: '7d' }, + ) + // ^? { type: 'approved'; data: { approverId: string } } + // | { type: 'rejected'; data: { reason: string } } + // | { type: 'timeout' } + + if (decision.type === 'approved') { + const result = await ctx.saga(async (s) => { + const c = await s.step(charge, profile.cardId, 999) + // ^? { chargeId: string } + const a = await s.step('activate', () => activateAccount(userId)) + return { c, a } + }) + return { + status: 'active' as const, + userId, + by: decision.data.approverId, + chargeId: result.c.chargeId, + } + } + + await ctx.step('mark-inactive', () => markInactive(userId)) + return { + status: 'inactive' as const, + userId, + reason: + decision.type === 'rejected' + ? ('rejected' as const) + : ('timed_out' as const), + } + }, +}) +``` + +## Inferred type at call site + +```typescript +const handle = await workflow.start(onboard, { userId: '123' }) +// ^? WorkflowHandle< +// | { status: 'active'; userId: string; by: string; chargeId: string } +// | { status: 'inactive'; userId: string; reason: 'rejected' | 'timed_out' } +// > + +const result = await handle.result() +// ^? same discriminated union +if (result.status === 'active') { + console.log(result.chargeId) // type-narrowed + console.log(result.reason) // TS error +} +``` + +Events are typed at the publish call site: + +```typescript +await workflow.publish( + onboard.events.approved, + { approverId: 'admin-7' }, + { runId }, +) +// ^^^ payload type checked against the schema +``` + +## React binding + +```typescript +import { useWorkflowRun, useWorkflowStream } from '@tanstack/react-workflow' +import { onboard } from '~/workflows/onboard' + +function OnboardingStatus({ runId }: { runId: string }) { + const run = useWorkflowRun(onboard, runId) + // ^? UseWorkflowRunResult< + // | { status: 'active'; ... } + // | { status: 'inactive'; ... } + // > + + if (run.state === 'running' && run.currentStep === 'load-profile') { + return + } + if (run.state === 'completed' && run.result.status === 'active') { + return + } + if (run.state === 'failed') return + return null +} +``` + +A streaming variant for live updates: + +```typescript +function LiveProgress({ runId }: { runId: string }) { + const stream = useWorkflowStream(onboard, runId) + // ^? events stream — typed step transitions, event arrivals, completions + return +} +``` + +## Engine wiring (platform agnosticism) + +```typescript +// workflow.server.ts — Production: Vercel + Neon +import { createEngine } from '@tanstack/workflow' +import { postgresStorage } from '@tanstack/workflow-postgres' +import { cronRuntime } from '@tanstack/workflow-cron' +import { onboard, emailSeries } from './workflows' + +export const workflow = createEngine({ + storage: postgresStorage({ client: neonClient }), + runtime: cronRuntime({ batchSize: 50, budgetMs: 25_000 }), + workflows: [onboard, emailSeries], +}) + +// Same workflows, Cloudflare DO: +export const workflow = createEngine({ + storage: durableObjectStorage({ binding: env.WORKFLOW_DO }), + runtime: durableObjectRuntime(), + workflows: [onboard, emailSeries], +}) + +// Same workflows, in-memory for dev/test: +export const workflow = createEngine({ + storage: memoryStorage(), + runtime: memoryRuntime(), + workflows: [onboard, emailSeries], +}) +``` + +The workflow definitions don't change. Only the engine adapters do. + +## How types flow through this design + +``` +input schema ───► ctx.run's second arg + │ + ▼ +events schema map ───► ctx.waitForEvent('name') return type + │ + ▼ +step closure return ───► await ctx.step('name', fn) return type + │ + ▼ +run() return type ───► Workflow + │ + ▼ + WorkflowHandle + │ + ▼ + useWorkflowRun().result +``` + +Every layer infers from the layer above. Zero manual generic specification by the user. + +## Composable primitive: `defineStep` + +```typescript +type StepDef = { + name: string + retries?: number + backoff?: BackoffPolicy + timeout?: Duration + run: (...args: TArgs) => Promise + compensate?: (result: TReturn) => Promise +} + +function defineStep( + def: StepDef, +): (ctx: WorkflowContext, ...args: TArgs) => Promise +``` + +A defined step is a function that takes ctx as the first arg. This is the explicit form. Composability is good — you can extract any step into a reusable primitive — but every caller has to thread ctx through. + +## Pros and cons + +**Pros:** + +- Most TanStack-familiar (definition-object pattern from Query, DB, Router, Form) +- All inference flows through a single object — easy to reason about +- Stack traces are clean — `ctx.step` calls appear by name +- No build-tool dependency required +- Easy to test — `run` is just a function +- Easy to migrate to from Inngest (very similar API) +- Adapter pattern is straightforward + +**Cons:** + +- String labels are a footgun — `ctx.step('charge', ...)` → rename to `'charge-card'` and in-flight runs break +- `ctx` parameter must thread through every helper function — composability tax +- `ctx.step('name', fn)` is verbose compared to `step(fn)` +- The repetition of step names in code adds visual noise + +**Mitigations:** + +- Optional `@tanstack/workflow-vite` / SWC / esbuild plugin that derives step IDs from lexical position at build time, allowing `ctx.step(fn)` without the label +- `defineStep` carries the name so reusable primitives don't repeat it +- Runtime check: if a workflow tries to call an unknown step during replay, throw a clear "Workflow non-determinism" error pointing to the missing step + +--- + +# Candidate 2 — Builder chain with middleware + +> **The TanStack Start-shaped pick.** Mirrors `createServerFn().validator().middleware().handler()`. Brings _middleware composition_ as a first-class concern — workflows are recipes built up incrementally with reusable middleware that can extend the context, add observability, enforce auth, or inject dependencies. Each method call narrows types. + +## Shape + +A workflow is built by chaining typed methods. Each call returns a more-narrowly-typed builder. The chain ends with `.handler()` which seals the workflow. The distinctive feature: `.middleware()` and `.use()` for composing cross-cutting concerns into the workflow's context. + +## Hello world + +```typescript +import { createWorkflow } from '@tanstack/workflow' + +export const hello = createWorkflow() + .name('hello') + .input(z.object({ name: z.string() })) + .handler(async (ctx, { name }) => { + await ctx.step('greet', () => console.log(`hi, ${name}`)) + return { greeted: name } + }) +``` + +## Reusable middleware + +This is the distinctive primitive in Candidate 2 — middleware that wraps the workflow with cross-cutting behavior and can extend `ctx`: + +```typescript +import { createMiddleware } from '@tanstack/workflow' + +// Add auth context — narrows ctx to require it downstream +const requireUser = createMiddleware().handler(async ({ ctx, next, input }) => { + const user = await ctx.step('load-user', () => loadProfile(input.userId)) + return next({ ctx: { ...ctx, user } }) +}) + +// Add tracing — wraps every step in a span +const traced = createMiddleware().handler(async ({ ctx, next }) => { + const tracer = ctx.tracer.startWorkflow(ctx.workflowName, ctx.runId) + try { + return await next({ ctx: { ...ctx, tracer } }) + } finally { + tracer.end() + } +}) + +// Add structured logging +const logged = createMiddleware().handler(async ({ ctx, next }) => { + ctx.logger.info('workflow.start', { runId: ctx.runId }) + const result = await next({ ctx }) + ctx.logger.info('workflow.complete', { runId: ctx.runId }) + return result +}) +``` + +## Reusable step primitive (same as Candidate 1) + +```typescript +const charge = defineStep({ + name: 'charge-card', + retries: 5, + backoff: { kind: 'exponential', baseMs: 1000 }, + run: async (cardId: string, amount: number) => chargeCard(cardId, amount), + compensate: async (result) => refundCharge(result.chargeId), +}) +``` + +## Subworkflow + +```typescript +const emailSeries = createWorkflow() + .name('email-series') + .input(z.object({ userId: z.string() })) + .middleware(traced) + .handler(async (ctx, { userId }) => { + await ctx.step('welcome', () => sendEmail(userId, 'welcome')) + await ctx.sleep('to-day-2', '1d') + await ctx.step('day-2', () => sendEmail(userId, 'day-2')) + await ctx.sleep('to-day-7', '5d') + await ctx.step('day-7', () => sendEmail(userId, 'day-7')) + }) +``` + +## Full onboarding workflow + +```typescript +export const onboard = createWorkflow() + .name('onboard') + .input(z.object({ userId: z.string() })) + .event('approved', z.object({ approverId: z.string() })) + .event('rejected', z.object({ reason: z.string() })) + .retries(2) + .timeout('14d') + .middleware(logged) + .middleware(traced) + .middleware(requireUser) // adds ctx.user; downstream gets typed access + .handler(async (ctx, { userId }) => { + // ctx.user is now typed thanks to requireUser middleware + const profile = ctx.user + // ^? { id: string; cardId: string; email: string } + + await ctx.invoke(emailSeries, { userId }, { detached: true }) + + await ctx.sleep('cooldown', '1d') + + const decision = await ctx.race( + { + approved: ctx.waitForEvent('approved'), + rejected: ctx.waitForEvent('rejected'), + }, + { timeout: '7d' }, + ) + + if (decision.type === 'approved') { + const result = await ctx.saga(async (s) => { + const c = await s.step(charge, profile.cardId, 999) + const a = await s.step('activate', () => activateAccount(userId)) + return { c, a } + }) + return { + status: 'active' as const, + userId, + by: decision.data.approverId, + chargeId: result.c.chargeId, + } + } + + await ctx.step('mark-inactive', () => markInactive(userId)) + return { + status: 'inactive' as const, + userId, + reason: + decision.type === 'rejected' + ? ('rejected' as const) + : ('timed_out' as const), + } + }) +``` + +## Type narrowing through the chain + +Each method call adds to the builder's type signature. The handler at the end sees the full accumulated shape: + +```typescript +declare const onboard: WorkflowBuilder< + // Input + { userId: string }, + // Events + { approved: { approverId: string }; rejected: { reason: string } }, + // Context extensions accumulated from middleware + { user: { id: string; cardId: string; email: string } } +> +``` + +After `.handler(fn)` it becomes: + +```typescript +declare const onboard: Workflow< + { userId: string }, + | { status: 'active'; userId: string; by: string; chargeId: string } + | { status: 'inactive'; userId: string; reason: 'rejected' | 'timed_out' }, + { approved: { approverId: string }; rejected: { reason: string } } +> +``` + +## Middleware composition pattern + +The big idea: reusable workflow _layers_ you compose into specific workflows. This becomes huge for organization-wide concerns: + +```typescript +// Define once, reuse everywhere +export const orgConventions = compose( + logged, + traced, + requireUser, + withFeatureFlags, +) + +// Apply to many workflows +export const onboard = createWorkflow() + .name('onboard') + .middleware(orgConventions) +// ... rest + +export const offboard = createWorkflow() + .name('offboard') + .middleware(orgConventions) +// ... rest +``` + +This is functionally equivalent to `createServerFn` + middleware patterns in TanStack Start — the user gets a feeling of familiarity. + +## Engine wiring (identical to Candidate 1) + +```typescript +export const workflow = createEngine({ + storage: postgresStorage({ client: pool }), + runtime: cronRuntime({ batchSize: 50 }), + workflows: [onboard, emailSeries], +}) +``` + +## Pros and cons + +**Pros:** + +- Middleware composition is a first-class, distinctive primitive — no other workflow library does this cleanly +- Mirrors TanStack Start's `createServerFn` patterns, so devs already in the TanStack ecosystem feel at home +- Incremental type narrowing makes the chain self-documenting +- Cross-cutting concerns (auth, tracing, logging, feature flags) compose naturally +- The chain's terminal `.handler()` enforces a clear "done building" boundary +- No build transform required + +**Cons:** + +- Verbose — every workflow definition is several method calls deep +- Type narrowing through deep chains can be slow on the TS compiler (Effect-style chains have this problem at scale) +- The "middleware extends ctx" pattern is powerful but has a learning curve +- Same string-label step ID footgun as Candidate 1 +- Step-level composability is still tied to explicit ctx threading +- Refactoring a workflow (reordering middleware) can shift ctx shape in non-obvious ways + +**Where it wins:** Organizations with shared cross-cutting concerns (auth, tracing, feature flags) and a desire for consistent workflow patterns across many definitions. Startup teams will love this; library users will love this. + +**Where it loses:** Solo devs writing small workflows feel the ceremony. Single-file scripts feel over-engineered. + +--- + +# Candidate 3 — Implicit context, hooks-inspired primitives + +> **The DX bet.** Build-time AST transform threads context implicitly via AsyncLocalStorage. Primitives are imported functions called directly inside a workflow body — no `ctx` parameter, no string labels. Step identity comes from lexical AST position. Highest composability, lowest boilerplate, biggest implementation investment. + +## Shape + +A workflow is a plain async function passed to `workflow()`. Inside the body, you call imported primitives (`step`, `sleep`, `waitForEvent`, `invoke`) directly. The build transform inserts a stable identity into each call site based on the AST position; the runtime uses AsyncLocalStorage to provide the workflow context to each primitive. Without the transform, the runtime falls back to source-position via `Error.stack` (slower; explicit IDs always available as escape hatch). + +This is the most "TanStack Query hooks"-feeling design: primitives compose like hooks, but for durable execution. + +## Hello world + +```typescript +import { workflow, step } from '@tanstack/workflow' + +export const hello = workflow(async (input: { name: string }) => { + await step(() => console.log(`hi, ${input.name}`)) + return { greeted: input.name } +}) +``` + +That's the entire program. No name, no schema, no ctx parameter. The build transform inserts step identity. The runtime infers the workflow name from the export name (or file path) — overridable via the second argument. + +## With explicit options + +```typescript +export const hello = workflow( + async (input: { name: string }) => { + await step(() => console.log(`hi, ${input.name}`)) + return { greeted: input.name } + }, + { name: 'hello', retries: 3, timeout: '1m' }, +) +``` + +## Reusable step primitive + +The killer feature: a step _is just a function_. You write a normal async function that uses `step()` internally; callers invoke it like any other function. Context is threaded via AsyncLocalStorage. No ctx threading. No boilerplate. + +```typescript +import { step, defineStep } from '@tanstack/workflow' + +// Inline composition — just write a function +async function chargeWithRetry(cardId: string, amount: number) { + return step(() => chargeCard(cardId, amount), { + retries: 5, + backoff: { kind: 'exponential', baseMs: 1000 }, + compensate: (r) => refundCharge(r.chargeId), + }) +} + +// Or hoisted form with metadata +const chargeWithRetry = defineStep( + async (cardId: string, amount: number) => chargeCard(cardId, amount), + { retries: 5, backoff: { kind: 'exponential', baseMs: 1000 } }, +) +``` + +Both `chargeWithRetry` calls are just function calls. No ctx parameter: + +```typescript +const result = await chargeWithRetry(profile.cardId, 999) +// ^? { chargeId: string } +``` + +This is the most composable form possible. Helpers compose like any other async functions. + +## Subworkflow + +```typescript +const emailSeries = workflow(async (input: { userId: string }) => { + await step(() => sendEmail(input.userId, 'welcome')) + await sleep('1d') + await step(() => sendEmail(input.userId, 'day-2')) + await sleep('5d') + await step(() => sendEmail(input.userId, 'day-7')) +}) +``` + +## Full onboarding workflow + +```typescript +import { + workflow, + step, + sleep, + waitForEvent, + race, + invoke, + saga, + defineEvent, +} from '@tanstack/workflow' + +// Events declared as standalone typed primitives +const approved = defineEvent('approved', z.object({ approverId: z.string() })) +const rejected = defineEvent('rejected', z.object({ reason: z.string() })) + +export const onboard = workflow( + async (input: { userId: string }) => { + const profile = await step(() => loadProfile(input.userId)) + // ^? { id: string; cardId: string; email: string } + + await invoke(emailSeries, { userId: input.userId }, { detached: true }) + + await sleep('1d') + + const decision = await race( + { + approved: waitForEvent(approved), + rejected: waitForEvent(rejected), + }, + { timeout: '7d' }, + ) + // ^? { type: 'approved'; data: { approverId: string } } + // | { type: 'rejected'; data: { reason: string } } + // | { type: 'timeout' } + + if (decision.type === 'approved') { + const result = await saga(async () => { + const c = await chargeWithRetry(profile.cardId, 999) + const a = await step(() => activateAccount(input.userId)) + return { c, a } + }) + return { + status: 'active' as const, + userId: input.userId, + by: decision.data.approverId, + chargeId: result.c.chargeId, + } + } + + await step(() => markInactive(input.userId)) + return { + status: 'inactive' as const, + userId: input.userId, + reason: + decision.type === 'rejected' + ? ('rejected' as const) + : ('timed_out' as const), + } + }, + { + name: 'onboard', + input: z.object({ userId: z.string() }), + events: [approved, rejected], + retries: 2, + timeout: '14d', + }, +) +``` + +The body reads like normal application code. No `ctx.step('foo', () => …)` ceremony. No string labels. The build transform makes step identity stable across renames. + +## How the build transform works + +For each `step(fn)`, `sleep(...)`, `waitForEvent(...)`, `invoke(...)`, `race({...})`, `saga(...)` call inside a `workflow(...)` body, the transform inserts a stable `__id__` argument derived from the AST position within the workflow: + +```typescript +// User writes: +await step(() => loadProfile(input.userId)) + +// Transform produces: +await step(() => loadProfile(input.userId), { __id__: 'onboard:0' }) +``` + +The IDs are: + +- Stable across renames of the workflow function (positional, not name-based) +- Stable across whitespace/comment changes (AST-based, not line-based) +- Stable across JS minification (the `__id__` argument survives) +- Visible to devtools via source maps + +Without the transform, a runtime fallback uses `Error.stack` to derive a position-based ID (~10× slower, less reliable across bundlers, fine for dev). Users can always pass explicit IDs as the escape hatch: `step(() => loadProfile(...), { id: 'load-profile' })`. + +## Build tool integrations + +- `@tanstack/workflow-vite` — Vite plugin +- `@tanstack/workflow-swc` — SWC plugin +- `@tanstack/workflow-esbuild` — esbuild plugin +- `@tanstack/workflow-babel` — Babel preset for Webpack/older toolchains +- `@tanstack/workflow-rollup` / `@tanstack/workflow-rolldown` — Rollup family + +The transform logic lives in one place (`@tanstack/workflow-transform-core`), the per-bundler packages are thin shims. + +## React binding + +The hook API mirrors Candidate 1 — bindings don't change with definition style: + +```typescript +function OnboardingStatus({ runId }: { runId: string }) { + const run = useWorkflowRun(onboard, runId) + if (run.state === 'completed' && run.result.status === 'active') { + return + } + return null +} +``` + +## Engine wiring (identical to Candidate 1) + +```typescript +export const workflow = createEngine({ + storage: postgresStorage({ client: pool }), + runtime: cronRuntime({ batchSize: 50 }), + workflows: [onboard, emailSeries], +}) +``` + +## How types flow through this design + +Identical type-flow guarantees to Candidates 1 and 2, just with cleaner syntax at the call sites. The `workflow()` factory is overloaded to extract input/event/return types from the function signature + options: + +```typescript +function workflow( + fn: (input: TInput) => Promise, + options?: WorkflowOptions, +): Workflow +``` + +The `step` primitive: + +```typescript +function step( + fn: () => Promise, + options?: StepOptions, +): Promise +``` + +`step` looks up the workflow context from AsyncLocalStorage and journals the result. No ctx parameter needed because the workflow's `run` is wrapped in a `AsyncLocalStorage.run()` scope. + +## The composability win, illustrated + +In Candidates 1 and 2, a helper that wants to use steps must take ctx: + +```typescript +// Candidate 1/2 +async function ensureProfile(ctx: WorkflowContext, userId: string) { + let profile = await ctx.step('try-load', () => loadProfile(userId)) + if (!profile) { + await ctx.step('create-profile', () => createProfile(userId)) + profile = await ctx.step('reload', () => loadProfile(userId)) + } + return profile +} + +// Caller: +const profile = await ensureProfile(ctx, userId) +``` + +In Candidate 3, the helper is just an async function — context flows ambiently: + +```typescript +// Candidate 3 +async function ensureProfile(userId: string) { + let profile = await step(() => loadProfile(userId)) + if (!profile) { + await step(() => createProfile(userId)) + profile = await step(() => loadProfile(userId)) + } + return profile +} + +// Caller: +const profile = await ensureProfile(userId) +``` + +The helper is callable from inside any workflow body, from any depth of nesting, from inside loops, from inside other helpers. No ctx threading ever. + +This is **why** Candidate 3 wins on the "composable primitives" axis — the primitives compose at the _function_ level, not the _method-on-context_ level. + +## The non-determinism guardrail + +Because primitives use AsyncLocalStorage, calling `step()` _outside_ a workflow body must throw a clear error: + +```typescript +// Anywhere outside a workflow body: +await step(() => doStuff()) +// ↳ Error: step() called outside a workflow context. +// Did you mean to wrap this code in workflow(...) ? +``` + +The error is loud, actionable, and impossible to silently swallow. + +## Pros and cons + +**Pros:** + +- **Highest composability.** Helpers are just async functions. They compose anywhere. +- **Lowest boilerplate.** Workflow bodies read like normal application code. +- **Stable step identity for free** (with transform). Renames don't break in-flight runs. +- **Cleanest debugger experience.** Stack traces show your code, not framework methods. +- **Strongest "TanStack DX" feel** — the same conceptual leap that made `useQuery()` win over `connect()(Component)` in 2019. +- **Same type safety guarantees** as Candidates 1 and 2. + +**Cons:** + +- **Build-tool dependency.** Must ship plugins for every major bundler. The maintenance burden is real — each major bundler version bump (Vite 7, esbuild API change, SWC migration) requires updates. +- **AsyncLocalStorage runtime cost.** Small but real per-step overhead. +- **"Magical" — context flow is invisible.** Stack traces show normal code, but the _why does this step have this ID_ answer is "the transform did it." Onboarding requires teaching the transform. +- **Edge runtime gotchas.** AsyncLocalStorage works on Node 16+, Bun, Deno, and Cloudflare Workers (with the `nodejs_compat` flag). Workers without that flag would need an explicit polyfill. +- **"Calling step() outside a workflow" must throw.** Easy to do accidentally in tests/scripts. + +**Mitigations:** + +- Provide a `runtimeOnly` fallback (no transform) that uses explicit IDs or stack-trace-derived IDs +- Provide a `runWithWorkflow(definition, input, fn)` test helper that sets up the AsyncLocalStorage scope +- Document the "must call inside workflow body" invariant prominently +- Ship a CLI that detects misuse: `npx tanstack-workflow check` + +--- + +# Side-by-side comparison + +## The same step extracted as a helper + +```typescript +// Candidate 1 — explicit ctx threading +async function chargeWithRetry( + ctx: WorkflowContext, + cardId: string, + amount: number, +) { + return ctx.step('charge', () => chargeCard(cardId, amount), { retries: 5 }) +} +// Caller: const r = await chargeWithRetry(ctx, cardId, 999) + +// Candidate 2 — same as Candidate 1, plus middleware available +async function chargeWithRetry( + ctx: WorkflowContext, + cardId: string, + amount: number, +) { + return ctx.step('charge', () => chargeCard(cardId, amount), { retries: 5 }) +} +// Caller: const r = await chargeWithRetry(ctx, cardId, 999) + +// Candidate 3 — pure function composition +async function chargeWithRetry(cardId: string, amount: number) { + return step(() => chargeCard(cardId, amount), { retries: 5 }) +} +// Caller: const r = await chargeWithRetry(cardId, 999) +``` + +## Trade-offs at a glance + +| Concern | Candidate 1 | Candidate 2 | Candidate 3 | +| ---------------------------- | ----------------------------------- | ---------------------------------- | --------------------------------------------- | +| Type safety | ✅ Strong, definition-object-driven | ✅ Strong, chain-narrowed | ✅ Strong, inferred from primitive signatures | +| Step-level composability | ⚠️ Requires explicit ctx threading | ⚠️ Requires explicit ctx threading | ✅ Pure function composition | +| Cross-cutting composability | ⚠️ Manual wrapping | ✅ Middleware chain | ⚠️ Manual wrapping | +| Platform agnosticism | ✅ Runtime only | ✅ Runtime only | ⚠️ Requires build transform integration | +| Familiarity to TanStack devs | ✅ Highest (Query/DB/Router) | ✅ High (Start createServerFn) | ⚠️ Novel | +| Step rename safety | ⚠️ Footgun (string labels) | ⚠️ Footgun (string labels) | ✅ Safe (lexical position) | +| Boilerplate | Medium | Highest | Lowest | +| Implementation risk | Low | Low | High (build toolchain) | +| Debugger / stack traces | Clean | Clean | Cleanest | +| Time-to-1.0 | Fastest | Fast | Slowest (transform work) | +| AsyncLocalStorage required | No | No | Yes | +| TS compiler stress | Low | Medium-high (deep chains) | Low | + +## Hello world line count + +- Candidate 1: 6 lines +- Candidate 2: 7 lines +- Candidate 3: 4 lines + +## Full onboarding workflow line count (signal only — measured from the examples above, excluding shared imports) + +- Candidate 1: ~45 lines +- Candidate 2: ~50 lines +- Candidate 3: ~40 lines + +The line count is close. The difference is felt in _helper_ code, where Candidate 3 saves dozens of lines per helper because ctx threading is gone. + +--- + +# Recommendation + +## Phased adoption — the hybrid + +The strongest play is to ship Candidate 1 as the foundation and graduate to Candidate 3 sugar via an opt-in build transform. This preserves platform agnosticism (the library works without the transform) while delivering best-in-class DX for users who opt in. + +**Phase 1 (1.0):** Ship Candidate 1 — `createWorkflow({ name, input, run })` with explicit `ctx`. Battle-test the engine, storage adapters, runtime adapters, devtools. No transform. Lowest risk. + +**Phase 2 (1.x):** Add an optional `@tanstack/workflow-vite` (and SWC / esbuild / Babel) plugin that lets users omit the string-label argument on `ctx.step(fn)`. Lexical-position ID derivation. Still Candidate 1 shape; renames become safe; users don't have to change anything. + +**Phase 3 (2.0 or experimental):** Introduce Candidate 3 — `workflow(async (input) => …)` with implicit context. Users who want maximum DX opt in. Backwards compatibility maintained by keeping Candidate 1's API in the package. + +**Skip Candidate 2** unless middleware composition becomes a strong community ask. The TanStack Start `createServerFn` shape is great for HTTP handlers but workflow-level middleware tends to fragment into many tiny wrappers in practice (auth + tracing + logging + retry + idempotency); composing them via middleware chains becomes hard to read. The same benefits can be delivered via Candidate 1's `defineStep` carrying its own behavior — and via explicit utility functions. + +## If you have to pick one + +**Pick Candidate 1.** It's the safest path to 1.0, the closest match to existing TanStack conventions, and the most likely to land cleanly across every deployment target without toolchain risk. Ship it. Iterate the engine quality, the devtools, the adapters, the docs. Win the durability quality + DX battle on those merits. + +Then layer in the lexical-position transform as Phase 2 — it's a strict improvement over explicit string labels, with zero breaking changes to the runtime API. + +Consider Candidate 3 a long-term ambition. It's the right destination, but only after the engine itself is rock-solid and the transform is battle-tested across bundlers. The "implicit context" leap is the same magnitude of conceptual jump that React Hooks were in 2018-2019 — worth doing, worth doing carefully, not worth rushing. + +## What to validate before committing + +1. **Prototype Candidate 1's `defineStep` ergonomics.** Write 10 real workflow examples; see if explicit ctx threading bothers contributors in practice. +2. **Prototype Candidate 3's build transform against Vite + esbuild + SWC.** A weekend spike is enough to know if the toolchain integration is feasible at TanStack's quality bar. +3. **Test middleware composition in a Candidate 2 prototype.** Decide if it's a feature or a distraction. The bar: does a real org-wide-conventions composition actually compose cleanly across 5+ workflows without TS performance regressions? +4. **Test inference depth.** Race + saga + sub-workflow nested 3-deep — does TS infer the full discriminated union? Where does inference fail and require annotations? +5. **Validate the engine's primitives are stable.** All three candidates share the same engine. Build the engine first; the API skin can change. + +## Final word + +The three designs above are all _viable_. None of them are wrong. The differences are about _how much DX risk you're willing to take vs. how fast you want to ship_. TanStack's history is shipping safe defaults and iterating toward magic — Query started without Suspense, Router started without code generation, Start started without RSC. The same playbook applies: **ship Candidate 1, layer in the transform, eventually reach Candidate 3 syntax.** + +The hedge isn't picking the perfect API on day one. The hedge is shipping the engine + adapters + devtools + Start integration before anyone else does it with TanStack-grade type inference and headless framework adapters. The API skin can evolve. + +Ship Candidate 1. Ship it well. The market wants this. diff --git a/research/EXPLICIT_VERSIONING.md b/research/EXPLICIT_VERSIONING.md new file mode 100644 index 0000000..0542b9f --- /dev/null +++ b/research/EXPLICIT_VERSIONING.md @@ -0,0 +1,390 @@ +# Explicit Versioning — Drop the Fingerprint, Lock the Content + +> Alternative design: instead of runtime fingerprinting, require explicit versions on every workflow, keep old versions alive in the registry alongside current, and use a build-time lock file to detect accidental modification. Cleaner, simpler, and eliminates the spurious-refusal problem entirely. + +## The core idea + +A workflow declares its version explicitly. Every run is pinned to a version at start. Old versions stay loaded alongside the current one until they drain. The developer's job is to bump the version when behavior changes; the toolchain's job is to catch them when they forget. + +```typescript +export const onboard = defineWorkflow({ + name: 'onboard', + version: 'v3', + previousVersions: [ + { version: 'v1', run: onboardV1Body }, + { version: 'v2', run: onboardV2Body }, + ], + input: z.object({ userId: z.string() }), + agents: { ... }, + run: async function* ({ input, agents }) { + // current behavior + }, +}) +``` + +That's the whole mechanic. No `patched()`. No fingerprint. No AST walker. No spurious refusals. + +## Why this is better than fingerprinting + +Alem's current design tries to _detect_ skew. This design tries to _prevent_ it. + +| Concern | Fingerprint approach | Explicit-version approach | +| --------------------------------------------------------------- | -------------------------------------------------------- | ----------------------------------------------------- | +| Prettier reformat kills runs | Yes (whitespace-sensitive) | No — version didn't change | +| Comment / log change kills runs | Yes | No | +| Cross-build minifier drift | Yes | No | +| Adding behavior to active path silently corrupts in-flight runs | Patch mode: yes | No — old version still resumes | +| Developer overhead | `patched()` gates around every change in patch mode | Bump a version string | +| Toolchain dependency | AST walker, build-time plugin, ESLint rule for `patched` | One ESLint rule + optional lock file | +| Mental model | "Hash my source code, refuse if it differs" | "Old code resumes old runs; new code starts new runs" | +| Multi-version cost | Memory: one workflow definition | Memory: N workflow definitions (one per live version) | +| Drain semantics | Manual via `selectWorkflowVersion` | Automatic — registry routes by run's pinned version | +| Failure mode when developer forgets | Patch mode: silent corruption | Lint rule errors at commit time | + +The trade-off is **explicitness over magic**. The developer is in charge of declaring when behavior has changed. In exchange, every spurious-refusal failure mode goes away, and the operational story becomes trivial: "ship code, drain old versions, remove when empty." + +## The mechanical model + +### Definition + +```typescript +defineWorkflow({ + name: 'onboard', + version: 'v3', // required, unique per name + previousVersions: [ // optional; for in-flight runs + { version: 'v1', run: bodyV1 }, + { version: 'v2', run: bodyV2 }, + ], + input: ..., + output: ..., + state: ..., + agents: ..., + run: async function* () { ... }, // the v3 body +}) +``` + +### Engine registration + +```typescript +const engine = createEngine({ + workflows: [onboard, sendEmails, ...], + storage: postgresStorage({ ... }), + runtime: workerRuntime(), +}) +``` + +The engine builds a `(name, version) → handler` lookup table from each workflow's `version` + every entry in its `previousVersions`. Internally: + +```typescript +const registry = { + 'onboard:v1': onboard.previousVersions[0].run, + 'onboard:v2': onboard.previousVersions[1].run, + 'onboard:v3': onboard.run, + 'sendEmails:v1': sendEmails.run, + // ... +} +``` + +### Run start + +When a new run begins: + +1. Look up the workflow by name → get current version (`'v3'`) +2. Persist `{ workflowName: 'onboard', version: 'v3' }` to the run record +3. Drive `onboard.run` (the v3 body) + +### Run resume + +When resuming a run (process restart, multi-instance routing, queued resume): + +1. Load run record → get pinned `(name, version)` = `('onboard', 'v2')` +2. Look up handler in registry → `onboard.previousVersions[1].run` +3. Drive that exact body, replaying log entries by position +4. If lookup fails (version not in registry) → orphan inspector (escape hatches: abandon, restart, manual advance) + +### Drain + +Storage exposes a count query: + +```typescript +const live = await storage.countRuns({ + workflowName: 'onboard', + groupBy: 'version', + status: 'active', +}) +// { v1: 0, v2: 14, v3: 234 } +``` + +Devtools renders: + +> **onboard** +> +> - v1: 0 active runs · safe to remove +> - v2: 14 active runs · expected drain in 6h +> - v3: 234 active runs · current + +When `v1` drains to zero, the developer removes its entry from `previousVersions` on next deploy. Tree-shaking drops the dead code from the bundle. Operationally: trivial. + +## How the developer bumps a version + +Manual: + +```typescript +// Before +export const onboard = defineWorkflow({ + name: 'onboard', + version: 'v2', + previousVersions: [{ version: 'v1', run: onboardV1Body }], + run: async function* ({ input, agents }) { + // v2 behavior + }, +}) + +// After making a behavior change +export const onboard = defineWorkflow({ + name: 'onboard', + version: 'v3', + previousVersions: [ + { version: 'v1', run: onboardV1Body }, + { version: 'v2', run: onboardV2Body }, // moved + ], + run: async function* ({ input, agents }) { + // v3 behavior (the new code) + }, +}) +``` + +CLI codemod: + +```bash +npx @tanstack/workflow bump onboard +``` + +This command: + +1. Reads `workflows/onboard.ts` +2. Captures the current `run` body +3. Appends `{ version: '', run: }` to `previousVersions` +4. Increments the version string (or prompts for a custom one) +5. Leaves the new `run` as a TODO comment for the developer +6. Updates `.tanstack/workflows.lock` (see below) + +Now the developer writes the new body. Old behavior is preserved verbatim. + +## The lock file — sanity check without runtime cost + +A sidecar file in the repo, committed to git: + +```json +// .tanstack/workflows.lock +{ + "onboard": { + "v1": { "hash": "sha256:abc...", "lockedAt": "2026-04-01" }, + "v2": { "hash": "sha256:def...", "lockedAt": "2026-04-15" }, + "v3": { "hash": "sha256:ghi...", "lockedAt": "2026-05-20" } + } +} +``` + +Generated and updated by `npx @tanstack/workflow lock` after a version bump or any intentional change. The hash is computed over the AST of the run function body (not the source text, so formatting / comments are ignored). + +**ESLint rule `lockfile-integrity`:** + +At lint time: + +1. Recompute the AST hash of every version's `run` body +2. Compare to the locked hash +3. Mismatch → error: `Workflow 'onboard' version 'v2' has been modified since being locked. If this is intentional, run 'npx @tanstack/workflow bump onboard'. If this is accidental, revert your changes.` + +**ESLint rule `current-version-bumped`:** + +At lint time, on the _current_ version (`onboard.run`): + +1. Recompute its AST hash +2. Compare to the locked hash +3. Mismatch → error: `Workflow 'onboard' current body changed without bumping the version. Run 'npx @tanstack/workflow bump onboard' if this is a behavior change, or 'npx @tanstack/workflow lock' if this is just a refactor of unchanged behavior.` + +The developer is forced to make an explicit call: "is this a behavior change or not?" If yes, bump. If not, just re-lock. + +This puts the fingerprint at **lint time**, not **runtime**. Same correctness guarantees, none of the runtime false-positive cost. The lock file is human-readable and git-diffable. + +## What `patched()` was for, and why it's gone + +In the old design, `patched()` existed because keeping multiple workflow source bodies alive simultaneously was expensive — the source-text fingerprint made it operationally heavy to ship `selectWorkflowVersion`-based deploys. So instead of "two versions side by side," you wrote "one version with an if-gate." + +With explicit versioning: + +- Multiple versions side by side is the _default_, not the exception +- The whole if-gate ceremony goes away +- Each version is just a function in `previousVersions` +- Tree-shaking keeps the cost reasonable (dead versions drop out as they're removed from the array) + +`patched()` becomes a footnote. Maybe still useful for inline migrations _within_ a single version (e.g., "in v3, the third execution of this step uses a new branch"), but rarely needed. + +## What changes in the engine + +Minimal: + +1. **Registry interface.** Add `previousVersions` to `defineWorkflow`. The engine builds the `(name, version) → handler` lookup at construction. +2. **Pin version at start.** Already in Alem's design (the version goes into the run record). +3. **Resume by version lookup.** Replace fingerprint-comparison with version-lookup. If the run's pinned version isn't in the registry → orphan inspector instead of `RUN_ERROR`. +4. **Drop the fingerprint module.** ~250 LoC gone. The FNV-1a code is gone. The `Function.prototype.toString()` calls are gone. Whitespace sensitivity is gone. +5. **Drop the `patches: [...]` field and `patched()` primitive.** Or keep them as a deprecated escape hatch for users migrating from Alem's existing branch. + +What stays: + +- Generators + `yield*` (the whole engine model) +- CAS conflict handling +- Idempotency keys +- All the primitives (`step`, `sleep`, `waitForSignal`, `approve`, `now`, `uuid`, `retry`) +- Replay engine itself +- Cross-version registry (now the default, not optional) + +## What changes in the developer experience + +What goes away: + +- `patches: ['name-1', 'name-2']` ceremony on definitions +- `if (yield* patched('name-1')) { new } else { old }` in run bodies +- Worrying that `console.log` will kill runs +- Worrying that Prettier reformat will kill runs +- Worrying about minifier behavior across builds + +What's new: + +- Version string is required (one extra field per workflow) +- `previousVersions` accumulates as runs persist longer than a deploy cycle +- `.tanstack/workflows.lock` is committed to the repo +- ESLint rules `lockfile-integrity` + `current-version-bumped` +- `npx @tanstack/workflow bump ` codemod +- `npx @tanstack/workflow lock` for intentional non-behavioral edits + +Net: the developer thinks about versioning when they're making a real change, and never has to think about it otherwise. Compare to fingerprinting where they have to think about deploy timing for every change to _any_ workflow source. + +## Operational tier story under this design + +Reframing the tiers from `SRC_SKEW_AND_RESUMPTION.md`: + +### Tier 1: Workflows <1 hour + +> Ship code freely. Most of the time you won't bump versions; even if you change the body, in-flight runs at deploy time complete in minutes. If anyone is mid-flight when you deploy, they continue on their pinned (previous) version. No drama. + +### Tier 2: Workflows 1–24 hours + +> Same as Tier 1, but expect 1–2 days of in-flight runs on the old version after a meaningful change. Bump the version when you change behavior; lint will tell you when to. + +### Tier 3: Workflows 1–7 days + +> Same model. `previousVersions` accumulates a few entries over a quarter. Devtools shows you when each version drains. Remove them on a subsequent deploy. + +### Tier 4: Workflows 7–30 days + +> Same model. You may end up with 5–10 historical versions in `previousVersions` over a year. Tree-shaking + lazy-loading per version keeps the bundle reasonable. Devtools surfaces drain timelines. + +### Tier 5: Workflows >30 days + +> Same model — and this is the big win. Long-running workflows that you couldn't reliably ship under fingerprinting now Just Work. The cost is bundle size for accumulated historical versions; the benefit is that 90-day workflows complete on the exact code they started on, even after months of deploys. + +The operational story collapses to one rule across all tiers: **bump the version when behavior changes; the rest is automatic.** + +## Risks and edge cases + +### 1. Developer forgets to bump + +Lint rule `current-version-bumped` catches this at commit time. If lint isn't running, the developer can ship a body change without a version bump, and in-flight runs corrupt silently. + +**Mitigation:** ship the lint rule as part of `@tanstack/eslint-plugin-workflow`. Make `lockfile-integrity` and `current-version-bumped` part of the recommended preset. Add a CI check (`npx @tanstack/workflow check`) that verifies the lock file matches the workflows on disk. + +### 2. Developer manually edits `previousVersions[].run` + +Same risk. Same mitigation. Lint catches it via `lockfile-integrity`. + +### 3. Lock file goes stale or gets deleted + +The CI check should fail loudly if `.tanstack/workflows.lock` doesn't exist or has missing entries for declared versions. Treat it like `pnpm-lock.yaml` — required, committed, enforced. + +### 4. Bundle bloat from accumulated `previousVersions` + +For very long-running workflows (>30 days) with frequent versioning, you could accumulate many historical bodies. Mitigation: + +- Tree-shake each version (already happens — they're function expressions, untouched by your active path) +- Lazy-load via dynamic import: `previousVersions: [{ version: 'v1', loader: () => import('./onboard-v1') }]` +- Devtools surfaces "v1 has 0 runs; remove on next deploy" actionable warning + +The cost should be small in practice. Most workflows accumulate 2–4 historical versions; rarely more than 10. + +### 5. Workflow name collisions + +Same as today — the engine enforces name uniqueness at registration time. With versioning, the constraint becomes `(name, version)` unique. Two `defineWorkflow({ name: 'onboard', version: 'v3' })` calls in the same engine throw at startup. + +### 6. Two developers concurrently bump to the same version string + +Git merge conflict — surfaces the problem at PR time. Resolve by picking one (and renaming the other). The lock file diff makes the conflict visible. + +### 7. Version strings as opaque identifiers + +`v1`, `v2`, `v3` is the easiest scheme. But `version: '2026-05-20-feat-tenant-isolation'` is equally valid. Or git SHA. Or any uniqueness-respecting string. The engine doesn't care — it's just a map key. + +Recommendation: the codemod defaults to monotonically incremented integers (`v1`, `v2`, ...) but allows custom strings via `--version 2026-05-20`. + +### 8. What if the engine needs to know "which versions are active right now" for drain decisions? + +Storage already groups by `(workflowName, version)`. The devtools query is: + +```sql +SELECT workflow_name, version, COUNT(*) +FROM workflow_runs +WHERE status IN ('running', 'paused') +GROUP BY workflow_name, version +``` + +Cheap. Indexed. No engine support needed beyond the existing run table. + +### 9. What about workflows that genuinely need to migrate state mid-flight? + +E.g., "we added a required field to state; old runs don't have it." This is a real problem fingerprinting doesn't solve either — strict mode kills the run, patch mode lets you write a migration in user code. + +With explicit versioning: the old version's run code continues to work on its old state shape. If you need to upgrade an old run to a new state shape mid-flight, that's a deliberate operation — provide a `migrateRun(runId, newVersion, transform)` API that's used explicitly by an operator. + +## Honest costs of this design + +1. **Boilerplate increase.** Every workflow gets a `version` field. Trivial. +2. **`previousVersions` array grows over time.** Acceptable in practice; lazy-loading helps for extreme cases. +3. **Lock file maintenance.** One CI failure if the developer forgets to run `lock`. Equivalent to `pnpm-lock.yaml` discipline. +4. **No automatic detection of "compatible" changes.** A whitespace-only change to a current version still requires `npx workflow lock` to update the hash. Not a real cost — lock is a one-command operation. +5. **Mental shift from "ship and pray" to "ship and version."** Initial onboarding cost; long-term clarity gain. + +## Why this is the right call + +The fingerprint approach is trying to _guess_ what the developer meant. AST fingerprinting is just a more sophisticated guess. Both can be wrong — too strict (kill spurious) or too loose (silent corruption). + +Explicit versioning **stops guessing**. The developer tells the engine when behavior changed. The engine routes runs to the right code. The toolchain prevents the developer from making mistakes. + +This is the same shift TypeScript itself made: "stop guessing types, declare them." The cost is more typing; the gain is that the system stops being wrong. + +For a durable execution library where the failure mode of being wrong is "silently corrupt a 30-day workflow," "stop guessing" is overwhelmingly the right call. + +## Conversation update for Alem + +Add to the questions: + +1. **Open to dropping the source-text fingerprint entirely and replacing it with explicit versioning + a lock file?** The runtime-correctness story is simpler, the false-positive surface goes away, and `patched()` becomes redundant. The cost is a `version: 'v1'` field on every workflow and a `.tanstack/workflows.lock` committed to the repo. +2. **Comfortable with `previousVersions: [...]` as the primary multi-version mechanism?** It makes the cross-version registry the default rather than an opt-in feature. +3. **Would you keep `patched()` as a deprecated escape hatch, or drop it cleanly?** If anyone's already using it on the branch, deprecation; otherwise just remove. +4. **Lint + codemod tooling.** Is `@tanstack/eslint-plugin-workflow` something the workflow repo owns, or shared with the AI repo? + +## Updated bottom line + +The skew + resumption story under explicit versioning is simpler than under fingerprinting. The library ships with: + +1. `defineWorkflow({ name, version, previousVersions, run, ... })` as the definition +2. Engine that routes runs to their pinned version +3. `npx @tanstack/workflow bump ` codemod +4. `.tanstack/workflows.lock` + `npx @tanstack/workflow lock` to maintain it +5. `@tanstack/eslint-plugin-workflow` with `lockfile-integrity` + `current-version-bumped` rules +6. Devtools showing per-version run counts + drain status +7. Orphan inspector for the rare run whose version isn't in the registry + +No fingerprint module. No AST walker at runtime. No build-time plugin matrix for fingerprinting (the lint rule's hash computation runs in the lint process, not in user code). No `patched()` ceremony. No whitespace sensitivity. No spurious refusals. + +This is a strictly better foundation than fingerprinting. Worth the engine refactor. Worth telling Alem. diff --git a/research/PRIOR_ART_AI_ORCHESTRATION.md b/research/PRIOR_ART_AI_ORCHESTRATION.md new file mode 100644 index 0000000..4e4d5ae --- /dev/null +++ b/research/PRIOR_ART_AI_ORCHESTRATION.md @@ -0,0 +1,263 @@ +# Prior Art: `@tanstack/ai-orchestration` + +> **Critical finding.** Alem Tuzlak and Tom Beckenham have built a substantial generator-based workflow engine inside the TanStack AI repo over the last ten days (May 10–20, 2026). It lives on the `feat/durable-workflows` branch and is not yet merged or released. This document inventories what's there and updates the design recommendation in [API_CANDIDATES.md](API_CANDIDATES.md). + +## Location + +- Branch: `origin/feat/durable-workflows` +- Package: `packages/typescript/ai-orchestration/` in `github.com/TanStack/ai` +- Sibling integrations: `ai-client/src/workflow-client.ts`, `ai-react/src/use-workflow.ts` +- Working example: `examples/ts-react-chat/src/lib/workflows/article-workflow.ts` +- Doc: `docs/getting-started/workflows.md` (Tom shipped this earlier today) +- Status: package README says **"v0 prototype"** + +## API shape — generator-based + +Alem landed on the **async-generator** pattern that my candidates research considered briefly and dismissed as "too divergent." It works beautifully here. The workflow body is a normal `async function*`; `yield*` is how you call durable primitives. + +```typescript +import { + approve, + defineAgent, + defineWorkflow, + fail, + succeed, +} from '@tanstack/ai-orchestration' + +const articleWorkflow = defineWorkflow({ + name: 'article', + input: z.object({ topic: z.string() }), + output: z.union([ + z.object({ ok: z.literal(true), article: Draft }), + z.object({ ok: z.literal(false), reason: z.string() }), + ]), + state: z.object({ + phase: z.enum([ + 'drafting', + 'reviewing', + 'editing', + 'awaiting-approval', + 'done', + ]), + draft: Draft.optional(), + }), + agents: { writer, legal, editor }, + run: async function* ({ input, state, agents }) { + state.phase = 'drafting' + const draft = yield* agents.writer({ topic: input.topic }) + state.draft = draft + + state.phase = 'reviewing' + const review = yield* agents.legal({ draft }) + if (review.verdict === 'block') + return fail(`legal: ${review.findings.join('; ')}`) + + state.phase = 'awaiting-approval' + const decision = yield* approve({ title: 'Publish?' }) + if (!decision.approved) return fail('user denied') + + state.phase = 'done' + return succeed({ article: draft }) + }, +}) +``` + +The full article-workflow.ts example exercises: typed input/output/state, three agents, schema-validated agent outputs, approval pauses with free-text feedback, a revision loop with `for` and `if`, discriminated-union return via `succeed`/`fail`. + +## What's already shipped + +### Definitions + +- `defineAgent({ name, input, output, run })` — typed wrapper around any text/JSON producer (typically `chat()` from `@tanstack/ai`) +- `defineWorkflow({ name, version, input, output, state, agents, initialize, defaultStepRetry, patches, run })` — compose agents into a generator +- `defineOrchestrator({ ... })` — router-driven agent loop (alt shape for "agent picks next agent") +- `defineRouter({ ... })` — orchestrator routing decisions + +### Generator primitives (all `yield*`-able) + +- `step(name, fn, { retry, timeout })` — durable side effects; engine journals the return value; replay short-circuits +- `sleep(duration)` / `sleepUntil(date)` — durable timers +- `waitForSignal(name, options)` — pause for external event with optional timeout +- `approve({ title, description })` — typed human-in-the-loop pause; returns `{ approved, feedback }` +- `now()` — deterministic timestamp (journaled per call) +- `uuid()` — deterministic ID (journaled per call) +- `patched(name)` — Temporal-style mid-flight workflow migration gate +- `retry(generator, options)` — wrap a sub-generator in a retry policy +- `bindAgents()` — internal, binds agent map to ctx +- `succeed(data)` / `fail(reason)` — discriminated result helpers + +### Engine internals + +- `runWorkflow(definition, options)` — server-side execution entrypoint +- Replay engine that survives process restart +- Fingerprint-based source-change detection (refuses replay across workflow source changes unless `patches` is declared) +- CAS conflict handling for multi-instance routing — idempotent retry + signal_lost detection +- Per-step + workflow-level retry policies with backoff +- Per-step timeout with AbortSignal propagation +- Publisher hook for multi-node event fan-out +- State diff via hand-rolled JSON Patch (RFC 6902) — streams to clients between yields +- Split `RunStore` interface: state + step log + +### Client surface + +- `WorkflowClient` in `@tanstack/ai-client` — headless client with `start`, `attach`, `signal`, `approve`, `stop` +- Client-provided `runId` + `signalId` for idempotency +- Connection adapter pattern: `WorkflowConnectionAdapter` (e.g. `fetchWorkflowEvents('/api/workflow')`) +- `attach(runId)` — resubscribe to a running workflow from a different client +- `steps-snapshot` API for catching up on history + +### React binding (`useWorkflow` in `@tanstack/ai-react`) + +```typescript +const wf = useWorkflow({ + connection: fetchWorkflowEvents('/api/workflow'), +}) + +// Returns: { state, output, status, start, stop, attach, signal, approve } +``` + +Stable client identity (mirrors `useChat` memo pattern). State updates stream in via JSON Patch. + +### Server surface + +- `parseWorkflowRequest(req)` — parses incoming workflow HTTP requests +- Composable with any HTTP framework (Start, Hono, Express, etc.) + +### Cross-version registry + +- `createWorkflowRegistry()` + `selectWorkflowVersion()` — routes incoming runs to the right workflow version based on a caller-supplied version identifier + +### Run store + +- `inMemoryRunStore()` — bundled for dev/tests +- `RunStore` interface for plugging in Postgres / SQLite / DO / Redis adapters (not yet implemented — this is the obvious next layer) + +### Test coverage (12 test files) + +`engine.attach.test.ts`, `engine.cas.test.ts`, `engine.durability.test.ts`, `engine.idempotency.test.ts`, `engine.patched.test.ts`, `engine.primitives.test.ts`, `engine.publisher.test.ts`, `engine.retry.test.ts`, `engine.signals.test.ts`, `engine.smoke.test.ts`, `engine.timeout.test.ts`, `in-memory-store.test.ts`, `registry.test.ts`. + +## Where it overlaps with my Candidates document + +This is essentially **a fourth candidate that I considered and dismissed too quickly**: generator-based workflows. My durable-execution research did cover it as "coroutine / async-state-machine compilation" (the Azure Durable Functions pattern), and I noted that `yield` is conceptually clean but TS inference through `yield` is awkward. Alem's implementation proves both points: + +- ✅ `yield*` is genuinely clean — every yield is a checkpoint boundary, no AST transform needed +- ✅ Step identity is determined by yield position, so renames don't break in-flight runs +- ✅ Determinism contract is built-in: anything not yielded is pre-yield user code, anything yielded is journaled +- ⚠️ TS inference through generators still has rough edges, but the `AsyncGenerator` return type plus careful generic threading make `yield* agents.writer({ topic })` infer the agent's output type correctly +- ⚠️ Mistyping `yield` instead of `yield*` is a real footgun — the README and docs both call this out explicitly + +The generator pattern delivers most of what my Candidate 3 (implicit context + build transform) was reaching for, **without requiring a build transform**. That's a strict improvement. + +## Where it sits vs my candidates + +| Trait | My C1 | My C2 | My C3 | `ai-orchestration` | +| --------------------------- | -------------------------------- | -------------------------------- | ----------------------------- | ------------------------------------------------ | +| Definition style | Object config | Builder chain | Plain function | Object config | +| Step identity | String labels | String labels | Lexical AST position | Generator yield position | +| Context threading | Explicit `ctx` | Explicit `ctx` | AsyncLocalStorage + transform | Generator delegation via `yield*` | +| Build transform needed | No | No | Yes | No | +| Step rename safety | Footgun (mitigated by transform) | Footgun (mitigated by transform) | Safe | Safe — yield position is structural | +| Composability | OK (ctx threading) | OK + middleware | Excellent | Excellent (sub-generators compose) | +| Type safety | Strong | Strong | Strong | Strong (with `AsyncGenerator` generic threading) | +| `yield`-vs-`yield*` footgun | N/A | N/A | N/A | Yes (documented) | + +The `ai-orchestration` design is essentially **the best of Candidates 1 and 3 without the build transform cost**. The cost is the generator footgun, which is a real but smaller tax than maintaining a build-tool plugin matrix. + +## What's not there yet (the obvious next layer) + +The engine is excellent. The adapter ecosystem is the gap: + +1. **Storage adapters.** Only `inMemoryRunStore`. No Postgres, SQLite, D1, Durable Objects, or Redis adapter yet. The `RunStore` interface looks clean and pluggable. +2. **Runtime adapters.** Currently the engine runs in-process. No cron-driven, worker-driven, or DO-alarm runtime modes for serverless / actor deployments. +3. **Framework bindings beyond React.** Only `@tanstack/ai-react`. No Solid, Vue, Svelte bindings yet (though `ai-solid-ui`, `ai-vue-ui` exist for the chat side). +4. **Devtools.** Nothing dedicated. The JSON Patch state streaming is the substrate but no inspector / timeline / replay-debugger UI. +5. **Start integration.** No `@tanstack/workflow-start` shim yet — users wire `parseWorkflowRequest` into route handlers manually. +6. **Storage-level transparency.** No documented schema / SQL examples for `RunStore` implementations — needed for the "you can `SELECT *` against it" positioning. +7. **Saga / compensation primitive.** Not present yet (though it composes naturally as a `try/finally` over sub-generators). +8. **Parallel / fan-out primitive.** Not yet — users would write `Promise.all` over generator drivers manually. A `parallel()` or `race()` primitive would be a natural addition. +9. **Child workflow invocation.** I didn't see `invoke(otherWorkflow)` — agents are the composition unit. Sub-workflow composition may not be needed if `agents.*` covers it, but cross-version routing across true child workflows would be a future need. + +## Updated strategic recommendation + +The standalone `@tanstack/workflow` library question is no longer "what API design?" — it's "**what's the relationship to `ai-orchestration`?**" + +Three viable paths: + +### Path A — Extract: promote the engine to `@tanstack/workflow-core` + +`ai-orchestration` keeps the AI-flavored layer (agents, chat integration, state streaming for AI UIs). The pure engine — `defineWorkflow`, the primitives, the run store, the replay engine, the server — moves into `@tanstack/workflow-core`. `ai-orchestration` declares it as a peer dep and adds the agent/chat layer on top. + +**Pros:** + +- Single engine, no duplication +- Workflow code without AI agents becomes idiomatic (no more `agents: {}` on every definition) +- AI users continue to use `ai-orchestration` which adds the agents + chat sugar +- Adapter ecosystem (postgres, sqlite, do, redis, devtools, framework bindings) attaches to the core, benefits both + +**Cons:** + +- Refactor work — Alem's engine isn't packaged for extraction yet +- API churn for `ai-orchestration` users (who don't exist yet — it's v0) + +**Recommendation: this is the right path.** The engine is too good to fork. The "agents required in defineWorkflow" coupling is the only thing that needs unwinding, and it's mechanical. + +### Path B — Promote: rename `ai-orchestration` to `@tanstack/workflow` + +Make agents optional. Drop the AI-specific framing. The package is just "TanStack Workflow" and agents become one of several integrations (alongside future storage adapters and framework bindings). + +**Pros:** + +- Cleanest end state +- One package, one engine, one positioning story +- Agents become a sub-feature, not the headline + +**Cons:** + +- Forces a position before there are users — but since it's v0, this is the _easiest_ time to do it +- Coordination with Alem on package name + AI repo vs. dedicated `workflow` repo + +**Recommendation: do this if you're willing to merge `workflow` into `ai` (or vice versa).** Cleanest, but a structural call. + +### Path C — Build parallel and replace later + +Ship a new `@tanstack/workflow` separately. Let `ai-orchestration` continue as is. Plan migration later. + +**Recommendation: don't.** Wasteful — Alem's engine is ahead of where any fresh re-implementation would be in three months. + +### My pick + +**Path A**, extract the engine into `@tanstack/workflow-core`, keep `ai-orchestration` as the AI-flavored layer, and use the empty `/Users/tannerlinsley/GitHub/workflow` repo to house the core + the storage/runtime adapter ecosystem + devtools + non-React framework bindings + Start integration. + +The package layout becomes: + +``` +@tanstack/workflow-core (extracted from ai-orchestration) +@tanstack/workflow-postgres +@tanstack/workflow-sqlite +@tanstack/workflow-d1 +@tanstack/workflow-durable-object +@tanstack/workflow-redis +@tanstack/workflow-cron (runtime adapter) +@tanstack/workflow-worker (runtime adapter) +@tanstack/workflow-start (HTTP entry for Start) +@tanstack/workflow-devtools +@tanstack/react-workflow (extracted from ai-react) +@tanstack/solid-workflow +@tanstack/vue-workflow +@tanstack/svelte-workflow +@tanstack/ai-orchestration (stays as the AI agents + chat layer on top) +``` + +## Conversation to have with Alem + +1. **Was `ai-orchestration` designed as the future of `@tanstack/workflow` or specifically scoped to AI?** The README says "Generator-based workflows and orchestrators for TanStack AI" — but the primitives (step, sleep, waitForSignal, approve) aren't AI-specific. Worth confirming intent. +2. **How married are you to the `agents` parameter being required in `defineWorkflow`?** Making it optional unlocks the broader workflow market without breaking AI users. +3. **What's the next 30 days of roadmap on `ai-orchestration`?** If storage adapters and devtools are on his plate, coordinate. If not, the new repo can take them. +4. **Where should adapter packages live?** AI repo (alongside `ai-orchestration`)? Workflow repo? Cross-repo monorepo? +5. **Has the generator-vs-async-await design been pressure-tested with non-AI workloads?** ETL pipelines, payment processing, transactional sagas — the article-workflow example is rich but it's AI-shaped. + +## Updated bottom line + +The strategic position changes from **"design and build a workflow library"** to **"productize and extend an existing internal workflow library."** That's a much better starting position. Six months of work on the API shape, engine semantics, replay logic, idempotency, CAS, fingerprinting, and patched migrations is already in. What remains is the surrounding ecosystem — storage adapters, runtime adapters, framework bindings, devtools, deployment matrix, marketing positioning — which is exactly where TanStack's distribution and design strengths shine. + +This is faster to market and lower-risk than a from-scratch build. It also keeps the headline pitch intact: type-safe durable workflows, deployment-agnostic, headless, no SaaS lock-in, devtools included. diff --git a/research/README.md b/research/README.md new file mode 100644 index 0000000..7bb76cb --- /dev/null +++ b/research/README.md @@ -0,0 +1,19 @@ +# Research archive + +Point-in-time design notes from the planning phase that led to `@tanstack/workflow-core`. **Not maintained.** Treat as historical context for why the engine is shaped the way it is. + +If you want current docs, see [/docs](../docs/) and [packages/workflow-core/README.md](../packages/workflow-core/README.md). The current API and engine may differ from what's described here in places — these were exploratory snapshots, not specs. + +## Contents + +| File | What it covered | Status of recommendations | +| -------------------------------------------------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------ | +| [RESEARCH.md](RESEARCH.md) | Competitive landscape (Inngest, Trigger.dev, Temporal, DBOS, Hatchet, Cloudflare Workflows, Mastra, LangGraph.js, AI SDK), technical patterns, deployment architecture, market positioning, Lovable / AI-app-builder distribution play. | Largely intact; informs how the project is framed and positioned. | +| [API_CANDIDATES.md](API_CANDIDATES.md) | Three API designs evaluated: definition-object + ctx, builder-chain + middleware (Candidate 2), implicit-context hooks-style. | **Candidate 2 won.** The shipping API is closure-based with ctx-as-arg and typed middleware. | +| [PRIOR_ART_AI_ORCHESTRATION.md](PRIOR_ART_AI_ORCHESTRATION.md) | Inventory of Alem Tuzlak + Tom Beckenham's existing generator-based engine in `@tanstack/ai-orchestration` ([TanStack/ai#542](https://github.com/TanStack/ai/pull/542)) — the parent we extracted from. | Engine extracted. AI surface (agents, orchestrators, AG-UI events) stays in `ai-orchestration`. | +| [SRC_SKEW_AND_RESUMPTION.md](SRC_SKEW_AND_RESUMPTION.md) | Analysis of fingerprint-based source-skew handling and its gaps (Prettier reformat / minifier drift / silent corruption in patch mode). | Motivated the move to explicit versioning. | +| [EXPLICIT_VERSIONING.md](EXPLICIT_VERSIONING.md) | Alternative design: explicit `version` + `previousVersions` registry + lint-time lock file, replacing runtime fingerprinting. | **Shipped.** `createWorkflow({ version }).previousVersions([...])` + version-routing engine. Lockfile + ESLint plugin still to come. | + +## How these came to be + +Generated during the design sprint that preceded the engine extraction. Some of the prose talks about "TanStack Run" — the project was briefly named that before reverting to "TanStack Workflow". Treat that as a footnote. diff --git a/research/RESEARCH.md b/research/RESEARCH.md new file mode 100644 index 0000000..693c07c --- /dev/null +++ b/research/RESEARCH.md @@ -0,0 +1,585 @@ +# TanStack Workflow — Research Dump + +> Compiled May 19, 2026. Eight parallel research streams; mix of web sources, local TanStack repo reads, and synthesis. Confidence is highest on prior-art landscape and TanStack's own signals, weaker on Q1–Q2 2026 specifics where memory had to fill in for some research streams. + +--- + +## 0. The headline finding: you have already publicly drafted this library + +Two artifacts in the TanStack blog plus today's empty `/Users/tannerlinsley/GitHub/workflow/` directory converge on a single conclusion: this is the "massive new library" you've already teased, and you've publicly sketched the API. + +### a. "Directives and the Platform Boundary" — 2025-10-24 + +In your own post arguing against `'use workflow'` / `'use step'` directives, you wrote the contrast example as a near-complete API: + +```js +import { workflow, step } from '@workflows/workflow' + +export const sendEmail = workflow( + async (input) => { + /* ... */ + }, + { retries: 3, timeout: '1m' }, +) + +export const handle = step( + 'fetchUser', + async () => { + /* ... */ + }, + { cache: 60 }, +) +``` + +You also list, verbatim, what you think the "real problems worth solving" are: + +- Server execution boundaries +- Streaming and async workflows +- Distributed runtime primitives +- Durable tasks +- Caching semantics + +This is the manifesto. You picked workflow as the worked example because you'd been thinking about it. + +### b. "The State of TanStack, Two Years of Full-Time OSS" — 2025-11-24 + +> "And yes, we've already started work on a massive new library that will take most of next year to get off the ground. It's one of the biggest things we've ever attempted. I can't share details yet, but it will open a new chapter for the entire ecosystem." + +### c. TanStack DB 0.6 — 2026-03-25 + +The DB team (Sam Willis, Kevin De Porre) shipped `createEffect` and explicitly framed DB+DO+SQLite as "a durable state engine for agent workflows, not just a UI data layer." This is the persistence substrate landing in advance of the workflow layer. + +### d. The empty `/Users/tannerlinsley/GitHub/workflow/` directory created today + +Self-explanatory. The work has started. + +### Directional read + +1. **Import-shaped, not directive-shaped.** You've publicly argued for this. +2. **`workflow(fn, options)` + `step(name, fn, options)`** is the API skeleton. +3. **Options-rich**: `retries`, `timeout`, `cache` shown; expect `idempotencyKey`, `concurrency`, `version`, `signal`. +4. **Type-safe, framework-agnostic, headless** — TanStack defaults apply. +5. **TanStack DB is being positioned as the substrate.** `createEffect` with `onEnter` / `AbortSignal` is exactly the shape a workflow runtime needs. +6. **2026 ramp.** "Most of next year" from Nov 2025 means Q4 2026 alpha-ish, with public surface emerging through 2026. + +--- + +## 1. The competitive landscape (May 2026) + +Five camps. Know where you fit before deciding what to build. + +| Camp | Examples | DX feel | What they win on | What they lose on | +| ------------------------------- | ----------------------------------------------------------------- | ----------------------------- | ----------------------------------------------- | ----------------------------------------------------------- | +| **Enterprise durable engines** | Temporal, Cadence, Restate | "Constrained async + sandbox" | Industrial-grade durability, deep introspection | Heavy ops, Java-shaped APIs, painful in TS | +| **TS-first serverless SaaS** | Inngest, Trigger.dev v4 | "Just write a function" | Best DX, fast onboarding, marketplaces | Vendor lock-in, pricing surprises, control-plane stickiness | +| **DB-backed durable execution** | DBOS, Hatchet, Resonate | "Postgres is the engine" | Lightest ops, transactional with business state | Smaller communities, TS often second-class | +| **Platform-tied** | Cloudflare Workflows, AWS Step Functions, Azure Durable Functions | Varies | Cheap, zero-ops, deep platform integration | Total vendor lock-in | +| **AI-agent frameworks** | LangGraph.js, Mastra, Vercel AI SDK, Inngest AgentKit | "Graph of LLM calls" | LLM-shaped primitives, streaming, tool loops | Narrow scope, durability is bolted on (except LangGraph) | + +### Detailed profiles + +**Temporal** — Event-sourced replay. Workflow code runs in an isolated v8 sandbox; every `Date.now()` / `Math.random()` / `setTimeout` is intercepted and made deterministic; history is the source of truth. TS SDK is solid but Java-shaped. Determinism rules and mid-flight versioning are the dominant pain points. Cluster needs Cassandra/MySQL/Postgres + optional Elasticsearch. Mid-tier of price/complexity unless on Temporal Cloud. **TS DX: 6/10.** + +**Inngest** — "Inngest calls you, not the other way around." Event-driven functions that expose a single HTTP endpoint (`/api/inngest`); their cloud invokes it per step, memoizing step results by label. Best DX in category. Excellent local dev server. Pain: HTTP overhead per step (10-step workflow = 10 invocations), step renames break in-flight runs, self-host trails cloud. **TS DX: 9/10.** + +**Trigger.dev v4** — Recently GA in 2025. Heap-state checkpointing means you write plain async code without thinking about step boundaries (they snapshot the V8 heap and restore on a different machine). v4 made self-hosting actually approachable — Docker Compose with built-in registry/object storage, official Kubernetes Helm chart with integrated Postgres + Redis + storage. They build & run your code on their infra. Strong AI workflow story. **TS DX: 8.5/10.** ([self-hosting docs](https://trigger.dev/blog/self-hosting-trigger-dev-v4-docker), [v4 GA](https://trigger.dev/launchweek/2/trigger-v4-ga)) + +**Restate** — Single Rust binary, embedded RocksDB, no external DB required. Combines workflows, virtual actors (called Virtual Objects), and durable RPC. Stateless SDK handler called over HTTP/2 by the runtime. Operationally the lightest of the heavyweights. Ex-Apache Flink team. Newer; smaller community. **TS DX: 7.5/10.** + +**Hatchet** — Postgres-only. Single binary + Postgres = your entire workflow engine. Heavy use of `SELECT … FOR UPDATE SKIP LOCKED`. YC W24. Honest about Postgres ceilings at very high throughput. TS SDK lags Python/Go. **TS DX: 6.5/10.** + +**Resonate** — Durable Promises as the universal primitive. Apache 2.0 Go binary, SQLite or Postgres backing. Ex-Temporal team. Niche but credible. **TS DX: 7/10.** + +**DBOS** — Stonebraker / MIT pedigree. Postgres is the engine — workflow state lives in your application's own database. Decorators (`@DBOS.workflow()`, `@DBOS.transaction()`). True exactly-once for the transactional path. Decorator-heavy feels dated in modern TS. **TS DX: 6.5/10.** + +**Cloudflare Workflows** — `class extends WorkflowEntrypoint` with `step.do()`. Backed by Durable Objects. Bound to Cloudflare runtime (V8 isolates, npm caveats). Zero ops if you're already on CF. **TS DX: 7/10.** + +**AWS Step Functions** — ASL JSON or CDK. Standard vs Express flavors. Bulletproof, integrates with every AWS service, JSON DSL hell at scale, AWS lock-in. **TS DX: 5/10.** + +**Azure Durable Functions** — Generator-based orchestrators (`yield context.df.callActivity(...)`). Replay-based like Temporal. `yield` losing type info is a real TS pain. Azure-only. **TS DX: 5.5/10.** + +**Defer** — RIP. YC-backed TS background jobs platform; shut down 2024. Lesson: TS-only background jobs is crowded; me-too kills you. + +### AI-flavored workflow tier (this is the hot growth segment) + +**Mastra** — 22k+ stars in 15 months, 300k+ weekly npm downloads by Jan 2026 GA. Open-source TS AI agent framework. Six primitives: agents, workflows, tools, memory, RAG, evals. Workflows are deterministic step graphs with suspend/resume + time-travel debugging. From the Gatsby team. Includes "Studio" UI. ([Mastra docs](https://mastra.ai/docs), [GitHub](https://github.com/mastra-ai/mastra)) + +**Vercel AI SDK 6** — Unified `generateObject`+`generateText`. `ToolLoopAgent` class for the standard tool-use loop. `stopWhen: stepCountIs(N)` configures multi-step. Streaming-first, not durable. ([AI SDK 6 blog](https://vercel.com/blog/ai-sdk-6)) + +**LangGraph.js** — Built-in persistence layer with `Checkpointer` saves state after every node execution. PostgreSQL/Redis backed. Resumable across deploys and process restarts. Diagrid wrote a pointed critique titled "Checkpoints Are Not Durable Execution" arguing LangGraph's model falls short of real durable execution for production agent workflows. ([LangGraph durable execution](https://docs.langchain.com/oss/javascript/langgraph/durable-execution), [Diagrid critique](https://www.diagrid.io/blog/checkpoints-are-not-durable-execution-why-langgraph-crewai-google-adk-and-others-fall-short-for-production-agent-workflows)) + +**XState v5** — Actor model is the focal abstraction. Actors are deeply (recursively) persisted in v5, unlike v4. Restate published a guide on combining XState + Restate to get durable state machines on serverless. Statecharts are the right tool when the model genuinely is "states with transitions"; ceremonial when the model is "sequence of steps." ([Restate + XState](https://www.restate.dev/blog/persistent-serverless-state-machines-with-xstate-and-restate), [XState v5 release](https://stately.ai/blog/2023-12-01-xstate-v5)) + +**Inngest AgentKit / Trigger.dev AI tasks** — Both have shipped AI-shaped APIs on top of their existing engines. They're bolted on rather than ground-up AI, but they ride durable execution which is a real advantage. + +--- + +## 2. Technical architecture patterns + +Eight patterns underlie everything above. Pick your pattern; everything else follows. + +### Pattern 1: Event-sourced replay (Temporal, Cadence) + +Workflow code is re-executed from history every time it needs to make progress. Every primitive call (`activity`, `sleep`, `condition`, `setHandler`) is intercepted and checked against the event log. If the result is in history → return cached; else → record a command and block. + +**Pros:** Best-in-class introspection, full event log, code reads like normal async/await +**Cons:** Strict determinism, mid-flight versioning via `patched()`, sandbox quirks (most npm packages break in workflow code) +**Storage cost:** ~300 events for a workflow with 100 activities + 50 timers + +### Pattern 2: Continuation / checkpoint persistence (Restate) + +Persist execution context after each step. Resume by loading. Replay still happens within a single invocation, but only `ctx.*` boundaries are journaled — code outside is just user code. + +**Pros:** Smaller journal, more forgiving determinism contract, stateless service runs anywhere +**Cons:** Suspension only at `ctx.*` boundaries, less granular debuggability + +### Pattern 3: Step-as-DAG / IR (AWS SFN, Inngest) + +Workflow body is a description that compiles to an execution graph. The runtime traverses the graph; user code is invoked piecemeal. + +For Inngest specifically: each `step.run` call boundary causes the runtime to **re-invoke your function over HTTP** with the previously-cached results substituted. This is replay-via-HTTP, which is why "anything not wrapped in `step.run` runs on every step" is the Inngest gotcha. + +**Pros:** User code is stateless, edge/serverless friendly, easy versioning (deploy new code), DX feels like writing a function +**Cons:** HTTP overhead per step, step names are identifiers (rename = break in-flight) + +### Pattern 4: Coroutine / async-state-machine (DBOS, Azure Durable Functions, Effect) + +Lean on language-native suspension points (`async/await`, `yield`, generators) so the compiler already knows where to checkpoint. DBOS does this with decorators + Postgres journaling. Effect does it with its fiber scheduler. + +**Pros:** Feels like native code, lower runtime overhead than full replay +**Cons:** Tied to a specific runtime model, versioning still hard + +### Pattern 5: Reactive / observable (XState statecharts) + +A workflow is a finite hierarchy of states with event-driven transitions. The interpreter drives transitions. + +**Pros:** Visual reasoning, parallel regions, hierarchical states +**Cons:** Long durations require external timer events, type ergonomics for deeply nested hierarchies get heavy, not built for "30-day sleep, then call API" + +### Pattern 6: Virtual actors (Cloudflare DO, Orleans, Dapr) + +Per-ID addressable, single-threaded, durable, with self-scheduling alarms. A _substrate_, not a workflow library on its own — Cloudflare Workflows is the workflow API on top of DO. + +**Pros:** Concurrency control free, durable timers built in, signals = messages +**Cons:** Platform lock-in, cross-actor coordination is on you + +### Pattern 7: DB-backed queues (Hatchet, pg-boss, BullMQ) + +Just a job queue (Postgres/Redis) with a thin workflow layer. Each step is a DB transaction. Wake-up via `LISTEN/NOTIFY` (Postgres), Redis streams, or polling. + +**Pros:** BYO database, transactional consistency with business state, debuggable via SQL +**Cons:** Lower-level user code, Postgres has scaling ceilings, no built-in signals + +### Pattern 8: Saga pattern (cross-cutting) + +Sequence of local transactions, each with a compensating action. Run compensations in reverse on failure. Orthogonal to execution pattern — every other pattern can implement sagas. + +**Pros:** Distributed transactions without 2PC +**Cons:** Compensations must be idempotent, easy to write incorrect ones + +### The cross-cutting dimensions that actually matter + +| Dimension | Strong determinism (Temporal) | Soft determinism (Inngest/Restate) | None (XState/queues) | +| --------------------------- | ----------------------------- | ---------------------------------- | ------------------------- | +| **Determinism enforcement** | Sandbox + intercepted globals | Step-boundary memoization | N/A | +| **User code feels like** | Constrained async | Normal async with `ctx.run` | Imperative or declarative | +| **Mid-flight versioning** | Painful (`patched()`) | Manageable (named steps) | Schema migration | +| **Edge/serverless fit** | Bad (workers hold state) | Excellent (stateless calls) | Excellent (just code) | +| **BYO database** | No | Sometimes | Yes | + +--- + +## 3. Deployment-agnostic architecture + +This is the biggest unique angle for TanStack and the one where existing players are weakest. + +### The runtime archetype trinity + +Every deployment target reduces to one of three archetypes: + +1. **Always-on process** — Fly machines, Railway, Render, Node, Bun, Docker, Fargate. Can hold sockets, run pollers, keep `LISTEN/NOTIFY` open. Easy mode. +2. **Scale-to-zero serverless** — Vercel Functions, Lambda, Netlify, CF Workers (no DO), Deno Deploy. Each invocation is fresh. State must live externally. Long sleeps = "park the workflow, cron will wake it." +3. **Durable actor** — CF Durable Objects, Deno KV queues, Temporal/Restate workers. A named, persistent instance with its own storage and self-scheduling alarms. + +A deployment-agnostic library means: the same workflow definition compiles to all three, and the user picks per deployment. + +### The deployment matrix (key cells) + +| Target | Long-lived process? | Native durable storage | Native scheduler | +| -------------------------- | ---------------------- | ------------------------------- | ----------------------------- | +| Vercel Functions | No | Vercel Postgres, KV, Blob | Vercel Cron (1 min) | +| Cloudflare Workers | No (per-request) | KV, R2, D1, Hyperdrive→Postgres | Cron Triggers (1 min), Queues | +| Cloudflare Durable Objects | **Yes** (actor) | Per-object SQLite | `state.storage.setAlarm()` | +| AWS Lambda | No | DynamoDB, RDS, S3 | EventBridge | +| Fly Machines | Yes (scale-to-zero ok) | Volumes, LiteFS, Postgres | App-level cron | +| Bun / Node / Docker | Yes | Any | Any | +| Deno Deploy | No | Deno KV (FoundationDB) | Cron + Queues | + +### Three storage shapes worth shipping as first-class + +1. **Relational** — Postgres / SQLite / D1 / libsql. `SKIP LOCKED` or polling. +2. **Key-value with sorted sets** — Redis / Upstash / Deno KV. Hash for state, ZSET for timers. +3. **Single-actor SQLite** — Cloudflare DO, embedded driver, no network. + +DynamoDB and friends become community adapters. + +### TanStack's adapter pattern, applied + +The lesson from `db-sqlite-persistence-core/src/persisted.ts` and the TanStack DB `SQLiteDriver` interface: **the engine owns the schema; the driver is just a transport.** Five methods (`exec`, `query`, `run`, `transaction`, `transactionWithDriver`) and the same engine runs on Node `better-sqlite3` or Durable Object SQL. + +For workflow, the minimum driver surface looks like: + +```typescript +interface WorkflowStorage { + // Run lifecycle + createRun(run: NewRun): Promise + getRun(id: string): Promise + updateRun(id: string, patch: Partial): Promise + + // Step memoization (THE durable execution log) + getStepResult(runId: string, stepId: string): Promise + putStepResult( + runId: string, + stepId: string, + result: StepResult, + ): Promise + + // Timers + scheduleTimer(timer: ScheduledTimer): Promise + claimDueTimers(limit: number, now: Date): Promise + + // Event waits + registerEventWait(wait: EventWait): Promise + matchEvents(name: string, payload: unknown): Promise + + // Atomic claim — only one worker gets a given run + claimNextRun(workerId: string, lockMs: number): Promise + releaseRun(runId: string): Promise + + // Transaction wrapper + transaction(fn: (tx: WorkflowStorage) => Promise): Promise +} + +interface WorkflowRuntime { + start(engine: WorkflowEngine): Promise + stop(): Promise + onWake?(callback: () => void): void // for LISTEN/NOTIFY-style push +} +``` + +### Three runtime adapters + +- **`@tanstack/workflow-cron`** — handler you wire into Vercel Cron / CF Cron / EventBridge / `Deno.cron`. Batches work per tick. 1-minute granularity acceptable. +- **`@tanstack/workflow-worker`** — always-on poller loop with optional `LISTEN/NOTIFY` push. For Fly / Railway / Render / Node. +- **`@tanstack/workflow-durable-object`** — one DO per run, alarm-driven. Coupled with `@tanstack/workflow-do-storage`. + +### What breaks deployment agnosticism (be honest about these) + +- **Long sleeps** with cron runtime have ≥1-minute precision skew +- **Large fan-out** stresses Postgres (10k inserts), DynamoDB (write capacity), CF Workers Queues (per-msg price) +- **Parent-child workflows across runtimes** must always go through storage, never in-process calls — adds latency but preserves portability +- **Postgres on Cloudflare Workers** requires Neon serverless driver or Hyperdrive, not raw `pg` +- **Workflow non-determinism** (`Math.random()` outside `step.run`) must be detected and thrown loudly, not silently allowed to diverge +- **Cloudflare KV is not viable as state store** — eventual consistency breaks workflow semantics +- **CPU-heavy steps on Workers** hit the 30-sec CPU limit; document for users + +### The single most important architectural choice + +**Everything goes through storage.** Never in-process function calls between workflows or steps. Storage is the only thing the library can guarantee is shared across cold serverless invocations, across DOs, across worker processes. + +--- + +## 4. The market gap — where TanStack actually has a hedge + +### What the existing crowd does badly + +| Painpoint (recurring across HN/Reddit/blog sentiment) | Existing libs | +| ------------------------------------------------------------------------ | --------------------------------------------------------------------------------------------------------------------- | +| Step IDs are string-keyed black boxes; refactoring breaks in-flight runs | Inngest, Trigger.dev, CF Workflows | +| Workflow payloads degrade to `any` once they cross serialization | Temporal, Step Functions, all SaaS engines | +| Surprise bills when step count spikes | Inngest, Temporal Cloud, Trigger.dev cloud | +| Can't `SELECT *` to see what's stuck | All SaaS engines | +| Self-host is "supported but you're on your own" | Inngest, Trigger.dev (less true after v4), Restate | +| Streaming LLM tokens + durable workflows is awkward | Almost everyone — Vercel AI SDK does streaming but not durable, Inngest/Trigger do durable but streaming is bolted on | +| Local dev story is "run a cluster" or "use our cloud" | Temporal, Trigger.dev, Restate | +| Observability is bolted on (OTEL is uneven) | Most engines | +| Migrating in-flight workflows during a deploy is terrifying | Temporal (sharp), DBOS (better), others variable | +| Vendor abstractions leak into your business code | Inngest `step.*`, Temporal `proxyActivities`, SFN ASL | + +### What TanStack uniquely can do + +1. **End-to-end inference through every step.** The TanStack Router / Query pattern of types flowing through call sites is the single biggest underserved demographic in workflows. Step IO inferred from closure types; event payloads typed from a Standard Schema; invoke call sites carry full return types. + +2. **Embedded engine, not a control plane.** `npm install`, write a function, await it. Postgres / SQLite / DO are _adapters_, not requirements. Closest analog is DBOS — but DBOS is decorator-heavy and Python-first. + +3. **Core + framework adapters.** `@tanstack/react-workflow`, `@tanstack/solid-workflow` with `useWorkflowRun(id)` and `useWorkflowStream(id)` hooks. **Nobody** has shipped framework binding hooks for workflows. Bury Inngest's dashboard inside the user's own app. + +4. **In-app, framework-agnostic devtools.** A drop-in inspector showing every run, every step, every retry, every payload — using your data, your servers. Inngest's UI is great but it's their dashboard, not yours. + +5. **TanStack DB as the persistence + reactive substrate.** `createEffect` already exposes `onEnter` / `onExit` / `onUpdate` with `AbortSignal`. Workflow runs become rows in a DB collection; the workflow engine reacts to query results. + +6. **TanStack Start native integration.** Server functions return workflow handles. Route loaders subscribe to runs. Streaming RSC + durable workflow + suspense — that combination doesn't exist anywhere else cleanly. + +### Three positioning angles, ranked + +**#1 — "Durable execution that flows with your types, not against them."** Headline pitch. The gap nobody is filling. Modeled after Router/Query inference. Most defensible. Most TanStack-true. + +**#2 — "The workflow engine that's just a library."** Practical pitch. Postgres-backed by default, SQLite for dev, no SaaS account. Devtools drop into your app. Direct attack on the "I just want background jobs that survive a deploy" segment. + +**#3 — "Streaming + durable. The AI-app workflow engine."** Flagship use case in launch materials, not the headline. The AI space is too volatile to bet positioning on. But streaming-first durable execution is a real gap — Vercel AI SDK does streaming, not durability; Inngest does durability, streaming is bolted on; LangGraph.js does checkpoints, but Diagrid's critique that "checkpoints are not durable execution" lands. + +### The threats — be honest + +1. **Inngest and Trigger.dev have 2–4 years and engineering teams 10× larger.** Out-DXing them on day one is hard. +2. **Building a durable engine is genuinely difficult.** Temporal has 100+ engineers and still finds bugs at scale. First versions will have rough edges. +3. **AI workflow space moves faster than TanStack's typical cadence.** LangGraph, Mastra, AI SDK ship monthly. +4. **Workflow code is sticky.** TAM is mostly greenfield. Slower TAM than Query/Table addressed at launch. +5. **"Headless" pattern is harder to apply to workflows than to UI.** Differentiation must come from types, devtools, deployability. +6. **DBOS, Hatchet already occupy the "your DB is the engine" niche.** TanStack must be meaningfully better on TS DX, not just present. +7. **Funding model unclear.** Workflow libraries have a smaller installed base ceiling than Query. + +--- + +## 5. Recommended MVP scope (six-month path to 1.0) + +### Packages + +- **`@tanstack/workflow-core`** — Engine, executor, step contract, run lifecycle, retry, sleep/timer, signals, run observable. Pure TS, zero deps. In-memory adapter bundled for dev. +- **`@tanstack/workflow-postgres`** — Production-grade Postgres adapter. Transparent, documented schema. `SKIP LOCKED` claims. `LISTEN/NOTIFY` for low-latency wake-up. Works with `pg`, `postgres.js`, `drizzle`, Neon serverless, Hyperdrive. +- **`@tanstack/workflow-sqlite`** — `better-sqlite3` / `node:sqlite` / `libsql` / Turso. Default for self-hosted single-process. +- **`@tanstack/workflow-d1`** — Cloudflare D1 variant. +- **`@tanstack/workflow-durable-object`** — Storage + runtime coupled. One DO per run. +- **`@tanstack/workflow-redis`** — ioredis / Upstash REST. +- **`@tanstack/workflow-cron`** — Wake-up handler for Vercel Cron / CF Cron / EventBridge / Deno Cron. +- **`@tanstack/workflow-worker`** — Always-on poller. +- **`@tanstack/react-workflow`** — `useWorkflow`, `useWorkflowRun`, `useWorkflowStream`. +- **`@tanstack/solid-workflow`** — Same hooks. Ship at launch to credibly call "framework-agnostic." +- **`@tanstack/workflow-devtools`** — Framework-agnostic core with React/Solid bindings. +- **`@tanstack/workflow-start`** — TanStack Start integration. Server functions return workflow handles. + +### API skeleton (informed by your blog post) + +```typescript +import { workflow, step } from '@tanstack/workflow' + +export const onboard = workflow( + { + name: 'onboard', + input: z.object({ userId: z.string() }), + events: { + approved: z.object({ approverId: z.string() }), + rejected: z.object({ reason: z.string() }), + }, + run: async (ctx, { userId }) => { + const profile = await ctx.step('load-profile', () => loadProfile(userId)) + // ^? Profile (inferred from loadProfile's return type) + + await ctx.sleep('1d') + + const decision = await ctx.waitForEvent('approved', { timeout: '7d' }) + // ^? { approverId: string } | null + + if (!decision) return { status: 'timed_out' as const, userId } + + await ctx.step('activate', () => activate(userId)) + return { status: 'active' as const, userId, by: decision.approverId } + }, + }, + { + retries: 3, + timeout: '1h', + version: 'auto', // pinned to build SHA by default + }, +) + +// Calling site: +const handle = await client.start(onboard, { userId: '123' }) +// ^? WorkflowHandle<{ status: 'timed_out'; userId: string } +// | { status: 'active'; userId: string; by: string }> +``` + +### Six design commitments (the manifesto) + +1. **Native `async/await`, no sandbox.** Step-boundary memoization, not full replay. The Inngest / Restate model. +2. **Steps are typed callbacks, not labels.** Inferred names from lexical position via a build-time transform; explicit string names as fallback. Renaming a step doesn't break in-flight runs. +3. **Versioning is automatic and on by default.** Every run pins to the code SHA that started it. New deploys can't break in-flight runs. +4. **Single source of truth: storage.** No in-process workflow-to-workflow calls. Everything goes through the storage adapter. +5. **Schema-typed events.** Standard Schema everywhere. Zod / Valibot / ArkType all work. +6. **No decorators.** Function-first. `as const` and `satisfies` for type narrowing. + +### Explicitly out of scope for 1.0 + +- Hosted cloud plane +- Multi-tenant isolation, RBAC, audit logging — Pro tier later +- Distributed engine beyond a single Postgres / single-region storage — phase 2 +- Vue / Svelte / Angular bindings — phase 2 or community +- Visual workflow builder — never +- Built-in integrations directory (a la Inngest's 100+) — let users compose with normal code + +### Launch story + +> **Type-safe durable workflows for TanStack apps. Postgres-backed. Self-hosted by default. Streaming-aware. Devtools included.** + +--- + +## 6. The business model + +Ranked by fit with TanStack's history: + +1. **OSS core + Pro devtools / Pro adapters (recommended).** Free engine, free core devtools, paid Pro for: multi-tenant isolation, audit logging, advanced replay debugger, SSO for devtools UI, premium adapters (Temporal-compat shim, etc.). Closest to TanStack Table/Form/Router Pro pattern. + +2. **OSS + Start Cloud bundling.** Workflow is a free library; the durable hosting is part of Start Cloud. Pulls Start adoption. + +3. **OSS + hosted "TanStack Workflows" cloud.** Same shape as Inngest/Trigger.dev. Higher revenue ceiling but **operationally brutal** — Inngest/Trigger are 20–50 engineer teams largely because of ops cost. Wrong scope for first 18 months. + +4. **Pure OSS like Query/Table.** Sponsorship + halo. Lowest revenue, highest community velocity. Always viable. + +**Strongest fit: #1 with optional #2.** OSS core must be genuinely complete — no crippleware. Query and Table set that expectation. + +--- + +## 7. Open questions to resolve before committing + +1. **Build-time transform vs runtime-only?** Stable step identity via lexical position requires a build transform. Worth the toolchain commitment? (Probably yes — it's the durable execution version of "automatic key inference.") +2. **TanStack DB as required substrate or optional substrate?** Tightly coupled DB ↔ Workflow is unique to TanStack and a huge moat, but it adds friction for users who want to drop the workflow lib into a non-DB app. +3. **Standard Schema first vs Zod first?** Standard Schema is the right call but tooling is still maturing. +4. **Embedded engine vs separate worker process?** The same engine code can do both via the runtime adapter. But which is the _default_ a new user gets out of the box? +5. **Cloudflare DO storage vs Postgres adapter as the "showcase" production setup?** DO has the best DX but locks to Cloudflare; Postgres is universal but adds infra. +6. **AI SDK integration story.** Tight integration (workflow knows about LLM streams) vs loose (workflow doesn't know, just calls AI functions inside steps). +7. **Naming.** "Workflow" is fine but generic. "TanStack Tasks"? "TanStack Durable"? "TanStack Run"? The repo is `workflow` — keep it. + +--- + +## 8. References + +### Primary TanStack sources + +- [Directives and the Platform Boundary](https://tanstack.com/blog/directives-and-the-platform-boundary) — Tanner, 2025-10-24 (the API sketch lives here) +- [The State of TanStack, Two Years of Full-Time OSS](https://tanstack.com/blog/tanstack-2-years) — Tanner, 2025-11-24 (the teaser) +- [TanStack DB 0.6](https://tanstack.com/blog/tanstack-db-0.6-app-ready-with-persistence-and-includes) — 2026-03-25 (the substrate) +- [TanStack AI Code Mode](https://tanstack.com/blog/tanstack-ai-code-mode) — 2026-04-08 (orchestration framing) + +### Workflow platforms + +- [Trigger.dev v4 GA](https://trigger.dev/launchweek/2/trigger-v4-ga) +- [Trigger.dev self-hosting docs](https://trigger.dev/docs/self-hosting/overview) +- [Trigger.dev v4 Docker self-hosting](https://trigger.dev/blog/self-hosting-trigger-dev-v4-docker) +- [Trigger.dev v4 Kubernetes self-hosting](https://trigger.dev/blog/self-hosting-trigger-dev-v4-kubernetes) +- [Inngest GitHub](https://github.com/inngest/inngest) +- [Inngest pricing](https://www.inngest.com/pricing) +- [Mastra GitHub](https://github.com/mastra-ai/mastra) +- [Mastra docs](https://mastra.ai/docs) +- [LangGraph durable execution](https://docs.langchain.com/oss/javascript/langgraph/durable-execution) +- [LangGraph GitHub](https://github.com/langchain-ai/langgraph) +- [Diagrid: "Checkpoints Are Not Durable Execution"](https://www.diagrid.io/blog/checkpoints-are-not-durable-execution-why-langgraph-crewai-google-adk-and-others-fall-short-for-production-agent-workflows) (sharp critique of LangGraph/CrewAI/Google ADK) +- [Vercel AI SDK 6](https://vercel.com/blog/ai-sdk-6) +- [Vercel AI SDK docs](https://ai-sdk.dev/docs/introduction) +- [XState v5 release](https://stately.ai/blog/2023-12-01-xstate-v5) +- [XState GitHub](https://github.com/statelyai/xstate) +- [Restate + XState integration](https://www.restate.dev/blog/persistent-serverless-state-machines-with-xstate-and-restate) +- [LangGraph vs Temporal for AI Agents](https://medium.com/data-science-collective/langgraph-vs-temporal-for-ai-agents-durable-execution-architecture-beyond-for-loops-a1f640d35f02) (March 2026) +- [Agent framework comparison: LangChain vs LangGraph vs CrewAI vs PydanticAI vs Mastra vs Vercel AI SDK](https://www.speakeasy.com/blog/ai-agent-framework-comparison) (Speakeasy) + +--- + +## 9. Distribution — the underrated half of the hedge + +Library quality wins narrowly. Library distribution wins broadly. TanStack Query didn't beat SWR purely on merit — it won because it got embedded in every template, scaffold, tutorial, and LLM training corpus. Workflow has even better distribution dynamics if played right. + +### a. AI app builders are a massive force multiplier + +**Lovable** is the obvious one. Their generated apps already use TanStack Query. AI workflows (agent loops, background generation, scheduled tasks, webhook fan-outs) are increasingly core to what they ship. If TanStack Workflow lands in their default scaffold, you get instant adoption at _app-generation scale_ — millions of apps, not millions of installs. Each generated app becomes a real codebase that keeps the dependency forever. + +Same dynamic applies to: + +- **v0 (Vercel)** — generates Next.js apps; will push their own story but composes with adoptions like AI SDK that compose with workflow +- **Bolt.new (StackBlitz)** — multi-framework, will pick whatever's idiomatic +- **Replit Agent** — full-stack including workflows; underrated reach +- **Same.new, Genie, Devin, Codex, Manus** — long tail +- **Cursor / Claude Code / Windsurf** — IDE-level code suggestion; even more powerful than scaffolds because every codebase is touched + +The mental model: **every AI app generator is a distribution channel that compounds over time.** Get into the prompt of the major ones early. + +### b. The LLM training-data flywheel + +TanStack Query won partly because every LLM trained on JS code now defaults to suggesting it. Same dynamic: + +- Ship the library publicly early so it's in training corpora +- Publish tutorials, recipes, comparison posts (especially "vs Inngest" / "vs Trigger.dev" / "vs Temporal" content) +- Get the API into "Building Effective Agents"-style cookbooks +- Get GitHub stars early (LLMs use star count as a quality proxy) +- Encourage shadcn-style copy-paste recipes that bake the library into example code + +By the next training cut, every AI coding assistant suggests TanStack Workflow when a user asks "how should I handle background jobs in TypeScript?" + +### c. Agent SDK substrate positioning + +Don't compete with agent frameworks. Be their substrate. + +- **Anthropic's Claude Agent SDK + "Building Effective Agents" patterns** — they need durable execution; TanStack Workflow can ship a `@tanstack/workflow-anthropic-agents` integration +- **OpenAI Agents SDK for JS** — same +- **Vercel AI SDK 6 agent loops** — `ToolLoopAgent` doesn't survive crashes; wrap it in a workflow and it does +- **Mastra, LangGraph.js, Inngest AgentKit** — these are graph orchestrators; they need a durable execution layer underneath +- **Cloudflare Agents SDK** — natively pairs with Durable Objects adapter + +The pitch to each: "We are not your competitor. We are the durability layer your agents need." Mastra's workflow primitives, LangGraph.js's checkpointer, AI SDK's `stopWhen` — these are all _attempts_ at durability but each has gaps. TanStack Workflow becomes the layer they all compose with. + +### d. The post-graduation pipeline + +Lovable / v0 / Bolt users eventually outgrow their generator. If they're already using TanStack Workflow when they "eject" or migrate to a real codebase, they keep using it. Same for the Tauri/Expo crowd that starts with TanStack DB for local-first apps and then needs background jobs. The library inherits the persistence story across mobile, desktop, web, edge — same code everywhere. + +### e. Cloudflare / edge ecosystem alignment + +Cloudflare's first-party Workflows product is fine but locks you in. CF's DevRel will signal-boost a deployment-agnostic library that ships a _great_ DO adapter — because it makes Workers more attractive for AI workloads without the lock-in story being a blocker. Same dynamic with Bun, Deno, and Fly. Each platform team has incentive to amplify the library that makes their runtime look good for workflows. + +### f. Integration ecosystem partners + +The natural webhook → workflow integrations write themselves: + +- **Clerk / WorkOS / Stack Auth** → user lifecycle workflows (onboarding, billing, deprovisioning) +- **Stripe / Polar / Lemon Squeezy** → payment lifecycle workflows +- **Resend / Loops / Postmark** → email orchestration with retries +- **Anthropic / OpenAI / Replicate / fal** → long-running LLM jobs +- **Trigger.dev / Inngest** (yes, the competitors) → adapters that wrap TanStack Workflow definitions so users can move workloads in both directions + +Each of these is a co-marketing opportunity. "TanStack Workflow + Resend in 5 minutes" lands as a tutorial that gets indexed and trained on. + +### g. Mobile and desktop runtime parity + +Already half-solved by TanStack DB. React Native + Expo with SQLite, Tauri desktop, Electron — same `@tanstack/workflow-sqlite` adapter works in all of them. Most workflow libraries can't even consider these targets. Mobile push notifications driving workflow signals, offline-first apps with locally-queued workflows, Tauri apps with embedded durable jobs — this is uncontested territory. + +### h. The Nozzle dogfood credibility play + +Nozzle does crawling, ranking, ETL — heavy workflow workloads at real scale. Battle-testing TanStack Workflow at Nozzle in production before public launch buys exactly the credibility Temporal got from Uber/Cadence and Inngest got from being founded by Twilio veterans. "Running in production at Nozzle for six months before launch" is a launch-tweet hook by itself. + +### i. The scaffold defaults play + +- **TanStack Start scaffold ships with `@tanstack/workflow` wired in by default** +- **create-t3-app** integration tier +- **shadcn-style registry** — `npx tanstack-workflow add email-onboarding` drops in a working onboarding flow with adapter + types +- **Vercel / Cloudflare / Bun / Railway templates** in their respective marketplaces +- Vite / Astro / Hono / Hattip / Nitro examples in the library docs + +### j. The conference / content cadence + +Workflow ships → talk at React Summit, JS Nation, ViteConf, Cloudflare Connect, AI Engineer Summit. Series of "Workflow Patterns" posts. YouTube series on building durable agents. Theo / Web Dev Cody / Lee Robinson coverage. The library has more launch surface area than most TanStack libraries because it spans the full stack (frontend hooks → server functions → durable execution → AI agents) instead of being purely client-side. + +### k. The "no platform required" tweet + +The single sharpest one-line pitch for distribution: + +> **Inngest, but it's just a library. Temporal, but you don't run Cassandra. Trigger.dev, but no docker compose. Mastra, but durable. Your existing Postgres is the engine.** + +That's the tweet that gets quote-retweeted. + +--- + +## 10. The bottom line + +You have a hedge. The TS workflow space is crowded but the leaders have all converged on the same vendor-shaped, lock-in-prone, control-plane model. Nobody has built **"durable execution that's just a library, with types that flow end-to-end, devtools that drop into your app, and adapters for every deployment target."** That's the TanStack-shaped gap. Lead with type safety, follow with self-host-first, treat AI streaming as the flagship example not the headline. + +The execution risk is the engine itself, not the positioning. Building a durable engine that handles retry storms, deterministic replay edge cases, in-flight versioning, and Postgres-at-scale is genuinely hard work — but the architecture is well-understood and you have the pieces (TanStack DB persistence, Start adapters, DevTools framework, ecosystem distribution) to compose rather than build everything from scratch. + +The market wants this. The Diagrid post arguing that checkpoint-based agent frameworks aren't really durable execution, the steady drumbeat of Inngest pricing complaints, the fragmentation of AI workflow tools — all signal demand for a principled, type-first, deployment-agnostic answer. TanStack is uniquely positioned to be that answer. + +Ship it. diff --git a/research/SRC_SKEW_AND_RESUMPTION.md b/research/SRC_SKEW_AND_RESUMPTION.md new file mode 100644 index 0000000..a73cdb1 --- /dev/null +++ b/research/SRC_SKEW_AND_RESUMPTION.md @@ -0,0 +1,251 @@ +# Source Skew & Resumption — What's There, What's Missing + +> Detailed assessment of how `@tanstack/ai-orchestration` handles the two hardest durable-execution problems: source-code changes between deploys (skew) and resuming runs after process restarts. With a concrete punch list of what's missing. + +## TL;DR + +Alem's design uses the right **default** — strict fingerprint refusal, no silent corruption — but the **operational story for long-running workflows is undertold**. Three real gaps: + +1. **Fingerprint is whitespace-sensitive** because it hashes `Function.prototype.toString()`. Prettier reformat, minifier choice, or build-tool bump → spurious mismatches → in-flight runs killed. +2. **Patched mode is correctness-by-discipline.** Nothing prevents a developer from adding a yield without wrapping it in `patched()`, which silently shifts positional indices for in-flight runs in patch-versioned mode. +3. **No automatic side-by-side version drain.** When fingerprint mismatches, runs error out. The `selectWorkflowVersion` registry exists but requires caller-supplied versioning and explicit deploy-pipeline integration — not automatic. + +For workflows under 24 hours, this is fine. For 1–7 day workflows, manageable with discipline. For 30-day workflows, painful unless the gaps below are filled. + +--- + +## What Alem actually built + +### The fingerprint + +Lives in `packages/typescript/ai-orchestration/src/engine/fingerprint.ts`. Computes a 64-bit FNV-1a hash (rendered as base36) covering: + +- The workflow's `name` +- `run.toString()` — the entire run function source as text +- `initialize.toString()` if present +- Each declared agent's name + `run.toString()` +- Nested workflows recursively (with cycle detection via `WeakSet`) + +Stored on the run record when it starts. On resume: + +- Compute current fingerprint +- Compare to stored +- If different → emit `RUN_ERROR { code: 'workflow_version_mismatch' }` and **refuse to drive the generator** + +The header comment is explicit: + +> "Source strings come from `Function.prototype.toString()` — production builds may minify, so the fingerprint is sensitive to whitespace and symbol renaming. That's the conservative choice (Temporal does the same): false-positive mismatches force a redeploy decision rather than silently corrupting an in-flight run." + +This is **the right default**. Refusing to drive a fresh generator through a log whose positional indices may no longer line up is far better than corrupting an in-flight workflow. Tanner's concern is valid; Alem's instinct is also right. + +### The patched mode + +Declaring `patches: ['change-name', ...]` on a workflow switches it into patch-versioned fingerprint mode: + +- Fingerprint covers only `name + sorted patch list`, **not source text** +- Code-body changes no longer trigger `workflow_version_mismatch` +- User wraps changed code in `if (yield* patched('change-name')) { new } else { old }` +- `patched(name)` returns `false` for runs started before the patch was declared, `true` for runs started after +- The patches-subset check enforces: you can ADD patches across deploys but cannot REMOVE them while runs are in flight + +This is the proven Temporal `getVersion()` / `patched()` pattern, faithfully reproduced. + +### Cross-version registry + +`createWorkflowRegistry()` + `selectWorkflowVersion()` lets you route runs to the right workflow version based on a caller-supplied identifier. Useful when running multiple versions side-by-side, but requires: + +- Caller chooses + supplies the version at start time (`version: '2026-05-15'` in `defineWorkflow`) +- Both versions registered in memory simultaneously +- Routing logic in the host + +There is no automatic deploy-pipeline integration that holds old code in memory after a deploy. That orchestration is on the user. + +### Idempotency + +The engine passes `ctx.id` to every `step` function — a deterministic per-step ID intended for use as an idempotency key with external systems. The docs explicitly mention this. CAS conflict handling exists for the engine's own write path. Client-provided `runId` + `signalId` close the loop on the API surface. + +--- + +## Where it breaks — a failure-mode taxonomy + +Let me enumerate every realistic source change and what happens under Alem's current design. + +| # | Source change | Strict mode | Patch mode | What you actually want | +| --- | ------------------------------------------------ | ------------------------------------------------------------- | --------------------------------------------------------------- | -------------------------------------------- | +| A | Prettier reformat | **Spurious refusal** — all in-flight runs killed | Survives | Survive | +| B | Minifier difference across builds | **Spurious refusal** | Survives | Survive | +| C | Add a `console.log` | **Spurious refusal** | Survives | Survive | +| D | Edit a code comment | **Spurious refusal** | Survives | Survive | +| E | Add yield in unreached branch | Correct refusal (safe) | **Silent corruption risk** if branch later fires | Detect + force `patched()` | +| F | Add yield in active path | Correct refusal (safe) | **Silent corruption** | Detect + force `patched()` | +| G | Rename a step's string ID | Correct refusal | Silent corruption (string is positional metadata, not identity) | Position-stable, alias-friendly | +| H | Remove a yield | Correct refusal | **Silent corruption** | Detect + force `patched()` | +| I | Reorder yields | Correct refusal | **Silent corruption** | Detect + force `patched()` | +| J | Change an agent's system prompt | **Spurious refusal** (agent.run.toString() changes) | Survives (excludes source) | Survive — recorded LLM output replays anyway | +| K | Change an agent's adapter (model swap) | **Spurious refusal** | Survives | Survive | +| L | Process restart, same code | Match, resume | Match, resume | Happy path ✓ | +| M | Rolling deploy: new + old workers serve same run | **Spurious refusal** when run routes to new worker mid-flight | Survives if both have patches declared | Pin run to its version's worker pool | +| N | Schema added optional field | Spurious refusal | Survives | Survive; new field absent on old runs | +| O | Schema added required field | Spurious refusal | Survives but new field undefined in old runs | Migration-aware | +| P | Throw different error message | **Spurious refusal** | Survives | Survive | +| Q | Inline a helper function | **Spurious refusal** | Survives if no new yields | Survive | + +The pattern: **strict mode is too coarse — most changes that don't affect yield structure still kill in-flight runs. Patch mode is too permissive — most changes that affect yield structure silently corrupt unless the developer remembered to wrap them in `patched()`.** + +Both regimes share the same root cause: **identity is positional, fingerprinting is textual.** Position drift causes corruption; text drift causes spurious refusal. Neither matches what we actually want, which is _structural_ identity that's resilient to formatting but precise to yield shape. + +--- + +## The missing layer + +Five concrete additions close the gaps. Each is independent; pick the ones that matter most. + +### 1. AST-based fingerprint instead of source-text + +Replace `Function.prototype.toString()` hashing with an AST walk that extracts: + +- Number, position, and _kind_ of yields in the run function (`step`, `sleep`, `waitForSignal`, `approve`, agent call, `patched`) +- The string literals that name yields (`step('charge', ...)` → identity `step:charge`) +- For agent calls: the _key_ in `agents.foo`, not the agent's source +- For nested control flow: just count yields per branch; don't hash branch bodies +- For agents: name + input/output schema shape, not the prompt or implementation + +Outcome: + +- Cases A, B, C, D, J, K, P, Q (whitespace, minifier, comments, prompts, model swaps, formatting) → **survive** +- Cases E, F, G, H, I (any structural yield change) → **still refuse**, with a _specific_ error message: "yield #4 was `step:charge` in the original run but is now `step:charge-card`" +- Cases N, O (schema changes) → diffable, can warn vs error per-field + +Implementation cost: ~300 LoC TypeScript AST walker. The TypeScript compiler API gives you everything you need; works at build time (preferred) or via `ts-morph` at runtime if needed. + +### 2. Build-time fingerprint generation + pinning + +Compute the fingerprint at build time, embed it in the bundle, ship it alongside the run record. Two wins: + +- Eliminates "runtime fingerprint differs from runtime fingerprint" entirely (cross-build instability) +- Lets the deploy pipeline see "this build has fingerprint X" and decide whether to drain in-flight runs vs. take them over + +Provide `@tanstack/workflow-vite`, `-swc`, `-esbuild`, `-rolldown` plugins that emit a `workflow-manifest.json` listing every workflow's fingerprint + structural shape. The host reads this at startup and registers known fingerprints for resume. + +### 3. ESLint rule: enforce `patched()` for yield structure changes + +A lint rule that runs in patch-versioned mode and: + +- Tracks the previous build's structural yield manifest (committed to the repo, e.g. `.tanstack/workflow-manifest.json`) +- Detects added / removed / reordered yields +- Requires each change to be inside a `if (yield* patched('name')) { ... }` block +- Errors at lint time: "You added `yield* step('foo')` outside a `patched()` gate. In-flight runs will be corrupted. Either wrap it in `patched()` or accept that all in-flight runs of `article-workflow` will fail on resume." + +This catches the patch-mode footgun before it ships. Without it, patch mode is correctness-by-discipline; with it, the toolchain enforces the discipline. + +### 4. Automatic side-by-side version drain + +Today: when a fingerprint mismatches, the run errors. The user can manually run multiple versions via `selectWorkflowVersion` if they thought to set it up. + +Better default: the engine accepts a `previousVersions: [...]` array on `defineWorkflow` (or auto-detects them from a `previousManifests` directory): + +```typescript +defineWorkflow({ + name: 'article-workflow', + run: async function* (...) { ... }, // current + previousVersions: [ + { fingerprint: 'old-fp-1', run: oldRun1 }, + { fingerprint: 'old-fp-2', run: oldRun2 }, + ], +}) +``` + +The deploy pipeline (or the Vite plugin) can populate `previousVersions` from the last N committed manifests. On resume: + +- Look up the run's stored fingerprint +- If it matches current → drive current +- If it matches a previous version → drive that previous version's code +- If it matches nothing → error (the run is truly orphaned) + +In-flight runs continue on their original code; new runs use the latest. **No manual drain coordination required.** + +After N days (configurable per workflow) the old code is dropped from the registry and any remaining runs error out. The engine surfaces a metric: "5 runs are still on fingerprint old-fp-1, expiring in 3 days." + +### 5. Stuck-run inspector + +When a run _does_ end up orphaned (fingerprint not matched anywhere), instead of just emitting `RUN_ERROR`, the engine should: + +- Snapshot the partial state + step log into a separate "orphaned runs" store +- Expose an inspector API: list orphaned runs, show their last completed step, show the original input +- Provide three escape hatches: + - **Abandon** — mark the run as dead, emit cleanup events + - **Restart from scratch** — re-enqueue with the original input + - **Manual advance** — operator picks "the run is conceptually at step N; please continue from there as if step N had succeeded with this value" + +This is the human-in-the-loop safety net for when automation fails. + +--- + +## How the layered story should read in docs + +The library should be honest about the operational story per workflow duration tier: + +### Tier 1: Workflows under 1 hour + +> Ship code freely. In-flight runs at deploy time complete in minutes. Default strict mode is fine. + +### Tier 2: Workflows 1 hour to 24 hours + +> Default strict mode is fine. Use AST-based fingerprint (above) to avoid spurious refusals from formatting. Expect the occasional in-flight run to error on resume after a deploy that touches workflow source — accept that, or wait to deploy. + +### Tier 3: Workflows 1 to 7 days + +> Use patch-versioned mode + the ESLint rule. Plan code changes that touch workflow source through `patched()` gates. Use automatic side-by-side drain (`previousVersions`) so in-flight runs stay on their original code. + +### Tier 4: Workflows over 7 days + +> Use `selectWorkflowVersion` + explicit versioning + deploy-pipeline integration. Treat workflow code like a public API — versioned, with explicit migration paths. Plan for a year-long "long tail" of old-version runs after each significant change. + +### Tier 5: Workflows over 30 days + +> You're in Temporal territory. Either accept abandonment as a cost, or pre-plan every workflow change months in advance with `patched()` gates. The TanStack Workflow library can do this with discipline, but it's expensive operationally. + +The library should refuse to silently make any tier "just work." It should make the operational requirements _visible_ per tier. + +--- + +## What this means for the standalone library + +The bottom line update in [PRIOR_ART_AI_ORCHESTRATION.md](PRIOR_ART_AI_ORCHESTRATION.md) said "productize and extend an existing internal workflow library — eight months instead of fourteen." That's still true, but the **extending** is more substantial than the inventory implied: + +**Engine-level additions (the new repo should ship these):** + +1. AST-based fingerprint engine (replaces or augments source-text fingerprint) +2. Build-time fingerprint plugins (Vite / SWC / esbuild / Babel / Rolldown) +3. `previousVersions` auto-drain registry on `defineWorkflow` +4. ESLint plugin: `@tanstack/eslint-plugin-workflow` with `enforce-patched-on-yield-change` rule +5. Orphaned-run inspector + escape-hatch API +6. Documented operational tier guide + +**Adapter ecosystem (already in PRIOR_ART_AI_ORCHESTRATION.md):** + +7. Storage adapters: postgres / sqlite / d1 / durable-object / redis +8. Runtime adapters: cron / worker / do-alarm +9. Framework bindings: solid / vue / svelte +10. Start integration +11. Devtools + +The engine work is the harder half. The skew + resumption story is what differentiates a credible production durable-execution library from a clever prototype. Temporal earned its reputation by being unimpeachable on these axes. TanStack Workflow has to clear the same bar. + +**Good news:** Alem's design is correct on the defaults (refusal-first). The work above is additive, not corrective. It strengthens an already-sound foundation. + +**Honest news:** This work is real. Two to three months of engine engineering, on top of the adapter ecosystem build-out. Worth doing right; not worth shipping half-done. + +--- + +## Conversation to have with Alem + +In addition to the questions in [PRIOR_ART_AI_ORCHESTRATION.md](PRIOR_ART_AI_ORCHESTRATION.md): + +1. **What's the intended operational story for >24h workflows?** Is the AI use case mostly request-scoped (a single user session), in which case strict mode is enough? Or are there 7-day+ pipelines on his roadmap? +2. **Has he hit the whitespace-sensitivity issue in practice yet?** If the existing test suite exercises real deploys, he'd know whether spurious refusals are a problem. +3. **Open to AST-based fingerprinting?** The implementation cost is real but the operational win is large. Probably a multi-week project. +4. **What's missing from the patch-mode footguns list above?** He's lived in this code for 10 days; he knows the edges better than I do from a 30-minute read. +5. **Build-tool integration appetite.** Vite plugin for build-time fingerprinting + lint manifest export is the natural next step. Worth doing in `ai-orchestration` or in the new workflow repo? + +The strategic shape doesn't change. The new repo extracts + extends. The extending is just bigger and more important than the inventory alone suggested.