Skip to content

fix(invalidate/store): wire reconcileWrite into the live write path; BEGIN IMMEDIATE for read-modify-write#9

Open
JoshKappler wants to merge 1 commit into
coder-company:mainfrom
JoshKappler:fix-concurrent-writes
Open

fix(invalidate/store): wire reconcileWrite into the live write path; BEGIN IMMEDIATE for read-modify-write#9
JoshKappler wants to merge 1 commit into
coder-company:mainfrom
JoshKappler:fix-concurrent-writes

Conversation

@JoshKappler

Copy link
Copy Markdown

Two concurrency holes in durable writes, both invisible to a green suite because each needs a second writer.

1. The no-silent-LWW guard never runs in production. STATUS §14 advertises "resolveConflicts + reconcileWrite optimistic concurrency (never silent LWW)", but reconcileWrite has no live call sites. On the real path (learn() -> classifyRelation), two parallel sessions' default remembers (workspace scope, user-asserted, equal precedence) classify as refines: the newer fact supersedes the older. A SUPERSEDES edge gets recorded, but no dispute is surfaced, so the loser dies quietly. The parallel-conflict eval stayed green because section 5 called reconcileWrite directly instead of going through learn().

Fix: at equal precedence between two user-asserted facts, the invalidator routes the refines decision through reconcileWrite. A writer that passes base_seen_fact_id (the fact it read before writing, from recall or why) supersedes cleanly. A writer that never saw the value it contradicts gets conflicts: both facts disputed, CONFLICTS_WITH edge, conflict note at injection, resolve_conflict settles it. Deterministic parser facts are exempt; their evidence is commit-anchored, so re-extraction still refreshes values without disputes.

base_seen_fact_id is threaded through Runtime.learn, rememberFact, and the MCP remember tool (optional input; still exactly 8 tools, I8 untouched). Interactive writers do the read-then-write themselves: the CLI remember command and the TUI prompt resolve the current fact at write time (new Runtime.currentFactIdFor) and pass it as their base, so a human deliberately updating a value supersedes exactly as before, and the core-memory-lifecycle eval's remember -> supersede check stays green. The asymmetry is deliberate: the human at the terminal sees the value they're replacing; a parallel agent session usually doesn't.

Behavior change to flag: an MCP remember that contradicts an existing equal-precedence value without base_seen_fact_id now returns status: "disputed" instead of silently winning. That's the SPEC §14 contract; agents get the base id from recall/why. The eval's section 5 now drives real learn() races and asserts the new outcome mix.

2. Overlapping promotion sweeps lose promotions permanently. tx() is BEGIN deferred, and the sweep reads candidates then writes inside one deferred transaction. Two processes on one WAL DB (a SessionEnd hook racing the MCP daemon's checkpoint or promote) both snapshot; the first commits; the second's first write throws SQLITE_BUSY_SNAPSHOT, which busy_timeout does not retry. The hook swallows the error (I9), and since the sweep is scoped to the now-ended session, no later sweep revisits those candidates. The promotions are gone, not delayed.

Fix: txImmediate in store/db.ts (both drivers expose .immediate on wrapped transactions), used by the probation sweep, reviewFactForWorkspace, and the invalidator's apply path. The writer lock is taken at BEGIN, so concurrent read-modify-write serializes through busy_timeout and each sweep runs on a fresh snapshot.

New tests: test/invalidate/concurrent-writes.test.ts (dispute without base seen, clean supersede with it, the interactive currentFactIdFor flow, precedence and parser-refresh guards) and test/store/tx-immediate.test.ts (two connections on one file DB: reproduces SQLITE_BUSY_SNAPSHOT under deferred, proves the immediate tx commits while the concurrent writer fails fast and retryable, plus wiring checks that the sweep and apply run immediate). Counters 292 -> 301.

Verification (Windows 11, Node 22):

npx tsc --noEmit                  # clean
npx biome check src test          # clean
npx vitest run                    # 289/301; the 12 failures are the pre-existing
                                  # win32 set the Windows PR fixes
npx tsx src/cli.ts eval conflict  # 62/62, silentOverwrites 0, real learn() races

eval all can't run on Windows without the Windows PR's .cmd fix, so the full battery ran on a local merge of the two branches (counters 306): vitest 306/306, eval all 19/19 including the core-memory-lifecycle remember -> supersede check, bench within budget.

Note: all three PRs bump the README/STATUS test counters off the same base (292), so whichever merges second and third will conflict there. I'll rebase the counters as each one lands.

…BEGIN IMMEDIATE for read-modify-write

Two concurrency holes in durable writes, both invisible to a green suite
because each needs a second writer.

1. The advertised no-silent-LWW guard never ran in production. STATUS
§14 claims 'resolveConflicts + reconcileWrite optimistic concurrency
(never silent LWW)', but reconcileWrite had zero live call sites. On
the real path (learn -> classifyRelation), two parallel sessions'
default remembers land at workspace scope with equal precedence and
classify as refines: the newer fact supersedes the older. An edge is
recorded but no dispute is surfaced. The parallel-conflict eval stayed
green because its 'real Runtime concurrent-session stress' section
called reconcileWrite directly instead of going through learn().

- src/invalidate/invalidator.ts: at equal precedence between two
  user-asserted facts, route the refines decision through
  reconcileWrite. base_seen_fact_id matching the existing fact ->
  clean supersede; no base seen -> conflicts (both disputed,
  CONFLICTS_WITH edge, conflict note at injection, resolve_conflict
  settles it). Deterministic parser facts are exempt: their evidence
  is commit-anchored, so re-extraction still refreshes values.
- src/runtime.ts: learn() takes { baseSeenFactId }; RememberFactInput
  gains baseSeenFactId; new currentFactIdFor(subject, predicate,
  sessionId?) returns the active same-scope fact a remember would
  collide with.
- Interactive writers do read-then-write themselves: the CLI remember
  command and the TUI 'n' prompt resolve currentFactIdFor at write
  time and pass it as the base, so a deliberate human update
  supersedes exactly as before (the core-memory-lifecycle eval's
  remember->supersede check stays green). Agents get no such
  auto-base: MCP remember accepts an optional base_seen_fact_id
  (still exactly 8 tools; I8 untouched), and without it a
  contradicting equal-precedence write returns status 'disputed'
  instead of silently winning. That asymmetry is the point: the
  human at the terminal sees the value they're replacing; a parallel
  agent session usually doesn't.
- src/eval/suites/parallel-conflict.ts: section 5 now drives real
  learn() races (two Runtimes, one store) instead of calling
  reconcileWrite directly; runner is async. cli.ts awaits it;
  test/eval wrappers updated.
- test/runtime/forget.test.ts: the sequential-update case now passes
  the base it saw, matching the CLI's behavior.

2. Overlapping promotion sweeps lose promotions permanently. tx() is
BEGIN deferred; the sweep reads then writes in one deferred tx. Two
processes on one WAL DB (SessionEnd hook vs the MCP daemon's
checkpoint/promote) both snapshot, the first commits, the second's
first write throws SQLITE_BUSY_SNAPSHOT - busy_timeout does NOT retry
snapshot conflicts - and hooks swallow it (I9). The sweep is scoped to
the ended session, so no later sweep revisits those candidates.

- src/store/db.ts: txImmediate (both drivers expose .immediate on
  wrapped transactions). src/store/facts.repo.ts: transactionImmediate.
- src/promote/probation.ts (sweep + reviewFactForWorkspace) and the
  invalidator apply path now run BEGIN IMMEDIATE, so concurrent
  read-modify-write serializes at BEGIN via busy_timeout and each
  sweep runs on a fresh snapshot.

Tests: test/invalidate/concurrent-writes.test.ts (dispute without base
seen, clean supersede with it, interactive currentFactIdFor flow,
precedence + parser-refresh guards); test/store/tx-immediate.test.ts
(two connections, one file DB: reproduces SQLITE_BUSY_SNAPSHOT under
deferred, proves txImmediate commits while the concurrent writer fails
fast with retryable SQLITE_BUSY, wiring checks for sweep + apply).
Counters 292 -> 301. docs/STATUS.md §14 row + AGENT_USAGE.md remember
params updated.

Verified on Windows 11 (Node 22): npx tsc --noEmit clean; npx biome
check src test clean; npx vitest run 289/301 with only the 12
pre-existing win32 failures (fixed separately in the Windows PR);
npx tsx src/cli.ts eval conflict 62/62 PASS with silentOverwrites=0
across real learn() races. Full 'eval all' cannot run on Windows
without the Windows PR's .cmd fix; 19/19 verified on a local
integration merge of the two branches, including the
core-memory-lifecycle remember->supersede check.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant