Skip to content

feat: [ENG-2494] FileTaskHistoryStore: Stale recovery on read#562

Merged
bao-byterover merged 1 commit intoproj/persis-task-historyfrom
feat/ENG-2494
Apr 27, 2026
Merged

feat: [ENG-2494] FileTaskHistoryStore: Stale recovery on read#562
bao-byterover merged 1 commit intoproj/persis-task-historyfrom
feat/ENG-2494

Conversation

@bao-byterover
Copy link
Copy Markdown
Collaborator

No description provided.

@bao-byterover bao-byterover requested a review from leehpham as a code owner April 27, 2026 09:00
@bao-byterover bao-byterover merged commit 58e2ab3 into proj/persis-task-history Apr 27, 2026
3 checks passed
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 27, 2026

Claude finished @bao-byterover's task in 3m 24s —— View job


PR Review: feat: [ENG-2494] FileTaskHistoryStore: Stale recovery on read

  • Gather context
  • Read implementation file
  • Read test file
  • Analyze for correctness, security, and code quality
  • Post review feedback

Summary

Score: 4/5

Solid, well-scoped implementation of stale recovery. The design choices are sound: best-effort persistence (both writes catch their own failures), defensive idempotency in recoverViaTaskId (returns terminal projection without re-recovering), and sequential processing in the readIndexDedup recovery loop to avoid index interleaving. Test coverage is thorough and the Number.POSITIVE_INFINITY default threshold on legacy fixtures is a clever guard.

Two inline comments have been posted. Remaining notes below:


Issues

getById double Date.now() call (nitpick)
isStale(entry.status, entry.createdAt, Date.now()) and buildRecovered(entry, Date.now()) call Date.now() independently. Almost always harmless, but capturing const now = Date.now() once is more precise and consistent with how readIndexDedup does it. See inline comment.

Concurrent readIndexDedup can race with in-flight persistRecovery (issue)
During the await Promise.all(...) inside persistRecovery, Node.js yields control. A second concurrent readIndexDedup call could arrive, find this.indexCache === undefined, and begin its own index read — potentially before the recovery lines are flushed to disk. In the worst case, two recovery passes run simultaneously for the same stale entry. The idempotency guard in recoverViaTaskId (checking for terminal status in the data file) limits the blast radius, but it's worth a brief comment noting the known behavior. See inline comment.

File header comment still lists stale recovery as future work (nitpick)
Line 15 in the doc comment says: stale recovery (M2.04), and delete/clear (M2.05) land in subsequent tickets. — stale recovery is now done; the comment should be updated to reflect that.

! non-null assertions in tests (nitpick, CLAUDE.md)
fetched!.status, fetched!.error, result![0] in the stale recovery describe block use non-null assertions. Per the project's TypeScript guidelines, these should be narrowed with a guard or assert.ok(fetched) to let TypeScript narrow the type cleanly.


Positives

  • Atomic data write via writeAtomic for recovery — same safety guarantee as initial saves.
  • recoverViaTaskId defensive idempotency — if a prior partial-success recovery left the data file in a terminal state but the index line stale, it returns the terminal projection without re-persisting. Prevents unbounded recovery loops.
  • Test beforeEach disables stale recovery by default (staleThresholdMs: Number.POSITIVE_INFINITY) — smart isolation of legacy fixtures from the new threshold logic.
  • no-await-in-loop eslint disable is justified inline — sequential processing intent is clearly documented.
  • Best-effort semantics are explicit and tested — the write-failure test confirms in-memory recovery still returns correctly even when appendFile throws EACCES.

@bao-byterover bao-byterover deleted the feat/ENG-2494 branch April 27, 2026 09:01
return undefined
}

if (this.isStale(entry.status, entry.createdAt, Date.now())) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nitpick: Date.now() is called twice in quick succession — once for isStale and once for buildRecovered. In the (extremely unlikely) event they straddle a millisecond boundary, completedAt will be slightly later than the staleness check. Capturing once is cleaner:

Suggested change
if (this.isStale(entry.status, entry.createdAt, Date.now())) {
const now = Date.now()
if (this.isStale(entry.status, entry.createdAt, now)) {
const recovered = this.buildRecovered(entry, now)
await this.persistRecovery(recovered)
return recovered
}

)
}),
])
this.indexCache = undefined
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue (concurrency edge case): this.indexCache = undefined is set after the Promise.all resolves, but during the two async writes, another concurrent call to readIndexDedup could arrive. Since the cache is already cleared by line 278 only after the awaits complete, the concurrent call would find this.indexCache still set to the in-progress-but-not-yet-committed-to-disk state.

Actually more subtle: during recoverViaTaskId → persistRecovery → await Promise.all(...), control can yield. A concurrent readIndexDedup call sees this.indexCache === undefined (cleared from a previous persistRecovery) and kicks off a fresh index read — which may not yet have the recovery line appended. This means two recovery passes could race.

Since the recovery is explicitly best-effort and the idempotency guard in recoverViaTaskId (line 349) handles partial-success data files safely, this is low-impact. But worth a brief comment noting the known concurrent-access behavior to help future readers.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant