Skip to content

Split the codex parser hash into session and priority-turns domains#1422

Closed
ProspectOre wants to merge 8 commits into
steipete:mainfrom
ProspectOre:perf/codex-parser-hash-domains
Closed

Split the codex parser hash into session and priority-turns domains#1422
ProspectOre wants to merge 8 commits into
steipete:mainfrom
ProspectOre:perf/codex-parser-hash-domains

Conversation

@ProspectOre

Copy link
Copy Markdown
Contributor

Stacked on #1404 and #1421 — the diff includes their commits until they land; the change specific to this PR is the last commit. I'll rebase as the stack merges.

Problem

CodexParserHash is one hash over every non-Claude source in Vendored/CostUsage/, and it producer-keys the codex session cost cache. Any change to codex priority-turns parsing therefore invalidates the entire session cache and forces every user to re-parse their full ~/.codex/sessions corpus on update — even though session JSONL parsing didn't change a byte.

This isn't hypothetical: #1404 and #1421 each touch only *CodexPriority* sources, and each would trigger a full session re-parse for every user on release day. The 0.32.x "release-day lag wave" pattern (#1387) was exactly this class of invalidation.

Change

Split the generated hash into two domains:

  • CodexParserHash.sessions — non-Claude, non-priority-turns sources → producer-keys the codex session cost cache (codex:cu:p…)
  • CodexParserHash.priorityTurns*CodexPriority* sources → producer-keys the priority-turns memo artifact from Persist the codex priority-turns memo across launches #1421 (codex:pt:p…)

Correctness is preserved by machinery that already exists: when priority parsing changes, the priority artifact invalidates and re-derives, and the changed turn set flows through codexPriorityTurnKeys / changedPriorityTurnIDs, which already re-derives only the session files whose cached codexTurnIDs intersect the changed turns (cachedCodexFileNeedsPriorityRescan). Files untouched by priority changes keep their cache.

A session-parsing change still invalidates the session cache exactly as before.

Runtime Proof (mechanism, live script run)

Editing a priority-turns source and regenerating: the sessions hash is byte-identical, only the priority domain moves — and reverting restores it:

=== baseline (current sources) ===
    static let sessions = "91ab91ab2dd5246b"
    static let priorityTurns = "6d64628ee98b719a"
=== after editing a priority-turns source ===
    static let sessions = "91ab91ab2dd5246b"
    static let priorityTurns = "3db2184d58c628d5"
=== reverted ===
    static let sessions = "91ab91ab2dd5246b"
    static let priorityTurns = "6d64628ee98b719a"

Under the previous single-domain hash, that same edit changed the global value and would have producer-key-invalidated the session cost cache (observable in this repo's history: every CodexParserHash bump in #1404's commits was a forced full re-parse for all users).

Migration cost

One-time: this PR changes the session producer key format itself, so the release that ships it triggers one final full re-parse — after which priority-only parser changes stop causing them.

Validation

  • swift test --filter 'CostUsageCacheTests|CostUsageScannerCodexPriority' — all pass, including the updated domain-key test asserting codex:cu:p<sessions> / codex:pt:p<priorityTurns>.
  • Scripts/regenerate-codex-parser-hash.sh check — passes; check mode validates both domains.
  • make check — clean.

No CHANGELOG edit per current review guidance (release-owned).

ProspectOre and others added 7 commits June 10, 2026 20:50
makeCodexRefreshPlan re-ran a full-table double-LIKE query over the
codex CLI trace database on every refresh past the scan interval. The
logs table only appends (INTEGER PRIMARY KEY AUTOINCREMENT), so the
query result is now accumulated per database in process memory and only
rows appended since the last call are examined. The database shrinking
or being replaced (file identity change) and window expansion trigger a
full rescan; windows ending before today keep the original bounded
one-shot query so historical lookups never pay an open-ended scan.
…efreshes

Two overlapping refreshes could both read the same memo, scan independently,
and write back out of order, letting an older cursor replace newer accumulated
state. Stored state now only advances unless coverage expands or the file is
replaced. Also drop the release-owned CHANGELOG edit per review.
Whether a turn is priority is unknowable when its completion row is parsed,
so non-priority completions accumulate in completedModelsByTurnID for the
process lifetime. Evict the oldest completions beyond a fixed retention
limit, keeping memory constant while preserving completion-before-request
ordering and late model upgrades within the retention window.
The rowid-cursor memo keeps live refreshes incremental, but it is process-local:
every app launch and every CLI invocation pays one full trace-database scan
before incremental refreshes resume. Persist the memo to a versioned cost-usage
artifact (parser-hash producer key, atomic replace-on-write) and seed it once
per process through the monotonic store, so the first refresh after launch
resumes from the persisted cursor.
…ains

One hash over every non-Claude CostUsage source producer-keys the session
cost cache, so a priority-turns-only parser change forces every user to
re-parse their full sessions corpus on update. Hash the two domains
separately: session parsing keeps invalidating the session cache, while
priority-turns changes invalidate only the priority memo artifact and flow
through the existing changedPriorityTurnIDs targeted re-derivation.
@clawsweeper

clawsweeper Bot commented Jun 11, 2026

Copy link
Copy Markdown

Codex review: needs maintainer review before merge. Reviewed June 11, 2026, 3:24 AM ET / 07:24 UTC.

Summary
The branch adds incremental and persisted Codex priority-turn memoization from its prerequisite stack, then splits parser hashes so priority-only changes preserve the session cache while shared scanner changes invalidate both domains.

Reproducibility: not applicable. as a conventional failing bug; the invalidation behavior is reproducible through controlled priority-only, session, and shared-source edits followed by hash regeneration.

Review metrics: 2 noteworthy metrics.

  • Latest repair: 2 files, +16/-8. The response to the previous finding is narrow and directly makes priority-memo invalidation dependency-complete.
  • Current stacked surface: 8 commits, 8 files, +682/-46. Most of the reviewed merge surface belongs to two open prerequisite branches rather than the isolated hash split.

Merge readiness
Overall: 🐚 platinum hermit
Proof: 🦞 diamond lobster
Patch quality: 🐚 platinum hermit
Result: ready for maintainer review.

Overall follows the weaker of proof and patch quality, so missing proof can cap an otherwise strong patch.

Rank-up moves:

  • Rebase to the isolated hash-domain diff after the prerequisite PRs land or stabilize.

Risk before merge

Maintainer options:

  1. Rebase after prerequisites settle (recommended)
    Wait for both prerequisite PRs to land or stabilize, then rebase and review the isolated hash-domain migration against their final artifact API.
  2. Review and land the full stack
    Merge the current branch only if maintainers intentionally review and accept all bundled memo-state changes and the one-time cache rebuild as one unit.

Next step before merge

  • No automated repair is currently needed; maintainers should review the final rebased patch after its two open prerequisites settle.

Security
Cleared: No concrete security or supply-chain concern was found in the local source-hashing script, generated constants, cache producer keys, or tests.

Review details

Best possible solution:

Land or finalize the prerequisite memo changes, then rebase and merge the isolated dependency-complete hash split with its producer-key assertions and controlled source-edit proof intact.

Do we have a high-confidence way to reproduce the issue?

Not applicable as a conventional failing bug; the invalidation behavior is reproducible through controlled priority-only, session, and shared-source edits followed by hash regeneration.

Is this the best way to solve the issue?

Yes in design: the asymmetric priority superset is dependency-complete without a fragile hand-maintained list and preserves the expensive session cache for priority-only edits; final approval should follow an isolated rebase.

AGENTS.md: found and applied where relevant.

Codex review notes: model internal, reasoning high; reviewed against 9015e94901c4.

Label changes

Label changes:

  • add rating: 🐚 platinum hermit: Overall readiness is 🐚 platinum hermit; proof is 🦞 diamond lobster and patch quality is 🐚 platinum hermit.
  • add status: 👀 ready for maintainer look: ClawSweeper has no concrete contributor-facing blocker left for this PR. Sufficient (live_output): The PR provides after-fix live script output showing the expected hash movement for priority-only, session, and shared-helper edits, including restoration to baseline.
  • remove rating: 🦪 silver shellfish: Current PR rating is rating: 🐚 platinum hermit, so this older rating label is no longer current.
  • remove status: ⏳ waiting on author: Current PR status label is status: 👀 ready for maintainer look.

Label justifications:

  • P2: This is a bounded performance and cache-invalidation improvement with meaningful but non-emergency user impact.
  • merge-risk: 🚨 compatibility: The session producer-key format changes and intentionally invalidates existing session caches once during upgrade.
  • merge-risk: 🚨 session-state: The new domains determine when persisted parser-derived session and priority-turn state is reused or recomputed.
  • rating: 🐚 platinum hermit: Overall readiness is 🐚 platinum hermit; proof is 🦞 diamond lobster and patch quality is 🐚 platinum hermit.
  • status: 👀 ready for maintainer look: ClawSweeper has no concrete contributor-facing blocker left for this PR. Sufficient (live_output): The PR provides after-fix live script output showing the expected hash movement for priority-only, session, and shared-helper edits, including restoration to baseline.
  • proof: sufficient: Contributor real behavior proof is sufficient. The PR provides after-fix live script output showing the expected hash movement for priority-only, session, and shared-helper edits, including restoration to baseline.
Evidence reviewed

What I checked:

Likely related people:

  • Peter Steinberger: Introduced the automatic Codex cost-cache invalidation and recently maintained the cache and targeted priority-rescan paths affected by this producer-key change. (role: feature owner and recent area contributor; confidence: high; commits: 5aded294c234, 69de57b85de7, 52d09da38bfe; files: Sources/CodexBarCore/Vendored/CostUsage/CostUsageCache.swift, Sources/CodexBarCore/Vendored/CostUsage/CostUsageScanner+CacheHelpers.swift, Scripts/regenerate-codex-parser-hash.sh)
What the crustacean ranks mean
  • 🦀 challenger crab: rare, exceptional readiness with strong proof, clean implementation, and convincing validation.
  • 🦞 diamond lobster: very strong readiness with only minor maintainer review expected.
  • 🐚 platinum hermit: good normal PR, likely mergeable with ordinary maintainer review.
  • 🦐 gold shrimp: useful signal, but proof or patch confidence is still limited.
  • 🦪 silver shellfish: thin signal; proof, validation, or implementation needs work.
  • 🧂 unranked krab: not merge-ready because proof is missing/unusable or there are serious correctness or safety concerns.
  • 🌊 off-meta tidepool: rating does not apply to this item.

Shiny media proof means a screenshot, video, or linked artifact directly shows the changed behavior. Runtime, network, CSP, and security claims still need visible diagnostics.

How this review workflow works
  • ClawSweeper keeps one durable marker-backed review comment per issue or PR.
  • Re-runs edit this comment so the latest verdict, findings, and automation markers stay together instead of adding duplicate bot comments.
  • A fresh review can be triggered by eligible @clawsweeper re-review comments, exact-item GitHub events, scheduled/background review runs, or manual workflow dispatch.
  • PR/issue authors and users with repository write access can comment @clawsweeper re-review or @clawsweeper re-run on an open PR or issue to request a fresh review only.
  • Maintainers can also comment @clawsweeper review to request a fresh review only.
  • Fresh-review commands do not start repair, autofix, rebase, CI repair, or automerge.
  • Maintainer-only repair and merge flows require explicit commands such as @clawsweeper autofix, @clawsweeper automerge, @clawsweeper fix ci, or @clawsweeper address review.
  • Maintainers can comment @clawsweeper explain to ask for more context, or @clawsweeper stop to stop active automation.

@clawsweeper clawsweeper Bot added proof: sufficient Contributor real behavior proof is sufficient. rating: 🦪 silver shellfish Thin PR readiness signal; proof, validation, or implementation needs work. status: ⏳ waiting on author ClawSweeper has contributor-facing work open and is waiting for author action. P2 Normal priority bug or improvement with limited blast radius. merge-risk: 🚨 compatibility 🚨 Merging this PR could break existing users, config, migrations, defaults, or upgrades. merge-risk: 🚨 session-state 🚨 Merging this PR could lose, corrupt, stale, or mis-associate session or agent state. labels Jun 11, 2026
Priority parsing depends on shared scanner helpers (timestamps, day ranges,
the scanner core), so a filename-only partition could leave the persisted
memo's producer key unchanged across a shared-dependency change. Hash all
non-Claude sources into the priority domain: shared edits invalidate both
keys by construction, while priority-only edits still leave the expensive
session cache untouched.
@ProspectOre

Copy link
Copy Markdown
Contributor Author

Addressed the [P2] dependency-completeness finding in 1351fac6:

Fix: the priority hash now covers all non-Claude sources (superset), while the sessions hash stays the non-priority subset. This is dependency-complete by construction — no hand-maintained dependency list to drift: any shared-helper change (timestamps, day ranges, scanner core) necessarily moves the priority hash. The asymmetry is deliberate: the session cache is the expensive artifact to rebuild, while a priority memo rebuild is one bounded background scan, so the memo conservatively invalidates on any scanner change. The original win is intact — priority-only edits leave the session cache untouched.

Coverage for the three edit classes (live script runs, restored to baseline after each):

=== baseline ===
sessions="91ab91ab2dd5246b"  priorityTurns="a3e303f66e535cc6"
=== priority-only edit (CostUsageScanner+CodexPriority.swift) ===
sessions="91ab91ab2dd5246b"  priorityTurns="bab7fac12782e16b"   ← session cache survives
=== session-parsing edit (CostUsageScanner+CodexFastJSON.swift) ===
sessions="9067af907244a5f7"  priorityTurns="54a1b7c01bb19370"   ← both invalidate
=== shared-dependency edit (CostUsageScanner+Timestamp.swift) ===
sessions="07791ca9d8978ad1"  priorityTurns="551e3ea431b0d20e"   ← both invalidate (the finding's gap, now closed)
=== restored ===
sessions="91ab91ab2dd5246b"  priorityTurns="a3e303f66e535cc6"

On the rebase note: agreed — I'll rebase to the final small diff as soon as the prerequisite stack (#1404, #1421) lands.

@clawsweeper re-review

@clawsweeper

clawsweeper Bot commented Jun 11, 2026

Copy link
Copy Markdown

🦞🧹
ClawSweeper re-review requested.

I asked ClawSweeper to review this item again.
Action: item re-review queued (workflow sweep.yml, event repository_dispatch).
Result: the existing ClawSweeper review comment will be edited in place when the review finishes.

Re-review progress:

@clawsweeper clawsweeper Bot added rating: 🐚 platinum hermit Good normal PR readiness with ordinary maintainer review expected. status: 👀 ready for maintainer look ClawSweeper has no concrete contributor-facing blocker left for this PR. and removed rating: 🦪 silver shellfish Thin PR readiness signal; proof, validation, or implementation needs work. status: ⏳ waiting on author ClawSweeper has contributor-facing work open and is waiting for author action. labels Jun 11, 2026
@steipete

Copy link
Copy Markdown
Owner

Superseded by the smaller measured fix in #1430, landed as dd8cf8b.

Exact-head profiling on the same 60,607-file real archive showed the priority SQLite scan was about 4% of the expired-refresh cost; repeated Foundation metadata/resource-identifier reads and cached-path validation were dominant. The landed replacement removes that cost with +124/-24 lines, no persisted cursor/memo state, and measured a 6.74s warm expired refresh versus 20.22-37.43s across the stacked candidates.

Thanks @ProspectOre for the investigation, implementation work, and benchmark evidence. Contributor credit is retained in the landed commit and changelog.

@steipete steipete closed this Jun 11, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

merge-risk: 🚨 compatibility 🚨 Merging this PR could break existing users, config, migrations, defaults, or upgrades. merge-risk: 🚨 session-state 🚨 Merging this PR could lose, corrupt, stale, or mis-associate session or agent state. P2 Normal priority bug or improvement with limited blast radius. proof: sufficient Contributor real behavior proof is sufficient. rating: 🐚 platinum hermit Good normal PR readiness with ordinary maintainer review expected. status: 👀 ready for maintainer look ClawSweeper has no concrete contributor-facing blocker left for this PR.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants