Resolve codex priority turns incrementally per refresh#1404
Conversation
|
@clawsweeper re-review |
|
🦞🧹 I asked ClawSweeper to review this item again. |
|
@clawsweeper re-run |
|
🦞👀 Command router queued. I will update this comment with the next step. |
1b5f33f to
9688235
Compare
|
Codex review: needs maintainer review before merge. Reviewed June 11, 2026, 6:47 AM ET / 10:47 UTC. Summary Reproducibility: not applicable. as a PR review; the contributor supplied real timing proof against a large live Review metrics: 2 noteworthy metrics.
Merge readiness Overall follows the weaker of proof and patch quality, so missing proof can cap an otherwise strong patch. Rank-up moves:
Risk before merge
Maintainer options:
Next step before merge
Security Review detailsBest possible solution: Land this only if maintainers intentionally want the additional warm priority-turn optimization after the merged file-stat fix, with fresh CI and release-risk acceptance for the parser-hash/cache behavior. Do we have a high-confidence way to reproduce the issue? Not applicable as a PR review; the contributor supplied real timing proof against a large live Is this the best way to solve the issue? Unclear; the PR is a focused implementation for the priority-turn subproblem, but current main already merged a lower-complexity broader cost-refresh optimization, so whether this remains the best solution is a maintainer sequencing decision. AGENTS.md: found and applied where relevant. Codex review notes: model internal, reasoning high; reviewed against dd8cf8b06ebb. Label changesLabel justifications:
Evidence reviewedWhat I checked:
Likely related people:
What the crustacean ranks mean
Shiny media proof means a screenshot, video, or linked artifact directly shows the changed behavior. Runtime, network, CSP, and security claims still need visible diagnostics. How this review workflow works
|
|
Addressed the review findings in
@clawsweeper re-review |
|
🦞🧹 I asked ClawSweeper to review this item again. Re-review progress:
|
|
@clawsweeper re-review |
|
🦞👀 Command router queued. I will update this comment with the next step. Re-review progress:
|
c526f59 to
054fbc0
Compare
|
🦞🧹 I asked ClawSweeper to review this item again. Re-review progress:
|
|
Apologies for the extra re-review trigger — I requested it after the earlier review run failed, not realizing the branch was being actively pushed to at the same time. I'll hold off on further review requests here while the branch is moving. |
dbeeb70 to
70e711e
Compare
70e711e to
6ef936f
Compare
Co-authored-by: pickaxe <54486432+ProspectOre@users.noreply.github.com>
6ef936f to
4de852a
Compare
|
Maintainer rebase/fixup pushed at Verification on the rebased head:
Fresh CI is required before merge. |
|
Superseded by the smaller measured fix in #1430, landed as dd8cf8b. Exact-head profiling on the same 60,607-file real archive showed the priority SQLite scan was about 4% of the expired-refresh cost; repeated Foundation metadata/resource-identifier reads and cached-path validation were dominant. The landed replacement removes that cost with +124/-24 lines, no persisted cursor/memo state, and measured a 6.74s warm expired refresh versus 20.22-37.43s across the stacked candidates. Thanks @ProspectOre for the investigation, implementation work, and benchmark evidence. Contributor credit is retained in the landed commit and changelog. |
Summary
Resolve codex priority-turn metadata incrementally per refresh: accumulate the trace-database query result in process memory and only examine rows appended since the last call, instead of re-running the full-table double-
LIKEscan on every refresh past the scan interval.Context
This is the dominant residual warm-refresh cost identified in #1392's follow-up measurements.
makeCodexRefreshPlan → codexPriorityTurnsrunsagainst the codex CLI's
logs_2.sqliteon every refresh oncerefreshMinIntervalSecondshas elapsed. The leading-%LIKEs force examining every row body in the window, so the cost grows with the database forever. On this machine the database is 762 MB / 322k rows and the query costs tens of seconds per tick; #1392's reporter measured 26–51 s per CLI refresh on a larger archive —sampleputs essentially all of that time insidecodexPriorityTurns → sqlite3_step.Change
logstable is append-only withINTEGER PRIMARY KEY AUTOINCREMENT, so rowids are monotonic and never reused. A per-database in-process memo keeps the accumulated turns plus a rowid cursor; each refresh runs the same filters withrowid > ?(satisfied by the integer PK, so it touches only new rows).max(rowid)shrinking (prune/replace), or the requested window expanding earlier than the accumulated coverage.Validation
swift test --filter CostUsageScannerCodexPriorityTests— 11 tests (3 new: incremental append pickup incl. late completed-model upgrade, window-expansion rescan, database-replacement rescan).swift test --filter CostUsage— 184 tests in 17 suites pass, suite wall time unchanged (the bounded historical path exists precisely because an open-ended first scan regressed a real-database test path 150× during development; that regression is fixed and the suite runs in ~4 s).make check— 0 violations;git diff --checkclean.Runtime Proof
Timing harness (3 consecutive calls for the live 30-day window against the real 762 MB / 322k-row
logs_2.sqlite, debug build, warm page cache; harness below, not committed). Result counts were identical in every run on both branches:Steady-state per-tick cost drops from tens of seconds to ~1 ms, with byte-identical query filters and equal results. (Absolute first-call times vary with page cache state; a cold-cache first call measured 79 s, equally one-time.)
Timing harness (drop into Tests/CodexBarTests to reproduce)
Honesty / scope notes
tsbound, so completed-model upgrades arriving after a turn's request row are applied even across ticks — marginally more complete attribution than the per-call query, identical for the standard advancing window.