Skip to content

Resolve codex priority turns incrementally per refresh#1404

Merged
steipete merged 6 commits into
steipete:mainfrom
ProspectOre:perf/codex-priority-turns-memo
Jun 11, 2026
Merged

Resolve codex priority turns incrementally per refresh#1404
steipete merged 6 commits into
steipete:mainfrom
ProspectOre:perf/codex-priority-turns-memo

Conversation

@ProspectOre

@ProspectOre ProspectOre commented Jun 10, 2026

Copy link
Copy Markdown
Contributor

Summary

Resolve codex priority-turn metadata incrementally per refresh: accumulate the trace-database query result in process memory and only examine rows appended since the last call, instead of re-running the full-table double-LIKE scan on every refresh past the scan interval.

Context

This is the dominant residual warm-refresh cost identified in #1392's follow-up measurements. makeCodexRefreshPlan → codexPriorityTurns runs

select ts, feedback_log_body from logs
where ts >= ? and ts < ?
  and (feedback_log_body like '%websocket request:%' or feedback_log_body like '%response.completed%')

against the codex CLI's logs_2.sqlite on every refresh once refreshMinIntervalSeconds has elapsed. The leading-% LIKEs force examining every row body in the window, so the cost grows with the database forever. On this machine the database is 762 MB / 322k rows and the query costs tens of seconds per tick; #1392's reporter measured 26–51 s per CLI refresh on a larger archive — sample puts essentially all of that time inside codexPriorityTurns → sqlite3_step.

Change

  • The logs table is append-only with INTEGER PRIMARY KEY AUTOINCREMENT, so rowids are monotonic and never reused. A per-database in-process memo keeps the accumulated turns plus a rowid cursor; each refresh runs the same filters with rowid > ? (satisfied by the integer PK, so it touches only new rows).
  • Full rescan triggers: database file identity change (inode), max(rowid) shrinking (prune/replace), or the requested window expanding earlier than the accumulated coverage.
  • Windows that end before today keep the original bounded one-shot query, so historical lookups never pay an open-ended scan and never populate the memo.
  • The memo is process-local only — no cache schema change. The scanner source hash regeneration is included (one-time cache rebuild on update, which runs serialized off the cooperative pool after Run cost-usage corpus scans off the Swift cooperative thread pool #1402).

Validation

  • swift test --filter CostUsageScannerCodexPriorityTests — 11 tests (3 new: incremental append pickup incl. late completed-model upgrade, window-expansion rescan, database-replacement rescan).
  • swift test --filter CostUsage — 184 tests in 17 suites pass, suite wall time unchanged (the bounded historical path exists precisely because an open-ended first scan regressed a real-database test path 150× during development; that regression is fixed and the suite runs in ~4 s).
  • make check — 0 violations; git diff --check clean.

Runtime Proof

Timing harness (3 consecutive calls for the live 30-day window against the real 762 MB / 322k-row logs_2.sqlite, debug build, warm page cache; harness below, not committed). Result counts were identical in every run on both branches:

# current main (cde92cfb): every call pays the full scan
TIMING_RESULT runs=[34.83 s, 22.38 s, 24.60 s] counts=[5, 5, 5]

# this branch: one full scan per process, then the rowid cursor
TIMING_RESULT runs=[8.51 s, 0.0015 s, 0.0006 s] counts=[5, 5, 5]

Steady-state per-tick cost drops from tens of seconds to ~1 ms, with byte-identical query filters and equal results. (Absolute first-call times vary with page cache state; a cold-cache first call measured 79 s, equally one-time.)

Timing harness (drop into Tests/CodexBarTests to reproduce)
import Foundation
import Testing
@testable import CodexBarCore

struct PriorityTurnsTimingHarness {
    @Test
    func `priority turns timing harness real db`() throws {
        let db = FileManager.default.homeDirectoryForCurrentUser
            .appendingPathComponent(".codex/logs_2.sqlite")
        try #require(FileManager.default.fileExists(atPath: db.path))
        let now = Date()
        let today = CostUsageScanner.CostUsageDayRange.dayKey(from: now)
        let since = CostUsageScanner.CostUsageDayRange.dayKey(
            from: Calendar.current.date(byAdding: .day, value: -29, to: now)!)
        let clock = ContinuousClock()
        var counts: [Int] = []
        var durations: [Duration] = []
        for _ in 0..<3 {
            let duration = clock.measure {
                counts.append(CostUsageScanner.codexPriorityTurns(
                    databaseURL: db,
                    sinceDayKey: since,
                    untilDayKey: today).count)
            }
            durations.append(duration)
        }
        print("TIMING_RESULT runs=\(durations) counts=\(counts)")
        #expect(counts[0] == counts[1])
        #expect(counts[1] == counts[2])
    }
}

Honesty / scope notes

  • The memo accumulates rows from the coverage start without an upper ts bound, so completed-model upgrades arriving after a turn's request row are applied even across ticks — marginally more complete attribution than the per-call query, identical for the standard advancing window.
  • The first scan per process (app launch / each CLI invocation) still pays the full query; the CLI therefore does not benefit. Persisting the memo would need a cache schema change and is left out deliberately.
  • The memo grows with the number of priority turns in the coverage window (small in practice; entries are turn-metadata only, no log bodies).

@ProspectOre

Copy link
Copy Markdown
Contributor Author

@clawsweeper re-review

@clawsweeper

clawsweeper Bot commented Jun 10, 2026

Copy link
Copy Markdown

🦞🧹
ClawSweeper re-review requested.

I asked ClawSweeper to review this item again.
Action: item re-review queued (workflow sweep.yml, event repository_dispatch).
Result: the existing ClawSweeper review comment will be edited in place when the review finishes.

@ProspectOre

Copy link
Copy Markdown
Contributor Author

@clawsweeper re-run

@clawsweeper

clawsweeper Bot commented Jun 10, 2026

Copy link
Copy Markdown

🦞👀
ClawSweeper picked this up.

Command router queued. I will update this comment with the next step.

@clawsweeper

clawsweeper Bot commented Jun 11, 2026

Copy link
Copy Markdown

Codex review: needs maintainer review before merge. Reviewed June 11, 2026, 6:47 AM ET / 10:47 UTC.

Summary
The PR changes Codex priority-turn scanning to keep an in-process rowid cursor and accumulated metadata so warm refreshes scan appended trace rows instead of rescanning the SQLite log window.

Reproducibility: not applicable. as a PR review; the contributor supplied real timing proof against a large live logs_2.sqlite, and the source path for repeated priority-turn scans is clear from the diff and tests.

Review metrics: 2 noteworthy metrics.

  • Diff size: 4 files, +952/-47. The patch is a large stateful scanner change for a narrow performance path, so maintainer sequencing matters before merge.
  • Changed surfaces: 1 scanner, 1 test file, 1 generated hash, 1 changelog. The implementation, regression coverage, cache producer key, and release notes all change together.

Merge readiness
Overall: 🐚 platinum hermit
Proof: 🦞 diamond lobster
Patch quality: 🐚 platinum hermit
Result: ready for maintainer review.

Overall follows the weaker of proof and patch quality, so missing proof can cap an otherwise strong patch.

Rank-up moves:

  • Wait for fresh CI on head 4de852ad and explicit maintainer acceptance of the cache/parser-hash release behavior.

Risk before merge

  • [P1] Merging this patch changes the Codex parser hash and can force existing cost-usage cache invalidation or rebuild work on upgrade.
  • [P1] The new memo becomes live process state for priority-turn attribution; the edge cases are tested, but it still adds replacement, prune, overlap, and retention behavior that CI cannot fully validate against real long-running trace databases.
  • [P1] Current main already includes perf: reduce Codex cost refresh metadata work #1430 for the broader cost-refresh bottleneck, so maintainers need to decide whether the extra stateful priority-turn optimization is still worth the release risk.

Maintainer options:

  1. Merge After Fresh CI And Sequencing Sign-Off (recommended)
    Accept the stateful memo if maintainers want the extra warm-refresh win and are comfortable shipping the parser-hash cache rebuild behavior in this release train.
  2. Drop Or Defer The Memo Stack
    Close or pause this PR if perf: reduce Codex cost refresh metadata work #1430 is considered the intended low-complexity solution for the large-history refresh problem.

Next step before merge

  • [P2] The remaining action is maintainer sequencing and release-risk acceptance, not an automated code repair.

Security
Cleared: The diff touches local SQLite scanning, tests, a generated hash, and changelog text; I found no concrete security or supply-chain regression.

Review details

Best possible solution:

Land this only if maintainers intentionally want the additional warm priority-turn optimization after the merged file-stat fix, with fresh CI and release-risk acceptance for the parser-hash/cache behavior.

Do we have a high-confidence way to reproduce the issue?

Not applicable as a PR review; the contributor supplied real timing proof against a large live logs_2.sqlite, and the source path for repeated priority-turn scans is clear from the diff and tests.

Is this the best way to solve the issue?

Unclear; the PR is a focused implementation for the priority-turn subproblem, but current main already merged a lower-complexity broader cost-refresh optimization, so whether this remains the best solution is a maintainer sequencing decision.

AGENTS.md: found and applied where relevant.

Codex review notes: model internal, reasoning high; reviewed against dd8cf8b06ebb.

Label changes

Label justifications:

  • P2: This is a normal-priority performance improvement with meaningful but bounded user impact and no emergency signal.
  • merge-risk: 🚨 compatibility: The parser hash changes the cache producer key, which can affect existing users on upgrade.
  • merge-risk: 🚨 session-state: The PR introduces process-local accumulated attribution state for Codex priority turns.
  • merge-risk: 🚨 availability: The change targets long refresh stalls but still leaves first-scan and large-database behavior as release-risk surfaces.
  • rating: 🐚 platinum hermit: Overall readiness is 🐚 platinum hermit; proof is 🦞 diamond lobster and patch quality is 🐚 platinum hermit.
  • status: 👀 ready for maintainer look: ClawSweeper has no concrete contributor-facing blocker left for this PR. Sufficient (terminal): The PR body includes after-change live timing output from a real 762 MB Codex trace database showing equal result counts and warm refreshes dropping to millisecond-scale scans.
  • proof: sufficient: Contributor real behavior proof is sufficient. The PR body includes after-change live timing output from a real 762 MB Codex trace database showing equal result counts and warm refreshes dropping to millisecond-scale scans.
Evidence reviewed

What I checked:

Likely related people:

  • Peter Steinberger: Authored current main's merged cost-refresh metadata optimization and the rebased PR-head commits touching the same scanner/cache path. (role: recent area contributor; confidence: high; commits: dd8cf8b06ebb, 4de852ad2b31, 2b86a120756e; files: Sources/CodexBarCore/Vendored/CostUsage/CostUsageScanner+CodexPriority.swift, Sources/CodexBarCore/Vendored/CostUsage/CostUsageScanner.swift, Sources/CodexBarCore/Generated/CodexParserHash.generated.swift)
  • ProspectOre: Credited as co-author on the current main optimization and the PR's main implementation commit, and supplied the PR proof plus follow-up fixes in discussion. (role: co-author / domain contributor; confidence: medium; commits: dd8cf8b06ebb, 2b86a120756e; files: Sources/CodexBarCore/Vendored/CostUsage/CostUsageScanner+CodexPriority.swift, Tests/CodexBarTests/CostUsageScannerCodexPriorityTests.swift)
What the crustacean ranks mean
  • 🦀 challenger crab: rare, exceptional readiness with strong proof, clean implementation, and convincing validation.
  • 🦞 diamond lobster: very strong readiness with only minor maintainer review expected.
  • 🐚 platinum hermit: good normal PR, likely mergeable with ordinary maintainer review.
  • 🦐 gold shrimp: useful signal, but proof or patch confidence is still limited.
  • 🦪 silver shellfish: thin signal; proof, validation, or implementation needs work.
  • 🧂 unranked krab: not merge-ready because proof is missing/unusable or there are serious correctness or safety concerns.
  • 🌊 off-meta tidepool: rating does not apply to this item.

Shiny media proof means a screenshot, video, or linked artifact directly shows the changed behavior. Runtime, network, CSP, and security claims still need visible diagnostics.

How this review workflow works
  • ClawSweeper keeps one durable marker-backed review comment per issue or PR.
  • Re-runs edit this comment so the latest verdict, findings, and automation markers stay together instead of adding duplicate bot comments.
  • A fresh review can be triggered by eligible @clawsweeper re-review comments, exact-item GitHub events, scheduled/background review runs, or manual workflow dispatch.
  • PR/issue authors and users with repository write access can comment @clawsweeper re-review or @clawsweeper re-run on an open PR or issue to request a fresh review only.
  • Maintainers can also comment @clawsweeper review to request a fresh review only.
  • Fresh-review commands do not start repair, autofix, rebase, CI repair, or automerge.
  • Maintainer-only repair and merge flows require explicit commands such as @clawsweeper autofix, @clawsweeper automerge, @clawsweeper fix ci, or @clawsweeper address review.
  • Maintainers can comment @clawsweeper explain to ask for more context, or @clawsweeper stop to stop active automation.

@clawsweeper clawsweeper Bot added proof: sufficient Contributor real behavior proof is sufficient. rating: 🦐 gold shrimp Decent PR readiness signal, but merge confidence is limited. status: ⏳ waiting on author ClawSweeper has contributor-facing work open and is waiting for author action. P2 Normal priority bug or improvement with limited blast radius. merge-risk: 🚨 session-state 🚨 Merging this PR could lose, corrupt, stale, or mis-associate session or agent state. labels Jun 11, 2026
@ProspectOre

Copy link
Copy Markdown
Contributor Author

Addressed the review findings in edbca5e0:

  • [P2] Overlapping memo writeback: writeback now goes through storeCodexPriorityTurnsMemoIfNewer, which discards a snapshot when the stored state already dominates it (same file identity, coverage starting at least as early, cursor at least as far). A slower writer with an older cursor can no longer replace newer accumulated state; non-dominated writers overwrite and converge through the existing rescan checks on the next refresh.
  • Regression test: overlapping refresh writeback cannot replace newer memo state deterministically replays the race — a stale snapshot (older cursor, missing turn) is rejected, while a newer-cursor snapshot and an expanded-coverage snapshot still replace. 12/12 tests in CostUsageScannerCodexPriorityTests pass.
  • CHANGELOG: removed the release-owned edit.
  • Also fixed the Linux CLI build from the previous push (c8597c2b): the unconditional import os (for OSAllocatedUnfairLock, which only exists inside the canImport(SQLite3) blocks) is now guarded with canImport(os). Parser hash regenerated.

@clawsweeper re-review

@clawsweeper

clawsweeper Bot commented Jun 11, 2026

Copy link
Copy Markdown

🦞🧹
ClawSweeper re-review requested.

I asked ClawSweeper to review this item again.
Action: item re-review queued (workflow sweep.yml, event repository_dispatch).
Result: the existing ClawSweeper review comment will be edited in place when the review finishes.

Re-review progress:

@clawsweeper clawsweeper Bot added rating: 🌊 off-meta tidepool PR readiness rating does not apply to this item. and removed proof: sufficient Contributor real behavior proof is sufficient. rating: 🦐 gold shrimp Decent PR readiness signal, but merge confidence is limited. status: ⏳ waiting on author ClawSweeper has contributor-facing work open and is waiting for author action. labels Jun 11, 2026
@ProspectOre

Copy link
Copy Markdown
Contributor Author

@clawsweeper re-review

@clawsweeper

clawsweeper Bot commented Jun 11, 2026

Copy link
Copy Markdown

🦞👀
ClawSweeper picked this up.

Command router queued. I will update this comment with the next step.

Re-review progress:

@clawsweeper clawsweeper Bot added proof: sufficient Contributor real behavior proof is sufficient. rating: 🐚 platinum hermit Good normal PR readiness with ordinary maintainer review expected. status: 👀 ready for maintainer look ClawSweeper has no concrete contributor-facing blocker left for this PR. rating: 🦐 gold shrimp Decent PR readiness signal, but merge confidence is limited. status: ⏳ waiting on author ClawSweeper has contributor-facing work open and is waiting for author action. merge-risk: 🚨 availability 🚨 Merging this PR could cause crashes, hangs, restart loops, stalls, or process outages. and removed rating: 🌊 off-meta tidepool PR readiness rating does not apply to this item. rating: 🐚 platinum hermit Good normal PR readiness with ordinary maintainer review expected. status: 👀 ready for maintainer look ClawSweeper has no concrete contributor-facing blocker left for this PR. proof: sufficient Contributor real behavior proof is sufficient. labels Jun 11, 2026
@clawsweeper clawsweeper Bot added status: 👀 ready for maintainer look ClawSweeper has no concrete contributor-facing blocker left for this PR. rating: 🌊 off-meta tidepool PR readiness rating does not apply to this item. and removed proof: sufficient Contributor real behavior proof is sufficient. rating: 🐚 platinum hermit Good normal PR readiness with ordinary maintainer review expected. status: 👀 ready for maintainer look ClawSweeper has no concrete contributor-facing blocker left for this PR. labels Jun 11, 2026
@steipete steipete force-pushed the perf/codex-priority-turns-memo branch from c526f59 to 054fbc0 Compare June 11, 2026 07:39
@clawsweeper

clawsweeper Bot commented Jun 11, 2026

Copy link
Copy Markdown

🦞🧹
ClawSweeper re-review requested.

I asked ClawSweeper to review this item again.
Action: item re-review queued (workflow sweep.yml, event repository_dispatch).
Result: the existing ClawSweeper review comment will be edited in place when the review finishes.

Re-review progress:

@ProspectOre

Copy link
Copy Markdown
Contributor Author

Apologies for the extra re-review trigger — I requested it after the earlier review run failed, not realizing the branch was being actively pushed to at the same time. I'll hold off on further review requests here while the branch is moving.

@steipete steipete force-pushed the perf/codex-priority-turns-memo branch from dbeeb70 to 70e711e Compare June 11, 2026 08:56
@clawsweeper clawsweeper Bot added proof: sufficient Contributor real behavior proof is sufficient. rating: 🐚 platinum hermit Good normal PR readiness with ordinary maintainer review expected. status: 👀 ready for maintainer look ClawSweeper has no concrete contributor-facing blocker left for this PR. merge-risk: 🚨 compatibility 🚨 Merging this PR could break existing users, config, migrations, defaults, or upgrades. and removed rating: 🌊 off-meta tidepool PR readiness rating does not apply to this item. labels Jun 11, 2026
@steipete steipete force-pushed the perf/codex-priority-turns-memo branch from 70e711e to 6ef936f Compare June 11, 2026 09:31
@steipete steipete force-pushed the perf/codex-priority-turns-memo branch from 6ef936f to 4de852a Compare June 11, 2026 10:40
@steipete

Copy link
Copy Markdown
Owner

Maintainer rebase/fixup pushed at 4de852ad on top of current main (dd8cf8b0). The only source conflict was the generated parser hash after #1430; it was regenerated from the combined source.

Verification on the rebased head:

  • swift test --filter CostUsageScannerCodexPriorityTests — 19 tests passed.
  • swift test --filter CostUsage — 196 tests in 18 suites passed.
  • make check — parser hash current; SwiftFormat clean; SwiftLint 0 violations across 1,049 files.
  • git diff --check — clean.
  • Local $autoreview against origin/main — no actionable findings; patch correct at 0.78 confidence.

Fresh CI is required before merge.

@steipete

Copy link
Copy Markdown
Owner

Superseded by the smaller measured fix in #1430, landed as dd8cf8b.

Exact-head profiling on the same 60,607-file real archive showed the priority SQLite scan was about 4% of the expired-refresh cost; repeated Foundation metadata/resource-identifier reads and cached-path validation were dominant. The landed replacement removes that cost with +124/-24 lines, no persisted cursor/memo state, and measured a 6.74s warm expired refresh versus 20.22-37.43s across the stacked candidates.

Thanks @ProspectOre for the investigation, implementation work, and benchmark evidence. Contributor credit is retained in the landed commit and changelog.

@steipete steipete closed this Jun 11, 2026
@steipete steipete merged commit 71b93e5 into steipete:main Jun 11, 2026
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

merge-risk: 🚨 availability 🚨 Merging this PR could cause crashes, hangs, restart loops, stalls, or process outages. merge-risk: 🚨 compatibility 🚨 Merging this PR could break existing users, config, migrations, defaults, or upgrades. merge-risk: 🚨 session-state 🚨 Merging this PR could lose, corrupt, stale, or mis-associate session or agent state. P2 Normal priority bug or improvement with limited blast radius. proof: sufficient Contributor real behavior proof is sufficient. rating: 🐚 platinum hermit Good normal PR readiness with ordinary maintainer review expected. status: 👀 ready for maintainer look ClawSweeper has no concrete contributor-facing blocker left for this PR.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants