Resolve codex priority turns incrementally per refresh by ProspectOre · Pull Request #1404 · steipete/CodexBar

ProspectOre · 2026-06-10T21:09:43Z

Summary

Resolve codex priority-turn metadata incrementally per refresh: accumulate the trace-database query result in process memory and only examine rows appended since the last call, instead of re-running the full-table double-LIKE scan on every refresh past the scan interval.

Context

This is the dominant residual warm-refresh cost identified in #1392's follow-up measurements. makeCodexRefreshPlan → codexPriorityTurns runs

select ts, feedback_log_body from logs
where ts >= ? and ts < ?
  and (feedback_log_body like '%websocket request:%' or feedback_log_body like '%response.completed%')

against the codex CLI's logs_2.sqlite on every refresh once refreshMinIntervalSeconds has elapsed. The leading-% LIKEs force examining every row body in the window, so the cost grows with the database forever. On this machine the database is 762 MB / 322k rows and the query costs tens of seconds per tick; #1392's reporter measured 26–51 s per CLI refresh on a larger archive — sample puts essentially all of that time inside codexPriorityTurns → sqlite3_step.

Change

The logs table is append-only with INTEGER PRIMARY KEY AUTOINCREMENT, so rowids are monotonic and never reused. A per-database in-process memo keeps the accumulated turns plus a rowid cursor; each refresh runs the same filters with rowid > ? (satisfied by the integer PK, so it touches only new rows).
Full rescan triggers: database file identity change (inode), max(rowid) shrinking (prune/replace), or the requested window expanding earlier than the accumulated coverage.
Windows that end before today keep the original bounded one-shot query, so historical lookups never pay an open-ended scan and never populate the memo.
The memo is process-local only — no cache schema change. The scanner source hash regeneration is included (one-time cache rebuild on update, which runs serialized off the cooperative pool after Run cost-usage corpus scans off the Swift cooperative thread pool #1402).

Validation

swift test --filter CostUsageScannerCodexPriorityTests — 11 tests (3 new: incremental append pickup incl. late completed-model upgrade, window-expansion rescan, database-replacement rescan).
swift test --filter CostUsage — 184 tests in 17 suites pass, suite wall time unchanged (the bounded historical path exists precisely because an open-ended first scan regressed a real-database test path 150× during development; that regression is fixed and the suite runs in ~4 s).
make check — 0 violations; git diff --check clean.

Runtime Proof

Timing harness (3 consecutive calls for the live 30-day window against the real 762 MB / 322k-row logs_2.sqlite, debug build, warm page cache; harness below, not committed). Result counts were identical in every run on both branches:

# current main (cde92cfb): every call pays the full scan
TIMING_RESULT runs=[34.83 s, 22.38 s, 24.60 s] counts=[5, 5, 5]

# this branch: one full scan per process, then the rowid cursor
TIMING_RESULT runs=[8.51 s, 0.0015 s, 0.0006 s] counts=[5, 5, 5]

Steady-state per-tick cost drops from tens of seconds to ~1 ms, with byte-identical query filters and equal results. (Absolute first-call times vary with page cache state; a cold-cache first call measured 79 s, equally one-time.)

Timing harness (drop into Tests/CodexBarTests to reproduce)

import Foundation
import Testing
@testable import CodexBarCore

struct PriorityTurnsTimingHarness {
    @Test
    func `priority turns timing harness real db`() throws {
        let db = FileManager.default.homeDirectoryForCurrentUser
            .appendingPathComponent(".codex/logs_2.sqlite")
        try #require(FileManager.default.fileExists(atPath: db.path))
        let now = Date()
        let today = CostUsageScanner.CostUsageDayRange.dayKey(from: now)
        let since = CostUsageScanner.CostUsageDayRange.dayKey(
            from: Calendar.current.date(byAdding: .day, value: -29, to: now)!)
        let clock = ContinuousClock()
        var counts: [Int] = []
        var durations: [Duration] = []
        for _ in 0..<3 {
            let duration = clock.measure {
                counts.append(CostUsageScanner.codexPriorityTurns(
                    databaseURL: db,
                    sinceDayKey: since,
                    untilDayKey: today).count)
            }
            durations.append(duration)
        }
        print("TIMING_RESULT runs=\(durations) counts=\(counts)")
        #expect(counts[0] == counts[1])
        #expect(counts[1] == counts[2])
    }
}

Honesty / scope notes

The memo accumulates rows from the coverage start without an upper ts bound, so completed-model upgrades arriving after a turn's request row are applied even across ticks — marginally more complete attribution than the per-call query, identical for the standard advancing window.
The first scan per process (app launch / each CLI invocation) still pays the full query; the CLI therefore does not benefit. Persisting the memo would need a cache schema change and is left out deliberately.
The memo grows with the number of priority turns in the coverage window (small in practice; entries are turn-metadata only, no log bodies).

ProspectOre · 2026-06-10T21:16:13Z

@clawsweeper re-review

clawsweeper · 2026-06-10T21:16:15Z

🦞🧹
ClawSweeper re-review requested.

I asked ClawSweeper to review this item again.
Action: item re-review queued (workflow sweep.yml, event repository_dispatch).
Result: the existing ClawSweeper review comment will be edited in place when the review finishes.

ProspectOre · 2026-06-10T21:52:37Z

@clawsweeper re-run

clawsweeper · 2026-06-10T21:52:39Z

🦞👀
ClawSweeper picked this up.

Command router queued. I will update this comment with the next step.

clawsweeper · 2026-06-11T03:55:53Z

Codex review: needs maintainer review before merge. Reviewed June 11, 2026, 6:47 AM ET / 10:47 UTC.

Summary
The PR changes Codex priority-turn scanning to keep an in-process rowid cursor and accumulated metadata so warm refreshes scan appended trace rows instead of rescanning the SQLite log window.

Reproducibility: not applicable. as a PR review; the contributor supplied real timing proof against a large live logs_2.sqlite, and the source path for repeated priority-turn scans is clear from the diff and tests.

Review metrics: 2 noteworthy metrics.

Diff size: 4 files, +952/-47. The patch is a large stateful scanner change for a narrow performance path, so maintainer sequencing matters before merge.
Changed surfaces: 1 scanner, 1 test file, 1 generated hash, 1 changelog. The implementation, regression coverage, cache producer key, and release notes all change together.

Merge readiness
Overall: 🐚 platinum hermit
Proof: 🦞 diamond lobster
Patch quality: 🐚 platinum hermit
Result: ready for maintainer review.

Overall follows the weaker of proof and patch quality, so missing proof can cap an otherwise strong patch.

Rank-up moves:

Wait for fresh CI on head 4de852ad and explicit maintainer acceptance of the cache/parser-hash release behavior.

Risk before merge

[P1] Merging this patch changes the Codex parser hash and can force existing cost-usage cache invalidation or rebuild work on upgrade.
[P1] The new memo becomes live process state for priority-turn attribution; the edge cases are tested, but it still adds replacement, prune, overlap, and retention behavior that CI cannot fully validate against real long-running trace databases.
[P1] Current main already includes perf: reduce Codex cost refresh metadata work #1430 for the broader cost-refresh bottleneck, so maintainers need to decide whether the extra stateful priority-turn optimization is still worth the release risk.

Maintainer options:

Merge After Fresh CI And Sequencing Sign-Off (recommended)
Accept the stateful memo if maintainers want the extra warm-refresh win and are comfortable shipping the parser-hash cache rebuild behavior in this release train.
Drop Or Defer The Memo Stack
Close or pause this PR if perf: reduce Codex cost refresh metadata work #1430 is considered the intended low-complexity solution for the large-history refresh problem.

Next step before merge

[P2] The remaining action is maintainer sequencing and release-risk acceptance, not an automated code repair.

Security
Cleared: The diff touches local SQLite scanning, tests, a generated hash, and changelog text; I found no concrete security or supply-chain regression.

Review details

Best possible solution:

Land this only if maintainers intentionally want the additional warm priority-turn optimization after the merged file-stat fix, with fresh CI and release-risk acceptance for the parser-hash/cache behavior.

Do we have a high-confidence way to reproduce the issue?

Not applicable as a PR review; the contributor supplied real timing proof against a large live logs_2.sqlite, and the source path for repeated priority-turn scans is clear from the diff and tests.

Is this the best way to solve the issue?

Unclear; the PR is a focused implementation for the priority-turn subproblem, but current main already merged a lower-complexity broader cost-refresh optimization, so whether this remains the best solution is a maintainer sequencing decision.

AGENTS.md: found and applied where relevant.

Codex review notes: model internal, reasoning high; reviewed against dd8cf8b06ebb.

Label changes

Label justifications:

P2: This is a normal-priority performance improvement with meaningful but bounded user impact and no emergency signal.
merge-risk: 🚨 compatibility: The parser hash changes the cache producer key, which can affect existing users on upgrade.
merge-risk: 🚨 session-state: The PR introduces process-local accumulated attribution state for Codex priority turns.
merge-risk: 🚨 availability: The change targets long refresh stalls but still leaves first-scan and large-database behavior as release-risk surfaces.
rating: 🐚 platinum hermit: Overall readiness is 🐚 platinum hermit; proof is 🦞 diamond lobster and patch quality is 🐚 platinum hermit.
status: 👀 ready for maintainer look: ClawSweeper has no concrete contributor-facing blocker left for this PR. Sufficient (terminal): The PR body includes after-change live timing output from a real 762 MB Codex trace database showing equal result counts and warm refreshes dropping to millisecond-scale scans.
proof: sufficient: Contributor real behavior proof is sufficient. The PR body includes after-change live timing output from a real 762 MB Codex trace database showing equal result counts and warm refreshes dropping to millisecond-scale scans.

Evidence reviewed

What I checked:

Repository policy read: The target AGENTS.md was read fully; its focused-test and release-owned changelog guidance informed this review. (AGENTS.md:1, dd8cf8b06ebb)
PR diff surface: The PR head changes 4 files with 952 insertions and 47 deletions on top of current main. (4de852ad2b31)
Incremental memo implementation: The PR adds CodexPriorityTurnsMemoState, locked memo storage, rowid accumulation, replacement/prune checks, and bounded pending completion retention. (Sources/CodexBarCore/Vendored/CostUsage/CostUsageScanner+CodexPriority.swift:27, 4de852ad2b31)
Scanner query path: The accumulation plan uses the timestamp index for bounded cold scans and rowid cursor queries for warm scans. (Sources/CodexBarCore/Vendored/CostUsage/CostUsageScanner+CodexPriority.swift:495, 4de852ad2b31)
Focused regression coverage: The PR adds coverage for incremental append pickup, window expansion, database replacement, overlapping writeback, pruning, and bounded completion metadata. (Tests/CodexBarTests/CostUsageScannerCodexPriorityTests.swift:225, 4de852ad2b31)
Generated and release files: The PR updates the generated Codex parser hash and adds an unreleased changelog entry, which affects release/cache sequencing. (Sources/CodexBarCore/Generated/CodexParserHash.generated.swift:4, 4de852ad2b31)

Likely related people:

Peter Steinberger: Authored current main's merged cost-refresh metadata optimization and the rebased PR-head commits touching the same scanner/cache path. (role: recent area contributor; confidence: high; commits: dd8cf8b06ebb, 4de852ad2b31, 2b86a120756e; files: Sources/CodexBarCore/Vendored/CostUsage/CostUsageScanner+CodexPriority.swift, Sources/CodexBarCore/Vendored/CostUsage/CostUsageScanner.swift, Sources/CodexBarCore/Generated/CodexParserHash.generated.swift)
ProspectOre: Credited as co-author on the current main optimization and the PR's main implementation commit, and supplied the PR proof plus follow-up fixes in discussion. (role: co-author / domain contributor; confidence: medium; commits: dd8cf8b06ebb, 2b86a120756e; files: Sources/CodexBarCore/Vendored/CostUsage/CostUsageScanner+CodexPriority.swift, Tests/CodexBarTests/CostUsageScannerCodexPriorityTests.swift)

What the crustacean ranks mean

🦀 challenger crab: rare, exceptional readiness with strong proof, clean implementation, and convincing validation.
🦞 diamond lobster: very strong readiness with only minor maintainer review expected.
🐚 platinum hermit: good normal PR, likely mergeable with ordinary maintainer review.
🦐 gold shrimp: useful signal, but proof or patch confidence is still limited.
🦪 silver shellfish: thin signal; proof, validation, or implementation needs work.
🧂 unranked krab: not merge-ready because proof is missing/unusable or there are serious correctness or safety concerns.
🌊 off-meta tidepool: rating does not apply to this item.

Shiny media proof means a screenshot, video, or linked artifact directly shows the changed behavior. Runtime, network, CSP, and security claims still need visible diagnostics.

How this review workflow works

ClawSweeper keeps one durable marker-backed review comment per issue or PR.
Re-runs edit this comment so the latest verdict, findings, and automation markers stay together instead of adding duplicate bot comments.
A fresh review can be triggered by eligible @clawsweeper re-review comments, exact-item GitHub events, scheduled/background review runs, or manual workflow dispatch.
PR/issue authors and users with repository write access can comment @clawsweeper re-review or @clawsweeper re-run on an open PR or issue to request a fresh review only.
Maintainers can also comment @clawsweeper review to request a fresh review only.
Fresh-review commands do not start repair, autofix, rebase, CI repair, or automerge.
Maintainer-only repair and merge flows require explicit commands such as @clawsweeper autofix, @clawsweeper automerge, @clawsweeper fix ci, or @clawsweeper address review.
Maintainers can comment @clawsweeper explain to ask for more context, or @clawsweeper stop to stop active automation.

ProspectOre · 2026-06-11T04:09:26Z

Addressed the review findings in edbca5e0:

[P2] Overlapping memo writeback: writeback now goes through storeCodexPriorityTurnsMemoIfNewer, which discards a snapshot when the stored state already dominates it (same file identity, coverage starting at least as early, cursor at least as far). A slower writer with an older cursor can no longer replace newer accumulated state; non-dominated writers overwrite and converge through the existing rescan checks on the next refresh.
Regression test: overlapping refresh writeback cannot replace newer memo state deterministically replays the race — a stale snapshot (older cursor, missing turn) is rejected, while a newer-cursor snapshot and an expanded-coverage snapshot still replace. 12/12 tests in CostUsageScannerCodexPriorityTests pass.
CHANGELOG: removed the release-owned edit.
Also fixed the Linux CLI build from the previous push (c8597c2b): the unconditional import os (for OSAllocatedUnfairLock, which only exists inside the canImport(SQLite3) blocks) is now guarded with canImport(os). Parser hash regenerated.

@clawsweeper re-review

clawsweeper · 2026-06-11T04:09:28Z

🦞🧹
ClawSweeper re-review requested.

I asked ClawSweeper to review this item again.
Action: item re-review queued (workflow sweep.yml, event repository_dispatch).
Result: the existing ClawSweeper review comment will be edited in place when the review finishes.

Re-review progress:

State: Superseded
Detail: A newer re-review for this item started before this run finished, so GitHub cancelled this older run. Check the latest ClawSweeper run for the current result.
Run: https://github.com/openclaw/clawsweeper/actions/runs/27323829461
Updated: 2026-06-11T04:31:09.687Z

ProspectOre · 2026-06-11T04:16:39Z

@clawsweeper re-review

clawsweeper · 2026-06-11T04:16:42Z

🦞👀
ClawSweeper picked this up.

Command router queued. I will update this comment with the next step.

Re-review progress:

State: Complete
Detail: The targeted re-review finished, the durable review comment was updated, and the synced verdict was routed.
Run: https://github.com/openclaw/clawsweeper/actions/runs/27323465843
Updated: 2026-06-11T04:21:14.810Z

clawsweeper · 2026-06-11T07:43:03Z

🦞🧹
ClawSweeper re-review requested.

I asked ClawSweeper to review this item again.
Action: item re-review queued (workflow sweep.yml, event repository_dispatch).
Result: the existing ClawSweeper review comment will be edited in place when the review finishes.

Re-review progress:

State: Failed
Detail: The targeted re-review did not finish cleanly. Check the workflow run for details.
Run: https://github.com/openclaw/clawsweeper/actions/runs/27331819611
Updated: 2026-06-11T07:46:22.707Z

ProspectOre · 2026-06-11T07:48:31Z

Apologies for the extra re-review trigger — I requested it after the earlier review run failed, not realizing the branch was being actively pushed to at the same time. I'll hold off on further review requests here while the branch is moving.

Co-authored-by: pickaxe <54486432+ProspectOre@users.noreply.github.com>

steipete · 2026-06-11T10:41:04Z

Maintainer rebase/fixup pushed at 4de852ad on top of current main (dd8cf8b0). The only source conflict was the generated parser hash after #1430; it was regenerated from the combined source.

Verification on the rebased head:

swift test --filter CostUsageScannerCodexPriorityTests — 19 tests passed.
swift test --filter CostUsage — 196 tests in 18 suites passed.
make check — parser hash current; SwiftFormat clean; SwiftLint 0 violations across 1,049 files.
git diff --check — clean.
Local $autoreview against origin/main — no actionable findings; patch correct at 0.78 confidence.

Fresh CI is required before merge.

steipete · 2026-06-11T11:04:53Z

Superseded by the smaller measured fix in #1430, landed as dd8cf8b.

Exact-head profiling on the same 60,607-file real archive showed the priority SQLite scan was about 4% of the expired-refresh cost; repeated Foundation metadata/resource-identifier reads and cached-path validation were dominant. The landed replacement removes that cost with +124/-24 lines, no persisted cursor/memo state, and measured a 6.74s warm expired refresh versus 20.22-37.43s across the stacked candidates.

Thanks @ProspectOre for the investigation, implementation work, and benchmark evidence. Contributor credit is retained in the landed commit and changelog.

ProspectOre mentioned this pull request Jun 10, 2026

Merge Icons mode causes system-wide input freezes / beachballs on macOS 26 — WindowServer event-buffer overflow evidence (not an in-process hang) #1399

Closed

ProspectOre force-pushed the perf/codex-priority-turns-memo branch from 1b5f33f to 9688235 Compare June 11, 2026 03:54

This was referenced Jun 11, 2026

Persist the codex priority-turns memo across launches #1421

Closed

Split the codex parser hash into session and priority-turns domains #1422

Closed

Add CI regression gates for cost-usage scan storms #1423

Closed

steipete force-pushed the perf/codex-priority-turns-memo branch from c526f59 to 054fbc0 Compare June 11, 2026 07:39

steipete force-pushed the perf/codex-priority-turns-memo branch from dbeeb70 to 70e711e Compare June 11, 2026 08:56

steipete force-pushed the perf/codex-priority-turns-memo branch from 70e711e to 6ef936f Compare June 11, 2026 09:31

steipete mentioned this pull request Jun 11, 2026

perf: reduce Codex cost refresh metadata work #1430

Merged

steipete and others added 6 commits June 11, 2026 11:37

fix: speed up Codex priority history scans

2b86a12

Co-authored-by: pickaxe <54486432+ProspectOre@users.noreply.github.com>

perf: amortize Codex priority memo eviction

745348a

perf: use Codex timestamp index for cold scans

5cbd20e

fix: bind Codex cold scan from one plan

2b42cf7

perf: avoid indexed unbounded Codex scans

1a0c2c3

chore: regenerate Codex parser hash after rebase

4de852a

steipete force-pushed the perf/codex-priority-turns-memo branch from 6ef936f to 4de852a Compare June 11, 2026 10:40

steipete closed this Jun 11, 2026

steipete merged commit 71b93e5 into steipete:main Jun 11, 2026
4 checks passed

Conversation

ProspectOre commented Jun 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Context

Change

Validation

Runtime Proof

Honesty / scope notes

Uh oh!

ProspectOre commented Jun 10, 2026

Uh oh!

clawsweeper Bot commented Jun 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ProspectOre commented Jun 10, 2026

Uh oh!

clawsweeper Bot commented Jun 10, 2026

Uh oh!

clawsweeper Bot commented Jun 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ProspectOre commented Jun 11, 2026

Uh oh!

clawsweeper Bot commented Jun 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ProspectOre commented Jun 11, 2026

Uh oh!

clawsweeper Bot commented Jun 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

clawsweeper Bot commented Jun 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ProspectOre commented Jun 11, 2026

Uh oh!

steipete commented Jun 11, 2026

Uh oh!

steipete commented Jun 11, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

ProspectOre commented Jun 10, 2026 •

edited

Loading

clawsweeper Bot commented Jun 10, 2026 •

edited

Loading

clawsweeper Bot commented Jun 11, 2026 •

edited

Loading

clawsweeper Bot commented Jun 11, 2026 •

edited

Loading

clawsweeper Bot commented Jun 11, 2026 •

edited

Loading

clawsweeper Bot commented Jun 11, 2026 •

edited

Loading