Skip to content

[codex] cache state DB runtime for repeated lookups#15583

Open
charley-oai wants to merge 1 commit intomainfrom
codex/cache-state-db-lookups
Open

[codex] cache state DB runtime for repeated lookups#15583
charley-oai wants to merge 1 commit intomainfrom
codex/cache-state-db-lookups

Conversation

@charley-oai
Copy link
Collaborator

@charley-oai charley-oai commented Mar 24, 2026

What changed

  • cache the core state DB runtime for ad hoc lookup helpers like get_state_db() and open_if_present() so repeated cold metadata/path lookups can reuse an existing StateRuntime
  • keep state_db::init() creating a fresh runtime for session-owned work instead of funneling everything through one shared SQLite pool
  • allow no-provider open_if_present() callers to initialize and cache an ad hoc runtime so rollout-path fallback and DB read-repair still work
  • treat empty-provider cached runtimes as reusable for later ad hoc lookups with a configured provider
  • prune dead cached runtimes on insert so one-off sqlite homes do not accumulate stale cache keys for the process lifetime
  • add focused tests covering cached lookup reuse, no-provider initialization, distinct-runtime behavior for init(), and stale-cache pruning

Why

Cold thread metadata lookups were repeatedly calling StateRuntime::init(), which opens both state_5.sqlite and logs_1.sqlite. On machines with a very large logs_1.sqlite, that made cold thread/read and thread/resume paths pay repeated logs DB open/migration costs even when they only needed thread metadata or rollout path lookup.

The first version of this change cached too broadly. Each StateRuntime uses a SQLite pool capped at 5 connections, so caching state_db::init() as well would have risked routing session-owned state DB work from multiple live sessions through one shared 5-connection pool. This PR keeps the caching benefit for ad hoc lookup-style callers while preserving separate runtimes for session-owned paths.

The cache also stores Weak<StateRuntime> values keyed by sqlite_home. Without pruning, dead entries can accumulate if many distinct homes are initialized once and never revisited. Cleaning them up on insert keeps the cache bounded in those cases.

Impact

  • reduces repeated SQLite runtime initialization on cold lookup paths
  • preserves filesystem fallback and read-repair when find_thread_path_by_id_str() reaches the no-provider open_if_present() path
  • keeps session-owned state DB work on distinct runtimes instead of concentrating it on one shared 5-connection pool
  • avoids unbounded growth of dead cache keys across many distinct sqlite homes
  • keeps existing backfill-complete gating for regular state DB reads

Validation

  • cargo test -p codex-core rollout::tests::find_thread_path_falls_back_when_db_path_is_stale
  • cargo test -p codex-core rollout::tests::find_thread_path_repairs_missing_db_row_after_filesystem_fallback
  • cargo test -p codex-core state_db::tests
  • just fmt
  • just argument-comment-lint
  • cargo test -p codex-core still shows unrelated pre-existing failures in other areas; the rollout regressions above are fixed

Notes

  • I did not run the full workspace test suite because repo guidance says to ask before doing that for core changes.

@charley-oai charley-oai force-pushed the codex/cache-state-db-lookups branch from 3bd848f to 0d3e0fb Compare March 24, 2026 00:39
@charley-oai charley-oai marked this pull request as ready for review March 24, 2026 00:44
@charley-oai
Copy link
Collaborator Author

@codex review

Copy link
Contributor

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 0d3e0fb47b

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Reuse a cached StateRuntime for ad hoc core state DB lookups, but keep state_db::init() on its own runtime so session-owned work does not share a single SQLite pool. Allow no-provider open_if_present() callers to initialize and cache an ad hoc runtime so rollout path fallback and DB read-repair still work, and treat empty-provider cached runtimes as reusable for later ad hoc lookups. Prune dead cached runtimes on insert so one-off sqlite homes do not accumulate stale cache keys for the process lifetime.

Co-authored-by: Codex <noreply@openai.com>
@charley-oai charley-oai force-pushed the codex/cache-state-db-lookups branch from c6e4a50 to de6fb67 Compare March 24, 2026 02:44
@charley-oai
Copy link
Collaborator Author

@codex review

@chatgpt-codex-connector
Copy link
Contributor

Codex Review: Didn't find any major issues. Bravo.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant