Skip to content

feat: incremental AST cache to eliminate redundant AstEncoder work#59

Merged
ParzivalHack merged 1 commit into
ParzivalHack:mainfrom
daniplatform:caching-AST
Jun 4, 2026
Merged

feat: incremental AST cache to eliminate redundant AstEncoder work#59
ParzivalHack merged 1 commit into
ParzivalHack:mainfrom
daniplatform:caching-AST

Conversation

@daniplatform
Copy link
Copy Markdown
Collaborator

@daniplatform daniplatform commented Jun 4, 2026

Summary

Adds an incremental AST cache that eliminates redundant AstEncoder work the pure-Python json.dumps(ast_tree, cls=AstEncoder) step, which is O(N nodes) and dominates parsing cost (ast.parse itself is C and negligible).

Three-level hierarchy:

  • L1 in-memory mtime guard: zero work on a hit within a single process run.
  • L2 disk content-hash guard: no parse/encode across runs.
  • L3 chunk-aware per-function / per-class subtree reuse when a file only partially changes.

Changes

  • src/pyspector/ast_cache.py — cache implementation. Persistence is JSON + base64 + zlib, deliberately not pickle (pickle executes arbitrary code on load, unsafe for cache files living in an untrusted repo directory). Atomic writes, LRU eviction, graceful degradation to L1-only when the cache dir can't be created.
  • src/pyspector/_ast_encode.py — shared AST → JSON encoder extracted as the single source of truth for the schema consumed by the Rust core, eliminating encoder drift between cli.py and the cache.
  • src/pyspector/cli.py — wires the cache into get_python_file_asts via a new optional cache parameter, and re-exports AstEncoder from _ast_encode (so from pyspector.cli import AstEncoder keeps working). The cache is bypassed when enable_syntax_warnings is True, preserving that diagnostic (the cache suppresses SyntaxWarning internally).
  • tests/unit/ast_cache_test.py — 51 unit tests (all green).

Compatibility

This branch was rebased onto the latest main, so it integrates cleanly on top of the recent CLI work (--stats, --debug, exclude pre-pass, syntax-warning handling, absolute file_path). All of those upstream behaviours are preserved; the cache only adds an optional fast path. The output AST JSON is byte-for-byte identical to the non-cached path.

Testing

  • tests/unit/ast_cache_test.py: 51 passed.
  • AST JSON produced via the cache verified identical to the direct ast.parse + AstEncoder path.
  • Pre-existing failures unrelated to this change (stale compiled _rust_core vs. new rules-TOML schema, missing bs4, and an existing absolute-vs-relative file_path assertion in test_get_asts.py) reproduce identically on main and are out of scope here.

Three-level (L1 in-memory mtime / L2 disk content-hash / L3 chunk-aware)
incremental AST cache that skips the pure-Python json.dumps(AstEncoder)
bottleneck across runs and on partial file changes.

- src/pyspector/ast_cache.py: cache implementation (JSON+base64 persistence,
  no pickle/code-exec on load)
- src/pyspector/_ast_encode.py: shared AST->JSON encoder (single source of
  truth, eliminates encoder drift between cli.py and the cache)
- src/pyspector/cli.py: wire the cache into get_python_file_asts
- tests/unit/ast_cache_test.py: unit tests

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@ParzivalHack ParzivalHack added enhancement New feature or request Test Label for every issue related to tests and testing in general labels Jun 4, 2026
Copy link
Copy Markdown
Owner

@ParzivalHack ParzivalHack left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @daniplatform, great idea on reducing the workload of AstEncoder, and in my tests this PR also increases PySpector's average scanning speed by 41.90%, so really a great addition. I require no edits. Merging :D

@ParzivalHack ParzivalHack merged commit fce70f6 into ParzivalHack:main Jun 4, 2026
1 of 3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request Test Label for every issue related to tests and testing in general

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants