[SM] Automatically prune cache entries older than 30 days#17585
Conversation
bonigarcia
left a comment
There was a problem hiding this comment.
The current implementation uses the modified metadata of the cache assets to select which assets to prune. But there are cases where these assets should still be in the cache, even if the field is older than 30 days. It can be any driver if it is still in use. For example, geckodriver has a very low release frequency (the last release was in February 2025).
I have something different in mind for this feature. In my head, the best solution to this problem is to add a field to the SM metadata (se-metadata.json) that is updated whenever an asset (driver or browser) is used (e.g., last_used or something similar). The metadata file is read by SM each time a driver/browser is resolved. At that moment, in addition to updating the existing info for the driver/browser and the new last_used, the rest of the assets (drivers/browsers) are checked and pruned if last_used is older than the current date - CACHE_TTL_DAYS.
|
I forgot about how infrequent geckodriver releases are... I will update accordingly |
Selenium Manager now removes driver/browser version directories from the cache (~/.cache/selenium) that have not been modified in over 30 days. The prune runs on every invocation after clear-cache/clear-metadata, so the cache size is bounded without requiring manual intervention. Fixes #17196
Replace the file-mtime pruning approach with a metadata-driven one, as suggested by @bonigarcia. Each Driver and Browser metadata entry now carries a `last_used` Unix timestamp (defaulting to now for entries read from older metadata files). The timestamp is refreshed whenever SM resolves a driver or browser from the cache. Pruning now reads se-metadata.json, removes entries whose `last_used` is older than CACHE_TTL_DAYS (30), deletes the matching version directories from disk, and writes the updated metadata back. This correctly handles infrequently-released drivers like geckodriver that should not be pruned just because they have an old mtime.
The previous approach stored last_used on the Driver/Browser structs,
which are subject to TTL filtering in get_metadata(). After 1 hour the
TTL entry expires; the next write_metadata call (triggered by any
driver/browser lookup) purges it from the file, discarding last_used
with it. As a result prune_old_cache_entries could never find old
entries to remove.
Fix: track usage in a new cached_assets section of se-metadata.json
that is never touched by the TTL retain logic. update_cached_asset
upserts a {asset_name, asset_version, last_used} record whenever a
driver or browser binary is served from the local cache. Pruning reads
only cached_assets, so it remains correct even after the short-lived
TTL entries have been flushed by unrelated SM invocations.
After deleting a stale version directory (e.g. chromedriver/linux/x86_64/120.0/), walk upward and remove any ancestor directories that are now empty, stopping at the cache root. This avoids leaving behind empty OS/arch scaffolding after a driver or browser is pruned.
a89f7b8 to
d7988b3
Compare
|
Are the docs up to date with this feature? |
|
I created a PR in the Selenium doc about it: SeleniumHQ/seleniumhq.github.io#2673 |
Selenium Manager now removes driver/browser version directories from the cache (~/.cache/selenium) that have not been modified in over 30 days. The prune runs on every invocation after clear-cache/clear-metadata, so the cache size is bounded without requiring manual intervention.
Fixes #17196
🔗 Related Issues
💥 What does this PR do?
🔧 Implementation Notes
🤖 AI assistance
💡 Additional Considerations
🔄 Types of changes