Cross-load packed weight cache reuse for XNNPACK (#19988)#19988
Cross-load packed weight cache reuse for XNNPACK (#19988)#19988doggeral wants to merge 1 commit into
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/19988
Note: Links to docs will display an error until the docs builds have been completed. ❗ 1 Active SEVsThere are 1 currently active SEVs. If your PR is affected, please view them below: ❌ 5 New Failures, 1 PendingAs of commit f6f3ee8 with merge base 185bd09 ( NEW FAILURES - The following jobs have failed:
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
|
@doggeral has exported this pull request. If you are a Meta employee, you can view the originating Diff in D106717093. |
This PR needs a
|
digantdesai
left a comment
There was a problem hiding this comment.
Review automatically exported from Phabricator review in Meta.
5b4a23c to
2ef5950
Compare
2ef5950 to
f6f3ee8
Compare
Summary: Add cross-load reuse + multi-PTE safety to the file-backed packed weight cache (D106673663). The first PTE in a session calls `save_packed_index()` to append a trailer; subsequent process launches mmap the file and pre-populate `name_to_packed_data_metadata_` so `look_up()` hits for every saved weight and `xnn_create_runtime` skips packing entirely. ## Cache file format ``` [packed data regions] (written by reserve_space) [index entries] (written by save_packed_index) each: name_len(4B) | name(N) | file_offset(8B) | data_size(8B) [footer: 20 bytes] index_start(8B) | entry_count(4B) | magic "XPWC"(4B) | version(4B) ``` ## Lifecycle invariants - `cache_loaded_` gate: `load_packed_cache()` runs at most once per process per path. Subsequent PTE inits for the same path reopen the write fd without re-reading the trailer. - `from_load` flag: persistent entries (loaded from trailer or promoted on save) skip `delete_packed_data` cleanup. This keeps the mmap region and metadata alive across PTE unload/reload, so the next init hits the cache instead of repacking. Without this, every PTE destroy/recreate cycle appended a fresh copy to the file (~450 MB per cycle). - No-op save short-circuit: `save_packed_index` returns early when no new `reserve_space` happened since the last save, avoiding the mtime churn that previously made the cache file look modified on every model load. ## Multi-PTE behavior - Multiple PTEs (or methods that don't share weights) in the same model load share one cache file. Each PTE's `reserve_space` extends the file; `finalize_for_runtime` msyncs only newly added regions; `save_packed_index` writes one trailer covering all PTEs at the end of the load. - Sibling PTEs that opt out of the mmap path (caller passes empty `packed_cache_path`) early-return from `initialize_for_runtime` and fall through to heap allocation, without touching the singleton's PLLM state. - Cross-model coexistence relies on caller-side discipline: only models that opt in set a non-empty cache path. Setting different non-empty paths concurrently is not supported by this singleton design. ## Caller change `XNNPACKBackend::init` always calls `set_packed_cache_path` (with empty string for non-opted-in PTEs). This keeps the singleton path in sync with the current PTE instead of inheriting a sibling's path. ## Test Plan ``` buck2 test fbcode//executorch/backends/xnnpack/test:test_xnn_weights_cache # 5 pass buck2 build fbsource//xplat/executorch/backends/xnnpack:xnnpack_backendApple buck2 build fbsource//xplat/executorch/backends/xnnpack:xnnpack_backend buck2 build fbcode//executorch/backends/xnnpack:xnnpack_backend ``` On device (iOS Stella build, PLLM + Llama3 runner): - Cold start: load `(1184 entries)` from cache, `reserve_mmap=0` for cached weights - Cache file size stable at ~593 MB across PLLM unload/reload cycles - `app_peak ~700 MB` (vs ~2.5 GB pre-fix) - `compressed ~100 MB` (vs ~1.7 GB pre-fix) Reviewed By: GregoryComer Differential Revision: D106717093
f6f3ee8 to
072586f
Compare
digantdesai
left a comment
There was a problem hiding this comment.
Review automatically exported from Phabricator review in Meta.
Summary:
Add cross-load reuse + multi-PTE safety to the file-backed packed weight cache (D106673663). The first PTE in a session calls
save_packed_index()to append a trailer; subsequent process launches mmap the file and pre-populatename_to_packed_data_metadata_solook_up()hits for every saved weight andxnn_create_runtimeskips packing entirely.Cache file format
Lifecycle invariants
cache_loaded_gate:load_packed_cache()runs at most once per process per path. Subsequent PTE inits for the same path reopen the write fd without re-reading the trailer.from_loadflag: persistent entries (loaded from trailer or promoted on save) skipdelete_packed_datacleanup. This keeps the mmap region and metadata alive across PTE unload/reload, so the next init hits the cache instead of repacking. Without this, every PTE destroy/recreate cycle appended a fresh copy to the file (~450 MB per cycle).save_packed_indexreturns early when no newreserve_spacehappened since the last save, avoiding the mtime churn that previously made the cache file look modified on every model load.Multi-PTE behavior
reserve_spaceextends the file;finalize_for_runtimemsyncs only newly added regions;save_packed_indexwrites one trailer covering all PTEs at the end of the load.packed_cache_path) early-return frominitialize_for_runtimeand fall through to heap allocation, without touching the singleton's PLLM state.Caller change
XNNPACKBackend::initalways callsset_packed_cache_path(with empty string for non-opted-in PTEs). This keeps the singleton path in sync with the current PTE instead of inheriting a sibling's path.Test Plan
On device (iOS Stella build, PLLM + Llama3 runner):
(1184 entries)from cache,reserve_mmap=0for cached weightsapp_peak ~700 MB(vs ~2.5 GB pre-fix)compressed ~100 MB(vs ~1.7 GB pre-fix)Reviewed By: GregoryComer
Differential Revision: D106717093