Cross-load packed weight cache reuse for XNNPACK (#19988) by doggeral · Pull Request #19988 · pytorch/executorch

doggeral · 2026-06-03T21:47:05Z

Summary:

Add cross-load reuse + multi-PTE safety to the file-backed packed weight cache (D106673663). The first PTE in a session calls save_packed_index() to append a trailer; subsequent process launches mmap the file and pre-populate name_to_packed_data_metadata_ so look_up() hits for every saved weight and xnn_create_runtime skips packing entirely.

Cache file format

[packed data regions]                          (written by reserve_space)
[index entries]                                (written by save_packed_index)
  each: name_len(4B) | name(N) | file_offset(8B) | data_size(8B)
[footer: 20 bytes]
  index_start(8B) | entry_count(4B) | magic "XPWC"(4B) | version(4B)

Lifecycle invariants

cache_loaded_ gate: load_packed_cache() runs at most once per process per path. Subsequent PTE inits for the same path reopen the write fd without re-reading the trailer.
from_load flag: persistent entries (loaded from trailer or promoted on save) skip delete_packed_data cleanup. This keeps the mmap region and metadata alive across PTE unload/reload, so the next init hits the cache instead of repacking. Without this, every PTE destroy/recreate cycle appended a fresh copy to the file (~450 MB per cycle).
No-op save short-circuit: save_packed_index returns early when no new reserve_space happened since the last save, avoiding the mtime churn that previously made the cache file look modified on every model load.

Multi-PTE behavior

Multiple PTEs (or methods that don't share weights) in the same model load share one cache file. Each PTE's reserve_space extends the file; finalize_for_runtime msyncs only newly added regions; save_packed_index writes one trailer covering all PTEs at the end of the load.
Sibling PTEs that opt out of the mmap path (caller passes empty packed_cache_path) early-return from initialize_for_runtime and fall through to heap allocation, without touching the singleton's PLLM state.
Cross-model coexistence relies on caller-side discipline: only models that opt in set a non-empty cache path. Setting different non-empty paths concurrently is not supported by this singleton design.

Caller change

XNNPACKBackend::init always calls set_packed_cache_path (with empty string for non-opted-in PTEs). This keeps the singleton path in sync with the current PTE instead of inheriting a sibling's path.

Test Plan

buck2 test fbcode//executorch/backends/xnnpack/test:test_xnn_weights_cache  # 5 pass
buck2 build fbsource//xplat/executorch/backends/xnnpack:xnnpack_backendApple
buck2 build fbsource//xplat/executorch/backends/xnnpack:xnnpack_backend
buck2 build fbcode//executorch/backends/xnnpack:xnnpack_backend

On device (iOS Stella build, PLLM + Llama3 runner):

Cold start: load (1184 entries) from cache, reserve_mmap=0 for cached weights
Cache file size stable at ~593 MB across PLLM unload/reload cycles
app_peak ~700 MB (vs ~2.5 GB pre-fix)
compressed ~100 MB (vs ~1.7 GB pre-fix)

Reviewed By: GregoryComer

Differential Revision: D106717093

pytorch-bot · 2026-06-03T21:47:10Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/19988

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

[ROCm] MI350 CI jobs will have longer queue times due to CI migration

❌ 5 New Failures, 1 Pending

As of commit f6f3ee8 with merge base 185bd09 ():

NEW FAILURES - The following jobs have failed:

Cadence Build & Test / hifi-build / hifi4 (gh)
Input required and not supplied: aws-region
Cadence Build & Test / vision-build / vision (gh)
Input required and not supplied: aws-region
Lint / lintrunner (gh)
>>> Lint for backends/xnnpack/test/runtime/test_xnn_weights_cache.cpp:
pull / test-qnn-models-linux (dl3) / linux-job (gh)
RuntimeError: Command docker exec -t f19df4b77893adfee648094605af790a648ffd4a37502847c695465fea5c3dca /exec failed with exit code 92
pull / unittest-editable / windows / windows-job (gh)
examples/models/test/test_export.py::ExportTest::test_efficient_sam_export_to_executorch

This comment was automatically generated by Dr. CI and updates every 15 minutes.

meta-codesync · 2026-06-03T21:47:13Z

@doggeral has exported this pull request. If you are a Meta employee, you can view the originating Diff in D106717093.

github-actions · 2026-06-03T21:47:57Z

This PR needs a `release notes:` label

If your change should be included in the release notes (i.e. would users of this library care about this change?), please use a label starting with release notes:. This helps us keep track and include your important work in the next release notes.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "release notes: none"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

digantdesai

Review automatically exported from Phabricator review in Meta.

comments have been addressed.

Summary: Add cross-load reuse + multi-PTE safety to the file-backed packed weight cache (D106673663). The first PTE in a session calls `save_packed_index()` to append a trailer; subsequent process launches mmap the file and pre-populate `name_to_packed_data_metadata_` so `look_up()` hits for every saved weight and `xnn_create_runtime` skips packing entirely. ## Cache file format ``` [packed data regions] (written by reserve_space) [index entries] (written by save_packed_index) each: name_len(4B) | name(N) | file_offset(8B) | data_size(8B) [footer: 20 bytes] index_start(8B) | entry_count(4B) | magic "XPWC"(4B) | version(4B) ``` ## Lifecycle invariants - `cache_loaded_` gate: `load_packed_cache()` runs at most once per process per path. Subsequent PTE inits for the same path reopen the write fd without re-reading the trailer. - `from_load` flag: persistent entries (loaded from trailer or promoted on save) skip `delete_packed_data` cleanup. This keeps the mmap region and metadata alive across PTE unload/reload, so the next init hits the cache instead of repacking. Without this, every PTE destroy/recreate cycle appended a fresh copy to the file (~450 MB per cycle). - No-op save short-circuit: `save_packed_index` returns early when no new `reserve_space` happened since the last save, avoiding the mtime churn that previously made the cache file look modified on every model load. ## Multi-PTE behavior - Multiple PTEs (or methods that don't share weights) in the same model load share one cache file. Each PTE's `reserve_space` extends the file; `finalize_for_runtime` msyncs only newly added regions; `save_packed_index` writes one trailer covering all PTEs at the end of the load. - Sibling PTEs that opt out of the mmap path (caller passes empty `packed_cache_path`) early-return from `initialize_for_runtime` and fall through to heap allocation, without touching the singleton's PLLM state. - Cross-model coexistence relies on caller-side discipline: only models that opt in set a non-empty cache path. Setting different non-empty paths concurrently is not supported by this singleton design. ## Caller change `XNNPACKBackend::init` always calls `set_packed_cache_path` (with empty string for non-opted-in PTEs). This keeps the singleton path in sync with the current PTE instead of inheriting a sibling's path. ## Test Plan ``` buck2 test fbcode//executorch/backends/xnnpack/test:test_xnn_weights_cache # 5 pass buck2 build fbsource//xplat/executorch/backends/xnnpack:xnnpack_backendApple buck2 build fbsource//xplat/executorch/backends/xnnpack:xnnpack_backend buck2 build fbcode//executorch/backends/xnnpack:xnnpack_backend ``` On device (iOS Stella build, PLLM + Llama3 runner): - Cold start: load `(1184 entries)` from cache, `reserve_mmap=0` for cached weights - Cache file size stable at ~593 MB across PLLM unload/reload cycles - `app_peak ~700 MB` (vs ~2.5 GB pre-fix) - `compressed ~100 MB` (vs ~1.7 GB pre-fix) Reviewed By: GregoryComer Differential Revision: D106717093

digantdesai

Review automatically exported from Phabricator review in Meta.

addressed comments already

doggeral requested a review from digantdesai as a code owner June 3, 2026 21:47

meta-cla Bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jun 3, 2026

meta-codesync Bot added fb-exported meta-exported labels Jun 3, 2026

digantdesai previously requested changes Jun 4, 2026

View reviewed changes

doggeral force-pushed the export-D106717093 branch from 5b4a23c to 2ef5950 Compare June 11, 2026 20:31

doggeral requested review from GregoryComer and digantdesai June 12, 2026 20:19

GregoryComer approved these changes Jun 12, 2026

View reviewed changes

doggeral force-pushed the export-D106717093 branch from 2ef5950 to f6f3ee8 Compare June 12, 2026 21:45

doggeral had a problem deploying to cadence June 12, 2026 21:46 — with GitHub Actions Failure

meta-codesync Bot changed the title ~~Cross-load packed weight cache reuse for XNNPACK~~ Cross-load packed weight cache reuse for XNNPACK (#19988) Jun 12, 2026

doggeral force-pushed the export-D106717093 branch from f6f3ee8 to 072586f Compare June 12, 2026 23:26

doggeral had a problem deploying to cadence June 12, 2026 23:26 — with GitHub Actions Failure

digantdesai previously requested changes Jun 12, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cross-load packed weight cache reuse for XNNPACK (#19988)#19988

Cross-load packed weight cache reuse for XNNPACK (#19988)#19988
doggeral wants to merge 1 commit into
pytorch:mainfrom
doggeral:export-D106717093

doggeral commented Jun 3, 2026 •

edited by meta-codesync Bot

Loading

Uh oh!

pytorch-bot Bot commented Jun 3, 2026 •

edited

Loading

Uh oh!

meta-codesync Bot commented Jun 3, 2026

Uh oh!

github-actions Bot commented Jun 3, 2026

Uh oh!

digantdesai left a comment

Uh oh!

digantdesai left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

doggeral commented Jun 3, 2026 • edited by meta-codesync Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Cache file format

Lifecycle invariants

Multi-PTE behavior

Caller change

Test Plan

Uh oh!

pytorch-bot Bot commented Jun 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/19988

❗ 1 Active SEVs

❌ 5 New Failures, 1 Pending

Uh oh!

meta-codesync Bot commented Jun 3, 2026

Uh oh!

github-actions Bot commented Jun 3, 2026

This PR needs a release notes: label

Uh oh!

digantdesai left a comment

Choose a reason for hiding this comment

Uh oh!

digantdesai left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

doggeral commented Jun 3, 2026 •

edited by meta-codesync Bot

Loading

pytorch-bot Bot commented Jun 3, 2026 •

edited

Loading

This PR needs a `release notes:` label