Skip to content

fix(exporters): write VERIFICATION.md after weight fusion, not before#240

Open
rylinjames wants to merge 1 commit into
mainfrom
fix/verification-hash-after-fusion
Open

fix(exporters): write VERIFICATION.md after weight fusion, not before#240
rylinjames wants to merge 1 commit into
mainfrom
fix/verification-hash-after-fusion

Conversation

@rylinjames

Copy link
Copy Markdown
Collaborator

Summary

  • Bug: export_monolithic wrote VERIFICATION.md (via _write_tether_configwrite_verification_report) inside the per-family exporter (e.g. export_smolvla_monolithic, line 661), then AFTER that returned ran fuse_weights (lines 1540-1541) which atomically overwrites model.onnx with structurally different bytes. Result: every monolithic export shipped a VERIFICATION.md with pre-fusion SHA-256 hashes; tether validate recomputed post-fusion hashes and the cert mismatched on every run.
  • Fix: In export_monolithic, after the fuse_weights block, call write_verification_report(Path(output_dir), parity=None) a second time so the stale report is overwritten with hashes of the final post-fusion files. Wrapped in the same defensive try/except pattern used nearby. The original call inside _write_tether_config is preserved — the split-path exporters depend on it.
  • Tests: Two new cheap unit tests (no model downloads, no torch, no lerobot):
    • test_hash_freshness_after_file_mutation: creates a temp export dir with a fake model.onnx, calls write_verification_report, mutates the file, calls again, asserts the recorded SHA-256 changed and exactly matches the independently computed hash of the new bytes.
    • test_export_monolithic_calls_write_verification_after_fusion: monkeypatches the family exporter, fuse_weights, and write_verification_report to record call order; asserts write_verification_report is called at least once AFTER fuse_weights.

Ordering confirmed

  • write_verification_report first called: monolithic.py:836 (inside _write_tether_config, invoked by family exporters at e.g. line 661)
  • fuse_weights called: monolithic.py:1541 (in export_monolithic, after family exporter returns)
  • New post-fusion refresh: monolithic.py:1554

Test plan

  • pytest tests/test_verification_report.py — 7/7 pass (including new test_hash_freshness_after_file_mutation)
  • pytest tests/test_export_monolithic_model_type_fallback.py — 18/18 pass (including new test_export_monolithic_calls_write_verification_after_fusion)
  • py_compile on monolithic.py — clean
  • ruff check on touched files — clean (pre-existing F841 at line 505 is in main, not introduced here)

🤖 Generated with Claude Code

- Root cause: `export_monolithic` called `_write_tether_config` (which
  calls `write_verification_report` and SHA-256-hashes every file) inside
  the per-family exporter (e.g. `export_smolvla_monolithic`, line 661),
  then AFTER that returned ran `fuse_weights` (lines 1540-1541) which
  atomically overwrites `model.onnx` with structurally different bytes.
  Result: VERIFICATION.md recorded pre-fusion hashes; `tether validate`
  recomputed post-fusion hashes → cert mismatch on every monolithic export.

- Fix: after the `fuse_weights` block in `export_monolithic`, call
  `write_verification_report(Path(output_dir), parity=None)` again so the
  stale report is overwritten with hashes of the now-final post-fusion
  files.  Wrapped in the same try/except pattern used nearby so a report
  failure can never abort the export. The original call inside
  `_write_tether_config` is intentionally preserved — other exporters
  (split path `export_smolvla`, etc.) depend on it.

- Tests added:
  - `test_hash_freshness_after_file_mutation` in test_verification_report.py:
    creates a temp dir with a fake model.onnx, calls write_verification_report,
    mutates the file, calls again, asserts the recorded SHA-256 changed and
    matches the independently computed hash of the new bytes.
  - `test_export_monolithic_calls_write_verification_after_fusion` in
    test_export_monolithic_model_type_fallback.py: monkeypatches the family
    exporter, fuse_weights, and write_verification_report to track call order;
    asserts write_verification_report is called at least once AFTER fuse_weights.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant