fix: cast initial_basis to long in 07_iterative_nqs_dci scripts (fixes #40)#41
fix: cast initial_basis to long in 07_iterative_nqs_dci scripts (fixes #40)#41thc1006 wants to merge 1 commit intoQuantumNoLab:mainfrom
Conversation
`FlowGuidedKrylovPipeline.extract_and_select_basis()` returns a float32
tensor (cloned from the NF trainer's `accumulated_basis`, which lives in
float for gradient tracking). But `run_hi_nqs_sqd(initial_basis=...)` and
`run_hi_nqs_skqd(initial_basis=...)` strictly validate that
`initial_basis` is integer/bool dtype (binary occupation vectors) and
raise ValueError on float.
This caused both `iter_nqs_dci_sqd.py` and `iter_nqs_dci_krylov_classical.py`
to fail end-to-end with:
ValueError: initial_basis must be integer or bool dtype
(binary occupations), got torch.float32
at the `run_hi_nqs_sqd`/`run_hi_nqs_skqd` call site.
Interestingly, the parallel scripts in group 08 (iter_nqs_dci_pt2_*) work
because their basis is passed through `expand_basis_via_connections()`
first, which happens to cast internally. The 007 scripts do not have that
intermediate step, so they hit the bug raw.
Fix: add `.long()` cast to the `basis` result in the two affected scripts,
with a short comment explaining why. Both scripts now reach chemical
accuracy on H2:
07_iterative_nqs_dci/iter_nqs_dci_sqd.py h2 cpu 11.9s 0.0 mHa
07_iterative_nqs_dci/iter_nqs_dci_krylov_classical h2 cpu 9.3s 0.0 mHa
Scope:
- Surgical script-level fix (2 files, 2-line changes each including
comment). Does not change library-level API contracts.
- The 3rd script in the group (`iter_nqs_dci_krylov_quantum.py`) does not
use `initial_basis` at all — unaffected.
- Related architectural issue (float32 basis at API boundary with int-only
validator) could eventually be fixed by either (a) making
`extract_and_select_basis` return long, or (b) making
`validate_initial_basis` lenient for float {0,1} values. Out of scope
for this minimal fix.
Discovered via end-to-end smoke test on `refactor/pipeline-catalog` branch
(PR QuantumNoLab#39). Verified that the failure is pre-existing on main before writing
this fix.
There was a problem hiding this comment.
Pull request overview
Fixes an end-to-end runtime failure in the 07_iterative_nqs_dci experiment scripts by ensuring the NF/DCI-selected basis is cast to an integer dtype before being passed as initial_basis to HI+NQS runners that strictly validate integer/bool occupations.
Changes:
- Cast
pipeline.extract_and_select_basis()output totorch.longiniter_nqs_dci_sqd.pybefore callingrun_hi_nqs_sqd(initial_basis=...). - Cast
pipeline.extract_and_select_basis()output totorch.longiniter_nqs_dci_krylov_classical.pybefore callingrun_hi_nqs_skqd(initial_basis=...). - Add short inline comments documenting the dtype mismatch and why the cast is required.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.
| File | Description |
|---|---|
| experiments/pipelines/07_iterative_nqs_dci/iter_nqs_dci_sqd.py | Cast extracted basis to long so run_hi_nqs_sqd(initial_basis=...) passes dtype validation. |
| experiments/pipelines/07_iterative_nqs_dci/iter_nqs_dci_krylov_classical.py | Cast extracted basis to long so run_hi_nqs_skqd(initial_basis=...) passes dtype validation. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 2 out of 2 changed files in this pull request and generated no new comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Code reviewNo issues found. Checked for bugs and CLAUDE.md compliance. Verified:
🤖 Generated with Claude Code |
|
@copilot resolve the merge conflicts in this pull request make sure follow rela best practice. |
Summary
Minimal surgical fix for #40: add
.long()cast to thebasistensor before passing it asinitial_basisin the two affected 07_iterative_nqs_dci scripts.Closes #40.
Affected files (2)
experiments/pipelines/07_iterative_nqs_dci/iter_nqs_dci_sqd.pyexperiments/pipelines/07_iterative_nqs_dci/iter_nqs_dci_krylov_classical.pyDiff summary
Each file gets one extra
.long()call + a 4-line explanatory comment:Total: 10 insertions, 2 deletions across 2 files. No library changes, no API changes.
Why this scope
See issue #40 for the full root-cause analysis. Short version:
extract_and_select_basis()returns float32 (cloned from NF trainer's accumulated_basis), but the downstreamrun_hi_nqs_sqd/run_hi_nqs_skqdstrictly validate integer/bool dtype. 008's sister scripts work becauseexpand_basis_via_connections()happens to cast internally; 007 doesn't have that intermediate step.Two cleaner long-term architectural fixes (cast at the library source, or relax the validator) are noted in the issue as out-of-scope. This PR is the minimum unblocking patch.
Test plan
Manual end-to-end verification on H2/CPU:
ValueError: initial_basis must be integer or bool dtype ... got torch.float32iter_nqs_dci_sqd.py h2 --device cpu→ Error 0.0 mHa, 11.88 siter_nqs_dci_krylov_classical.py h2 --device cpu→ Error 0.0 mHa, 9.32 sruff check experiments/pipelines/07_iterative_nqs_dci/— cleanruff format --check— cleanextract_and_select_basisor lenient validator)Relationship to PR #39
PR #39 (
refactor/pipeline-catalog) renames07_iterative_nqs_dci/→007_iterative_nqs_dci/viagit mv. This PR modifies file content (not paths) in07_iterative_nqs_dci/. The two PRs touch disjoint aspects of the same files, so git's rename detection should handle the rebase cleanly regardless of merge order.Recommended merge order:
Either order works.