Improvements to show, and new show from disc#4911
Conversation
| def evict(self) -> None: | ||
| """Release the loaded CTerm from memory.""" | ||
| self._cterm = None | ||
|
|
There was a problem hiding this comment.
I don't actually see where this is used, how does this work? I assume somehow this gets called whenever we're done printing a given node?
There was a problem hiding this comment.
I intended to use it in pretty_segments, but missed it. I will address.
| def __eq__(self, other: object) -> bool: | ||
| if isinstance(other, LazyNode): | ||
| return self.id == other.id | ||
| # Also compare with real KCFG.Node | ||
| return hasattr(other, 'id') and self.id == other.id |
There was a problem hiding this comment.
Would this be equivalent anyway?
| def __eq__(self, other: object) -> bool: | |
| if isinstance(other, LazyNode): | |
| return self.id == other.id | |
| # Also compare with real KCFG.Node | |
| return hasattr(other, 'id') and self.id == other.id | |
| def __eq__(self, other: object) -> bool: | |
| return hasattr(other, 'id') and self.id == other.id |
Or perhaps the tighter:
| def __eq__(self, other: object) -> bool: | |
| if isinstance(other, LazyNode): | |
| return self.id == other.id | |
| # Also compare with real KCFG.Node | |
| return hasattr(other, 'id') and self.id == other.id | |
| def __eq__(self, other: object) -> bool: | |
| if isinstance(other, LazyNode): | |
| return self.id == other.id | |
| # Also compare with real KCFG.Node | |
| return isinstance(other, KCFG.Node) and self.id == other.id |
There was a problem hiding this comment.
I like the second one. I think it captures the intention (only LazyNode or Node) better.
|
Does it make sense to just have lazy-loading be the default anyway, and build it into the main KCFG class? |
|
@claude can you review this PR? And check if |
@ehildenb I think that there is a great reason to build it into the main KCFG class. One of the large proofs I was running failed while it was running (not after completion, at the printing stage). While writing this, I found that this is due to the nodes are never released from memory while a proof is running (even though the nodes' json files are created while the proof runs). Having a LazyNode would help in that, by enabling evicting nodes' CTerms and reloading them if ever needed. I did not do this, as it was a much larger change, affecting not only the printing. I am happy to adapt this PR to do this. |
|
I think it's a good idea overall, but we should probably check in with @tothtamas28 about it. I would prefer not to have two different classes with one duck-typing the other, if we can get the benefit by just making lazy loading the default, that would be great. Tamas should be back next week to provide his opinion. |
|
I should be able to just edit the Node class, no additional class. |
|
If we have the laziness built-in by default, do we need the |
|
I'm also wondering now if this is a good idea. We have to store the Currently, a |
I think no, we would not need the stubs then. I removed one so far, for some of the current changes. |
Right, I did need to add |
|
@ehildenb The issue is that, with the lazy approach, in order to find a node's |
Memory-efficient proof display: streaming show and lazy loading
Motivation
Displaying large proofs with
kmir showcaused out-of-memory failures. A completed 1411-node proof required 23.8 GB of memory just to print its output, after the proof itself had already finished successfully.Investigation
The initial hypothesis was that the memory cost came from joining the entire output into a single string before printing (
'\n'.join(lines)). Streaming the output line-by-line (via a generator) eliminated this duplication, but peak memory remained at 23.8 GB. The real cost was loading the entire proof into memory, parsing all node CTerms and constraint data from JSON into Python K-term objects, with ~6× memory overhead from Python object representation.Changes
This PR introduces two improvements:
1. Generator-based
show_iterKCFGShow.show_iterandAPRProofShow.show_iteryield lines one at a time instead of collecting them into a list. The existingshowmethods are reimplemented aslist(self.show_iter(...))to avoid code duplication. Tests confirm same output.2. Lazy-loading
show_iter_from_diskAPRProofShow.show_iter_from_diskdisplays a proof without loading the full proof into memory:LazyNode— duck-typesKCFG.Node. Stores node ID, attrs, and a file path. Loads the CTerm fromnodes/{id}.jsononly when.ctermis accessed for printing.LazyCSubst— duck-typesCSubst. Holds the raw JSON dict and defers parsing into K-term objects until.constraintsor.substis accessed.APRProofStub— duck-typesAPRProof. Holds proof metadata (init, target, terminal, bounded, refutations) fromproof.jsonand answers proof-level queries (is_init,is_target,is_terminal, etc.) without loading the KCFG.These stubs are passed into the real
KCFGdata structures (edges, splits, covers, ndbranches accept them via duck typing) and the existingpretty_segmentstraversal works unchanged, with no reimplementation or duplication of the tree-drawing logic.Loading is integrated into the existing code path:
KCFG.from_dictaccepts alazyparameter that controls whether nodes are created asLazyNodestubs and covers useLazyCSubstwrappers.KCFGStore.read_cfg_data_lazyreadskcfg.jsonwithout loading node JSON files.KCFG.read_cfg_data_lazyprovides the entry point, combining the two.Results
Tested on a 1411-node p-token proof (
burn_multisig_n1):show_iter(full load)show_iter_from_disk(lazy)74% memory reduction, 13.5× faster, identical output.
Testing
show_iter_from_diskoutput is compared againstshowoutput in three existingtest_imp.pytest functions (all tests that write proofs to disk). Both default and full-printer modes are exercised.APRProofNodePrinter(proof-level attrs: init, target, terminal).