Skip to content

Improvements to show, and new show from disc#4911

Draft
mariaKt wants to merge 10 commits intodevelopfrom
mk/show-from-disk-2
Draft

Improvements to show, and new show from disc#4911
mariaKt wants to merge 10 commits intodevelopfrom
mk/show-from-disk-2

Conversation

@mariaKt
Copy link
Copy Markdown
Collaborator

@mariaKt mariaKt commented Apr 20, 2026

Memory-efficient proof display: streaming show and lazy loading

Motivation

Displaying large proofs with kmir show caused out-of-memory failures. A completed 1411-node proof required 23.8 GB of memory just to print its output, after the proof itself had already finished successfully.

Investigation

The initial hypothesis was that the memory cost came from joining the entire output into a single string before printing ('\n'.join(lines)). Streaming the output line-by-line (via a generator) eliminated this duplication, but peak memory remained at 23.8 GB. The real cost was loading the entire proof into memory, parsing all node CTerms and constraint data from JSON into Python K-term objects, with ~6× memory overhead from Python object representation.

Changes

This PR introduces two improvements:

1. Generator-based show_iter

KCFGShow.show_iter and APRProofShow.show_iter yield lines one at a time instead of collecting them into a list. The existing show methods are reimplemented as list(self.show_iter(...)) to avoid code duplication. Tests confirm same output.

2. Lazy-loading show_iter_from_disk

APRProofShow.show_iter_from_disk displays a proof without loading the full proof into memory:

  • LazyNode — duck-types KCFG.Node. Stores node ID, attrs, and a file path. Loads the CTerm from nodes/{id}.json only when .cterm is accessed for printing.
  • LazyCSubst — duck-types CSubst. Holds the raw JSON dict and defers parsing into K-term objects until .constraints or .subst is accessed.
  • APRProofStub — duck-types APRProof. Holds proof metadata (init, target, terminal, bounded, refutations) from proof.json and answers proof-level queries (is_init, is_target, is_terminal, etc.) without loading the KCFG.

These stubs are passed into the real KCFG data structures (edges, splits, covers, ndbranches accept them via duck typing) and the existing pretty_segments traversal works unchanged, with no reimplementation or duplication of the tree-drawing logic.

Loading is integrated into the existing code path: KCFG.from_dict accepts a lazy parameter that controls whether nodes are created as LazyNode stubs and covers use LazyCSubst wrappers. KCFGStore.read_cfg_data_lazy reads kcfg.json without loading node JSON files. KCFG.read_cfg_data_lazy provides the entry point, combining the two.

Results

Tested on a 1411-node p-token proof (burn_multisig_n1):

show_iter (full load) show_iter_from_disk (lazy)
Peak memory 23.8 GB 6.3 GB
Time 4263s (~71 min) 315s (~5 min)
Output lines 10,221 10,221

74% memory reduction, 13.5× faster, identical output.

Testing

  • show_iter_from_disk output is compared against show output in three existing test_imp.py test functions (all tests that write proofs to disk). Both default and full-printer modes are exercised.
  • Manually verified output match on external proofs with APRProofNodePrinter (proof-level attrs: init, target, terminal).
  • All 98 existing pyk show-related tests pass.

@mariaKt mariaKt marked this pull request as ready for review April 21, 2026 17:37
Comment thread pyk/src/pyk/kcfg/lazy.py Outdated
Comment on lines +44 to +47
def evict(self) -> None:
"""Release the loaded CTerm from memory."""
self._cterm = None

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't actually see where this is used, how does this work? I assume somehow this gets called whenever we're done printing a given node?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I intended to use it in pretty_segments, but missed it. I will address.

Comment thread pyk/src/pyk/kcfg/lazy.py Outdated
Comment on lines +48 to +52
def __eq__(self, other: object) -> bool:
if isinstance(other, LazyNode):
return self.id == other.id
# Also compare with real KCFG.Node
return hasattr(other, 'id') and self.id == other.id
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would this be equivalent anyway?

Suggested change
def __eq__(self, other: object) -> bool:
if isinstance(other, LazyNode):
return self.id == other.id
# Also compare with real KCFG.Node
return hasattr(other, 'id') and self.id == other.id
def __eq__(self, other: object) -> bool:
return hasattr(other, 'id') and self.id == other.id

Or perhaps the tighter:

Suggested change
def __eq__(self, other: object) -> bool:
if isinstance(other, LazyNode):
return self.id == other.id
# Also compare with real KCFG.Node
return hasattr(other, 'id') and self.id == other.id
def __eq__(self, other: object) -> bool:
if isinstance(other, LazyNode):
return self.id == other.id
# Also compare with real KCFG.Node
return isinstance(other, KCFG.Node) and self.id == other.id

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the second one. I think it captures the intention (only LazyNode or Node) better.

@ehildenb
Copy link
Copy Markdown
Member

ehildenb commented Apr 22, 2026

Does it make sense to just have lazy-loading be the default anyway, and build it into the main KCFG class?

@ehildenb
Copy link
Copy Markdown
Member

@claude can you review this PR? And check if evict is actually used, and whether it makes sense to just have lazy loading be the default for KCFGs?

@mariaKt
Copy link
Copy Markdown
Collaborator Author

mariaKt commented Apr 22, 2026

Does it make sense to just have lazy-loading be the default anyway, and build it into the main KCFG class?

@ehildenb I think that there is a great reason to build it into the main KCFG class. One of the large proofs I was running failed while it was running (not after completion, at the printing stage). While writing this, I found that this is due to the nodes are never released from memory while a proof is running (even though the nodes' json files are created while the proof runs). Having a LazyNode would help in that, by enabling evicting nodes' CTerms and reloading them if ever needed.

I did not do this, as it was a much larger change, affecting not only the printing. I am happy to adapt this PR to do this.

@ehildenb
Copy link
Copy Markdown
Member

I think it's a good idea overall, but we should probably check in with @tothtamas28 about it. I would prefer not to have two different classes with one duck-typing the other, if we can get the benefit by just making lazy loading the default, that would be great.

Tamas should be back next week to provide his opinion.

@mariaKt
Copy link
Copy Markdown
Collaborator Author

mariaKt commented Apr 22, 2026

I should be able to just edit the Node class, no additional class.

@mariaKt mariaKt marked this pull request as draft April 22, 2026 21:03
@ehildenb
Copy link
Copy Markdown
Member

If we have the laziness built-in by default, do we need the APRProofStub classes for example? Or do we get the benefit even with the main classes?

@ehildenb
Copy link
Copy Markdown
Member

I'm also wondering now if this is a good idea. We have to store the _node_path directly on the Node object, which is not how the data-model was done before. Before, we had clear separation between the storage on disk and the data representation itself. I think it should be fine, but for example is the KCFGStore class correctly handling these _node_path now too?

Currently, a KCFG already has a KCFGStore, so it already has the paths needed for loading/unloading from disk, can we reuse that infrastructure somehow when doing the lazy node loading?

@mariaKt
Copy link
Copy Markdown
Collaborator Author

mariaKt commented Apr 23, 2026

If we have the laziness built-in by default, do we need the APRProofStub classes for example? Or do we get the benefit even with the main classes?

I think no, we would not need the stubs then. I removed one so far, for some of the current changes.

@mariaKt
Copy link
Copy Markdown
Collaborator Author

mariaKt commented Apr 23, 2026

I'm also wondering now if this is a good idea. We have to store the _node_path directly on the Node object, which is not how the data-model was done before. Before, we had clear separation between the storage on disk and the data representation itself. I think it should be fine, but for example is the KCFGStore class correctly handling these _node_path now too?

Currently, a KCFG already has a KCFGStore, so it already has the paths needed for loading/unloading from disk, can we reuse that infrastructure somehow when doing the lazy node loading?

Right, I did need to add node_path to Node. I edited the KCFGStore (read_cfg_data and write_cfg_data) to propagate these paths, but I don't think this is what you might mean by "correctly handling these _node_path". I can take a closer look at the KCFGStore and see how to reuse that information instead, thanks for pointing it out.

@mariaKt
Copy link
Copy Markdown
Collaborator Author

mariaKt commented Apr 23, 2026

@ehildenb The issue is that, with the lazy approach, in order to find a node's CTerm I need to
potentially read the node's json file. If not an explicit path, I need to be able to refer to the store, or the KCFG,
so I definitely need some kind of access in that direction. claude also suggested a callback, which I hadn't considered. Is there an option from these, or another that you prefer?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants