Improvements to show, and new show from disc by mariaKt · Pull Request #4911 · runtimeverification/k

mariaKt · 2026-04-20T23:36:13Z

Memory-efficient proof display: streaming show and lazy loading

Motivation

Displaying large proofs with kmir show caused out-of-memory failures. A completed 1411-node proof required 23.8 GB of memory just to print its output, after the proof itself had already finished successfully.

Investigation

The initial hypothesis was that the memory cost came from joining the entire output into a single string before printing ('\n'.join(lines)). Streaming the output line-by-line (via a generator) eliminated this duplication, but peak memory remained at 23.8 GB. The real cost was loading the entire proof into memory, parsing all node CTerms and constraint data from JSON into Python K-term objects, with ~6× memory overhead from Python object representation.

Changes

This PR introduces two improvements:

1. Generator-based show_iter

KCFGShow.show_iter and APRProofShow.show_iter yield lines one at a time instead of collecting them into a list. The existing show methods are reimplemented as list(self.show_iter(...)) to avoid code duplication. Tests confirm same output.

2. Lazy-loading show_iter_from_disk

APRProofShow.show_iter_from_disk displays a proof without loading the full proof into memory:

LazyNode — duck-types KCFG.Node. Stores node ID, attrs, and a file path. Loads the CTerm from nodes/{id}.json only when .cterm is accessed for printing.
LazyCSubst — duck-types CSubst. Holds the raw JSON dict and defers parsing into K-term objects until .constraints or .subst is accessed.
APRProofStub — duck-types APRProof. Holds proof metadata (init, target, terminal, bounded, refutations) from proof.json and answers proof-level queries (is_init, is_target, is_terminal, etc.) without loading the KCFG.

These stubs are passed into the real KCFG data structures (edges, splits, covers, ndbranches accept them via duck typing) and the existing pretty_segments traversal works unchanged, with no reimplementation or duplication of the tree-drawing logic.

Loading is integrated into the existing code path: KCFG.from_dict accepts a lazy parameter that controls whether nodes are created as LazyNode stubs and covers use LazyCSubst wrappers. KCFGStore.read_cfg_data_lazy reads kcfg.json without loading node JSON files. KCFG.read_cfg_data_lazy provides the entry point, combining the two.

Results

Tested on a 1411-node p-token proof (burn_multisig_n1):

	`show_iter` (full load)	`show_iter_from_disk` (lazy)
Peak memory	23.8 GB	6.3 GB
Time	4263s (~71 min)	315s (~5 min)
Output lines	10,221	10,221

74% memory reduction, 13.5× faster, identical output.

Testing

show_iter_from_disk output is compared against show output in three existing test_imp.py test functions (all tests that write proofs to disk). Both default and full-printer modes are exercised.
Manually verified output match on external proofs with APRProofNodePrinter (proof-level attrs: init, target, terminal).
All 98 existing pyk show-related tests pass.

ehildenb · 2026-04-22T00:53:18Z

+    def evict(self) -> None:
+        """Release the loaded CTerm from memory."""
+        self._cterm = None
+


I don't actually see where this is used, how does this work? I assume somehow this gets called whenever we're done printing a given node?

I intended to use it in pretty_segments, but missed it. I will address.

ehildenb · 2026-04-22T02:25:20Z

+    def __eq__(self, other: object) -> bool:
+        if isinstance(other, LazyNode):
+            return self.id == other.id
+        # Also compare with real KCFG.Node
+        return hasattr(other, 'id') and self.id == other.id


Would this be equivalent anyway?

Suggested change

def __eq__(self, other: object) -> bool:

if isinstance(other, LazyNode):

return self.id == other.id

# Also compare with real KCFG.Node

return hasattr(other, 'id') and self.id == other.id

def __eq__(self, other: object) -> bool:

return hasattr(other, 'id') and self.id == other.id

Or perhaps the tighter:

Suggested change

def __eq__(self, other: object) -> bool:

if isinstance(other, LazyNode):

return self.id == other.id

# Also compare with real KCFG.Node

return hasattr(other, 'id') and self.id == other.id

def __eq__(self, other: object) -> bool:

if isinstance(other, LazyNode):

return self.id == other.id

# Also compare with real KCFG.Node

return isinstance(other, KCFG.Node) and self.id == other.id

I like the second one. I think it captures the intention (only LazyNode or Node) better.

ehildenb · 2026-04-22T02:26:05Z

Does it make sense to just have lazy-loading be the default anyway, and build it into the main KCFG class?

ehildenb · 2026-04-22T02:37:58Z

@claude can you review this PR? And check if evict is actually used, and whether it makes sense to just have lazy loading be the default for KCFGs?

mariaKt · 2026-04-22T20:05:12Z

Does it make sense to just have lazy-loading be the default anyway, and build it into the main KCFG class?

@ehildenb I think that there is a great reason to build it into the main KCFG class. One of the large proofs I was running failed while it was running (not after completion, at the printing stage). While writing this, I found that this is due to the nodes are never released from memory while a proof is running (even though the nodes' json files are created while the proof runs). Having a LazyNode would help in that, by enabling evicting nodes' CTerms and reloading them if ever needed.

I did not do this, as it was a much larger change, affecting not only the printing. I am happy to adapt this PR to do this.

ehildenb · 2026-04-22T20:45:13Z

I think it's a good idea overall, but we should probably check in with @tothtamas28 about it. I would prefer not to have two different classes with one duck-typing the other, if we can get the benefit by just making lazy loading the default, that would be great.

Tamas should be back next week to provide his opinion.

mariaKt · 2026-04-22T20:55:35Z

I should be able to just edit the Node class, no additional class.

ehildenb · 2026-04-23T00:07:06Z

If we have the laziness built-in by default, do we need the APRProofStub classes for example? Or do we get the benefit even with the main classes?

ehildenb · 2026-04-23T00:09:24Z

I'm also wondering now if this is a good idea. We have to store the _node_path directly on the Node object, which is not how the data-model was done before. Before, we had clear separation between the storage on disk and the data representation itself. I think it should be fine, but for example is the KCFGStore class correctly handling these _node_path now too?

Currently, a KCFG already has a KCFGStore, so it already has the paths needed for loading/unloading from disk, can we reuse that infrastructure somehow when doing the lazy node loading?

mariaKt · 2026-04-23T00:12:09Z

If we have the laziness built-in by default, do we need the APRProofStub classes for example? Or do we get the benefit even with the main classes?

I think no, we would not need the stubs then. I removed one so far, for some of the current changes.

mariaKt · 2026-04-23T00:21:04Z

I'm also wondering now if this is a good idea. We have to store the _node_path directly on the Node object, which is not how the data-model was done before. Before, we had clear separation between the storage on disk and the data representation itself. I think it should be fine, but for example is the KCFGStore class correctly handling these _node_path now too?

Currently, a KCFG already has a KCFGStore, so it already has the paths needed for loading/unloading from disk, can we reuse that infrastructure somehow when doing the lazy node loading?

Right, I did need to add node_path to Node. I edited the KCFGStore (read_cfg_data and write_cfg_data) to propagate these paths, but I don't think this is what you might mean by "correctly handling these _node_path". I can take a closer look at the KCFGStore and see how to reuse that information instead, thanks for pointing it out.

mariaKt · 2026-04-23T00:38:05Z

@ehildenb The issue is that, with the lazy approach, in order to find a node's CTerm I need to
potentially read the node's json file. If not an explicit path, I need to be able to refer to the store, or the KCFG,
so I definitely need some kind of access in that direction. claude also suggested a callback, which I hadn't considered. Is there an option from these, or another that you prefer?

mariaKt added 6 commits April 17, 2026 17:00

Added lazy (generator based) show_iter. show unchanged for now.

c008426

show now use show_iter (materialize generator)

cc8cb7e

stubs for kcfg needed types

0556f90

Testing for show_iter_from_disc

34fc00f

Removed duplication, instead edit from_dict with lazy param

0275f6a

Code quality

57ddca4

mariaKt marked this pull request as ready for review April 21, 2026 17:37

mariaKt requested review from ehildenb, palinatolmach and tothtamas28 April 21, 2026 17:37

ehildenb reviewed Apr 22, 2026

View reviewed changes

mariaKt marked this pull request as draft April 22, 2026 21:03

mariaKt added 4 commits April 22, 2026 16:51

LazyNode (dyck-typing Node) removed, integrated into Node

44827c3

Write node path to node in write_cfg_data

de0582c

Allow evicting nodes duting a proof (for new proofs)

17e384e

Allow evicting nodes during a proof (for resumed proofs)

8acd76e

Conversation

mariaKt commented Apr 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Memory-efficient proof display: streaming show and lazy loading

Motivation

Investigation

Changes

Results

Testing

Uh oh!

ehildenb Apr 22, 2026

Choose a reason for hiding this comment

Uh oh!

mariaKt Apr 22, 2026

Choose a reason for hiding this comment

Uh oh!

ehildenb Apr 22, 2026

Choose a reason for hiding this comment

Uh oh!

mariaKt Apr 22, 2026

Choose a reason for hiding this comment

Uh oh!

ehildenb commented Apr 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ehildenb commented Apr 22, 2026

Uh oh!

mariaKt commented Apr 22, 2026

Uh oh!

ehildenb commented Apr 22, 2026

Uh oh!

mariaKt commented Apr 22, 2026

Uh oh!

ehildenb commented Apr 23, 2026

Uh oh!

ehildenb commented Apr 23, 2026

Uh oh!

mariaKt commented Apr 23, 2026

Uh oh!

mariaKt commented Apr 23, 2026

Uh oh!

mariaKt commented Apr 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

mariaKt commented Apr 20, 2026 •

edited

Loading

ehildenb commented Apr 22, 2026 •

edited

Loading

mariaKt commented Apr 23, 2026 •

edited

Loading