Performance (1/5): Matching can now partially evaluate taclets by unp1 · Pull Request #3831 · KeYProject/key

unp1 · 2026-06-15T04:21:26Z

Disclaimer: Framework architecture and design by me, coding by AI

This is the first PR of a performance oriented PR series for 3.1

📖 Developer docs: Performance Optimizations (3.1) — the compiled matcher

This PR generalizes our currently interpreter based framework with a more compiler like one. The matching logic is specialised for each taclet avoiding expensive navigation via a cursor.

The "compiled" version is amenable for the normal JVM hotspot to be optimized. No separate compilation to bytecode at the moment (usefulness has to be evaluated)

Features:

no longer separate matching logic for programs and terms, uses our unified ncore-SyntaxElement navigation
still one single source of truth: one has (as long as one want not to step outside the framework) only to provide
add one new instruction (if matching should be extended for a new construct) using a builder based framework. This add interpreter and compiler support at the same time.
Default is now the compiled matcher. The legacy interpreter is opt-out (Settings → Feature Flags: MATCHER_INTERPRETER, or -Dkey.matcher.interpreter=true; reload the proof to apply) and remains the automatic fallback for patterns the compiler does not handle
Compilation at the moment not to bytecode. We compile each taclet to a partially evaluated and unrolled respresentation, then the normal JVM (which runs KeY) can run and optimize it. Whether a separate compile step to actual bytecode is benificial has to be evaluated. But the introduced interface support this scenario for future PRs
Generalizes to ncore for other backends as drop-in replacement (work in progress)
Extensively tested (differential testing with 4.8M comparison points, generates byte-identical proofs for our runAllFunProof collection.

Note: This does not provide a performance boost by itself. But is the necessary foundation for the next PRs that tackle other part of the match -> evaluate -> apply pipeline.

Isolated find-matcher benchmark: interpreter vs. partial evaluated

Matching time over KeY's FOL taclet base (bundled TPTP PUZ/SYN/SET problems),
measured on the unified match-plan framework. Both back-ends are built by
JavaMatchPlanBuilder; speedup = interpreter ÷ compiled.

Problem	Find-taclets	Corpus	main interp (ms)	framework interp (ms)	framework compiled (ms)	compiled vs main	fw-interp vs main
PUZ001p1	32	287	8.08	7.99	0.98	8.24×	1.01×
PUZ001p1-eq	32	155	4.66	4.67	0.71	6.54×	1.00×
PUZ031p1	32	259	7.39	7.34	0.93	7.97×	1.01×
PUZ047p1	32	289	8.24	8.22	1.00	8.27×	1.00×
SYN002m1.007.008	32	78	2.26	2.26	0.32	7.02×	1.00×
SYN007p1.014	32	102	2.88	2.88	0.44	6.47×	1.00×
SYN036p2	32	78	2.28	2.33	0.33	6.83×	0.98×
SYN548p1	32	108	3.26	3.23	0.41	7.87×	1.01×
SYN550p1	32	102	3.03	3.01	0.40	7.50×	1.00×
SET027p3	32	81	2.43	2.40	0.33	7.34×	1.01×
SET044p1	32	69	2.11	2.06	0.29	7.20×	1.02×
SET045p1	32	69	2.44	2.07	0.29	8.34×	1.18×
SET062p3	32	87	2.92	2.87	0.35	8.26×	1.02×
SET063p3	32	110	3.19	3.11	0.42	7.55×	1.02×
Total	14	—	55.16	54.46	7.23	7.63×	1.01×

Second run (stability): main 54.36 ms · framework interp 53.47 ms · framework compiled 7.06 ms
→ 7.70× compiled-vs-main, 1.02× framework-interp-vs-main.

Attention: Speed-up is for matching in isolation not end-to-end as other parts of the pipeline restrict at the moment.

Plan

Polishing and completig this PR
Provide more extensive performance data
Developer documentation with explanations how to add new matching instructions
Follow-Up PRs for improved performance
Evaluate if an explicit compilation to bytecode is beneficial

Type of pull request

New feature (non-breaking change which adds functionality)

Ensuring quality

extensive testing incl. differential testing, new tests for matching, all our normal test run through, byte identical proofs for our runAllProofs collections.

Additional information and contact(s)

This PR has been done with AI tooling support.

The contributions within this pull request are licensed under GPLv2 (only) for inclusion in KeY.

The Java program inside a modality was matched by a single monolithic MatchProgramInstruction delegating to ProgramElement.match. Make the generic program part (ordinary statements/expressions: class + exact arity + child recursion, and non-list program schema variables) matchable by the same instruction VM the rest of the find pattern uses: - MatchProgramElementInstruction (class + exact arity, generic over SyntaxElement) and MatchSubProgramInstruction (runs a sub-program over the modality's program via its own cursor). - The generator converts such programs into a VMInstruction sub-program; anything it does not handle falls back to MatchProgramInstruction. - Seams introduced here: MatchProgram (the match-program abstraction implemented by VMProgramInterpreter, and later by the compiled matcher) and ProgramChildrenMatcher (for matching a run of program children). Gated behind -Dkey.matcher.programInstructions (read at matcher construction so it can be toggled by reloading; default off → unchanged monolithic path); behaviour-preserving when on. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

The .. ... pattern of symbolic-execution taclets (ContextStatementBlock) is matched specially: variable-length prefix descent to the active statement, inner execution context, active-statement matching, and prefix/suffix position bookkeeping. Convert phase (3) -- the active statements -- to VM instructions while keeping the intricate phases (1)(2)(4) in place: - ContextStatementBlock.match gains a phase-(3) seam taking a ProgramChildrenMatcher; the default still uses matchChildren, but a supplied matcher (a VM sub-program here) can drive the active-statement matching instead. - MatchContextStatementBlockInstruction wires a context block to that seam. - VMProgramInterpreter.matchChildrenFrom runs a sub-program over a run of source children from a child offset (the active statements). - The generator emits the context-block instruction for a top-level context block, falling back when an active statement is not convertible. Same -Dkey.matcher.programInstructions gate; behaviour-preserving when on.

CompiledMatchProgram is a second MatchProgram backend that navigates the term and Java-program structure directly (term.op()/sub(i) and SyntaxElement.getChild), avoiding the PoolSyntaxElementCursor entirely. It compiles essentially the whole find-taclet base: ordinary operators and schema variables, bound variables (quantifiers/substitutions), modalities with their program (generic programs and context blocks), parametric function instances and elementary updates; program elements with their own match (value literals, type refs, loops) and variable-arity children (list SVs #slist) are reused cursor-free by delegating to their own match. VMTacletMatcher selects the compiled find-matcher when -Dkey.matcher.compiled is set (read at construction, so toggling it and reloading switches matchers; default off → the interpreter, which stays the source of truth). Behaviour-preserving; ~1.2-1.7x on matcher-bound proving. CompiledMatchProgramTest checks the compiled matcher against explicit expectations (propositional, function and bound-variable patterns, success and failure).

testRAP (generated runAllProofs regression suite, in-process ProveTest per fork) now runs on up to 10 parallel JVMs (-PrapForks=N), with a configurable per-fork heap (-PrapHeap), and forwards the compiled-matcher switch (-Pmatcher.compiled=true / -Dkey.matcher.compiled) to the proof runs so the regression suite can exercise the compiled matcher.

…RE MERGE] Development-only verification/measurement, not intended for the PR: - ProgramMatchDifferentialTest: every find-taclet matched by the interpreter oracle vs the converted/compiled matchers over a real-proof term corpus, asserting identical results (success/failure + instantiations). - CompiledMatchProgramBenchmark / ContextMatchBenchmark: isolated interpreter-vs-compiled matching-time measurements.

A new language-agnostic module holding the match-plan IR from which both find-matcher back-ends (the VMProgramInterpreter and the cursor-free compiled matcher) are derived from a single description, so a construct described once drives both. - MatchPlan: the IR node (emitInstructions for the interpreter, compile for the compiled matcher); OperatorPlan / SchemaVarPlan cover the term skeleton. - MatchHead: the operator-specific check (no subterm recursion); GenericOperatorHead handles ordinary operators. - BinderMatcher / ProgramMatchHook: the two cross-language SPIs (bound variables and the modality program AST), kept abstract here so other ncore-based provers (Rust, Solidity) can reuse the framework. The module depends only on key.ncore / key.ncore.calculus / key.util (no Java-DL types); key.core gains a dependency on it.

The single Java-DL dispatch (JavaMatchPlanBuilder.buildPlan) builds a MatchPlan for a find pattern; both back-ends are then derived from it. It covers the FOL term skeleton, elementary updates, parametric function instances and modalities, falling back to the legacy matchers only for term labels. - Heads ElementaryUpdateHead / ParametricFunctionHead / ModalityHead carry the operator-specific interpreter + compiled fragments, lifted verbatim from the hand-written generator and compiled matcher (so behaviour is preserved). - JavaBinderMatcher / JavaProgramMatchHook implement the two SPIs (bound-variable binding/renaming; the JavaBlock / ContextStatementBlock program matching). - CompiledMatchProgram.compiledProgramMatcher is extracted from compileModality (now keyed on the JavaBlock) so the compiled matcher and the program hook share one program-matching implementation; buildProgramInstruction is made package-visible for the hook's interpreter side. - JavaMatchPlanBuilder also exposes the production facades interpreterProgram / compiledProgram (framework-built, with legacy fallback for term labels).

The find and assumes matchers are now built via JavaMatchPlanBuilder (interpreterProgram / compiledProgram) instead of calling the two hand-written dispatches directly, making the match-plan IR the single source of truth in production. The facades fall back to the legacy generator / compiled matcher for the constructs the framework does not build yet (term labels), so behaviour is unchanged. The key.matcher.compiled / key.matcher.programInstructions flags keep their meaning.

…ROP BEFORE MERGE] Extends the dev-only differential test and micro-benchmark (added in the "matcher differential test + isolated benchmarks" drop commit) to also exercise the match-plan framework alongside the hand-written matchers: - ProgramMatchDifferentialTest builds the plan and verifies its interpreter (with program-instruction conversion both off and on, since production reads that flag) and its compiled matcher against the legacy oracle (24.8M comparisons). - CompiledMatchProgramBenchmark times the framework-built matchers next to the hand-written ones for both back-ends (the no-overhead check). Like the commit it extends, this is dropped before merge.

…e source) With both back-ends now derived from the match-plan framework, the two hand-written per-construct dispatches are redundant and are removed, leaving the framework as the single source of truth for find-matching: - delete CompiledMatchProgram's term-level dispatch (compile / compileStep / compileCore / compile{ElementaryUpdate,ParametricFunction,Modality}); the heads already carry that logic. Its reused Java-program helpers (compiledProgramMatcher, compileProgram, delegateToMatch, compileActiveStatements) move to a small JavaProgramCompiler used by the program hook. - delete SyntaxElementMatchProgramGenerator's createProgram dispatch; only the program-instruction conversion helpers (buildProgramInstruction & co), which the hook reuses, remain. - migrate term labels into the framework (JavaMatchPlanBuilder.LabelPlan, reusing the matchTermLabelSV instruction) so buildPlan is total; the facades no longer fall back -- an unsupported pattern raises a clear error naming the missing head (no current taclet hits this; the whole standard base is covered). - retarget CompiledMatchProgramTest to the framework facade. Net: ~390 fewer lines of production matcher code, no duplicated dispatch. The interpreter/compiled engines and the program helpers are unchanged.

…only benchmark The differential test and the context benchmark depend on the hand-written matchers as an independent oracle, which no longer exist in this branch. They are retained on a separate development branch (with the reference interpreter) as the regression net, and removed here. CompiledMatchProgramBenchmark is retargeted to compare the framework's interpreter vs its compiled matcher (no oracle).

Selecting the compiled find-matcher previously needed -Dkey.matcher.compiled, which is awkward to pass through Gradle when trying it out. Expose it as a KeY feature flag (Settings -> Feature Flags) too: - VMTacletMatcher.COMPILED_MATCHER_FEATURE ("MATCHER_COMPILED"); the matcher is selected when the system property OR the feature flag is set. The property is kept for headless / CI (testRAP). - SettingsManager references the flag so it is registered and shown in the feature-settings panel on a fresh start, before any proof is loaded. The flag is read per taclet at construction, so it applies to newly loaded proofs (or after reloading the current one); the panel shows a "reload required" notice (restartRequired = true). No on-the-fly switch of an open proof's matchers.

The cursor-free compiled find-matcher is now selected by default; the legacy interpreter becomes an opt-out via -Dkey.matcher.interpreter or the MATCHER_INTERPRETER feature flag (and remains the automatic fallback for patterns the compiler does not handle). Differential testing established the two back-ends are byte-identical, so this only changes which one runs, not any proof.

unp1 self-assigned this Jun 15, 2026

unp1 added this to the v3.1.0 milestone Jun 15, 2026

unp1 changed the title ~~Performance: Matching can now compile taclets~~ Performance: Matching can now partially evaluate taclets Jun 15, 2026

unp1 added the 🚀 Performance label Jun 15, 2026

wadoon self-requested a review June 15, 2026 08:28

wadoon approved these changes Jun 15, 2026

View reviewed changes

unp1 and others added 12 commits June 17, 2026 00:13

unp1 force-pushed the matcherFramework branch from 7a71e2f to f263517 Compare June 16, 2026 22:16

unp1 changed the title ~~Performance: Matching can now partially evaluate taclets~~ Performance (1/5): Matching can now partially evaluate taclets Jun 17, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance (1/5): Matching can now partially evaluate taclets#3831

Performance (1/5): Matching can now partially evaluate taclets#3831
unp1 wants to merge 13 commits into
mainfrom
matcherFramework

unp1 commented Jun 15, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

unp1 commented Jun 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Isolated find-matcher benchmark: interpreter vs. partial evaluated

Plan

Type of pull request

Ensuring quality

Additional information and contact(s)

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

unp1 commented Jun 15, 2026 •

edited

Loading