feat(core): Add CoreSMT verification pipeline with incremental solver and diagnosis#475
feat(core): Add CoreSMT verification pipeline with incremental solver and diagnosis#475MikaelMayer wants to merge 165 commits intomainfrom
Conversation
…or Func - Added ToFormat for generic Func with proper constraints - Added [ToFormat T.IDMeta] to Factory.lean section variables - Removed unnecessary ToFormat instances from test files and Program.lean - Removed custom Env.format function (now uses default ToFormat) - Function bodies now display properly instead of showing <body>
- Resolved conflict in Factory.lean (Factory_wf moved to FactoryWF.lean in main) - Applied rotate_left fix to FactoryWF.lean
…rotate_left fix to FactoryWF
- Modified testFuncDeclSymbolic to show functions capture variables by reference - Function declared with n=10, then n mutated to 20, function uses n=20 at call time - Proof obligation correctly shows result should be 25 (5+20), not 15 (5+10) - Reverted Env.lean to main (custom scope formatting not needed)
…claration time - Functions now capture variable values at declaration time, not by reference - Free variables in function body are substituted with their current values from environment - Test demonstrates: n=10 at declaration, n=20 after mutation, function uses n=10 - Proof obligation correctly shows result is 15 (5+10), not 25 (5+20)
…duplication - Move generic Func structure to Strata/DL/Util/Func.lean - Add PureFunc to Imperative, removing Lambda->Imperative dependency - Fix funcDecl type checking to add function to type context - Remove duplicate renameLhs/substFvar from ProcedureInlining - Extract captureFreevars helper in StatementEval - Refine getVars to exclude formal parameters for funcDecl - Add type checking test for funcDecl
- Add WFfuncDeclProp structure in WF.lean checking input parameter uniqueness - Add LFunc.type_inputs_nodup theorem in Factory.lean - Add Function.typeCheck_inputs_nodup theorem in FunctionType.lean - Add listMap_keys_map_snd helper lemma in StatementWF.lean - Replace sorry in funcDecl case with complete proof
…flags Restore the pattern from the old verifier: createInteractiveSolver selects appropriate flags based on the solver name (cvc5 vs z3) rather than hardcoding cvc5-specific flags.
The lean-action cache key does not include .st grammar files. When LaurelGrammar.st changes, the cached .olean files are stale causing spurious test failures. Remove them before building to force a rebuild.
The lean-action cache key only includes lean-toolchain and lake-manifest.json.
When .st grammar files change (e.g. LaurelGrammar.st), the cached .olean files
become stale causing spurious test failures.
Fix: manage the lake cache ourselves with use-github-cache: false, adding
hashFiles('**/*.st') to the cache key so grammar changes invalidate the cache.
…t processFuncDecl - Unify proveCheck and coverCheck into a single runCheck function parameterized by PropertyType, eliminating ~40 lines of duplication - Extract processFuncDecl helper to reduce processStatement length - File reduced from 280 to 238 lines
TranslationTests: - Filter declare-datatype, push/pop, check-sat from output - Use check-sat-assuming instead of push/assert/check-sat/pop - Buffer solver returns unsat to prevent diagnosis from running - Update all test expectations VerifierTests: - Show full obligation as first diagnosis line - Show path conditions for full obligation - Show sub-expressions when there are multiple failures - Update all test expectations
Keep push/pop and check-sat in translation test output - they are meaningful CoreSMT output. Only filter set-logic, set-option, and declare-datatype (prelude). Update test expectations accordingly.
The declare-datatype Option was added to the solver prelude but CoreSMT is new and has no existing users that depend on it. Remove it from initializeSolver, reset, and programToSMT. Also remove the declare-datatype filter from TranslationTests since it is no longer emitted.
| info: test: ✗ counterexample found | ||
| (0,61): check 8 == 8 && f(5) == 7 | ||
| └─ (0,67): could not prove 8 == 8 && f(5) == 7 | ||
| └─ (0,67): could not prove 8 == 8 |
There was a problem hiding this comment.
8 == 8 is a path condition, it's not something we can't prove. What happened?
There was a problem hiding this comment.
Fixed. The diagnosis now correctly identifies proved vs refuted sub-expressions:
- Sub-expressions that are proved (not(expr) is unsat) are filtered out
- Sub-expressions that are refuted (expr is unsat) show as "it is impossible"
- Sub-expressions that are unknown show as "could not prove"
- Sub-expressions identical to the full obligation are not duplicated
So 8 == 8 is now correctly filtered (proved), and 1 + 2 == 4 correctly shows as "it is impossible".
Previously, diagnosis would report 'could not prove 8 == 8' even though 8 == 8 is trivially true. The fix: - For assert checks: skip sub-expressions that are proved (not(expr) is unsat) - For assert checks: mark sub-expressions as refuted if expr is unsat - For reach checks: mark sub-expressions as refuted if expr is unsat - Skip sub-expressions identical to the full obligation (no duplication) This matches the behavior of the old B3 verifier's diagnosis.
Each sub-expression failure now shows its path condition, which includes both the conjunction-split context (proved left conjuncts) and the state path condition (assume statements). For example, 'f(5) == 7' in 'check 8 == 8 && f(5) == 7' now shows '8 == 8' as a path condition, since 8 == 8 was proved and used as context when diagnosing f(5) == 7.
| info: test_assert_helps: ✗ unknown | ||
| (0,103): assert f(5) > 1 | ||
| └─ (0,103): could not prove f(5) > 1 | ||
| (0,103): check f(5) > 1 |
There was a problem hiding this comment.
It should still be assert here. It's the assert that fails, not the check.
There was a problem hiding this comment.
Fixed. The B3 statement kind (check/assert/reach) is now stored in the Core metadata during ToCore conversion, and the verifier output uses it to correctly label failures. assert f(5) > 1 now shows as "assert" not "check".
Store the B3 statement kind ('check'/'assert') in Core metadata so
the verifier output correctly labels assert failures as 'assert'
rather than 'check'.
| let xsArray ← translateDeclList bindings xsa | ||
| -- Note: the indices in the following are placeholders | ||
| let newBoundVars := List.toArray (xsArray.mapIdx (fun i _ => LExpr.bvar () i)) | ||
| let newBoundVars := List.toArray (xsArray.mapIdx (fun i _ => LExpr.bvar Strata.SourceRange.none i)) |
There was a problem hiding this comment.
In this whole file we should actually have no more Strata.SourceRange.none as we ought to get the sourceRange from the metadata of the DDM. We don't need to do it for python for now but for core it'll be essential.
There was a problem hiding this comment.
Fixed. All Strata.SourceRange.none in Translate.lean are now replaced with the actual source range from the DDM Arg/Op annotation at each call site: arg.ann for expression nodes, op.args[N]!.ann for binding helpers, xsa.ann/bodya.ann for quantifiers, etc.
There was a problem hiding this comment.
Fixed. All Strata.SourceRange.none in Translate.lean are replaced with the actual DDM source range from the Arg/Op annotation at each call site (arg.ann, op.args[N]!.ann, xsa.ann, bodya.ann, etc.). The Arg.ann field is already Strata.SourceRange — the same type as CoreExprMetadata — so no conversion is needed.
There was a problem hiding this comment.
Fixed. All Strata.SourceRange.none in Translate.lean replaced with actual DDM source ranges: arg.ann for expression nodes, op.args[N]!.ann for binding helpers, xsa.ann/bodya.ann for quantifiers. Arg.ann is already Strata.SourceRange so no conversion needed.
…nslate.lean - Restore the three #guard_msgs tests in Identifiers.lean that were removed when CoreExprMetadata changed from Unit to SourceRange. Update expectations to show Strata.SourceRange.none instead of (). - Replace all Strata.SourceRange.none in Translate.lean with the actual source range from the DDM Arg/Op annotation at each call site. This ensures Core expressions carry proper source location info.
- SourceRange.lean: Add #guard_msgs tests for the Repr instance showing that none displays as () and non-none shows struct fields - ASTtoCST.lean: The fvar initializer case is relevant to this PR since CoreExprMetadata changed from Unit to SourceRange, requiring the fvar constructor to take a SourceRange argument - CoreToCBMC.lean: Comment already present explaining SourceRange.none
Fix Core.defaultSolver reference in SolverInterface.lean after the SimpleAPI refactor moved defaultSolver to Core.Options.
…Types - SourceRange Repr always shows () to keep debug output readable (source location info is available via SourceRange.format) - LExpr.eraseTypes now also erases metadata to default, so alpha equivalence checks in ProcedureInlining tests are not affected by source range differences between programs
Problem
Strata Core's existing verifier generates all VCs upfront via symbolic execution, encoding each to a separate SMT file. This prevents incremental verification where solver state is maintained across statements.
Additionally,
Core.ExpressionMetadatawasUnit, discarding source location information and making it impossible to report accurate positions in verification failures.Solution
Core Expression Metadata: Unit → SourceRange
Core.ExpressionMetadatais changed fromUnittoStrata.SourceRangeso that:All code constructing Core expressions with
()now usesSourceRange.noneor a proper range. TheSyntaxMonomacro (eb[...]) gains adefaultMetadatafield onMkLExprParams(defaults toUnit.unitfor generic params, overridden toSourceRange.nonefor Core).TermType.toSMTStringis extracted as a pure function to avoid duplication.CoreSMT Verification Pipeline
An
SMTSolverInterfacewith push/pop support enables incremental verification. A diagnosis engine splits failing conjunctions and checks each conjunct individually to identify which assertions cannot be proved or which covers are refuted.B3 Verifier Migration
The B3 verifier now uses the CoreSMT pipeline: B3 expressions are converted to Core (preserving source locations), verified via CoreSMT, and diagnosed failures are converted back to B3 for display.
Changes
Core.ExpressionMetadata:Unit→Strata.SourceRangeSyntaxMonomacro:defaultMetadatafield for type-appropriate metadataTermType.toSMTString: extracted pure function toTermType.leanisCoreSMTpredicate,SMTSolverInterface, diagnosis engine,VCResult.diagnosisC_Simp/Verify.lean:csimpMetaToCoreconversion (Unit → SourceRange)Testing
All existing tests pass with the new pipeline.