Skip to content

GOTO backend: Core-to-GOTO translation, CBMC pipeline tests, and CI#289

Open
tautschnig wants to merge 8 commits intomainfrom
tautschnig/ToCProverGOTO-Stmt
Open

GOTO backend: Core-to-GOTO translation, CBMC pipeline tests, and CI#289
tautschnig wants to merge 8 commits intomainfrom
tautschnig/ToCProverGOTO-Stmt

Conversation

@tautschnig
Copy link
Contributor

@tautschnig tautschnig commented Dec 22, 2025

Description of changes:

Core-to-GOTO translation

Translate Strata Core programs to CProver GOTO binary format for CBMC
verification. Covers all Imperative statement types, Core commands,
procedure contracts, calls, axioms, datatypes, and source locations.

ToCProverGOTO.lean:

  • Handle block, ite, loop, exit, funcDecl statements
  • Emit loop invariants (#spec_loop_invariant) and measures (#spec_decreases)
  • Detect unresolved exit statements (targeting nonexistent labels) and abort
  • Extract helpers: emitGoto, emitCondGoto, emitLabel, patchGotoTargets

LambdaToCProverGOTO.lean (new, in Strata/Backends/CBMC/GOTO/):

  • Map all arithmetic, comparison, boolean, bitvector, real, string,
    and regex operators to GOTO equivalents
  • Signed BV operations (SDiv, SMod, SLt, SLe, SGt, SGe): cast operands
    to signedbv via typecast so CBMC interprets them correctly
  • Euclidean integer division/modulo (Int.Div, Int.SafeDiv, Int.Mod,
    Int.SafeMod): encode as compound expressions built from truncating
    div/mod with correction terms
  • BV hex encoding for all widths (was 32-bit only)
  • Support BV extract, old(expr), quantifiers, ternary, Map.const

CoreToCProverGOTO.lean (new, in Strata/Backends/CBMC/GOTO/):

  • End-to-end Core program to GOTO translation
  • Call LHS type lookup from program context (not hardcoded)

InstToJson.lean:

  • Extend JSON serialization for GOTO programs with function entries
  • Deduplicate symbol collection and operator JSON generation

StrataMain.lean:

  • Add coreAnalyzeToGoto, laurelAnalyzeToGoto, pyAnalyzeToGoto commands
  • Translate procedure calls to FUNCTION_CALL instructions at any nesting
  • Lift local funcDecl to top-level GOTO functions
  • Emit contracts, axioms, distinct decls, global variables
  • Propagate source locations from metadata to GOTO instructions

Code reorganization

Production GOTO translation code moved from StrataTest/ to
Strata/Backends/CBMC/GOTO/ (LambdaToCProverGOTO.lean,
CoreToCProverGOTO.lean). Test files in StrataTest/ now import from
the production modules and contain only test code.

Tests and CI

  • Unit tests for expression, type, and statement translation
  • E2E tests for the Core-to-GOTO contracts pipeline (49 test cases)
  • Laurel and Python CBMC pipeline test suites with property-level
    expected output matching (CBMC properties checked by line number)
  • CI workflow (cbmc.yml) builds CBMC from source with string support,
    regex, and bounds-check patches; runs all CBMC test suites
  • Laurel pipeline uses --z3 for SMT-based string reasoning

Documentation

  • CoreToGOTO_Gaps.md: translation coverage, soundness principles,
    operator semantics decisions, and remaining open gaps

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

@tautschnig tautschnig force-pushed the tautschnig/ToCProverGOTO-Stmt branch from b7a830b to 7b8e91f Compare December 22, 2025 18:02
@atomb
Copy link
Contributor

atomb commented Jan 6, 2026

One general comment on this: I have plans to add unstructured CFGs in Strata (started in #202), and it would probably make sense in the long run to have a a pipeline that does Strata Stmt -> Strata CFG -> GOTO instruction CFG. I'd paused work on #202 because it wasn't clear what we'd use it for right now, but I could finish it up and merge it if you think it'd be useful for this PR.

@tautschnig
Copy link
Contributor Author

One general comment on this: I have plans to add unstructured CFGs in Strata (started in #202), and it would probably make sense in the long run to have a a pipeline that does Strata Stmt -> Strata CFG -> GOTO instruction CFG. I'd paused work on #202 because it wasn't clear what we'd use it for right now, but I could finish it up and merge it if you think it'd be useful for this PR.

There'll certainly be interactions between your PR and this one, but I'm happy for these to be worked on in either order: if #202 goes in first, this PR will be updated, else #202 should likely include changes to GOTO instruction support (which I'm then happy to contribute myself).

@tautschnig tautschnig marked this pull request as ready for review January 7, 2026 11:16
@tautschnig tautschnig requested a review from atomb as a code owner January 7, 2026 11:16
Copilot AI review requested due to automatic review settings January 7, 2026 11:16
@tautschnig tautschnig requested review from a team and aqjune-aws as code owners January 7, 2026 11:16
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR extends the transformation functionality from imperative commands to GOTO instructions by adding support for all statement types (.block, .ite, .loop, and .goto), not just the previously-supported .cmd statements.

Key Changes:

  • Implemented mutual recursive functions Stmt.toGotoInstructions and Block.toGotoInstructions to handle all statement constructors
  • Added comprehensive test coverage with 10 test cases covering basic, nested, and edge-case scenarios

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File Description
Strata/DL/Imperative/ToCProverGOTO.lean Adds mutual recursive transformation functions for statements and blocks, handling control flow constructs (blocks, conditionals, loops, gotos) with proper label generation and GOTO instruction patching
StrataTest/Backends/CBMC/ToCProverGOTO.lean Adds 10 comprehensive test cases covering all new statement types including basic transformations, nested control flow, empty branches/bodies, and assertions/assumptions within control structures

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@shigoel shigoel enabled auto-merge January 7, 2026 15:31
@atomb
Copy link
Contributor

atomb commented Jan 7, 2026

One general comment on this: I have plans to add unstructured CFGs in Strata (started in #202), and it would probably make sense in the long run to have a a pipeline that does Strata Stmt -> Strata CFG -> GOTO instruction CFG. I'd paused work on #202 because it wasn't clear what we'd use it for right now, but I could finish it up and merge it if you think it'd be useful for this PR.

There'll certainly be interactions between your PR and this one, but I'm happy for these to be worked on in either order: if #202 goes in first, this PR will be updated, else #202 should likely include changes to GOTO instruction support (which I'm then happy to contribute myself).

I mostly just wanted to make sure we're both aware of each other's work. Since this PR seems just about ready to go, and #202 still needs some tests which I won't have a chance to add right away, let's go ahead and merge this one and update #202 later.

@tautschnig tautschnig marked this pull request as draft February 19, 2026 10:42
@tautschnig tautschnig marked this pull request as draft February 19, 2026 10:42
auto-merge was automatically disabled February 19, 2026 10:42

Pull request was converted to draft

auto-merge was automatically disabled February 19, 2026 10:42

Pull request was converted to draft

@tautschnig tautschnig self-assigned this Feb 19, 2026
@tautschnig tautschnig force-pushed the tautschnig/ToCProverGOTO-Stmt branch 6 times, most recently from f74b7a5 to af9b21d Compare February 24, 2026 22:26
@tautschnig tautschnig force-pushed the tautschnig/ToCProverGOTO-Stmt branch from af9b21d to 32db60d Compare February 26, 2026 08:56
@tautschnig tautschnig marked this pull request as ready for review February 26, 2026 09:27
@tautschnig tautschnig requested a review from Copilot February 26, 2026 09:27
@tautschnig tautschnig changed the title Extend Cmds.toGotoTransform to handle all Stmt types GOTO backend: extend Stmt translation and add CBMC pipeline tests Feb 26, 2026
skipped=0
errors=0

for ion_file in "$TESTS_DIR"/*.py.ion; do
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd expect .py or .python.st.ion

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's wrong with .py.ion?

@atomb
Copy link
Contributor

atomb commented Feb 26, 2026

I think #472 is basically ready to merge, and it replaces .goto with .exit, which would require a few changes to this PR (though from a quick look it seems like they wouldn't be very large).

| some fm =>
let pos := fm.toPosition fr.range.start
(pos.line, pos.column)
| none => (0, 0)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think there's a different default for when the fileMap isn't available that'd be sensible here? I think we've run into this elsewhere too.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's now using byte offset as a rough indicator, but I'm not sure what else can be done when no information is available.?

@tautschnig tautschnig force-pushed the tautschnig/ToCProverGOTO-Stmt branch from 9425663 to c7c51e0 Compare February 27, 2026 10:34
@tautschnig
Copy link
Contributor Author

I think #472 is basically ready to merge, and it replaces .goto with .exit, which would require a few changes to this PR (though from a quick look it seems like they wouldn't be very large).

Those changes are now included.

@tautschnig tautschnig force-pushed the tautschnig/ToCProverGOTO-Stmt branch from c7c51e0 to 8729ce8 Compare February 27, 2026 13:36
@tautschnig tautschnig changed the title GOTO backend: extend Stmt translation and add CBMC pipeline tests GOTO backend: Core-to-GOTO translation, CBMC pipeline tests, and CI Feb 27, 2026
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 48 out of 49 changed files in this pull request and generated no new comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

let decl_inst :=
{ type := .DECL, locationNum := trans.nextLoc,
sourceLoc := { SourceLocation.nil with function := functionName },
sourceLoc := srcLoc,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't you pass 1d source locations to the back-end? In the common case, a FileMap will not be available at this point so you can not generate the 2d locations.

Copy link
Contributor Author

@tautschnig tautschnig Feb 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current code already falls back to 1D byte offsets when no FileMap is available. All current callers (coreAnalyzeToGoto, laurelAnalyzeToGoto, pyAnalyzeToGoto) do provide a FileMap built from the source file.

Copy link
Contributor

@keyboardDrummer keyboardDrummer Feb 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All current callers (coreAnalyzeToGoto, laurelAnalyzeToGoto, pyAnalyzeToGoto) do provide a FileMap built from the source file.

Front-ends generally will not pass a FileMap to Strata. They will pass Ion without a FileMap. The FileMap might not even be available on the machine where Strata is executing.

The current code already falls back to 1D byte offsets when no FileMap is available.

Shouldn't that be the default? Does it enable reconstructing 2D locations near the front of Strata?

@tautschnig tautschnig force-pushed the tautschnig/ToCProverGOTO-Stmt branch 2 times, most recently from b2178a8 to 2de1f04 Compare February 27, 2026 15:42
Translate Strata Core programs to CProver GOTO binary format for CBMC
verification. Covers all Imperative statement types, Core commands,
procedure contracts, calls, axioms, datatypes, and source locations.

ToCProverGOTO.lean:
- Handle block, ite, loop, exit, funcDecl statements
- Emit loop invariants (#spec_loop_invariant) and measures (#spec_decreases)
- Extract helpers: emitGoto, emitCondGoto, emitLabel, patchGotoTargets

Expr.lean:
- Map all arithmetic, comparison, boolean, bitvector, real, string,
  and regex operators to GOTO equivalents
- Support BV extract, old(expr), quantifiers, ternary

InstToJson.lean:
- Extend JSON serialization for GOTO programs with function entries
- Deduplicate symbol collection and operator JSON generation

StrataMain.lean:
- Add coreAnalyzeToGoto, laurelAnalyzeToGoto, pyAnalyzeToGoto commands
- Translate procedure calls to FUNCTION_CALL instructions at any nesting
- Lift local funcDecl to top-level GOTO functions
- Emit contracts, axioms, distinct decls, global variables
- Propagate source locations from metadata to GOTO instructions
Move test files to StrataTest/Backends/CBMC/GOTO/ subdirectory.
Add tests for all statement types, operator mappings, procedure
translation, funcDecl lifting, and source location propagation.
Test the full pipeline: strata → process_json.py → symtab2gb → goto-cc
→ goto-instrument --dfcc → cbmc. Covers contracts, assertions, ensures,
loops, calls, nested calls, and call-inside-loop/if patterns.
CI: build CBMC from source with string/regex support.
Laurel: shell scripts and test programs for Laurel-to-CBMC pipeline.
Python: shell scripts and test programs for Python-to-CBMC pipeline,
  with generation step for .py.ion files from .py sources.
Track implemented features, open gaps (exit statement, unhandled types/
expressions, Map.const), DFCC integer limitation, and DDM parser issue #490.
- exit statement: emit unconditional GOTO, patch target when enclosing
  block ends. Track pending exits in GotoTransform.pendingExits.
- Map.const: map to GOTO array_of unary expression (constant-valued array).
- modifies clause: look up actual variable types from program declarations
  instead of hardcoding Integer, fixing DFCC 'no definite size' errors
  for programs using bounded types.
…ions

- Remove committed .py.ion.core.st files (generated pipeline intermediates,
  not human-readable). Add *.py.ion and *.py.ion.core.st to .gitignore.
- Clarify metadataToSourceLoc comment: document that all current callers
  provide a FileMap, and the byte-offset fallback is for library reuse.
@tautschnig tautschnig force-pushed the tautschnig/ToCProverGOTO-Stmt branch 2 times, most recently from 780e316 to 8a771b6 Compare February 27, 2026 19:07
CBMC's SSA renaming and simplifier transform quantifier bound variables
into non-symbol expressions, violating the quantifier_exprt invariant.
Add cbmc-quantifier-simplify.patch to skip bound variables during
renaming and simplification.

Restore axiom emission in Python pipelines (previously stripped as
workaround). Mark test_missing_models, test_precondition_verification,
and test_strings as SKIP due to a separate CBMC SMT2 convert_type crash
on struct_tag types.
@tautschnig tautschnig force-pushed the tautschnig/ToCProverGOTO-Stmt branch from 8a771b6 to 9924b36 Compare February 27, 2026 20:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants