List.sort done in in-place array sort by talsewell · Pull Request #1349 · CakeML/cakeml

talsewell · 2026-03-04T06:43:35Z

This is an implementation of the remaining idea of issue #979 .

It introduces a list-of-heaps sort in basis/pure/heap_list_sortScript.sml, and a monadic variant of it that suffix-encodes the heaps into a single array in basis/monadic/heap_list_sort_monadicScript.sml. The latter theory includes a proof of equivalence, that the monadic computation produces the same results as the pure one.

The monadic definitions are then translated (by the monadic translator) in ListProg, and the translation of List.sort uses them. However the constant being translated is just the pure heap-list-sort, so it should EVAL OK.

Various monadic translator fixes, mostly related to type polymorphism, are included in these commits.

Yet to be discovered: Does anything else break in the build? Does this sort function actually perform OK?

The monadic translator already includes code to substitute types if the supplied definition has a type variable for the state or exception type, e.g. if the relevant function didn't make use of the state or exception value so it was left fully polymorphic. Update this to also substitute type variables and print a message if the relevant types match but are not identical, e.g. if the state type has a polymorphic parameter that is not instantiated to the same thing.

The existing monadic-translator interface code does a bit of guessing of various constant names, and needs to set the translator use_full_type_names setting to false to make the naming predictable enough. This makes the monadic translator incompatible with various translation contexts. Some of this code can be replaced by use of get_type_inv and other query functions to fetch the relevant constants. This seems to allow the monadic translator to run with use_full_type_names present, and doesn't seem to break anything.

Handle a case where the updator for a polymorphic state field is too general. This is a bit of a HOL4 specialty, where a record has a type variable (say 'a) that appears in only one field, and an updator for that field is allowed to change the type of the overall datatype by replacing the field. There was already code for handling this in the monad translator, it just seemed to miss a case. I found it by adding some more error diagnostic code along the way, and I might leave it in.

The idea is to write a sort function with a O(n * log(n)) complexity, which exposes a pure functional interface but which uses a mutable array to actually do the rearrangement. This pure file presents a heap-sort algorithm, which the mutable array modelled as a function. Correctness is verified. The next step is to translate or present some CakeML code that implements this algorithm with a mutable array.

Convert the pure functional heap-sort representation to a monadic representation compatible with the monadic translator (based on some examples). Attempt to translate to CakeML AST, but blocked on various errors.

It seems that the cause of recent issues is that the monadic translator is incredibly fragile when it comes to type parameters if the state type needs to be polymorphic. Tweak things to check that the type variable is consistent. This code seems to work with "tvar = ``: 'state``" and "tvar = ``: 'a``" but it does not seem to work with "tvar = ``: 'el``", which is all a bit mysterious.

Switch out the standard heap-sort, in which the heap points in the opposite direction, for a list-of-heaps sort in which power-of-two-minus-one sized heaps point in the correct direction within the array. This has the advantage that an already sorted array is confirmed sorted without moving anything and in linear time. It also "agrees" with the functional model. Problem though, the mapping of the multiple heaps into the array gets painful to reason about. Saving before trying a different approach.

I gave up on setting up a separation-style approach and just verified everything using TAKE/DROP. Some of the proofs are a bit of a mess.

Attempt to port the proof of simulation of the 979 heap-list-sort from a "direct" version on the monadic semantics to a postcondition/hoare style approach. The big upgrade would be if we could do something separation-ish about the array, since each worker works on a "focal chunk" of the array leaving the rest unchanged, and it would be a huge upgrade if the disjointness of these could be made more implicit.

After figuring out some issues with polymorphism, present a mechanism that sort of works involving disjoint array chunk membership. It works, and the most involved proof is now much faster and much more step-by-step. The other proofs are getting clunkier though. Sigh. Time to try something else.

Finally, an approach I'm happy with. Committing then will clean this up.

After a lot of false steps, the process seems to work. The monadic constants are defined and verified in the monadic sub-theory. The real monadic translation is set up in ListProgScript, together with a bit of a hack for rewriting the accessor constants from the sub-theory to the ones the translator setup builds. The final "sort" function is derived from that, and can be trivially proven to be the pure version (and thus a sorting function) in ListProof.

Along the way, we seem to have made the monadic translator resistant to type variables with names other than 'a or 'state.

Change around the translation. Prove the "monadic-sort = functional-sort" theorem, and use its symmetric version as the definition to translate the functional algorithm. The point is that the translation and EVAL calculations will now both do the correct thing with the one constant.

In the monadic translation, the state type represents all the stateful stuff for the purposes of the monadic model. It doesn't directly exist as any type in the CakeML translation, instead its fields are placed into reference and array objects states and the FFI oracle and whatnot. A previous patch was confused about this. This patch restores the previous behaviour, including the pretty awful re-parsing of strings to fetch constants and overloads that were defined before, though puts a little defensive programming around some of it. Hopefully fixes the breakage of the IO-based monadic translation examples.

Switch out the "sort" symbol of mllist to call the new heap-list-sort with all the same key logical theorems exported. This should cause the Candle kernel etc to call on the new sort and translate correctly.

Oops. The linter really doesn't like that.

Follow-up to the previous adjustment, we need to make sure that mllist$sort is translated in ListProg, not just something equal to it, so everything is pointing at the one sort instance.

The bignum proof needs to know that "sort R [x] = [x]" for some reason. This was done by supplying its definitional theorems. It's now derived from PERM instead, which is likely to be more stable. It also occurs to me that EVAL might be fairly safe.

This includes a variant of one of the sort workers that keeps a "fuel" parameter to make termination obvious, which in turn means that translation to CV can be done by auto-methods. I also tried a more direct approach, and it was horrible, so I think the silly fuel parameter is being kept.

The Candle proofs include a pass across the translated CakeML AST to check for problematic effects. The array-based sort complicates this, so instead, restore the previous mergesort translation, call it from Candle, and tweak some proofs to refer to the correct one.

Mostly duplicating the change made to the "standard" theories.

A merge-sort implementation that first scans for increasing/decreasing runs, and merges them trying to merge even-size runs along roughly the design of "timsort". May replace the standard mergesort.

This proof is still a bit involved, but less so than the heap version. It probably helps that the data representation, in which the list chunks are all placed in order, has less moving parts than the flattened-heap variant.

Exercise the same routine as for heap-list-sort, translating the sort function to CakeML AST in a separate theory to the monadic development, including a hack that re-defines the monadic elements.

The heap-list-sort development was done on the assumption that it would become the basis List.sort function, which turns out to be premature. For now, move both in-array sorts (heap-list and merge-run) to be examples. If the merge-run sort turns out to be useful it may be moved back in.

talsewell mentioned this pull request Mar 5, 2026

Monadic translator lib fixes #1346

Closed

talsewell added 29 commits March 16, 2026 13:37

monadic translator: more use of Syntax not parsers

7d07f46

Convert heap-sort to monad, but fail to translate

b07995b

Convert the pure functional heap-sort representation to a monadic representation compatible with the monadic translator (based on some examples). Attempt to translate to CakeML AST, but blocked on various errors.

Verify and translate monadic versions

22a6a45

I gave up on setting up a separation-style approach and just verified everything using TAKE/DROP. Some of the proofs are a bit of a mess.

Clean up experiments and previous versions

a271e51

A better proof

eef109a

Finally, an approach I'm happy with. Committing then will clean this up.

Reduce to the better proof

6a9750b

Prove remaining cheats

90de656

Start cleaning up "pure" part

0586ee8

Move monadic defs and equiv to basis/monadic

6100ef1

list sort: use a more sensible variable type

9bba1cc

Along the way, we seem to have made the monadic translator resistant to type variables with names other than 'a or 'state.

Add basis/monadic to build-sequence

3875bfb

mllist: use heap-list-sort

e97f619

Switch out the "sort" symbol of mllist to call the new heap-list-sort with all the same key logical theorems exported. This should cause the Candle kernel etc to call on the new sort and translate correctly.

Rename Triviality->Theorem[local]

c1f464a

Oops. The linter really doesn't like that.

heap-list-sort: adjust translation steps

208d1d7

Follow-up to the previous adjustment, we need to make sure that mllist$sort is translated in ListProg, not just something equal to it, so everything is pointing at the one sort instance.

cleanup: remove previous version

60e3ef2

Commit README.md update in basis

309e4ee

talsewell force-pushed the array-heap-sort-cleanup branch from 77517e0 to 309e4ee Compare March 16, 2026 04:35

talsewell added 8 commits March 19, 2026 23:46

candle: same sort change in "overloading"

5387eac

Mostly duplicating the change made to the "standard" theories.

run-finding merge-sort verification

200f4e8

A merge-sort implementation that first scans for increasing/decreasing runs, and merges them trying to merge even-size runs along roughly the design of "timsort". May replace the standard mergesort.

monadic (array) variant of sort and equiv proof

9fd617e

This proof is still a bit involved, but less so than the heap version. It probably helps that the data representation, in which the list chunks are all placed in order, has less moving parts than the flattened-heap variant.

merge_run sort: post-facto translation

c40491e

Exercise the same routine as for heap-list-sort, translating the sort function to CakeML AST in a separate theory to the monadic development, including a hack that re-defines the monadic elements.

heap_list_sort: matching translation file

28d2ece

also detach heap_list_sort from ListProg

5381a66

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

List.sort done in in-place array sort#1349

List.sort done in in-place array sort#1349
talsewell wants to merge 37 commits intoCakeML:masterfrom
talsewell:array-heap-sort-cleanup

talsewell commented Mar 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

talsewell commented Mar 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant