List.sort done in in-place array sort#1349
Open
talsewell wants to merge 37 commits intoCakeML:masterfrom
Open
Conversation
The monadic translator already includes code to substitute types if the supplied definition has a type variable for the state or exception type, e.g. if the relevant function didn't make use of the state or exception value so it was left fully polymorphic. Update this to also substitute type variables and print a message if the relevant types match but are not identical, e.g. if the state type has a polymorphic parameter that is not instantiated to the same thing.
The existing monadic-translator interface code does a bit of guessing of various constant names, and needs to set the translator use_full_type_names setting to false to make the naming predictable enough. This makes the monadic translator incompatible with various translation contexts. Some of this code can be replaced by use of get_type_inv and other query functions to fetch the relevant constants. This seems to allow the monadic translator to run with use_full_type_names present, and doesn't seem to break anything.
Handle a case where the updator for a polymorphic state field is too general. This is a bit of a HOL4 specialty, where a record has a type variable (say 'a) that appears in only one field, and an updator for that field is allowed to change the type of the overall datatype by replacing the field. There was already code for handling this in the monad translator, it just seemed to miss a case. I found it by adding some more error diagnostic code along the way, and I might leave it in.
The idea is to write a sort function with a O(n * log(n)) complexity, which exposes a pure functional interface but which uses a mutable array to actually do the rearrangement. This pure file presents a heap-sort algorithm, which the mutable array modelled as a function. Correctness is verified. The next step is to translate or present some CakeML code that implements this algorithm with a mutable array.
Convert the pure functional heap-sort representation to a monadic representation compatible with the monadic translator (based on some examples). Attempt to translate to CakeML AST, but blocked on various errors.
It seems that the cause of recent issues is that the monadic translator is incredibly fragile when it comes to type parameters if the state type needs to be polymorphic. Tweak things to check that the type variable is consistent. This code seems to work with "tvar = ``: 'state``" and "tvar = ``: 'a``" but it does not seem to work with "tvar = ``: 'el``", which is all a bit mysterious.
Switch out the standard heap-sort, in which the heap points in the opposite direction, for a list-of-heaps sort in which power-of-two-minus-one sized heaps point in the correct direction within the array. This has the advantage that an already sorted array is confirmed sorted without moving anything and in linear time. It also "agrees" with the functional model. Problem though, the mapping of the multiple heaps into the array gets painful to reason about. Saving before trying a different approach.
I gave up on setting up a separation-style approach and just verified everything using TAKE/DROP. Some of the proofs are a bit of a mess.
Attempt to port the proof of simulation of the 979 heap-list-sort from a "direct" version on the monadic semantics to a postcondition/hoare style approach. The big upgrade would be if we could do something separation-ish about the array, since each worker works on a "focal chunk" of the array leaving the rest unchanged, and it would be a huge upgrade if the disjointness of these could be made more implicit.
After figuring out some issues with polymorphism, present a mechanism that sort of works involving disjoint array chunk membership. It works, and the most involved proof is now much faster and much more step-by-step. The other proofs are getting clunkier though. Sigh. Time to try something else.
Finally, an approach I'm happy with. Committing then will clean this up.
After a lot of false steps, the process seems to work. The monadic constants are defined and verified in the monadic sub-theory. The real monadic translation is set up in ListProgScript, together with a bit of a hack for rewriting the accessor constants from the sub-theory to the ones the translator setup builds. The final "sort" function is derived from that, and can be trivially proven to be the pure version (and thus a sorting function) in ListProof.
Along the way, we seem to have made the monadic translator resistant to type variables with names other than 'a or 'state.
Change around the translation. Prove the "monadic-sort = functional-sort" theorem, and use its symmetric version as the definition to translate the functional algorithm. The point is that the translation and EVAL calculations will now both do the correct thing with the one constant.
In the monadic translation, the state type represents all the stateful stuff for the purposes of the monadic model. It doesn't directly exist as any type in the CakeML translation, instead its fields are placed into reference and array objects states and the FFI oracle and whatnot. A previous patch was confused about this. This patch restores the previous behaviour, including the pretty awful re-parsing of strings to fetch constants and overloads that were defined before, though puts a little defensive programming around some of it. Hopefully fixes the breakage of the IO-based monadic translation examples.
Switch out the "sort" symbol of mllist to call the new heap-list-sort with all the same key logical theorems exported. This should cause the Candle kernel etc to call on the new sort and translate correctly.
Oops. The linter really doesn't like that.
Follow-up to the previous adjustment, we need to make sure that mllist$sort is translated in ListProg, not just something equal to it, so everything is pointing at the one sort instance.
The bignum proof needs to know that "sort R [x] = [x]" for some reason. This was done by supplying its definitional theorems. It's now derived from PERM instead, which is likely to be more stable. It also occurs to me that EVAL might be fairly safe.
This includes a variant of one of the sort workers that keeps a "fuel" parameter to make termination obvious, which in turn means that translation to CV can be done by auto-methods. I also tried a more direct approach, and it was horrible, so I think the silly fuel parameter is being kept.
77517e0 to
309e4ee
Compare
The Candle proofs include a pass across the translated CakeML AST to check for problematic effects. The array-based sort complicates this, so instead, restore the previous mergesort translation, call it from Candle, and tweak some proofs to refer to the correct one.
Mostly duplicating the change made to the "standard" theories.
A merge-sort implementation that first scans for increasing/decreasing runs, and merges them trying to merge even-size runs along roughly the design of "timsort". May replace the standard mergesort.
This proof is still a bit involved, but less so than the heap version. It probably helps that the data representation, in which the list chunks are all placed in order, has less moving parts than the flattened-heap variant.
Exercise the same routine as for heap-list-sort, translating the sort function to CakeML AST in a separate theory to the monadic development, including a hack that re-defines the monadic elements.
The heap-list-sort development was done on the assumption that it would become the basis List.sort function, which turns out to be premature. For now, move both in-array sorts (heap-list and merge-run) to be examples. If the merge-run sort turns out to be useful it may be moved back in.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This is an implementation of the remaining idea of issue #979 .
It introduces a list-of-heaps sort in
basis/pure/heap_list_sortScript.sml, and a monadic variant of it that suffix-encodes the heaps into a single array inbasis/monadic/heap_list_sort_monadicScript.sml. The latter theory includes a proof of equivalence, that the monadic computation produces the same results as the pure one.The monadic definitions are then translated (by the monadic translator) in ListProg, and the translation of List.sort uses them. However the constant being translated is just the pure heap-list-sort, so it should EVAL OK.
Various monadic translator fixes, mostly related to type polymorphism, are included in these commits.
Yet to be discovered: Does anything else break in the build? Does this sort function actually perform OK?