The Functional Essence of Imperative Binary Search Trees

Algorithms on restructuring binary search trees are typically presented in imperative pseudocode. Understand-ably so, as their performance relies on in-place execution, rather than the repeated allocation of fresh nodes in memory. Unfortunately, these imperative algorithms are notoriously diﬃcult to verify as their loop invariants must relate the unﬁnished tree fragments being rebalanced. This paper presents several novel functional algorithms for accessing and inserting elements in a restructuring binary search tree that are as fast as their imperative counterparts; yet the correctness of these functional algorithms is established using a simple inductive argument. For each data structure, move-to-root, splay, and zip trees, this paper describes both a bottom-up algorithm using zippers and a top-down algorithm using a novel ﬁrst-class constructor context primitive. The functional and imperative algorithms are equivalent: we mechanise the proofs establishing this in the Coq proof assistant using the Iris framework. This yields a ﬁrst fully veriﬁed implementation of well known algorithms on binary search trees with performance on par with the fastest implementations in C.


INTRODUCTION
In his book on purely functional data structures, Okasaki [1999b] presents several implementations of binary search trees.The inductive nature of these purely functional algorithms makes them amenable to reasoning and verification.A typical exercise in verification courses asks for a formal proof that insertions or deletions preserve the binary search tree properties [Appel 2018].It is even possible to synthesize such implementations automatically [Albarghouthi et al. 2013;Polikarpova et al. 2016] or mechanically compute their amortized time complexity [Leutgeb et al. 2022;Schoenmakers 1993].Unfortunately, the absolute performance of these functional algorithms is often worse than the imperative implementations published in papers and textbooks.While the functional algorithms have the same semantic behaviour as their imperative counterparts, their operational behaviour differs substantially.The functional algorithms suffer from asymptotically worse heap allocation (due to copying immutable data) and stack usage (due to non-tail calls).
This paper presents novel functional algorithms on binary search trees with absolute performance on par with the best known imperative implementations.Enabled by recent advancements, including fully in-place functional programming [Lorenzen et al. 2023b] and Perceus reference counting [Reinking, Xie et al. 2021;Ullrich and de Moura 2019], our algorithms perform in-place updates whenever possible, without sacrificing their purely functional nature.Additionally, our algorithms execute in constant stack space: they are defined in a tail recursive manner, storing incomplete trees as heapallocated one-hole contexts.Each of these algorithms is proven to be equivalent to the established functional version using a simple inductive argument.Yet these algorithms have a close operational correspondence to the published imperative code-we show how the one-hole contexts are an essential part of the loop invariant necessary to verify their imperative counterparts.
A one-hole context is typically represented by a zipper [Huet 1997], storing the path from the hole back up to the root.Our zipper-based algorithms navigate through the tree, accumulating a zipper of unvisited subtrees along the way.Upon encountering the key we were looking for, the zipper is unrolled to reconstruct a complete binary search tree.For example, our implementation of the 'zig-zag' case for splay trees (Section 5) looks something like this: The zipper we have accumulated, accz, stores the path back to the root, where each constructor NodeR/NodeL records whether the traversal down the tree went right or left.Our algorithm is readily calculated from the standard definition by means of a defunctionalized CPS-transformation [Danvy and Nielsen 2001;Reynolds 1972].At the same time, it corresponds exactly to Figure 3 of Sleator and Tarjan [1985], given on the right.It is the essence of an imperative bottom-up algorithm, implemented with pointer reversal [Schorr and Waite 1967].
However, zippers are not the only way to represent one hole contexts.To give a functional account of imperative top-down algorithms, zippers are an asymptotically worse choice of data structure-much in the same way that lists are a poor implementation of queues.To address this, we introduce a novel language feature, first-class constructor contexts, that safely encapsulate the required mutation behind a purely functional interface.These contexts follow a design proposed by Minamide [1998], but we are the first to give an implementation that requires neither a linear type system nor reference counting.Using these constructor contexts, we can implement the corresponding zig-zag step of top down splay trees as follows: fip fun splay ( t, k, accl, accr ) match t Node(ayzb,x,c) -> if x > k then match ayzb Node(a,y,zb) -> if y < k then splay(zb, k, accl ++ ctx Node(a,y,_), accr ++ ctx Node(_,x,c)) ...
Here we traverse the tree and accumulate two constructor contexts accl and accr.Once again, this code can be calculated from the direct recursive definition, in this case using a tail recursion modulo context transformation [Leijen and Lorenzen 2023].At the same time, it corresponds exactly to Figure 11 of Sleator and Tarjan [1985], given on the right, describing the imperative top-down algorithm that accesses a given key, whilst restructuring the tree in-place.
We illustrate our techniques by studying three particular implementations of restructuring binary search trees: move-to-root trees [Allen and Munro 1978;Stephenson 1980] (Section 2), self-adjusting splay trees [Sleator and Tarjan 1985] (Section 5) and the more recently published randomized zip 2 MOVE-TO-ROOT TREES To introduce our techniques, we consider Move-to-root trees, independently described by both Allen and Munro [1978] and Stephenson [1980], which are binary search trees, where accessing a particular key ensures it moves to the tree's root.These trees are rarely used in practice since they can become unbalanced, but their simplicity makes them well suited to illustrate our key ideas.

A Recursive Functional Algorithm
All our examples in this paper are written in the Koka language [Leijen 2014] (v3.1.1).In Koka, we can declare a datatype for binary trees as: type tree Node( left : tree, key : key, right : tree ) Leaf We use an abstract type key for the keys stored in the tree but this is usually instantiated to be an int.The main operation on binary trees is the insert function that takes a tree and a key as its arguments.If the key is not yet in the tree, the insert function inserts it; otherwise no elements are inserted or deleted.We can elegantly express move-to-root tree insertion inductively in a pure functional way using direct recursion, where we ensure in each branch that the inserted key always ends up at the root of the resulting tree: fun insert( t : tree, k : key ) : tree match t Node(l,x,r This version performs a single traversal over the input tree.It is straightforward to verify various correctness properties of this function using a proof assistant such as Coq [2017].For instance, we have proven that (1) whenever t is a binary search tree, so is insert(t,k); (2) every key in t also occurs in insert(t,k); and (3) the key stored at the root of insert(t,k) is equal to k.
Even though this recursive functional definition is simple enough, it does have its drawbacks.Firstly, it is not tail-recursive and can use stack space linear in the size of the tree in the worst-case.Depending on the implementation, its execution may also allocate many nodes in the process leading to (much) worse performance when compared to the imperative algorithms.We now proceed to derive efficient fully in-place bottom-up and top-down variants that remedy both these issues.

Bo om-Up Move-To-Root Using Zippers
A bottom-up algorithm first traverses down the tree to the insertion point and then restructures the tree on the way back up.This can be derived by a standard defunctionalized CPS transformation of the direct recursive definition of the insert function [Danvy and Nielsen 2001;Reynolds 1972].Doing so, uncovers a familiar functional data structure, the zipper (or 'one hole context') [Huet 1997;McBride 2001]  Note that the zipper stores the traversed path down in reverse, such that we can rebuild a new tree bottom-up in constant space in a single traversal: fip fun rebuild( z : zipper, t : tree ) match z Done -> t NodeR(l,x,up) -> match t // we came from the right Node(s,y,b) -> rebuild( up, Node( Node(l,x,s), y, b)) NodeL(up,x,r) -> match t // we came from the left Node(s,y,b) -> rebuild ( up, Node( s, y, Node(b,x,r))) The rebuild function repeatedly moves up through the zipper, reassembling the original tree.This function may be executed fully in-place, as indicated by the fip keyword, a point we discuss more extensively in the next section.Using these definitions, we now specify the complete tail-recursive bottom-up insert function: The insert-bu function is both fast and correct.It is tail recursive; the fip property ensures no unnecessary new memory is allocated.In this case, the fip(1) annotation allows allocation of a single constructor when the inserted key is not yet present; otherwise no memory is allocated or freed.This addresses the two performance issues associated with the direct recursive definition we saw previously.To prove the insert-bu function is correct, we prove the following theorem relating the two algorithms: Theorem 1. (Correctness of bottom-up move-to-root insertion) The recursive-and the bottom-up algorithms coincide: Proof.The proof proceeds by induction on the tree t.The base case, when t is a leaf, is trivial.If the tree is non-empty, we distinguish three cases, depending on the key x stored at t is less than, greater than, or equal to k.We cover the first case -the others are similar -where we need to show: which follows immediately from our induction hypothesis.
An obvious corollary of this theorem is that the recursive version calculated at the beginning of this section coincides with the tail-recursive bottom-up insert function presented here: for all trees t and keys k, we have insert-bu(t,k) ≡ insert(t,k).

Intermezzo: First-Class Constructor Contexts
A top-down algorithm traverses a structure down in a single pass and directly returns the result structure when reaching the final position.Unfortunately, in a purely functional language is not always possible to express such algorithms directly.Consider the map function for example: Yet the resulting algorithm is no longer top-down, as eventually we need to traverse back through the accumulator to reverse it in O(n) time.Similarly, using a function (or difference list [Clark and Tärnlund 1977;Hughes 1986]) as the accumulator, requires a final application of the functional accumulator, essentially traversing back up through the composite Cons operations.
To express true top-down algorithms, we introduce the concept of first-class constructor contexts.This abstraction can safely encapsulate the limited form of mutation necessary to define top-down algorithms, while still having a purely functional interface.We define a constructor context as a sequence of constructor applications that ends in a single hole.We can describe such contexts using the following grammar: where we use v for values, and C k for a constructor that takes k arguments.In Koka, the keyword ctx starts a constructor context and the hole □ is written as an underscore _.For example, we can write a list constructor context as ctx Cons(1,_) or a binary tree constructor context as ctx Node(Node(Leaf,1,Leaf),2,_).
Using constructor contexts, we are now able to define a true top-down functional version of map using an accumulating constructor context: The map function now uses a single tail-recursive traversal down the list and returns the final list directly (in constant time) when reaching the end of the list.

Top-Down Move-To-Root Using Accumulating Constructor Contexts
Using accumulating constructor contexts, we now also define a true top-down functional version of move-to-root insertion.The very first recursive version of insert matches on the result of a recursive call, adding a constructor to either the left or right subtree: ... match insert(r,k) Node(s,y,b) -> Node( Node(l,x,s), y, b) We can make this tail-recursive by passing two accumulating constructor contexts for each of the left-and right subtrees of the root.For example, in the case outlined above, we extend the context we have accumulated so far with an additional ctx Node(l,x,_).In the base cases, we then use the accumulated contexts to construct the final tree: Proof.The proof proceeds by induction on the tree t.The base case, when t is a leaf, is trivial.If the tree is non-empty, we distinguish three cases, depending on the key x stored at t is less than, greater than, or equal to k.For the first case, we need to show: down-td(r, k, accl ++ Node(l,x,_), accr) ≡ val Node(l',x',r') = insert(r,k) in Node(accl ++ Node(l,x,_) ++.l', x', accr ++.r') This follows from our induction hypothesis for the right subtree, where we use the (dist) law.The other cases are similar.A corollary of this result is that that the recursive insertion coincides with top-down insertion algorithm: for all trees t, keys k, insert-td(t,k) ≡ insert(t,k)

MAKING IT FAST: FULLY IN-PLACE AND CONSTRUCTOR CONTEXTS
The previous section has shown three different implementations of the same algorithm: a direct recursive definition; an efficient bottom-up implementation using a zipper; and an efficient top-down implementation using first-class constructor contexts.To achieve performance that is competitive with the imperative algorithms, we need two fundamental techniques: fully in-place functions (fip) and efficient constructor context operations.

Fully In-Place Functional Programming
Throughout this paper, we write purely functional programs, but the goal is always to derive fully in-place or fip functions that can be compiled to efficient code.This section highlights the key principles underlying this recent paradigm of fully-in place functional programming [Lorenzen et al. 2023b].Consider for example the function that swaps the left and right subtrees: fip fun swap (t : tree) : tree match t Leaf -> Leaf Node(l,x,r) -> Node(r,x,l) Lorenzen et al. [2023b] define a linear calculus characterizing fip programs and prove that any program in the fip fragment can be compiled to code that does not use any (de)allocations and uses constant stack space: it can be executed fully in-place.The fip keyword asserts that a given function is in this linear fragment.The Koka compiler statically checks that each fip function does not duplicate or discard its arguments; when a function is erroneously marked as fip, the Koka compiler gives a warning statically.
Intuitively, we can check if a function is fip by ensuring that in each branch the constructors matched on use the same memory layout as the constructors "allocated" on the right hand side of the function definition, thereby ensuring every heap cell is reused.The reuse analysis allows constructors from different datatypes to reuse the same memory cells, illustrated by the following case from the earlier down-bu function: More formally, in each branch of a case expression, the constructor that is matched provides us with a reuse credit of a certain size k, written as ⋄ k (similar to the space credits of Aspinall et al. [2008]).These reuse credits are consumed when space of that size is required: in down-bu the NodeR(l,x,z) consumes the reuse credit ⋄ 3 obtained by matching on the Node constructor.Constructors without arguments, like Nil, True, or Leaf, and primitive types like integers, are called atoms and require no allocation.Furthermore, value types like tuples are always unboxed and passed as registers or on the stack.
Nevertheless, it is only safe to reuse these memory locations if the original parameters are owned and unique at runtime! Inside fip functions the linear use of owned parameters is guaranteed, but when fip functions are called from a non-fip context, the arguments may be shared.Consider the following example: fun mirror( t : tree, k : key ) : tree Node(t,k,swap(t)) Here the tree t is now shared.Any in-place update on t would be unsound and change the meaning of this program.To ensure fip functions are executed safely, Koka uses precise reference counting [Reinking, Xie et al. 2021;Ullrich and de Moura 2019] to determine dynamically whether or not arguments can be reused in-place.In particular, for a function like swap, the generated code becomes: fip fun swap(t : tree) : tree match t Leaf -> Leaf Node(l,x,r) -> val p = if unique(t) then &t else {dup(l); dup(r); decref(t); alloc(3)} in Node@p(r,x,l) That is, if t does have a unique reference count of 1 at runtime, the allocated space is reused.Otherwise, t is shared: the reference counts are adjusted and a fresh heap cell is allocated.The fip annotation in Koka only guarantees that no (de)allocation occurs if the parameters are unique at runtime.This may be viewed as weakness -we do not guarantee statically that a function will actually be executed in-place -but it does offer greater flexibility where we can use fip functions in both modes.In particular, for the tree algorithms in this paper, we not only get the efficient in-place updating behaviour for unique trees, but we can also use them persistently where any shared (sub)trees are copied as needed.
The fip check provides a strong guarantee: constant stack usage and no (de)allocation at all.Throughout this paper, we also use the fip(n) variant which allows a function to allocate at most n constructors.This is useful for tree insertion algorithms, as we may need to allocate a constant amount of memory for the single node storing the new key.
Improving Bottom-Up.The swap function may seem trivial -but consider the following slight variation that rotates a binary tree, moving subtrees from the left to the right: fip fun rotate-right( t : tree ) : tree match t Node(Node(ll,lx,lr),x,r) -> Node(ll,lx,Node(lr,x,r)) It is easy to check that this function is fully in-place.As fip functions can safely call other fip functions, we can rewrite our rebuild function as follows: fip fun rebuild( z : zipper, t : tree ) : tree match z Done -> t NodeL(up,x,r) -> rebuild( up, rotate-right(Node(t,x,r)) ) NodeR(l,x,up) -> rebuild( up, rotate-left(Node(l,x,t)) ) This now corresponds closely to the published algorithm by Allen and Munro [1978] where they also use a bottom-up traversal using left-and right rotations.We formalise the precise relation between our bottom-up insert-bu function and the published algorithms shortly (Section 4), but first turn our attention to the top-down version of the same algorithm.

Efficient First Class Constructor Contexts
As mentioned previously, constructor contexts can be implemented using functions, but such implementation is unnecessarily inefficient.Minamide [1998] describes a linear hole calculus for constructor contexts.In Minamide's system, a context has an affine type and it is safe to update the hole in-place.A context is represented by a Minamide tuple, written as {x, h}, where the first element x points to the top of the data structure, and the second element h points directly to the hole inside that structure.Composition and application can now directly update the hole in-place and are constant time.Unfortunately, it is not easy to extend an existing language with Minamide's system as it requires an affine type system for contexts (and also uses linear derivations and evaluation under-lambda for contexts).In particular, this is problematic for the some of the proofs we do in this paper that rely on referential transparency and do not rule out sharing or duplication of contexts.
Context Paths.There is a way though to have efficient in-place mutating context operations without requiring affine types.The key to this is the use of runtime context paths, which store the path from the root to the hole, first described by Leijen and Lorenzen [2023].
Their use of context paths is internal and not exposed to the user, but we can use a similar runtime mechanism to implement our first-class constructor contexts.
In essence, we compile constructor contexts to a runtime representation storing the context path down from the top to the hole in the data structure.To enable this, we use extra bits in the header of each object where we store the index of the child that leads to the (single) hole in the structure.Koka re-uses an 8-bit field for this purpose which is normally used for stackless freeing.
The context path indices can be constructed in constant time when compiling constructor contexts.Writing C i for the constructor C decorated with child index i, we compile a constructor context into a Minamide tuple as follows: where &x.i denotes the address of field i in x.At runtime, a constructor allocation of C typically initializes the header fields, including the tag.Adding in the context path index yields a single constant, eliminating any overhead associated with this representation.For example, the Koka compiler compiles a context like ctx Node(Node(Node(Leaf,1,Leaf),2,_),5,Leaf) internally into: where each constructor along the context path is annotated with a child index (1 and 3).
With these context paths, we can now follow the path from the top of a context to the hole in that structure at runtime, and thus we are able to copy the linear context path dynamically at runtime when required.When we compose or apply a context we can now copy shared contexts only when needed.In a language with precise reference counts (like Koka or Lean) we copy the contexts at runtime along the context paths whenever they are not unique.
We can also support this in languages without precise reference counts though.In particular, we can use a special distinguished value for a runtime hole □ that is never used by any other object.A substitution now first checks the value at the hole: if it is a □ value, the hole is substituted for the first time and we just overwrite the hole in-place (in constant time).However, any subsequent substitution on the same context will find some object instead of □.At this point, we first dynamically copy the context path (in linear time) and then update the copy in-place.
If the contexts happen to be used linearly, then all operations execute in constant time, just as in Minamide's approach; but we now have full functional semantics and any subsequent substitutions on the same context work correctly (but will take linear time in the length of the context path).The Fig. 1.Applying a shared context where the context path is denoted with a bold edge.In the second update the nodes along the context path are copied.
where the context c is shared, evaluates correctly to Figure 1 illustrates a more complex example of a shared tree context that is applied to two separate nodes.In the illustration the runtime context path is denoted by bold edges.The intermediate state is interesting as it is both a valid tree, but also a part of the tree is shared with the remaining context, where the hole points to a regular node now.When that context is applied, only the context path (node 5 and 2) is copied first where all other nodes stay shared (in this case, only node 1).
However, in the context composition operation c 1 + + c 2 we need an extra check in order to avoid cycles: we check if c2 has an already overwritten hole or if the hole in c2 is at the same address as in c1.In either case, c2 is copied along the context path.Effectively, both checks ensure that the new context that is returned always ends with a single fresh HOLE.If we compose a context with itself: this evaluates to [1,1,2], where the check copies the appended c.In Appendix D in the tech report, we give an implementation of these constructor contexts in C and prove that these checks are sufficient for avoiding cycles using a heap semantics.

FUNCTIONAL AND IMPERATIVE MOVE-TO-ROOT COINCIDE
In Section 2, we derived two functional implementations from our recursive specification.How can we relate these to the imperative move-to-root algorithms published by Stephenson [1980] and Allen and Munro [1978]?As we will see, the imperative algorithms rely heavily on pointer manipulation: it is not at all obvious that they are correct or even represent the 'same' program.
These published algorithms are usually written in imperative pseudocode.To reason about them, we formalize each algorithm precisely in Iris [Jung et al. 2018], a framework for (higherorder concurrent) separation logic [Reynolds 2002] implemented as a library in the Coq proof assistant [2017].In the style of Frumin et al. [2019] and Bedrock2 [Erbsen et al. 2021;Pit-Claudel et al. 2022], we have defined an embedded language, called AddressC, building on the standard HeapLang language supported by Iris [2022].
The AddressC language is embedded into Coq where we use extensive Notation to have the embedded code resemble a low-level C-style language that can match the typical pseudo-code in published algorithms closely.Eventually, AddressC is desugared into a standard HeapLang value representing the low-level control-flow and heap operations on which the proofs operate.
While our language builds on HeapLang, we place special consideration on precisely modeling while loops and the (untyped) low-level structure of memory.For example, we model a tree as:  This states that a Leaf is represented by a null address (the constant NULL).To represent a non-empty tree, Node(l,x,r), requires having some address #p, pointing to a heap cell of 3 fields containing an address for the left tree (l'), its key #x, and an address for the right tree r'.The separating conjunction, * , ensures that the tree is indeed inductively defined and that there are no cycles.For the bottom-up algorithms we additionally need to model zippers, which requires us to distinguish between NodeL and NodeR.To do so, we include an additional tag field in the heap cells, as p ↦ → * [ #1; l'; #x; r'].
Variables typically denote memory locations, and just as in C, we use & to take an address of a location and we use * to dereference an address.We can also dereference at an offset, writing node 2 to dereference the second field of the node address.We usually define notation for constant offsets so we can write node->right instead of node 2 to get the value of the right child.

Proving Stephenson's Top-Down Algorithm
Stephenson [1980] presents an imperative top-down insertion algorithm for top-down move-toroot trees in pseudocode.Figure 2 shows both Stephenson's top-down algorithm as published and our formal AddressC implementation.We can see that our formal AddressC implementation corresponds to the published algorithm almost line-by-line.Using Iris, we can now formally relate the functional algorithm and AddressC implementation: Theorem 3. (Stephenson's imperative top-down move-to-root algorithm is correct) The pre-condition requires that the argument address tv points to a valid in-memory tree corresponding to t, and the post-condition establishes that the result address v points to a valid in-memory tree that corresponds to mtr_insert_td t k.The proof goes through because we can directly relate the loop invariant of this algorithm to the recursive calls of mtr_insert_td.As we see in the next Section, this is only possible because constructor contexts precisely capture the top-down behaviour of Stephenson's algorithm.It would be much harder to relate the AddressC code to our original recursive definition.As is often the case in verification, finding the right formulation of our theorem is vital -this proof would not be possible without constructor contexts.

Representing Constructor Contexts
Stephenson's algorithm uses intricate pointer manipulation and even goto-statements that make it non-trivial to verify formally.The key insight is that Stephenson builds the smaller and bigger trees using the left_hook and right_hook variables.For example, for the case where the key in the current node is larger than the argument key (name), we have: Here the current node address is written to * right_hook which is then itself updated to point to the left child of the current node (right_hook = &(node->left)).Afterwards the current node is advanced to the left child.This corresponds to the functional version, where the current node is written into the right context (accr) and the hole is set to its left child: down( l, k, accl, accr ++ ctx Node(_,x,r) ) At this point though, we have all kinds of problems in the formal setting.Not only have we overwritten the value that right_hook was previously pointing to, but we have introduced aliasing where both the current bigger tree's left-child and node point to the same location.The bigger tree is not even a valid constructor context as the left child is "dangling" pointer (that will eventually be overwritten).Yet we can still prove these pointer manipulations correct by relating them to the constructor contexts used in our functional algorithm.First, we implement our functional algorithms in Coq using a slow, but purely functional representation of constructor contexts, where append and composition take time linear in the depth of the first context.Such a context can be modelled similar to a zipper, but with the pointers going from the root to the hole: We can then define an is_ctx z p h predicate that represents the context z in heap memory, with root pointer p and hole pointer h.However, we take care to ensure that the predicate does not take ownership of the hole h.This is different from the usual presentation [Charguéraud 2016] and allows us to change the value of the hole without inspecting the constructor context (to allow temporarily for a dangling pointer).For example, we can now prove the following lemma: Lemma ctx_of_ctx (z1 : ctx) (z2 : ctx0) (zv1 : loc) (hv1 : loc) (zv2 : loc) (hv2 : loc) : is_ctx z1 zv1 hv1 * hv1 ↦ → #zv2 * is_ctx0 z2 zv2 hv2 - * is_ctx (comp z1 z2) zv1 hv2.This states that if the hole points to another context, together, they form the composed context.This is the key lemma that enables checking the individual cases of the algorithms.The main difficulty of this proof lies in specifying the loop invariants for the while-loop.The first formula passed to the wp_while_true tactic gives the condition once the loop terminates and the second formula gives the invariant for subsequent iterations.

Loop
wp_while_true "H" (∃ l (x : Z) r left_dummy_v right_dummy_v (p : loc) left_dangling_v right_dangling_v, root ↦ → #p * p ↦ → * [left_dangling_v; #x; right_dangling_v] * left_dummy ↦ → left_dummy_v * is_tree l left_dummy_v * right_dummy ↦ → right_dummy_v * is_tree r right_dummy_v * ⌜mtr_insert_td t k = Node l x r⌝)%I This first formula mirrors the return value of the functional code, which returns two trees l and r.It specifies that left_dummy and right_dummy point to those trees, that root points to an allocation containing x and that this final result Node l x r is equal to the result of the functional code mtr_insert_td t k.Our second formula is: The invariant for subsequent iterations mirrors the recursive calls of the functional code, which calls itself in tail-position on a tree t' and two contexts accl, accr.The invariant specifies that left_dummy and right_dummy point to these contexts and left_hook and right_hook point to the holes, while node points to the subtree and name stays constant.Finally, it asserts ownership over the root location and asserts that the functional values accl, accr, t' correspond to a loop iteration of the functional code mtr_down_td.
With these invariants, the remaining proof of the top-down algorithm is highly automated and we resolve all obligations associated with assignments on the heap or constructor contexts automatically using Diaframe's iSteps tactic [Mulder et al. 2023[Mulder et al. 2022]], which performs proof search guided by custom hints (such as our lemma ctx_of_ctx): -iDecompose "H".iSteps.
(* The invariant holds when entering the loop *) Qed.
The brevity of the proof -despite the intricate nature of Stephenson's algorithm -provides further evidence that these definitions capture the essence of top-down algorithms.

Proving Allen and Munro's Bo om-Up Algorithm
While Allen and Munro [1978] do not present imperative pseudo-code, we can define an imperative version of their algorithm in AddressC.They introduce a "simple exchange" (corresponding to what is now called a rotation) and describe their algorithm as: [..] perform a sequence of simple exchanges on the retrieved record so that it is moved to the root. . . .By carefully using the coding trick of "reversing the direction of the pointers" in performing the search, only two or three extra storage locations are required.We can directly implement the mentioned pointer reversal technique [Schorr and Waite 1967] in AddressC (see Appendix E.1 in the tech report).The code corresponds closely to the functional bottom-up version we derived earlier.In particular, just as constructor contexts captured the top-down behavior in Stephenson's algorithm, a zipper captures the structure of in-place pointerreversal.Even though the use of pointer-reversal is complex from a formal perspective, we can use the functional zippers to relate the functional and imperative versions to make the proofs go through.Theorem 4. (Allen and Munro's imperative bottom-up move-to-root algorithm is correct) Lemma heap_rebuild_correct (z : zipper) (t : root_tree) (zipper tree : loc) (zv tv : val) : {{{ zipper ↦ → zv * is_zipper z zv * tree ↦ → tv * is_root_tree t tv }}} heap_rebuild #zipper #tree {{{ v, RET v; is_root_tree (move_to_root z t) v }}}.
Lemma heap_mtr_insert_bu_correct (i : Z) (tv : val) (t : tree) : The precondition of heap_mtr_rebuild requires that the argument addresses, zipper and tree, point to a zipper z and non-leaf binary tree t.The postcondition guarantees that after execution, the memory location that is returned, v, denotes the non-leaf binary tree arising from our functional algorithm, rebuild.Similarly, heap_mtr_insert_bu_correct specifies that given an arbitrary binary tree t with its heap representation tv, the imperative version returns a tree corresponding to mtr_insert_bu i t.

SPLAY TREES
Having looked at move-to-root trees, we can apply the same techniques to their improved sibling, splay trees [Sleator and Tarjan 1985].The move-to-root trees only move the accessed element to the root of the tree but they do not restructure the tree.As such, the tree can degrade to a list in the case of ordered accesses.Splay trees on the other hand are self-adjusting: accessing an element also restructures the path to the root to become more balanced.Sleator and Tarjan [1985] identify six different kinds of tree rotations that are required, zig, zigzig, zigzag and their mirrored counterparts -is it possible to derive all of these rotations?

The Essence of Splay Tree Rebalancing
To understand the essence of splay trees, we look again at the definition of move-to-root trees since splay trees satisfy the exact same requirements.First, we unroll the recursion in move-to-root tree insertion once: fun insert( t : tree, k : key ) match t Node(l,x,r) -> if x < k then match r Node(rl,rx,rr) -> if rx < k then match insert(rr,k) Node(s,y,b) -> Node( Node(l,x,Node(rl,rx,s)), y, b) // (A) elif rx > k then match insert(rl,k) Node(s,y,b) -> Node( Node(l,x,s), y, Node(b,rx,rr)) ... Now we gain insight into why move-to-root trees can easily become unbalanced: when we move twice to the right (and dually, twice to the left for the x > k case) as in the branch labelled (A), we create a short unbalanced part with two right leaning nodes: (Node(l,x,Node(rl,rx,rr)) -> Node(l,x,Node(rl,rx,s) // A A splay tree though rotates those nodes instead, resulting in a more balanced result: (Node(l,x,Node(rl,rx,rr)) -> Node(Node(l,x,rl),rx,s) This is the essence of splay tree restructuring -and it captures the key difference between moveto-root trees and splay trees.Using this insight, we obtain a recursive insert function (Appendix B in the tech report) that closely mirrors the version presented by Okasaki [1999b] (Sec. 5.4).
Even though the essence of splay tree restructuring has just two cases for rebalancing, the rebuild function now shows the usual six rebalancing cases that are common in the splay tree literature: zig, zigzig, zigzag, and their mirrored counterparts.
We call this the fused algorithm and it corresponds directly to Sleator and Tarjan's published bottom-up algorithm.However, even though it splays correctly it turns out to be not equivalent to the recursive-or top-down algorithms.For example, if we start from a right-unbalanced tree with nodes 1 to 4 and insert node 4, we get different results for each of the various algorithms: That the imperative top-down and bottom-up splay algorithms are not equivalent was not widely known [Lucas 2004], but when trying to prove equivalence on the functional specifications it becomes immediately apparent that down-bu-fused(t,k,z) ≡ rebuild(z, insert(t,k)) does not hold.

Functional and Imperative Splay Trees Coincide
Using our AddressC embedded language in Iris we formalized the published top-down and bottomup algorithms by Sleator and Tarjan [1985], where we use pointer-reversal for bottom-up.Once again, there is a line-by-line correspondence between the published pseudocode and formal Ad-dressC implementations that we have written (Appendix E.2 and E.3 in the tech report).Using the same techniques as for move-to-root trees we have formally established that the functional implementations accurately model the published imperative algorithms: Theorem 6. (Sleator and Tarjan's imperative top-down splay algorithm is correct) Lemma heap_splay_insert_td_correct (k : Z) (tv : val) (t : tree) : Theorem 7. (Sleator and Tarjan's imperative bottom-up splay algorithm is correct) Lemma heap_splay_insert_bu_correct (k : Z) (tv : val) (t : tree) : It is worth repeating that the proofs of these theorems are direct, requiring no additional lemmas.This is possible because the functional implementations precisely capture the iterative behaviour of their imperative counterparts through constructor contexts and zippers.Furthermore, these results are novel-to the best of our knowledge there is no formal correctness proof of these algorithms.

ZIP TREES
In recent work, Tarjan et al. [2021] introduce zip trees which can be seen as the functional equivalent of skip lists [Pugh 1990].A zip tree is a binary search tree where every node also has a rank.
type ztree alias key = int Leaf alias rank = int Node(rank : rank, left : ztree, key : key, right : ztree) We choose node ranks independently from a geometric distribution, where the rank of a node is non-negative integer k with probability 1/2 k+1 .Besides being a binary search tree for the keys, the tree is now also max heap-ordered with respect to the ranks with ties broken in favor of smaller keys.We define is-higher-rank as: Once the insertion point is found (as the right child of node 12), the tree at node 17 is unzipped along the key 15, and the resulting trees become the le -and right child of the inserted node.Deletion is the inverse where the children are zipped instead.
fip fun is-higher-rank( ^(r1,k1) : (rank,key), ^(r2,k2) : (rank,key) ) : bool Any parent node always is-higher-rank than its children.Since a zip tree is also a binary search tree, we can also see that the rank of a parent is always greater than the rank of its left-child, and greater than or equal to the rank of its right child.Interestingly, the shape of a zip tree is now fully determined by just the rank/key pairs in the tree and independent of the insertion order.See Figure 3 for an example of two valid zip trees.Intuitively we can see that given the geometric distribution of ranks, the shape of a tree naturally tends to be well balanced, with twice as many nodes at each lower rank.This means that the zip tree operations never need to do any explicit rebalancing, simplifying their implementation compared to usual balanced tree algorithms The rank can be chosen independently at random, but in order to combine search and insertion, we can also derive the rank pseudo randomly from the key.To insert an element into a zip tree, we first calculate the rank of the node.We can now traverse down until we find the fixed insertion point, as it is fully determined by the rank and key: pub fun insert( t : ztree, k : key ) : ztree down( t, rank-of(k), k ) fun down( t : ztree, rank : rank, k : key ) : ztree match t Node(rnk,l,x,r) | is-higher-rank( (rnk,x), (rank,k) ) // go down while node is higher rank Once we reach the insertion point where we are of higher rank than the current tree t, we unzip the tree t into two trees: one with elements smaller than k and one with bigger elements: fun unzip( t : ztree, k : key ) : (ztree,ztree) match t Node(rnk,l,x,r) -> Figure 3 illustrates inserting a node into a tree and the resulting unzip operation.Since the shape of a zip tree is always fixed by its rank/key pairs, deletion is the inverse of insertion which zips child trees back together.Like before, it is straightforward to formally prove that our specification of insert maintains the expected properties of a zip tree.

Bo om-Up Zip Trees
For the bottom-up algorithm, we first define a zipper for the zip tree datatype to keep track of the path down the tree: fip fun rebuild( z : zipper, t : ztree ) : ztree match z type zipper NodeR(rk,l,x,up)-> rebuild(up,Node(rk,l,x,t)) NodeR(rk:rank, l:ztree, x:key, up:zipper) NodeL(rk,up,x,r)-> rebuild(up,Node(rk,t,x,r)) NodeL(rk:rank, up:zipper, x:key, r:ztree) Done -> t Done The down and unzip take the zipper(s) as an accumulating argument, where we again ensure we never unzip trees with the key present: We can optimize this a bit further: for the down-bu function, the zipper along the search path always just rebuilds the exact same path since no restructuring takes place, unlike the rebuilding for move-to-root or splay trees.It can be more efficient to use a constructor context for down-bu instead, as this can rebuild the tree in constant time.
For the optimized bottom-up version the correctness theorem is as before: Theorem 9. (Correctness of bottom-up zip tree insertion) down-bu(t,k,acc) ≡ acc ++.insert(t,k) with unzip-bu(t,k,zs,zb) ≡ val (s,b) = unzip(t,k) in (rebuild(zs,s), rebuild(zb,b)) As with move-to-root and splay trees, we can again prove the correctness of imperative zip insertion.But this time we can even go further: In Appendix C in the tech report, we present a new imperative insertion algorithm which derives directly from the functional version and is simpler, yet as-efficient as the imperative algorithm by Tarjan et al. [2021].

BENCHMARKS
Figure 4 shows benchmark results for the various derived algorithms in this paper.We compare Koka against the best known iterative C implementations.For bottom-up algorithms, we also benchmark ML and Haskell implementations that are direct translations of the bottom-up Koka versions.We ran the benchmarks on Ubuntu 22.04.2 using an AMD 7950X at 4.5Ghz.We used Koka v2.4.2 (-O2), the C implementations were compiled with Clang 14 (-O3 -DNDEBUG), ML with OCaml 4.13.1 (ocamlopt -O2), and Haskell with GHC 8.8.4 (-O2).Each bar represents the median performance over 10 runs, with error bars for the standard error.Each benchmark performs 10M insertions starting with an empty tree, using a pseudo random sequence of keys between 0 and 100 000.Initially the tree is populated quickly up to 100 000 elements followed by many insertions where the element already exists.We tested all top-down (-td) and bottom-up (-bu) versions of move-to-root tree (mtr), splay trees (splay), and zip trees (zip).Figure 4 also includes tests for red-black trees (rb) but we disregard those for the moment.
If we look at the performance relative to the median performance of Koka in Figure 4 we see that our purely functional fip derived versions always outperform C for move-to-root, splay, and zip-trees!How is that even possible?The Koka code in particular must perform more operations: • Koka has automatic memory management, and thus everything is reference counted.The generated code also includes branches to handle potential thread shared structures (which requires atomic reference count operations).• Koka uses arbitrary precision integers (int) for keys and all comparisons and arithmetic operations include branches for the case where big integer arithmetic is required.• Context composition and application are reference counted to handle sharing, and always check for empty contexts.In the C code empty context checks are unnecessary due to stack allocation.• Koka reuses memory when possible, but the trees can always be used persistently as well, and insertion can also handle shared trees where the spine to the insertion point is copied.One factor why Koka still outperforms C is that Koka is tightly integrated with the optimized mimalloc allocator [Leijen et al. 2019].To gain better insight into what the actual overhead of the above features is in our functional code, we also include "equalized" C: here we link the C programs with mimalloc as well (overriding malloc and free), and we include an unused header word in the top-down algorithms to ensure an equal amount of memory is allocated.2This is the third bar in Figure 4.Even compared to equalized C our functional versions still perform remarkably well, being at most 15% slower for top-down move-to-root trees, and only 6% slower for top-down zip-trees.This is surprising, given the additional safety guarantees Koka provides.Many of these checks are cache-local and use just few instructions in the fast path (e.g.is-unique).We conjecture that on modern hardware small fast-path branches with cache-local accesses can be quite cheap -due to the speculation with many parallel compute units the actual performance bottlenecks may be somewhere else, such as a dependency on an uncached memory read.
Even with equalized C, our functional versions are still substantially faster on the bottom-up move-to-root and splay trees.This is due to the difference in implementations: in our derived functional versions we use zippers which are compiled essentially to use in-place pointer-reversal at runtime.The C implementations, in contrast, are using parent pointers instead which is the usual way of traversing back up for the bottom-up algorithms.However, for move-to-root and splay trees the constant restructuring is now more expensive since we need to also adjust parent pointers for each rotation.This cost is much less pronounced for zip trees for which considerably less restructuring takes place, and so the performance difference is correspondingly smaller.As an experiment, we also implemented a pointer-reversal version of Allen and Munro's move-to-root bottom-up algorithm using the lowest pointer bit to distinguish left-from right paths.In that case, the equalized C code performs about 14% better than our functional version.
The top-down zip tree algorithm in C uses Tarjan et al. 's algorithm.We also tested this with our derived algorithm, and the simpler version that does not have the inner repeat-until iterations (and may thus perform extra pointer assignments as shown in Appendix C).For our benchmark though, we could not measure any significant differences in execution times between these versions.
Red-Black Trees. Figure 4 also contains results for bottom-up [Guibas and Sedgewick 1978;Lorenzen and Leijen 2022;Okasaki 1999a] and top-down [Tarjan 1985;Weiss 2013] red-black tree algorithms.It is beyond the scope of this paper to discuss those in detail but we can apply the same techniques that we have shown in this paper to implement the bottom-up version using defunctionalized CPS and zippers, and the top-down version with constructor contexts.The topdown C version is based on the GNU library tree search implementation which encodes the node color in the least significant pointer bit [Schmidt 1997], while the bottom-up one implements the algorithm described by Cormen et al. [2022].The relative performance of Koka versus (equalized) C is still good, but less impressive as for the other data structures: about 25% slower for the top-down algorithm and almost 50% slower for the bottom-up version.This shows that there is still room for further improvements in our compilation techniques.
Each variant performs poorly for different reasons though.We believe the functional version of bottom-up red-black trees is slower because the C versions can use early bailout: on the way back up as soon as a parent is no longer red, the C version can immediately return the root pointer.For the functional version though we need to unwind the zipper completely to reconstruct the tree.There seems no obvious way to implement such optimization on the functional side: we would need some concept of parent pointers to achieve similar behaviour.For the top-down version the reason for the poorer performance is less clear, but we believe it is due to the need to keep track of extra context.Top-down red-black tree rebalancing requires access to the parent and grand-parent of the current node for its rebalancing operations.In C we can just keep two extra pointers around on the traversal down.In the functional version though we need two derivative node constructors for the parent and grandparent, together with the accumulating constructor context -moving the grand-parent into the constructor context on each iteration.We imagine that a potential path to improving this situation is to allow a limited form of pattern matching on constructor contexts.

RELATED WORK
We discuss related work of the studied algorithms in the main text.Here, we want to present an overview of the work related to the employed techniques.
Data structures with a hole.Zippers [Huet 1997] are the canonical functional representation of data structures with a hole.They can be defined generically [Hinze et al. 2004;McBride 2001McBride 2008]], but also arise syntactically as the defunctionalization of the closures generated by a CPS-conversion [Danvy and Nielsen 2001].While they have long been known to be the functional equivalent of backpointers [Huet 2003;Munch-Maccagnoni and Douence 2019], only recently has this insight been exploited to actually compile them to pointer reversing code [Lorenzen et al. 2023b].
In contrast, constructor contexts, as studied in this work, have received far less attention.One reason for this may be that previous implementations required type systems to ensure safety.Minamide [1998] describes a linear type system for efficient one-hole contexts, while destination passing style [Bour et al. 2021;Pottier and Protzenko 2013] requires linear or ownership types.Huet [2003] also discusses top-down structures with a hole Ω, but he does not make an explicit connection to top-down algorithms or present an efficient implementation.
Some top-down algorithms can also be expressed using either laziness [Wadler 1984] or tail recursion modulo cons (TRMC) [Bour et al. 2021;Friedman and Wise 1975;Leijen and Lorenzen 2023;Risch 1973].However, both techniques require the programmer to provide an expression up-front which determines the value eventually filling the hole.This makes it more cumbersome and sometimes impossible to express top-down algorithms with these techniques.Laziness additionally carries a performance overhead due to the creation of intermediate thunks.Conversely, TRMC can be implemented manually with first-class constructor contexts: Leijen and Lorenzen [2023] introduce the context transformation, which generalises Danvy and Nielsen's [2001] approach to constructor contexts.
Compilation of functional programs.A crucial step of our compilation is to reuse [Lorenzen and Leijen 2022;Schulte and Grieskamp 1992;Ullrich and de Moura 2019] old heap cells for new ones.This can be performed automatically in languages with precise reference counting [Reinking, Xie et al. 2021;Ullrich and de Moura 2019], but could also be manually implemented in languages with uniqueness types [Barendsen and Smetsers 1996].However, in order to achieve a fully in-place algorithm, we also need to be sure that certain values (such as tuples) are not allocated on the heap.Lorenzen et al. [2023b] propose a calculus for such functions which ensures that the functions presented here do not have spurious allocations.
In this work, we study compilation as a refinement [Appel 2016] which allows us to connect the functional implementation to published imperative code.Modulo exact choice of variable names and helper functions, it is possible to compile functional code directly to published imperative code.Hofmann [2000] first proposed such a scheme and Gudjónsson and Winsborough [1999] presents an optimization to avoid updating the hole of the context in cases where it already contains the right value, just as in the published implementation of zip tree insertion.
Verification of imperative algorithms.Insertion and deletion algorithms for binary search trees have been verified countless times: There is a large literature on functional implementations [Nipkow et al. 2021[Nipkow et al. 2020] ] as well as destructive implementations [Armborst and Huisman 2021;Pek et al. 2014;Stefanescu et al. 2016Stefanescu et al. 2016;;Zhan 2018].However, these algorithms are typically based on recursive code and thus do not deal with the issues discussed in this paper.Surprisingly, there seems to be far less literature on verifying idiomatic, imperative code as it appears in algorithm papers.Schellhorn et al. [2022] and Dross and Moy [2017] formalize the text-book insertion and deletion of red-black trees, but due to the use of inline invariants their code does not resemble the original implementation.Lammich [2020] formalizes an array-based implementation of patterndefeating quicksort in the Boost C++ library.Enea et al. [2015] prove insertion algorithms for AVL trees and red-black trees in C correct by deriving a representation for the already-traversed segment.They do not consider a functional version and thus have to perform a proof search.
Formalizing constructor contexts.Following Charguéraud [2016], we define an inductive representation of one-hole data structures.In the work of Enea et al. [2015], these segments also hold additional invariants, but this is not necessary if one only wants to relate the segments to their functional counterpart.Cao et al. [2019] formalize an idiomatic, non-balancing insertion into binary trees and point out that a constructor context can also be represented in separation logic using a magic wand, thereby implementing constructor contexts directly as their interface.
Tuerk [2010] demonstrates a method for proving the correctness of while-loops that recurse on an argument by using simple pre-and post-conditions; this may be powerful enough to prove bottom-up algorithms as well as those top-down algorithms that arise from a functional version via TRMC.

CONCLUSION
This work bridges the gap between imperative algorithms and purely functional programs.The key techniques to guarantee performant functional implementations-deriving tail recursive fip functions from their directly recursive counterparts-are in no way restricted to binary search trees.We fully expect them to be widely applicable to other algorithms and data structures.Furthermore, this will enable us to adopt a wide variety of techniques designed specifically for functional languages-ranging from program synthesis [Albarghouthi et al. 2013;Polikarpova et al. 2016] to automatic amortized complexity analysis [Leutgeb et al. 2022;Schoenmakers 1993]-in a novel setting, where they could not be applied easily until now.

Fig. 2 .
Fig. 2. The move-to-root top-down algorithm formalized in AddressC on the le , versus a screenshot of Stephenson's published algorithm on the right (© Springer Nature)

Fig. 3 .
Fig.3.Inserting a node with key 15 and rank 3 into a zip tree (with ranks shown as single digits in blue).Once the insertion point is found (as the right child of node 12), the tree at node 17 is unzipped along the key 15, and the resulting trees become the le -and right child of the inserted node.Deletion is the inverse where the children are zipped instead.
Invariant.Using our representation of constructor contexts, we can now show the full proof of the functional correctness of Stephenson's top-down algorithm.We start the proof by Proc.ACM Program.Lang., Vol. 8, No. PLDI, Article 168.Publication date: June 2024.
Proc.ACM Program.Lang., Vol. 8, No. PLDI, Article 168.Publication date: June 2024.Benchmarks on Ubuntu 22.04.2 (AMD 7950X 4.5Ghz) comparing the relative performance of C, ML, and Haskell against Koka for move-to-root (mtr), splay trees (splay), and zip trees (zip) for both top-down (td) and bo om-up (bu) variants.Each benchmark performs the same sequence of 10M pseudo-random insertions between 0 and 100 000 starting with an empty tree.