Rooting for Efficiency: Mechanised Reasoning about Array-Based Trees in Separation Logic

Array-based encodings of tree structures are often preferable to linked or abstract data type-based representations for efficiency reasons. Compared to the more traditional encodings, array-based trees do not immediately offer convenient induction principles, and the programs that manipulate them often implement traversals non-recursively, requiring complex loop invariants for their correctness proofs. In this work, we provide a set of definitions, lemmas, and reasoning principles that streamline proofs about array-based trees and programs that work with them. We showcase our proof techniques via a series of small but characteristic examples, culminating with a large case study: verification of a C implementation of a recently published tree clock data structure in a Separation Logic embedded into Coq.


Introduction
There are hardly any other basic data structures in Computer Science as ubiquitous, versatile, and beloved as trees.
Used to represent virtually any kind of data with hierarchical ordering, trees admit a simple encoding in both functional programming languages, as algebraic data types (ADT), and in imperative ones, as pointer-based linked structures without internal sharing.Most tree-manipulating programs can be expressed as recursive traversals, whose control flow mimics the shape of the underlying data structure.Thanks to this fact, reasoning about tree manipulations can be conducted via simple induction principles, which made such computations popular topics in studies on program derivation [13,29], transformation [4,35], and verification [2,10].
In this work, we focus on a less well-studied way to represent trees in heap-manipulating programs written in imperative languages such as C: as arrays.Developers choose an array-based encoding of trees for efficiency reasons-it allows constant-time random access to node data, requires less memory to store, and enjoys better cache locality than pointer-based representations.From the perspective of formal reasoning about tree-manipulating programs, the array representation comes with a number of new challenges.First of all, tree traversals implemented by means of addressing elements in an array via integer indices do not immediately yield familiar induction principles.Furthermore, programs working with array-based tree representations often exploit the fact that children of a node are arranged into a contiguous array segment: this enables efficient non-recursive traversals, but complicates verification due to the need to devise complex loop invariants.Finally, while arrays are conceptually similar to pointers, they require slightly more delicate reasoning in common formalisms, such as Separation Logic (SL) [28,31], as proof obligations involving their element indices require one to keep track of the indices' numeric properties to avoid, in particular, out-of-bounds errors.
The motivation for this work came out of our effort of verifying in Coq a C implementation of Tree Clocks [22]-an intricate imperative data structure that implements a stateof-the-art version of logical clocks via an array-based tree and extensively takes advantage of the array encoding for the sake of efficiency.SL-based reasoning about complex linked structures with "deep" internal sharing and unstructured aliasing has been addressed, to some extent, in the past [24,36].However, we found those developments inapplicable for our goal due to their focus on a more general setup targeting graph-like structures and, therefore, imposing extra proof overhead when tackling tree-specific proof obligations.
To document the lessons we learned from our verification experience, in this paper, we articulate a number of challenges we faced during while verifying several characteristic procedures that manipulate array-based trees.We then describe reasoning principles and auxiliary definitions encoded in Coq that came in handy when constructing such proofs.In particular, we argue that there are two kinds of representation predicates in Separation Logic for array-based trees (i.e., the array view ones and the tree view ones), each having its specialised utility in the proof; we also present lemmas for smoothly "switching" between the dual views.Another emphasis of our demonstration is the formulation of an extensible loop invariant for non-recursive tree traversals through a small collection of neat functions defined on the mathematical model of array-based trees.Finally, we sketch the key points of our mechanised correctness proof of the tree clock data structure that has been implemented in C and verified using the Verified Software Toolchain (VST) framework [1], which embeds Separation Logic into Coq.
In summary, this work makes the following contributions: • A collection of small case studies illustrating the key challenges of reasoning about imperative C programs that manipulate array-based trees (Sec.2); • Arboreta: a library of predicates and accompanying higherorder lemmas, as well as a set of principles that streamline the common reasoning patterns for such program (Sec.3); • A mechanically verified implementation of Mathur et al.'s array-based tree clock data structure in C [22] (Sec.4).
The code repository for this paper is publicly available at h ps://github.com/verse-lab/arboreta

Overview and Key Ideas
In this work, we reason primarily about rooted labelled trees (RLTs), and specifically those implemented using arrays.An RLT is a tree in which each node, besides the data it holds, is given a unique identifier (label), usually a natural number.These identifiers allow for node indexing and thereby enable random-access data retrieval from the "carrier" array of the tree.In an imperative programming language like C, a common approach to implementing RLTs is to use an array of structs, where each struct contains the information associated with a node, and use array indices as node identifiers.Fig. 1a shows one its possible encoding in C: the tree is represented by the array tree, in which tree[i] stores the node with identifier (hereafter referred to as node ).Each node also stores some data in its val field, and its relations to other nodes in the RLT are represented by the fields par,  sib, and fch that store the identifiers of the node's parent, right sibling, and first (left-most) child, respectively.Fig. 1b gives a visual depiction of such a tree.As indicated in Fig. 1b, this structure allows the representation of RLTs of any arity.
In the rest of this section, we provide a series of illustrative examples of programs that manipulate RLTs encoded as shown in Fig. 1a, articulating the challenges of structuring formal verification of such programs.
Formalising RLTs.Fig. 2 shows a natural encoding of an RLT in Coq as an inductive algebraic data type: an RLT is constructed with the Node constructor, which carries the node identifier id, the node information val and the list chn of children.Even though the definition of the tree data type offers no constructor for an empty tree, leafs (i.e., nodes without children) can be modelled as instances of the Node constructor with empty chn.To ensure that the nodes in a tree tr all have distinct identifiers, we use an extrinsic predicate NoDupId(tr), whose presentation is postponed till Sec.3.1.
While this fairly standard inductive definition is sufficient for modelling the tree structure and stating and proving mathematical properties of purely-functional RLTs, it is not immediately suitable for specifying and verifying imperative programs that manipulate array-based RLTs: the logical tree structure is not directly related to the physical one in memory.To bridge this gap, we employ a standard technique of defining a representation predicate in Separation Logic [6,7,34] that relates the algebraic definition of RLTs in Coq to the memory layout of the RLT in C.
for arrays (available in many existing Coq embeddings of SL [1,8]), which states that an address in memory is a base pointer of an array whose elements are the elements of the (mathematical) list ℓ.Concretely, we make ℓ into a list of payloads, where each payload is the functional model of a single instance of the node structure shown in Fig. 1a.
Unlike the C definition in Fig. 1a, in the Coq encoding, the child subtrees of a node are represented as a finite list.The challenge in relating the C encoding, which connects a node to its parents/children/sibling by virtue of the indices represented by integer fields, with the Coq encoding from Fig. 2 lies in recovering the list of nodes from the inductive tree definition, and specifically in constructing the par, sib, and fch fields to express the tree structure.To this end, we define a recursive projection predicate treelist_proj that relates a list of trees to a list of payloads: As per the definition above, the projection predicate takes as input a list of trees, rather than a single tree, in order to match the structure of the Coq definition wrt.children.For Node(id, val, chn) in the list, the predicate asserts that its payload is the id th element of ℓ (denoted ℓ [id]), with the appropriate parent.The right sibling and first child are obtained by retrieving the identifier of the heading node (hid) in the trs and chn lists, respectively.The remainder of the payload list ℓ is constrained by two recursive applications of the predicate, treelist_proj(chn, id, ℓ) and treelist_proj(trs, par, ℓ).The indices of the nodes in the payload list ℓ are the same as their identifiers in the definition from Fig. 2, but treelist_proj could assign identifiers arbitrarily, as long as they are distinct.

Specifying Computations with Array-Based RLTs
Let us now illustrate the representation predicate by specifying and verifying a couple of RLT-manipulating programs.The specification above states that, assuming the node with the identifier holds the value in the algebraic representation of the RLT tr and the array-based representation of tr is available to get_val, the result ret of the call will be exactly .To establish this specification, we first need to prove an important property of find_val, namely, that if find_val( , tr) = Some( ), then for any ℓ that satisfies tree_proj(tr, ℓ), ℓ [ ] contains .This fact can be proven by induction on the algebraic tree tr via the following induction principle stating that a property holds on an RLT if it holds on all children and on the root node itself: 1 Lemma tree_ind' (P: tree -> Prop) (Hindstep : forall (id v: nat) (chn: list tree), (forall ch, In ch chn -> P ch) -> P (Node id v chn)): forall (tr: tree), P tr.
The specification (2) can be then proven using standard Separation Logic rules.First, tree_rep arr ( tree , tr) gives us the logical list ℓ, which we pass to the find_val property, obtaining the fact that ℓ [ ] contains .Next, from arr( tree , ℓ) we know that tree[x] stores the equivalent of the payload ℓ [ ]. Finally, from the above, we conclude that tree[x].valreturns ℓ [x].val.The key enabler of this proof is the representation predicate that links the algebraic tree with the arraybased one.Random access operations similar to get_val are paramount in implementations that manipulate array-based RLTs, and we make use of such specifications in our chief case study with tree clocks outlined in Sec. 4. 1 Coq automatically derives a weaker induction principle for the tree from Fig. 2, hence we define tree_ind' and prove it as a standalone theorem.Example 2: Structure-Changing Tree Traversals.Realistic programs that manipulate RLTs can be quite complex, and may perform (imperative, non-recursive) traversals interspersed with structure-changing operations. 2 Fig. 4a shows an example of such a program, which (i) traverses the tree stored in tree2, (ii) copies the val of each traversed node x to the node with the same identifier (the "updated node") in the tree stored in tree1, and (iii) moves the first child of the updated node to be the first child of the node root1.Fig. 4b shows an example execution of copyval_and_move.In the resulting tree, the value of node 3 has been updated to 7, and node 3's first child in tree1, i.e., node 4, has been moved to become a child node of node 1, which is the root of tree1.Similarly, node 5's value has been updated to 4, and node 5's first child, i.e., node 6, has been attached to node 1.Finally, the value of node 1 is updated to 2. This example is a simplified version of the manipulations that take place in tree clocks, and we dedicate the next two subsections to zooming in on its two challenging aspects: changing the tree structure and performing its non-recursive traversal. 2For ease of presentation, all non-recursive traversals in this paper visit child nodes from last to first.Regular traversals can be handled analogously.

Structure-Changing Tree Operations
Before we look into copyval_and_move, let us first verify its subprocedure move_first_child, shown in Fig. 5.
This procedure makes the first child of node src become the first child of node dst, while keeping the rest of the tree's structure unchanged.To give its specification, we can (a) write one or more recursive functions in Coq exhibiting the same behaviour as the imperative program (e.g., one for popping a child of node and another for prepending a child), and then (b) state that the logical tree obtained after applying these functions corresponds to the tree in the postcondition: where tr ′ in the postcondition may be similar in form to prepend_child(pop_child(tr, src), dst).We give one possible definition of prepend_child later in Sec.3.2.4.
To prove the specification (3), first observe that this program only involves reads and writes on the fields of the array tree.We can unfold the tree_rep arr in the precondition to obtain the initial payload list ℓ by existential elimination.Similarly, in the postcondition, the resulting tree is represented by another payload list ℓ ′ such that arr( tree , ℓ ′ ) holds, and we know that ℓ ′ is obtained from ℓ by applying a sequence of transformations.It then suffices to show that tree_proj(tr ′ , ℓ ′ ) holds.Unfortunately, this proof obligation can be very cumbersome, since we need to show that the payloads not touched by move_first_child have remained unchanged in the process.We can prove this by induction, but it is tedious, and the proof would need to be repeated anew for every different structure-changing operation.
Key Idea: Dual Views.What we would really want are some localised reasoning principles that would allow us to prove tree_proj(tr ′ , ℓ ′ ) by only requiring reasoning about the modified part of the payloads and the tree.However, this is hard to achieve with the current representation predicate.
Prior work on Separation Logic-based verification has shown that one can get exactly this workflow by defining an alternative representation predicate that reflects tree structure via suitable placed usages of the separating conjunction in its definition [6,7].We will refer to such a predicate that allows one to reason about the memory manipulations inductively and employ proper localised reasoning rules (e.g., Frame), as a tree view predicate.In Sec.3.2, we will show how to define such a tree view predicate, which facilitates verifying structure-changing RLT operations.At the same time, we will keep using the previously-defined array view predicate to deal with random access operations, and will be switching between the two views as needed to use the one best suited for the verification task at hand.

Non-Recursive Tree Traversals
Preorder or postorder tree traversals are easy to implement recursively in both functional and imperative styles.Since RLT nodes have unique identifiers, we can also implement non-recursive traversals using a stack.This avoids recursion overhead (e.g., allocation of stack frames), and is therefore preferred in scenarios where efficiency is key.
The difficulty with certifying such traversals lies in stating the loop invariant.To illustrate this, let us temporarily turn our attention from loop-based implementation of copyval_and_move to a simpler program in Fig. 6 that performs a recursive preorder traversal on array-based RLTs.The program computes the maximum value val in the tree, and we can straightforwardly define the functional model of this imperative program in Coq: Since max_val_rec performs recursion nearly identically to max_val, a feasible SL specification for max_val_rec is: {tree_rep ′ arr ( tree , tr, par)} max_val_rec(id_of (tr)) { ret.tree_rep ′ arr ( tree , tr, par) * ⌈ret = max_val(tr)⌉} where id_of (tr) returns the identifier of tr's root, and tree_rep ′ arr ( tree , tr, par) ≜ ∃ℓ, ⌈treelist_proj(tr :: nil, par, ℓ)⌉ * arr( tree , ℓ).
We have to use the generalised representation predicate above because par is not −1 in the recursive call.
We can now establish specification ( 4) by (a) showing that tree_rep ′ arr ( tree , Node(id, val, chn), par) entails that the predicate tree_rep ′ arr ( tree , ch, id) holds for all ch ∈ chn, and (b) using "the value of maxi is the maximum value amongst tr's root and some prefix of its children" as a loop invariant.
Although, as we have seen, a recursive traversal similar to max_val_rec is straightforward to specify and prove, the situation is quite different for copyval_and_move.Specifically, it is unclear how to specify the loop invariant for its outer while-loop.There are two aspects we need to capture in that loop invariant: (a) the contents of the stack and (b) the visited nodes (i.e., the nodes that have already been popped from the stack in previous iterations).The two must be consistent (e.g., a visited node should not appear in the stack) and maintained synchronously (e.g., once a node is visited in the current iteration, its children nodes should be pushed into the stack).The difficulty lies in both characterising these two aspects and describing how they evolve in a traversal.In some cases, it might suffice to define a ghost state for the visited nodes, but this does not generalise for arbitrary traversals.
Key Idea: Tree Spli ing.To verify non-recursive traversals, we need a precise characterisation of the tree structure that has been already visited.This can be achieved by "splitting" the set of array-hosted nodes of the tree into those already visited and those that yet remain to be processed.Luckily, the spatial structure of the array-based RLTs makes it possible to define such invariants by instantiating a common "template" that constrains the two subsets of the nodes.We detail this technique Sec.3.3, providing a description of the helper lemmas that facilitate such proofs.

Arboreta: Proofs about Array-Based Trees
In this section, we elucidate the design of Arboreta-a Coq library with a set of proof principles that facilitate mechanised reasoning about array-based tree manipulations in Separation Logic.We introduce its fundamental components in Sec.3.1.In Sections 3.2 and 3.3, we develop and apply specialised reasoning principles to solve the challenges outlined previously in Sections 2.2 and 2.3, respectively.

Reasoning Principles for Rooted Trees
Core to Arboreta is Arboreta-P, a small collection of useful definitions and lemmas for pure reasoning about rooted (unlabelled and labelled) trees.The library provides a definition of generic rooted trees, parametrised by a type parameter A, which is the type of node data, shown in Fig. 7a.For convenience, we keep using tree as the type name of trees.
Expansion.One of the core definitions is the expansion function, which, when applied to an algebraic tree tr, returns a list containing all the subtrees of tr.Fig. 7b shows its definition, and three derived definitions, for the size of a tree, the subtree relation, and the list of node data.Since a bijection exists between nodes and subtrees, expansion serves as a building block for other definitions, e.g., NoDupId.(f) Definition of tree prefix and one of its property.Forall2( , ℓ 1 , ℓ 2 ) Fig. 7. Main Coq definitions from Arboreta-P.
Node identifiers.As the type of node data is parameterised, we also parameterise the type of node identifiers into B, as shown in Fig. 7c.id_of_data extracts the node identifier from a node's data, and id_of extracts it from a node.NoDupId is defined in terms of expansion and id_of.
Finding by identifier.When identifiers come equipped with decidable equality (e.g., natural numbers), Arboreta-P provides a function find_node(x, tr) for finding the subtree of tr whose root has identifier x, as shown in Fig. 7d.
Node coordinate.We can assign a unique coordinate to each node in the tree.A coordinate or position describes how to find that node from the root in a top-down manner, and can be represented in Coq as a list of natural numbers storing a child index to visit at each level.We define a locate function to access a node by coordinate (the definition is trivial and elided), and a locate_update function to replace the subtree at a given coordinate, as shown in Fig. 7e.
Tree prefix.We say that a rooted tree tr 1 is a prefix of tr 2 if tr 1 is a subgraph of tr 2 and both trees have the same root.This definition is encoded by the inductive predicate from Fig. 7f, along with a custom induction principle (not shown).With such an abstract definition, we can prove properties of prefixes in Arboreta-P, and then prove that the trees returned by the (possibly complicated) functions we want to verify are prefixes.For instance, we use the prefix_data_is_subset property in the tree clock case study (cf.Sec. 4).

Dual Views: From Array to Tree and Back Again
Recall that in Sec.2.2, we encountered the need to reason about trees in a structure-aware fashion, and envisioned an inductively defined tree view predicate to be used as an alternative to the array view, in order to reason with ease about structure-changing operations.In this section, we (a) derive the tree view predicate from the array view one, (b) exploit the tree view to apply local reasoning principles, and finally (c) shift back to the array view.In other words, we work through the (general) way to apply the following rule of consequence, and showcase it on move_first_child from Fig. 5: 1 From Array View to Tree View.As a reminder, the representation predicate of an array is typically defined as a collection of contiguous memory blocks, as follows: where is the type of the payload and sizeof( ) denotes how much space the payload occupies in memory.Since the separating conjunction * is commutative and associative, we can reorganise the memory blocks in the array according to the tree's structure, and inductively define the tree view predicate tree_rep tree as follows:

Local Reasoning with Tree View. Let us define the predicate tree_rep ′
tree in a way similar to tree_rep ′ arr (4): Using the properties of the magic wand [6], we can reflect the modifications to the functional tree made by locate_update (Fig. 7e) onto the heap state, expressing this by the following entailment, which can be proved by induction on pos: tree ( , locate_update(tr, pos, sub ′ ), par, sib) (WandFrameUpdate) The entailment in WandFrameUpdate might look a bit intimidating, but the only thing it does is instantiating the analogue of the "modus ponens" rule for * /− * , "pulling out" the tree_rep ′ tree ( , sub ′ , par ′ , sib ′ ) assertion.Note that Wand-FrameUpdate allows "pulling out" only a single subtree at a time, which is nevertheless sufficient for our verification task about tree clock.We will discuss this limitation in Sec. 5.

3.2.3
From Tree View to Array View.Thus far, we have defined the tree view predicate and shown how to obtain it from the array view predicate.However, in practice, the array view predicate is independently useful for reasoning about read-only operations, especially random array accesses.This utility may stem from specialised support provided by verification tools for handling array operations, for instance.To this end, a natural question could be whether we are able to switch between the array view and the tree view on demand so as to enjoy the best parts of both views.
The answer is affirmative.From the definition ArrDef, we can prove the following entailment, which "reconstructs" an array from contiguous memory blocks, by induction on .Here, is a natural number and the type of is And since the tree view predicate can be also regarded as a bunch of memory blocks, we can prove by induction on tr that it can be "shattered" into memory blocks whose content is specified by a function from identifiers to payload: In this case, we can switch "seamlessly" between the two views using Ar-rToTree and TreeToArr.

Dual Views in Action.
The ability to switch between the array and tree views allows for relatively straightforward verification of move_first_child from Sec. 2.2 against the specification (3).To do so, we first express the local modifications via locate_update.For example, prepending a child ch to a specific node of tr can be implemented as follows, when given the coordinate pos of that node inside tr: prepend_child(tr, ch) ≜ locate_update(tr, pos, Node( , ch :: chn)) where we let locate(tr, pos) be Some(Node( , chn)).We can implement popping the first child analogously.After those instantiations, we can then switch from the array view in the precondition to the tree view via ArrToTree, apply WandFrameUpdate to reason about the affected local part, and finally recover the original array view via TreeToArr.

Loop Invariants for Non-Recursive Traversals
As the last component of Arboreta we present an approach for stating loop invariants to verify non-recursive tree traversals similar to that of copyval_and_move from Fig. 4a.Recall that the solution hinted at the end of Sec.2.3 was to explicitly characterise the "visited part" of the tree in the loop invariant, relating it to the contents of the explicit stack.
In such traversals, at the beginning of each iteration of the outer while-loop, the visited part is likened to the "right half" of the tree obtained by "splitting" the original tree along the path from the root down to the node that will processed in the current iteration (i.e., the node at the top of the stack; represented by x in Fig. 4a, for example), which we refer to as the stack top node hereafter.After completing the iteration, the stack top node will belong to the visited part in the loop invariant.This process is depicted in Fig. 8 with regard to the "working" node 3.As we will soon see, a visited part must be a prefix of the original tree (in the sense of the definition from Fig. 7f), so we will refer to it as the visited prefix.In addition, we name the visited prefix at the beginning/end of the iteration as the pre/post-iteration visited prefix, respectively.
In the remainder of this subsection, we will walk through the intuition of formalising the idea of splitting the tree along the path, showing how it leads to an extensible invariant definition for non-recursive traversals, and demonstrating the utility of that definition to verify the program from Fig. 4a.

Vertical
Tree Splitting.Fig. 9a provides the definition of a function for splitting the tree vertically wrt. a given node, which is represented by its position.The function vsplit(full, tr, pos) returns the part "on the right" after splitting tr wrt. the node at pos, whose children will be included in the result iff full is true (see the first branch of match pos with in the definition of vsplit).
We can now use vsplit to define post-iteration visited prefix for a traversal loop.As an example, let us take tr to be the tree in the leftmost part of Fig. 8. Then the coordinate of the node 3 in tr is pos 3 = [1], and we can check that  (b) Definitions of pre/post-iteration visited prefixes.When computing the pre-iteration visited prefix, the value of full depends on the existence of the right sibling of the node at pos.Note that if the right sibling does not exist, then by vsplit_norsib, the computed result will be equal to the post-iteration visited prefix applied to the coordinate of the parent node, namely (removelast pos).the result of vsplit(false, tr, pos 3 ) is the post-iteration visited prefix shown in the rightmost part of Fig. 8.Even though the function vsplit is defined to produce the post-iteration visited prefix, it can also be used to obtain the pre-iteration visited prefix: the pre-iteration visited prefix is exactly the post-iteration visited prefix after "subtracting" the stack top node.It turns out that this definition of the pre-iteration visited prefix coincides with the result obtained by vsplit using the coordinate of the right sibling (or the parent, if the right sibling does not exist) of the stack top node.Moreover, the coordinate of the right sibling of a node can be calculated from the node's own coordinate via the function rsibpos in Fig. 9b.We summarise the formal definitions of pre/post-iteration visited prefixes in Fig. 9b, from which we know that they are indeed tree prefixes by the lemma vsplit_is_prefix.

Retrieving Stack Contents.
Perhaps surprisingly, by understanding the nature of the traversal, it becomes possible to fully retrieve the contents of the assisting stack from the algebraic tree description of the stack top node.The definition of a function worklist doing exactly that is given in Fig. 11.The function returns a list of subtrees (the second component of its return value) in addition to the coordinates of their roots (the first component of its return value).By applying worklist on the coordinate of the stack top node, the contents of the assisting stack can be then recovered from the root identifiers of the returned list of subtrees.The list of coordinates will be used later in Sec.3.3.3.

3.3.3
The Cornerstone Loop Invariant.At last, we come to present a recipe for stating loop invariants to verify nonrecursive traversals using the cornerstone invariant, which  is defined as an inductive Coq predicate in Fig. 12 and is parameterised by the tree tr being traversed.It relates assisting stack (captured in its first index of type list B) and the pre-iteration visited prefix (i.e., its second index of type tree).In order to use the cornerstone invariant, one has to assume that the root has been visited before the loop begins and that its children nodes are in the assisting stack at the start of the loop.In practice, this is almost always the case, and it is the case for all the operations of tree clocks (cf.Sec. 4).
During a loop iteration where the assisting stack is not empty, according to the TInv_intermediate case in Fig. 12, the stack top node (i.e., the root of sub) will be popped, and its children nodes will be pushed into the stack before the iteration ends. 4The lemma in Fig. 12 serves to re-establish the invariant at the end of the iteration.
A notable feature of this pure loop invariant is its extensibility: it characterises only the core components of nonrecursive traversal, thereby granting users the flexibility to use it in conjunction with other invariants.In particular, one can use this loop invariant as the cornerstone of a "larger" loop invariant specific to a concrete non-recursive traversal.At a high-level, any such larger loop invariant can be constructed using the following invariant template: where tr is the tree being traversed.We phrase the template using the array view predicate, since the structure of the original tree tr is usually not changed during traversal, and the contents of tr are retrieved via random accesses.Fig. 12.An extensible loop invariant for non-recursive traversals.In the TInv_terminate case, the assisting stack is empty and the traversal terminates.The TInv_intermediate case snapshots the stack content and the pre-iteration visited prefix given the coordinate pos of the stack top node.
The proof of its SL specification with regard to a functional reference implementation can be completed by instantiating the invariant template as follows: 4 Verifying the Tree Clock Data Structure This section showcases the definitions and techniques of Arboreta working in tandem for verifying a large case study: an executable C implementation the tree clock structure [22].

Tree Clocks: A Primer
Dynamic analysis is a de facto preferred method for detecting concurrency bugs such as data races in multi-threaded programs.Dynamic data race detectors such as Thread-Sanitizer [33] and FastTrack [12] observe events of an execution during runtime, infer the happens-before (HB) partial order [18] between them and report conflicting events unordered by HB to be in a data race.These detectors play a vital role in revealing data races which may lead to critical failures in large software systems and have been extensively applied in industrial settings.Instead of explicitly constructing the HB partial order as a graph of events, such tools leverage time-stamping to implicitly infer the HB partial order.Analyses such as data race detectors maintain logical clock data structures to compute the timestamp of each event accurately, potentially performing clock operations at every event.A timestamp is a mapping from the identifiers of threads to their respective (local) clocks (represented by natural numbers), and the HB relation between events can be recovered by comparing their timestamps.Tree clock [22], a recently proposed logical clock variant, achieves optimal asymptotic complexity in performing clock operations by novelly exploiting the hierarchical structure of tree.
A logical clock can be regarded as an abstract data type which exposes its maintained timestamp and usually supports two clock operations: join and copy.When implemented in an imperative language, the timestamp is typically mutable, and the two operations work by modifying the timestamp stored in one of the two operands (i.e., they are in-place operations).For an instance of a logical clock, we use •val to denote the timestamp that points to.Let 1 and 2 be two instances of a logical clock initially pointing to timestamps 1 and 2 respectively.The in-place join operation where ⊔ denotes the logical join over timestamps: Likewise, the copy operation 1 •copy( 2 ) should update 1 so that 1 •val = 2 after the operation is performed.
The vector clock [11,23] is the traditional logical clock data structure; it represents timestamp using an array, indexed by identifiers of threads.Join and copy operations for the vector clock data structure take Θ( ) time, where is the number of threads in the execution.In the context of data race detection, for executions with many events, this can result in prohibitively significant slowdowns.On the other hand, the tree clock internally organises timestamp hierarchically as a tree, where nodes in the tree correspond to threads.Join and copy operations on trees are implemented through tree traversals, and tree's hierarchical structure allows for pruning of the traversal, which is the key to its optimal time complexity.
Formally defined, a tree clock is a tuple TC = (tr, ThrMap), where tr is a tree such that every node in the tree is a tuple = ( , c, ac, p, ch, rs), where thread identifier , clock c, and attached clock ac are scalar fields, while parent p, head child ch, and right sibling rs are pointer fields to other nodes in tr.Every node except tr's root has an attached clock; for tr's root, its attached clock is undefined and thus marked as ⊥.Intuitively, the thread at the root of tr "owns" the tree clock instance.Node 1 is a child of 2 if the root thread was "made aware of" the thread 2 • (using a message/synchronisation operation ) via 1 • , and the attached clock 2 •ac is the clock of 2 • when this message arrived at 1 • .Therefore, for each node , the attached clocks of its children should be no more than •c.Moreover, its children are arranged in decreasing order of their attached clocks, which facilitates pruning during traversal.ThrMap( ) identifies the unique node in tr with identifier and serves as the timestamp.We provide the visual representation of two exemplary tree clock instances TC 1 and TC 2 in Fig. 15.
In this paper, we focus on formalising and verifying the join operation of array-based tree clocks.For efficiency in an intensive application like data race detection, the tree clock data structure is implemented using arrays instead of explicit pointers.In this case, the join of a tree clock TC 1 into TC 2 (TC 2 •join(TC 1 )) is performed by traversing TC 1 and updating the corresponding nodes in TC 2 , in which the traversal is loop-based with the assistance of a stack.We show a snippet (in C) of the join operation on array-based tree clocks in Fig. 13.

Tree Clocks in Coq, Functionally
We start our tree clock mechanisation by developing its reference implementation in Gallina, the pure functional programming language of Coq.We will then use it in Hoarestyle specifications to connect the reference implementation with the C code, following the relatively conventional twolayer paradigm for verifying imperative programs [3].4.2.1 Datatypes.In Fig. 14a, we model the tree part of the tree clock as the generic functional RLT in Sec.3.1 with the type parameter A instantiated to be the following record type and the type parameter B instantiated to be thread.The id_of_data is thereby set to be the tid getter of the record.Here thread is also a type parameter and will be instantiated as natural numbers in proving the imperative join operation.Note that compared with the axiomatic definition of a node (i.e., defining a node as a tuple), the data held by a treeclock node does not contain the pointer fields: it is implicitly captured in the functional RLT structure.
We model ThrMap via the function find_node introduced in Sec.3.1, and the timestamp from a tree clock (i.e., the (c) Functional version of the tree clock join operation.mapping from the identifier of a thread to its clock value) can be further expressed as getClock funciton in Fig. 14b.

Operations.
In the original tree clock presentation, the join operation builds upon three auxiliary operations: getUpdatedNodesJoin, detachNodes, and a achNodes.All of them are modelled as recursive functions on tree clocks Fig. 15.Illustrative example of joining TC 1 into TC 2 .Each node is annotated with its thread identifier, its clock and attached clock (e.g., for node 2 on TC 1 its clock is 20 and its attached clock is 9).The portion encircled by the blue dashed line is the portion of TC 1 returned by getUpdatedNodesJoin(TC 2 , TC 1 ) (denoted as pf ) such that a thread identifier appears in pf if and only if getClock(TC 2 , ) < getClock(TC 1 , ).And pf is a prefix of TC 1 .detachNodes(pf , TC 2 ) partitions TC 2 into a "pivot" tree (encircled by the green dashed line) and a list of trees (denoted as forest, here containing the trees encircled by the red dashed lines), such that (1) for every tree in forest, its root thread identifier appears in pf and no other thread identifier appearing in pf also appears in that tree; (2) the pivot tree is a prefix of TC 2 and does not contain any thread identifier appearing in pf .
After that, a achNodes(forest, pf ) "attaches" all the trees in forest to pf by matching the root thread identifiers in forest; the resulting tree is the one surrounded by red and blue dashed lines in TC ′ 2 .Finally, the result of join TC ′ 2 is given by making the result of a achNodes be the first child of the root of the pivot tree.and presented in Fig. 14c.To streamline the proof of the imperative join, we model the core part of join into corejoin, with join, the actual functional join operation, being its wrapper (cf.Fig. 14c).While the imperative join TC 2 •join(TC 1 ) modifies TC 2 into another tree clock TC ′ 2 , the corresponding functional join, i.e. join(TC 2 , TC 1 ), just returns TC ′ 2 .Fig. 15 shows a concrete example of execution of join(TC 2 , TC 1 ).

Predicates and Properties.
We define several extrinsic predicates to ensure well-formedness of functional tree clocks.Specifically, the predicate valid(TC) is a conjunction of NoDupId(TC) and the conditions in Sec.4.1 that a tree clock should satisfy.We also define the binary predicate respect(TC 1 , TC 2 ) to be the conjunction of the direct monoticity and indirect monoticity, which are required in the following proofs.Due to space limit, we omit its concrete statement and refer interested readers to its provenance [22].
To guarantee that the functional tree clock implements the interfaces of logical clock correctly, we need to prove that LogicalJoin holds for the functional join operation.With the timestamp model from Fig. 14b, this can be phased as: The proof of this fact closely depends on the property of getUpdatedNodesJoin(TC 2 , TC 1 ) and the lemma from Fig. 7f.

Verifying the C Implementation
As the second step of our verification task, we ascribe a Hoare-style specification phrased in terms of functional tree clock manipulations from Sec. 4.2, to the C implementation 5from Fig. 13 and prove that the specification holds.

4.3.1
The C Implementation.The imperative tree clock joining (cf.Fig. 13) is implemented as a non-recursive tree traversal with its control flow similar that of copyval_and_move from Fig. 4a.Notably, the imperative implementation is nothing like its functional reference counterpart from Sec. 4.2.In the functional join, the join is done step-by-step: we first obtain the prefix getUpdatedNodesJoin(TC 2 , TC 1 ) in one step, then accomplish all the subtree detaching in another step, and finally do all the attaching.In the imperative version, however, we will detach and attach a single subtree in an iteration during the non-recursive traversal of that prefix; the join is done only after finishing the traversal.That is, the traversal_invariant predicate from Fig. 12 is instantiated with getUpdatedNodesJoin(TC 2 , TC 1 ) as an argument.This is because during the imperative traversal, the tree prefix getUpdatedNodesJoin(TC 2 , TC 1 ) is not fully visited, hence the invariant needs to "compensate" for this to match the functional reference implementation.
With the main complexity of the invariant factored out as per the core proof principles of Arboreta, the rest of the proof posed little conceptual challenge.For example, the subprocedure push_child is a structure-changing operation, which can be modelled in the same way as has been demonstrated for prepend_child in Sec.3.2.4.The proof is similar for another subprocedure detach_from_neighbors (called at line 16 of Fig. 14c) that removes a child.
The last thing to note is the switch between array view and tree view.When verifying the code in Fig. 14c, we keep using the array view when "stepping through" at the lines 10-14 and deal with the random access.We then switch to the tree view to handle the structure-changing subprocedures.At the line 23, we go back to the array view, since get_upd_nodes_join_chn (a function used for traversing children nodes) also performs various random accesses.

Evaluation and Benchmarks
To demonstrate the practical relevance of our verified C implementation of array-based tree clocks, we incorporated it inside a HB-based dynamic data race detector.
We evaluate the performance of our array-based implementation over the naïve but easier-to-mechanise pointerbased tree clocks.For a controlled evaluation, our data race detector performs analysis on offline traces logged a priori; this ensures that both implementations work on the same trace (for each benchmark program).Conceptually, the race detector maintains a fixed number of tree clocks, processes events one by one, updates them at each event, and checks for data races at every access event.Our benchmark suite is derived from prior work [22] and includes 146 traces from different concurrent C/C++ as well as Java applications.For each trace, we measured the time taken by the race detection analysis, using both the naïve pointer-based, as well as our verified array-based tree clock implementations.Our evaluation was conducted on a 64-bit Red Hat Enterprise Linux 8.4 machine with a single CPU core and 256GB RAM.
In Tab. 2, we summarise the results of our evaluation.The table aggregates the trace logs into 5 groups, dividing the entire set into roughly equal sets based on their lengths (number of events).Column 1 and Column 2 represent the range of trace lengths and the number of traces in each group respectively.Column 4 (resp.column 5) reports the (geometric) mean of the time taken by the analysis that uses the pointer-based (resp.verified array-based) tree clock implementation.Column 6 reports the (geometric) mean of the resulting speedup in each group; the speedup for a given benchmark is measured as the ratio of the time taken by the pointer-based implementation and the array-based implementation.Overall, the array-based implementation offers a 35% speedup, thanks to the efficiency offered by random access in arrays.More importantly, now this fast implementation comes with a formal correctness proof!

Related Work
Our work contributes to a large body of research on mechanically verified heap-based data structures and algorithms.
Tree manipulations in Separation Logic.Verifying recursive traversals of heap-based trees in SL (mechanised or not) is considered a standard exercise and is featured in a number of papers and teaching materials [8,9,27,30].Reasoning about arrays in SL is also a well-studied area, in which many problems can be reduce to reasoning about lists [15].
To the best of our knowledge, relatively few works are concerned with array-based tree representations.Barriére specifies B+ trees in VST by providing a representation predicate that facilitates proofs about traversals via heap induction, but is less convenient to reason about random accesses, as is allowed by the array view in our approach [5].
As described in Sec.3.2.2,our tree view enables the use of localised reasoning rules such as WandFrameUpdate, which, however, is restricted to scenarios involving the manipulation of only one subtree at a time.Advanced techniques have been proposed to address the more complex cases involving multiple subtrees [7] and we plan to integrate them into Arboreta in the future.
Reasoning about graphs in Separation Logic frequently requires defining a representation predicates similar to our tree_rep [24,34,36].Even though such predicates facilitate certain kinds of inductive reasoning [16], they impose additional proof obligations related to non-interference between recursive calls that can be caused by deep intrinsic sharingsuch obligations would not be necessary for trees.
Our approach of formulating loop invariants for nonrecursive traversals is reminiscent of Charguéraud's specialised rules for inductive reasoning about loops [8, §6], but is tailored for array-based trees and the respective representation predicates.We do not exclude a possibility that such loop invariants can be automatically derived from induction hypotheses of equivalent recursive traversals, and we are planning to investigate this research question in the future.
Reasoning about logical clocks.Tree clocks are a particular instance of logical clocks [18]: a family of data structures that are frequently used as a mechanism for reasoning about causality of events in concurrent and distributed systems.Mechanised implementations of a simpler version of logical clocks-vector clocks [11,23]-are featured in several existing efforts on verified algorithms for dynamic data race detection [21,32,37].Verified vector clocks are also an important component of certified implementations of Conflict-free Replicated Data Types (CRDTs) [14,19,20,26].
We are not aware of any verified implementations of tree clocks, but we believe it should be possible to use our implementation from Sec. 4 as a verified drop-in replacement for vector clocks in some of those systems.The reason we could not do so immediately is the mismatch between the logical foundations of our proofs and the existing implementations of data race detectors and CRDTs.For example, most of the existing mechanised CRDT implementations [14,26] are verified in Iris [17], whose proofs are therefore not directly composable with ours in VST.Mansky et al.'s verified version of FastTrack algorithm [12,21] features a C implementation partially verified in VST.Unfortunately, its proof relies on bespoke specifications of logical clock operations, making it difficult to plug in our implementation "as-is".We leave these exercises in proof composition to the future work.

Conclusion and Future Work
In this work, we have presented a principled methodology for structuring proofs about manipulations with array-based trees in Separation Logic (SL).We implemented the main components of our approach in a Coq library called Arboreta and showcased them on a large case study, verifying an arraybased tree structure used in real-world data race detectors.While our current implementation is tied to the VST framework as a Coq embedding of Separation Logic, we believe, the key ideas of our work can be transferred to other SL embeddings, such as CFML [8], HTT [25], and Iris [17] in a relatively straightforward way.Furthermore, our pure reasoning principles concerning RLTs, such as those delineated in Arboreta-P (Sec.3.1) and those associated with non-recursive tree traversals (Sec.3.3), should be adaptable to other theorem provers based on higher-order logic.In the future, we are planning to extend our case studies to other array-based tree structures, such as AVL and B+ trees.We are also planning to integrate our verified implementation of tree clocks into the verified data race detector by Mansky et al. [21].

Data Availability
The software artefact with a snapshot of the Coq and C developments accompanying this paper is available online [38].It contains the source code of Arboreta, the tree clock case study, and the harness to reproduce the experimental results with the data race detector described in Sec.4.5.
struct node { int val, par, sib, fch; }; struct node tree[N];(a) RLT encoding in C. Correspondence between a tree node and the struct fields.
Example of executing copyval_and_move.
Fig. 10 depicts a sequence of steps showing how the non-recursive traversal progresses.Each subfigure snapshots the state at the start of the corresponding loop iteration.The portion encircled by red dashed line indicates the pre-iteration visited prefix, and a node surrounded by blue dashed line denotes the stack top node with pos being its coordinate.Readers are invited to validate the definitions from Fig. 9b by using the pos and full in each subfigure.

Fig. 11 .
Fig. 11.Retrieving a worklist given a tree tr and a node position pos in it.The auxiliary function seq(n, m) returns a list of natural numbers [n; n+1; ...; m-1].

Table 2 .
Evaluation: array-based v. pointer-based tree clocks Proof Effort The quantitative data about our verification efforts is given in Tab.1.The implementation of Arboreta includes pure facts about array-based trees (Sec.3.1) and separation logic facts about dual views (Sec.3.2) and amounts to 2.7 kLOC of Coq.The formalisation of tree clocks includes the functional reference specification (Sec.4.2) as well as specs and proofs for C code (Sec.4.3), totalling at 5.2 kLOC of Coq.The size of the verified C codebase, not included into the statistics in Tab. 1, is around 150 lines of code.