Verifying correctness of persistent concurrent data structures: a sound and complete method

Non-volatile memory (NVM), aka persistent memory, is a new memory paradigm that preserves its contents even after power loss. The expected ubiquity of NVM has stimulated interest in the design of persistent concurrent data structures, together with associated notions of correctness. In this paper, we present a formal proof technique for durable linearizability, which is a correctness criterion that extends linearizability to handle crashes and recovery in the context ofNVM.Our proofs are based on refinement of Input/Output automata (IOA) representations of concurrent data structures. To this end, we develop a generic procedure for transforming any standard sequential data structure into a durable specification and prove that this transformation is both sound and complete. Since the durable specification only exhibits durably linearizable behaviours, it serves as the abstract specification in our refinement proof. We exemplify our technique on a recently proposed persistentmemory queue that builds on Michael and Scott’s lock-free queue. To support the proofs, we describe an automated translation procedure from code to IOA and a thread-local proof technique for verifying correctness of invariants.


Introduction
Recent technological advances indicate that future architectures will employ some form of non-volatile memory (NVM) that retains its contents after a system crash (e.g., power outage).NVM is intended to be used as an intermediate layer between traditional volatile memory (VM) and secondary storage and has the potential to vastly improve system speed and stability.Software that uses NVM has the potential to be more robust; in case of a crash, a system state before the crash may be recovered using contents from NVM, as opposed to being restarted from secondary storage.However, because the same data is stored in both a volatile and non-volatile manner, and because NVM is updated at a slower rate than VM, recovery to a consistent state may not always be possible.This is particularly true for concurrent systems, where coping with NVM requires introduction of additional synchronisation instructions into a program.
Recently, researchers have developed persistent extensions to existing concurrent objects (e.g., concurrent data structures or transactional memory).This work has been accompanied by extensions to known notions of consistency, such as linearizability and opacity that cope with crashes and subsequent recovery.
This paper examines correctness and formal verification of the recently developed persistent queue by Friedman et al. [FHMP18], against the (also) recently developed notion of durable linearizability [IMS16].Friedman et al.'s queue extends the well-known Michael-Scott queue [MS96], whereas durable linearizability extends the standard notion of linearizability [HW90] so that completed operations are guaranteed to survive a system crash.
Our verification follows a well-established methodology: (1) we develop an operational model of durable linearizability that is parameterised by a generic sequential object (e.g., a queue data structure with enqueue and dequeue operations), (2) we prove that this operational model both sound and complete, and (3) we establish a series of refinements between the operational model and the concrete implementation.The final (and most complex) of these steps, which establishes that the implementation refines the operational model, is fully mechanised in the KIV theorem prover [EPS + 15].It is important to note that the operational model is generic and for any particular verification one only needs to establish step (3) in order to show that a particular algorithm is durable linearizable.
Ours is the first approach to address formal verification of persistent data structures.We consider the development of our sound operational characterisation of durable linearizability and the refinement proofs, including mechanisation in KIV, to be the main contributions of this paper.The mechanisation may be accessed from [KIV20].
We present Friedman et al.'s queue in Section 2, durable linearizability in Section 3 and an operational characterisation of durable linearizability via Input/Output automata (IOA) in Section 4. Section 5 then gives an overview of our proof approach.It requires the translation of programs as given in a programming language into IOA which we describe further in Section 6.For proving refinement, we need to show invariants on the data structure which we carry out by thread-local proof techniques detailed in Section 7. Finally, Section 8 explains how we apply these generic proof concepts on the example of the persistent queue.This article is an extended version of [DDD + 19] adding the technique for translating code to IOA and providing a compositional technique for invariant verification which our refinement approach heavily relies on.

A persistent queue
The persistent queue of Friedman et al. [FHMP18] is an extension of the Michael-Scott queue (MSQ) [MS96] to cope with NVM (see Algorithms 1 and 2).The MSQ uses a linked list of nodes with global head and tail pointers.The first node is a sentinel that simplifies handling of empty queues.The MSQ is initialised by allocating a dummy node with a null next pointer, then setting the global head and tail pointers to this dummy node.
The enqueue operation creates a new node that is inserted at the end of the linked list.The insertion is performed using an atomic compare-and-swap (CAS) instruction that atomically updates the next pointer of the last node provided this next pointer hasn't changed since it was read at the beginning of the enqueue operation.The CAS returns true if it succeeds and false otherwise.Immediately after a new node is inserted, the tail pointer is lagging one node behind the true tail of the queue, and hence, must be updated to point to the last node in a separate step.
The dequeue operation returns empty if the head and tail pointer both point to the sentinel node and the tail is not lagging.If the queue is not empty, the dequeue reads from the value of the node immediately after the sentinel and atomically swings the head pointer to this next node provided it has not changed.Thereby, the next node becomes the new sentinel node of the queue.
A key feature of MSQ is a helping mechanism where a different thread from the original enqueue may advance the tail pointer if it is lagging.In the case of a dequeue, this only occurs if head and tail pointers are equal, but the queue is not empty.

flush(&RVals[i]);
Friedman et al. [FHMP18] adapt MSQ to a system comprising both VM and NVM.In such systems, computations take place in VM as normal, but data is periodically flushed to NVM by the system.Such flushes may happen at any time while the algorithm executes, and there is no specific (e.g., FIFO) order in which writes in VM are flushed to NVM.In addition to system controlled flushes, a programmer may introduce explicit flush events that transfer data from VM to NVM.Only data in NVM persists after a crash (e.g., power loss).A persistent data structure must enable recovery from such an event, as opposed to a full system restart.In doing this, it must ensure some notion of consistency in the presence of crashes and a subsequent recovery operation.Following Friedman et al. [FHMP18], the notion of consistency we use is durable linearizability (see Section 3).
The persistent queue uses the same underlying data structure as MSQ (see Algorithm 1), but nodes contain an additional field, deqID (initialised to −1), which stores the ID of the thread that removed the node from the queue.(The deqID field is further explained as part of the dequeue operation below.)In addition to the head and tail pointers, it uses an array of pointers, RVals, with one index for each thread, containing either null (which is the initial value), a pointer to a cell which itself either contains empty (which signifies that the thread last saw an empty queue), or a value (which is the value that was last dequeued).Unlike MSQ, the persistent dequeue operation does not return a value; instead the returned value for tid is stored in the cell pointed to by RVals [tid].Persistent enqueue.The basic structure (see Algorithm 2) is the same as the enqueue of MSQ.In addition, to ensure that the linked list data structure is recoverable after a crash, nodes and next pointers have to be persisted after being modified in VM.This is achieved by using three flush operations in lines 3, 10 and 14.The first ensures that the node is persisted before it is inserted into the queue; the second and third ensure that the next pointer of a lagging tail pointer is persisted before the tail is advanced.Note that updates to tail do not need to be explicitly flushed because it can be recomputed during recovery by traversing the persistent list.Persistent dequeue.The basic structure of the dequeue operation also resembles the dequeue of MSQ.In addition it uses variables RVals and deqID to guarantee durable linearizability.RVals is an array of pointers to cells that are used to store the value returned by each dequeue.A dequeue creates a new cell at Line 2, then flushes it at Line 3. The pointer to this cell is stored in RVals at Line 4, and this pointer is made persistent at Line 5.
The deqID field is used to logically mark nodes that are dequeued, which occurs at the successful CAS at Line 20.This logical dequeue is made persistent by flushing the deqID at Line 21.After a node has been logically dequeued, the dequeued value is stored in the cell pointed to by RVals[tid] (see Line 22) where tid is the thread ID of the dequeuing thread.This dequeued value is made persistent at Line 23.A dequeue by thread tid stores empty in RVals[tid] if the queue is empty in Line 13, and this value is made persistent at Line 14.
The persistent dequeue operation employs an additional helping mechanism to ensure that these new fields are made persistent in the correct order.In particular, a node that has been logically dequeued in VM must be made persistent before another dequeue is allowed to succeed.Therefore, if a thread recognises that deqID is not −1 at Line 20, it helps the other thread by flushing the deqID field, writing the dequeued value into the cell pointed to by RVals[nxt→tid], flushing this cell, and finally advancing the head pointer.As an example consider the state of the queue depicted in Figure 1.The head pointer head (both in persistent and volatile memory) points to the sentinel node at the start.The queue currently contains elements a, d and b.Assume that a thread 2 has already started dequeing and has successfully executed Line 20 (thereby placing its id into the deqID field of the first node).This field has however not been made persistent yet.Now another thread (say, 5) starts dequeing.Its CAS at Line 20 fails (it sees the deqID entry to not be -1 anymore).Thus it starts helping: it flushes the 2 entry in the deqID field of the first node, updates the return value arrays entry of thread 2 and makes it persistent.Only then can thread 5 resume its own dequeuing operation with the next iteration of the while loop.
Note that the helping thread may be delayed between the read at Line 27 and the write at Line 30, and the original thread tid may begin a new dequeue operation in this interval.In this case, since tid allocates a fresh cell at Line 2, the helping thread's write at Line 30 will harmlessly modify a previous cell.
Recovery.After a crash, and prior to resuming normal operation, persistent data structures must perform a recovery operation that restores the state of the data structure in VM from NVM.The recovery procedure proposed by Friedman et al. is multithreaded (and complex), so we elide its details here (an interested reader may consult [FHMP18]).Instead, we provide a simpler single-threaded recovery operation (see Algorithm 4), which we describe in more detail in Section 6.1.

Durable linearizability
We now define durable linearizability [IMS16], a central correctness condition for persistent concurrent data structures.Like linearizability, durable linearizability is defined over histories recording the invocation and response events of operations executed on the concurrent data structure.Unlike linearizability, durably linearizable histories include crash events.
Formally, we let Σ be the set of operations.For a queue, Σ {Enq, Deq}.A history is a sequence of events, each of which is either (a) an invocation of an operation op by a thread t ∈ T with values v , written inv t (op, v ), (b) responses of op in thread t with value v , written res t (op, v ), or (c) a system-wide crash c.
Given a history h, we let ops(h) denote h restricted to non-crash events, and h |t denote h restricted to (noncrash) events of thread t ∈ T .The crash events partition a history into h h 0 c 1 h 1 c 2 ...h n−1 c n h n , such that n is the number of crash events in h, c i is the i th crash event and ops(h i ) h i (i.e., h i contains no crash events).We call the subhistory h i the i -th era of h (i.e., all largest subsequences of h without crashes are eras).For a history h and events e 1 , e 2 , we write e 1 < h e 2 whenever h h 0 e 1 h 1 e 2 h 2 .
A history h is said to be sequential iff every invocation event (except if it is the last event in h) is immediately followed by its corresponding response event; it is well formed if and only if (a) h |t is sequential for every thread t and (b) each thread id appears in at most one era.Any invocation that is not followed by its response event is called a pending invocation.We consider well-formed histories only.A history h defines a happens-before ordering on the events occurring in h by letting e 1 ≺ h e 2 iff e 1 < h e 2 and e 1 is a response and e 2 an invocation event.Linearizability (and durable linearizability) requires a notion of a legal history, which we define using a sequential object.

Definition 3.1 (Sequential object)
A sequential object over a base type V al is a 5-tuple ( , S , s 0 , in, ρ) where

•
is an alphabet of operations, S is a set of states and s 0 the initial state, • in : → N is an input function telling us the number of inputs an operation op ∈ takes, and We assume outputs of operations to consist of a single value which possibly is the symbol empty or no value denoted by ⊥.In the following we let v v 1 v 2 . . .v n denote a string of n elements and write # v to denote its length n.We write inv t (op, v ) for an invocation of the operation op with n # v inputs by thread t and let Inv be the set of all such invocations.Similarly, we let Res be the set of all responses.
The legal histories of a sequential object S ( , S , s 0 , in, ρ) are defined as follows.We write 0, q 0 , plus the transition function ρ given as a set of pairs out of (S where ε is the empty string, is the empty sequence and • is used for sequence concatenation.For Q, the history h below is sequential and legal h inv 1 (Enq, a), res 1 (Enq, ⊥), inv 2 (Deq, ε), res 2 (Deq, a) whereas the history h • inv 3 (Deq, ε), res 3 (Deq, b) is sequential but not legal.
For the definition of durable linearizability some more notation is needed.We write h ≡ h if h |t h |t for all threads t.We let compl (h) (the completion) be the set of histories that can be obtained from h by appending (some) missing responses at the end, and use trunc(h) to remove pending invocations from a history h (or a set of histories).

Definition 3.2 (Linearizability [HW90]
) A (crash-free) history h is linearizable if there is some h ∈ trunc(compl (h)) and some legal sequential history h S such that (i) h ≡ h S and (ii) ∀ e 1 , e 2 ∈ h : e 1 ≺ h e 2 ⇒ e 1 ≺ h S e 2 .
For durable linearizability, this definition is now simply lifted to histories with crashes.

Definition 3.3 (Durable linearizability [IMS16])
A history h is durably linearizable if it is well formed and ops(h) is linearizable.
Informally, durable linearizability guarantees that even after a crash the state of the concurrent object remains consistent with the abstract specification.This means that the effect of any operations that completed before a crash are preserved after the crash.The effect of operations that did not complete before a crash may or may not be preserved.For example, the concurrent history is durably linearizable since ops(hc) inv 1 (Enq, a), inv 3 (Deq, ε), res 1 (Enq, ⊥), inv 2 (Deq, ε), res 2 (Deq, a) is linearizable with respect to the history h in Example 3.1.On the other hand the history inv 1 (Enq, a), inv 3 (Enq, b), res 1 (Enq, ⊥), c, inv 2 (Deq, ε), res 2 (Deq, empty) is not durably linearizable since the effect of the completed operation Enq(a) is not preserved after the crash.
Our methodology for proving durable linearizability does not use Definition 3.3 directly; instead it uses the following characterisation, which defines the set of all durably linearizable histories for a sequential object.
We let Lin(S) be the set of histories linearizable wrt. the legal histories of sequential object S and define For a given concurrent durable data structure implementing a sequential object S, proving its correctness thus amounts to showing that all histories of the implementation are in DurLin(S).To this end, for a given S, we develop an operational model DurAut(S) whose behaviours generate DurLin(S).We then use a standard refinement approach to show that the implementation model is a refinement of DurAut(S).This is enough to guarantee that the original implementation is durably linearizable.

An operational model for durable linearizability
The operational model for durable linearizability is formalised in terms of an Input/Output automaton (IOA) [LT87].This framework is often used for proving linearizability via refinement [DD15], and here we intend to similarly employ it for proving durable linearizability.
Definition 4.1 An Input/Output automaton (IOA) is a labeled transition system A with • a set of states states(A), • a set of start states start(A) ⊆ states(A), • a set of actions acts(A), and • a step (or transition) relation step(A) ⊆ states(A) × acts(A) × states(A) (so that the actions label the steps).
The set acts(A) is partitioned into internal actions, internal (A) and external actions, external (A). 1 The internal actions represent events of the system that are not visible to the environment, whereas the external actions represent the automaton's interactions with its environment.
An execution of an IOA A is a sequence σ s 0 a 1 s 1 a 2 s 2 a 3 . . . of alternating states and actions such that s 0 ∈ start(A) and for each i , (s i , a i+1 , s i+1 ) ∈ step(A).A reachable state of A is a state appearing in an execution of A. We let reach(A) denote the set of reachable states of A. An invariant of A is any superset of the reachable states of A (equivalently, any predicate satisfied by all reachable states of A).A trace of A is any sequence of (external) actions obtained by projecting onto the external actions of any execution of A. The set of traces of A, traces(A), represents A's externally visible behaviour.If every trace of an automaton C is also a trace of an automaton A, then we say that C implements or refines A.
One advantage of using IOA (and related operational models) is that we can prove refinement using the method of simulations.To prove refinement in our case study, we will later employ forward simulation.The definition of forward simulation we use is adapted from that of Lynch  Forward simulation is sound in the sense that if there is a forward simulation between A and C , then C refines A.
It is well known that forward simulation is not complete for proving refinement.We also require a notion of simulation known as backward simulation [LV95].In this paper, we use backward simulation to prove refinement between the abstract specification (see Fig. 2) and an intermediate automaton (see Fig. 3).Such techniques are well documented in the literature (see [DGLM04,DD15]) and their details are largely uninteresting for the purposes of our case study.We therefore elide the definition of backward simulation in this paper.
For an arbitrary sequential object S, we next construct a durable automaton DurAut(S) (see Fig. 2) whose traces are histories in DurLin(S) only.This automaton can serve as a specification automaton in a refinement proof.The state of this automaton incorporates the state s of the sequential object S, plus for every thread t ∈ T : • a program counter fixing whether the thread is still idle, is ready to be started, is crashed (i.e., has been active during a crash), or is currently executing an operation, • possible input values of the thread's operations and a possible output value.
The step relation of the automaton is -as usual -given in the form of pre-and postconditions of actions.For every operation op in the sequential object, the automaton has actions inv (op), do(op) and res(op), where do(op) corresponds to execution of the abstract operation op, potentially changing the state of the sequential object.We use inv t (op, v ) and res t (op, v ) for inv(op) t ( v ) and res(op) t (v ), respectively.
Note that a thread may only invoke an operation if it is ready.We furthermore have a dedicated crash action that may be executed at any time that sets all active threads to crashed.To ensure that crashed threads are confined to a single era, we use a separate action run that enables idle threads to become ready.While inv (op), res(op) and crash are external actions, run and do(op) are internal.
The theorem below ensures that the traces of the durable automaton are exactly the durably linearizable histories of S. Proof.We start by showing that traces(DurAut(S)) ⊆ DurLin(S).Let σ cs 0 a 1 cs 1 . . .a n cs n be an execution of DurAut(S) and let cs i .s,cs i .outetc. be the components of state cs i .Let tr be the trace of σ .We construct the history h by making the following changes to tr (in this order).
Completion For every a i being a do action do t (op) in σ without matching res t (op), we add res t (op, v ) such that v cs i .out(t) to the end of tr .Truncation We remove all inv t (op, v ) without matching response.
Next, we need to construct a legal sequential history h S such that ops(h) ≡ h S .Let i 1 , . . ., i k be the indices of σ such that a i j is a do action do t (op).Then ρ(cs i j −1 .s,op, v ) (cs i j .s,cs i j .out(t))by definition of the durable automaton.We set

out(t)).
We let h S w i 1 . . .w i k and h S ∈ legal(S).Now assume e 1 ≺ h e 2 .By definition, e 1 res t (op, v ) and e 2 inv t (op , v ) for some t, t ∈ T .Then e 1 has not been added to the trace tr by completion since responses are added to the end.By construction of the durable automaton threads execute inv , do and res operations in this ordering only.Hence the execution σ contains an action do t (op) prior to e 1 and an action do t (op ) following e 2 .Hence e 1 ≺ h S e 2 .
We next proceed with showing that traces(DurAut(S)) ⊇ DurLin(S).Let h be a history of DurLin(S), i.e., ops(h) ∈ Lin(S).By the definition of linearizability there is hence a history h ∈ trunc(compl (ops(h)) and some h s which is a legal, sequential history of S such that for all e 1 , e 2 ∈ h : e 1 ≺ h e 2 implies e 1 ≺ h s e 2 .
Let h s w 0 . . .w m and assume s 0 − − → s m are the transitions in the sequential object S. We inductively construct an execution σ st 0 a 1 st 1 a 2 . . . of the durable automaton such that traces(σ ) h.This construction maintains a prefix σ i of σ , remainder histories h i (suffix of h) and h s i (suffix of h s ) and a set crashed i ⊆ T .We furthermore maintain the following relations among these variables: and t is in crashed i if there exists a prefix ĥ such that h ĥ • crash • h i and t has invoked an operation in ĥ.Initially, σ 0 : st 0 s.t.st 0 ∈ start(A) (which is unique), crashed 0 : ∅, h 0 : h, h s 0 : h s .The so far constructed execution σ i st 0 a 1 . . .a i st i is now extended in the following way according to the next event in the history h i .
• Next event is an invocation: Since each thread appears at most once in one era, t ∈ crashed i .By well-formedness of histories, st i .pc(t)∈ {idle, ready}.If st i .pc(t)idle: extend σ i by run t st i+1 (st i+1 according to transition relation of automaton) and set • Next event is a crash: Here, we again need to consider two cases: Then we extend σ i by do t (op )st i+1 (st i+1 according to transition relation of automaton), set h i+1 to h i , crashed i+1 to crashed i , h s i+1 to h .We can do so because st i .pc(t ) inv (op ) (by well-formedness of history and linearizability preserving real-time order) and st i .val (t ) u and ρ(st i .s,op , val (t )) (st i+1 .s,u ) (by linearizability preserving thread operations and the definition of sequential object).2.
Then we extend σ i by two actions: do t (op)st i+1 res t (op, v )st i+2 (st i+1 , st i+2 according to transition relation of automaton), set h i+1 and h i+2 to h i , crashed i+1 and crashed i+2 to crashed i , h s i+1 to h .By construction, traces(σ ) h.

Proof approach
We verify durable linearizability by proving refinement between the implementation model and DurAut(Q) (the queue Q as of Example 3.1) using the IOA formalism introduced in Section 4. To this end, a number of further steps are required which we outline here before we go into more detail in later sections.

Selection of simulation type (this section).
Refinement on IOA can be shown by forward or backward simulations [LV95] which are both sound (and in combination complete) for showing refinement.For our case study, we need to determine the appropriate simulation type.Translation of implementation (Section 6).Concurrent algorithms are typically directly given in a programming language, which for the durable queue is C. Hence, we need to translate C programs to IOA.In this, we in particular need to take care of the fact that we have volatile as well as persistent memory.Proofs of invariants (Section 7).Simulation proofs often first of all require the definition and proof of invariants on the implementation.These invariants describe constraints on the reachable states of the concurrent data structure.We aim at thread-local proofs of invariants.
We will exemplarily describe all the steps along our case study of the persistent queue.Before going into more detail about these steps, we first take a look at the simulation question.Forward and backward simulations allow a step-by-step comparison between the operations of the implementation and the abstract specification using an abstraction relation.Forward simulations are often easier to show than backward simulations.However, like for the MSQ [DGLM04, DD15], we require a backward simulation for the persistent queue here.To split the complexity of proving, we in addition introduce an intermediate automaton.For the proof, we establish a backward simulation between the intermediate automaton and DurAut(Q) as well as a forward simulation between the implementation of the persistent queue and the intermediate automaton.The intermediate automaton resolves non-determinism at the abstract level in the same way as used in existing proofs of MSQ.Since refinement is transitive and guarantees trace inclusion, these two simulation proofs together are sufficient to show that the persistent queue is durably linearizable.In Fig. 3 we present the intermediate automaton IDQ which is used as an intermediate level in the overall refinement proof.The automaton IDQ is similar to the durable automaton for the queue datatype, DurAut(Q) (see Fig. 2 instantiated for the queue from Example 3.1).Note that we have specified one generic invocation operation inv t (op, v ) for both Enq and Deq.As with DurAut(Q), it has variables pc, val and out, which play the same role, and variable q instantiates the state s.Furthermore, all its actions except for check Emp are also actions of DurAut(Q), and have essentially the same effect.For IDQ we have the following property2 .The additional features of IDQ exist to model a behaviour where a dequeue thread first observes that the queue is empty, and later decides to return empty, at a point when the queue may no longer be empty.The observation is modelled by a check Emp t action, which records in the obs Emp(t) variable the fact that the queue was empty during the execution of t's dequeue operation.In this automaton, it is possible for a thread t to execute a do t (Deq) transition and set the output value to empty whenever obsEmp(t) has been set to true.We note that the queue may not actually be empty when this transition takes place, but this does not affect soundness of the proof method since obsEmp(t) being set to true indicates that the queue has been empty at some point during the operation's execution.Further details of this technique, in the context of linearizability, may be found in [DGLM04,DD15,DSW11].
This theorem already establishes the first half of our refinement proof, showing that the intermediate automaton refines DurAut(Q).We next look at the other half, proving the implementation of the persistent queue to refine the intermediate automaton.

Translating C programs to input/output automata
This section describes the translation of the implementation to IO automata.Once we have an IOA, we can establish a refinement relationship between the implementation IOA and the intermediate automaton using proof techniques for IOA refinement.The implementation IOA describes the states, actions and in particular the step relation step ⊆ State × Action × State.The notation used for specifying IOAs is the notation used in the theorem prover KIV, i.e., the step relation is given via predicate logic axioms.
In earlier case studies, we have done such translations manually [SDW14, DSW11, DDD + 16].For programs with only a few steps this is not too much work, but for longer programs the translation of atomic program steps (the programs here have ca.50 atomic steps) to axioms of predicate logic becomes error prone.Therefore the translation is now done in two steps.First the C programs are translated manually to concurrent programs as used by KIV's program logic.The control structures of the C programs can be used almost one-to-one, however, care must be taken about the granularity of atomic steps and how data structures are specified in the theorem prover.In particular, the volatile and persistent heap that is implicit in the C programs must be explicitly specified.We describe this translation in Section 6.1.The result for the dequeue operation is shown in Algorithm 3. The algorithm includes annotations used by the refinement proof, shown in gray, which will be explained later.
The second step is a comprehensive framework which automatically translates such programs as well as specifications of atomic steps like flushes and crash, or the ones given for the intermediate automaton of Fig. 3, to predicate logic axioms which define the transition relation of an IOA.Section 6.2 defines the resulting structure of the State and Action type, and Section 6.3 breaks down the definition of the full transition relation to predicate logic axioms.
Together with an automated generation of thread-local proof obligations from assertions for program points given in Section 7, the framework allows one to focus verification of the refinement.

The KIV model for C programs
The KIV model for the C algorithm shown in Algorithm 3 has to axiomatise data types for all types used in C. The main effort is needed to translate the implicit heap structure used in C into an explicit data structure.
One natural way to represent a heap is as a (partial) map from locations to values.The main difficulty in using this technique is that such a partial map has a single fixed range type, whereas the heap we are modeling stores various different types, such as references and boolean flags.This problem is well-known, and various solutions have been proposed (e.g.see [TK05]).We use a simple technique based on representing heaps as partial maps, but we use several such maps in a way that enables us to exploit the KIV type system as much as possible.
Our KIV model uses a type Re f to represent references to heap allocated objects and a type N ode to represent the nodes themselves.The model has a state variable vs : Re f N ode containing the volatile memory version of the queue nodes, and a corresponding map ps : Re f N ode containing the persistent memory version.Retrieving the value stored under a reference r allocated in heap vs, is denoted as vs [r ].Every shared variable in our model except Tail follows this pattern: we have one variable storing the value currently in volatile memory, and one variable storing the value currently in persistent memory.The remaining variables are as follows: • A variable vhead :Re f containing the volatile value of head, and its persistent counterpart phead : Re f .In contrast to C, the KIV model does not store these values under a fixed address in the heap.• A function vRetRefs : T id → Re f and its persistent counterpart pRetRefs : T id → Re f .These variables represent the return-value array, which maps each thread to a cell on the heap for storing the thread's return value.• A map vrvHp:Re f Opt V alue and its persistent counterpart prvHp.These variables represent the returnvalue heap cells themselves.Each value in the type Opt V alue is either empty, (when the cell has not yet been initialised) or Val(v ) for some value v (when the cell has been set to contain v ).
• Our model has a variable vtail : Re f representing the volatile value of Tail, with no persistent counterpart.This exception is because the queue algorithm never explicitly flushes Tail, and does not depend on the value of Tail being preserved across crashes.As we shall see, vtail is set to an arbitrary value during crash events and then an appropriate value is inferred for vtail during recovery.
Thus, the parts of the heap that contain queue nodes, return values, global variables, and the array containing references to return values are all stored as different variables.This strategy allows our accesses to the heap to be properly typed by simply typed lambda calculus, which is KIV's type system.
Recall that both threads and the NVM system are able to flush values from the volatile heap to persistent memory.Conceptually, these flushes are uniform: the value of any given location is flushed, irrespective of its type.However, because we use distinct variables to represent entities stored on the heap, we must define a flush action for each such pair.Thus, • We model flushes of Head by performing the assignment phead : vhead .
• We model flushes of entries in the return-value array with the assignment pRetRefs(t) : vRetRefs(t), where t is the ID of the thread whose entry is flushed.• We model flushes of a return value stored at address r by performing the assignment prvHp[r ] : vrvHp [r ] where r is the reference to the return value that is flushed.
The code has explicit flushes as part of the enqueue and dequeue programs, as indicated by the comments for Algorithm 3.All flushes are additionally possible as steps of the operating system which can happen independently of any code the individual threads currently execute.These are modeled additionally as global steps in the KIV specification, which are added to the step relation of the IOA.

Modelling of crash and recovery.
Recall that after a crash and restart, we assume NVM systems copy the persistent store back into volatile memory and run a queue-specific recovery procedure before starting any program threads.We model this three-phase crash-copy-recover process as follows (Algorithm 4 on page 13 presents the formal KIV description).In the first phase, we model the effect of the crash itself.First, all the shared variables are set to arbitrary values.Second, all threads that are not idle are moved to a crashed state, precisely as in the canonical automaton.Third, the local variables of each thread are set to nondeterministically chosen values3 (using the notation :∈).
In the next phase our model copies the persistent variable of each persistent/volatile pair into its volatile counterpart.For example, the crash action performs the assignments vhead : phead and vs : ps.
In the final (recovery) phase, our model executes a recovery procedure to bring the queue into a consistent state (see Algorithm 4).We assume that the NVM system runs the recovery procedure before starting any program threads.Therefore, no program thread using the queue executes any action between the crash event and the completion of the recovery procedure.For this reason, we execute the entire crash-copy-recovery process as a single atomic step, i.e., all statements of Algorithm 4 conceptually occur in one step (no concurrent activity possible).

Allocation of references.
Our model must support the allocation of new references.To perform an allocation, a thread nondeterministically chooses a reference not already in the domain of the appropriate partial map, and then updates the map so that the new reference is mapped to some initial value specified in the code.Note that the queue algorithm does not explicitly deallocate any reference, depending on garbage collection or a similar mechanism.Accordingly, we do not model explicit deallocations.Thus, the domain of each partial map increases monotonically.To ensure that an allocation is always possible we require that the Re f type is infinite, while the domain of each map is finite in any reachable state.
KIV model.The full specification which contains all the programs and the axioms that are generated from them can be found online at [KIV20].Here, we only give some more information on the notation used in Algorithm 3. The first items address the data types used, the rest explains the control structures.
• KIV prefers a direct specification of finite maps as a (non-free) datatype which is constructed from the empty map ∅ by adding or overriding key-value pairs with an update operator vs[r := nd ].The constructor yields a map, where reference r has been updated or allocated with value nd .• The check that a reference r is allocated in vs, i.e. in the domain of the partial function vs, is written as r ∈ vs.
Reading the node stored at reference r ∈ vs is written as vs[r ].
• Reading a reference r from memory is always done from the combined memory ( ps ⊕ vs). 4 The result of ( ps ⊕ vs)[r ] is vs[r ] when r ∈ vs, otherwise ps[r ]. • Nodes nd of type N ode are specified as a typed tuple with three fields of type V alue, T id ∪{−1}, and Re f .The fields are selected (and overwritten by assignments) using selectors nd .val,nd .nextand nd .deqID.Flushing the next field of the node stored at r is written as the assignment ps[r ].next : vs[r ].next and similar for the other two fields.• The dequeue program starts with program counter value ready which indicates that the thread is not running a program.It also returns to this label at the end of the code.The program has no input parameters (before the semicolon), and one output parameter optval, which can either be Some(v ), where v is of type V alue, or empty.• Assignments separated by comma are executed as one atomic, parallel assignment.
• Allocation in line D2 is done by choosing a reference re f (the star indicates, that the choice is executed atomically with the next assignments) that is not yet allocated and by storing some random value oval under this reference.Allocation must be done atomically in both the persistent and the volatile memory, to avoid inconsistencies.• The "if CAS" of the C algorithm has been broken up into two atomic steps at lines D19 and D20 using a local variable success.The first step executes the CAS itself expanded to a conditional.Again the if* indicates that the (compare) test is executed together with the assignment in both branches as one atomic step.The second step executes the conditional around the CAS using the result success.The local variable is additionally used as the test when to leave the main while loop.This avoids having a return in the middle of the code.Several more statements have also been split to ensure atomicity, e.g.line D25 reads the threadid other t into a local variable before writing it.• A few else skip statements have been added, since the language currently does not support conditionals without an else part.

States and actions of the automaton
The main work of automatically translating the programs of the case study to an IOA is to generate axioms for the step relation step ⊆ State × Action × State of the IOA from the individual atomic steps of the given enqueue and dequeue algorithms in Fig. 2. The KIV specification in Algorithm 3 (in contrast to Alg. 4) defines a non-atomic program5 .In non-atomic KIV programs every statement bringing the program counter from one label to the next forms one atomic step.The labels are defined as an enumeration type PC of program counter values.The enumeration includes constants for threads being at some atomic step within a program but also idle for a thread that has not started, ready for a thread that has started but is in between calls of enqueue and dequeue, and crashed.
In the IO automaton such atomic steps are (most of the time) translated to internal τ actions.However, some such steps correspond to persistence points, i.e., the points in time when the effect of an operation (like the dequeue) becomes visible to other threads.The identification of such points are crucial for the later refinement proof (see Section 8) and the IOA actions of these steps have to be non-τ actions.KIV allows to specify the IOA action associated with a statement in the KIV model using with-clauses.An example of this can be seen in line D14 of Algorithm 3 which specifies the statement to be the do t (Deq) action.A more complex example of a with-clause is discussed later.
Besides such atomic steps of nonatomic programs, the translation to an IOA also has to consider two other types of steps: (1) Atomic steps of one thread, like starting a thread (moving from label idle to ready) as well as all the steps of IDQ shown in Fig. 3 that are labeled with a thread t, and (2) atomic global steps not executed by a specific thread, like system flushes and the crash-recovery.The next section explains the translation of all such steps to a step relation; this section fixes how the types State and Action are defined.The states of the automaton are constructed as tuples mkstate(gs, ls f , pc f ) with three components.
• The global state gs (of type G S) is a tuple of all the global variables used.In the case study this state contains e.g. the vhead variable and the persistent and volatile heap ps and vs.We use gs.vhead, gs.ps as selectors ("getters" in Java) for components.The global state also contains the auxiliary variables used for verification.• The second component ls collects all the local variables used by the threads.Formally it is a function function ls f : T id → L S, where a tuple ls : L S collects all the local variables used in programs.These include parameters (such as the input value to enqueue) as well as all local variables like f irst and nxt used by the algorithms.• The thread id t of the thread itself is always stored in local variable tid of each local state.This component can be read, but not assigned, resulting in the invariant ls f (t).tid t.We use the notation ls f (t := ls) to denote the function ls f , where ls f (t) has been modified to be ls.
• The third component of the state pc f specifies the program counter for each thread.This component could be stored as a component of the local state.However, since it is frequently accessed, we store it separately as a function pc f : T id → PC, where PC is the enumeration type of all the available program counter labels.
The type Action of actions of the automaton is the disjoint union of elements that are generated by default as follows: • As required for proving linearizability, invoking and returning steps of nonatomic programs op (here: op ∈ {Enq, Deq}) have invoking and returning actions inv t (op, args), res t (op, vals), where args are the inputs of the operations (if any), and vals are the outputs6 .• Deterministic steps of nonatomic programs are assigned the default action τ .
• Nondeterministic steps are assigned an action that fixes the nondeterministic choice.The programs of our case study have two nondeterministic steps: enqueue allocates an arbitrary reference node ∈ Re f outside the heap, and dequeue allocates a new reference re f to a return value at line D2 in Algorithm 3. The latter gets the action chooseD2(re f , oval) where re f is the chosen reference and oval is the random initialization value.This allows a deterministic computation of the next state when the action is given.• Atomic programs op like starting a thread t define an action call t (op, args, vals), global steps such as a flush define call(op, args, vals) (with no subscript t on call).

The step relation of the automaton
The section gives the axioms defining the step relation step ⊆ State × Action × State of the automaton which is automatically generated from the programs.To efficiently prove that individual steps of the programs satisfy certain properties, it is necessary to give a definition which ensures that most axioms and proof obligations can be defined local to one thread.This is done by using a precondition predicate together with three step functions for (1) global steps gstepf, (2) local steps lstepf and (3) the change of the program counter pcstepf. pre lstep(gs, ls, pc, a, gs , ls , pc ) ↔ pre(gs, ls, pc, a) ∧ gstepf(gs, ls, pc, a) gs ∧ lstepf(gs, ls, pc, a) ls ∧ pcstepf(gs, ls, pc, a) pc (4) using an predicate lstep ⊆ G S × L S × PC × Action × G S × L S × PC.The axioms as well as proof obligations given later use the convention that free variables are implicitly universally quantified.
The reduction to the three functions works for all the steps of nonatomic programs and for all atomic steps of threads which execute a single (parallel) assignment, e.g.those of IDQ7 .
Global steps like flushes and the crash are added as additional disjuncts to the formula defining the step relation (3).We demonstrate the translation with the crash, which is the most complex such step, as it is nondeterministic and modifies all local states.All other steps are deterministic and modify the global state only.Thus, they can be simplified to a predicate logic formula similar to the translation of atomic steps within a program given below.
The crash and the subsequent recovery are modeled as the program given in Algorithm 4. This program modifies gs, ls f and pc f .It first ramdomises all local variables (except the thread id of each thread) and the volatile global state (vhead, vtail, vs, vRetRefs and vrvHp) and sets pc f (t) to crashed for all threads that have already started (are no longer in state idle).Then the recovery program is executed.This first restores the volatile heaps from the persistent ones.Then it restores the volatile head vhead by traversing the queue starting with phead until an unmarked cell is found.Finally it restores vtail by further traversing the queue until a null-reference is found.Together this gives the program in Algorithm 4.
It is easy to prove that this program always terminates, its step relation is added disjunctively to the previous definition (3) of the step predicate as one atomic step.
step(mkstate(gs, ls f , pc f ), a, mkstate(gs , ls f , pc f )) ↔ (a crash ∧ Crash() mkstate(gs, ls f , pc f ) mkstate(gs , ls f , pc f )) ∨ (3) where crash is the action for a crash taken from the IDQ automaton, and program Crash() is given in Algorithm 4. Formula p s s states that "there is a terminating run of the program p which starts in s and ends in state s " in Dynamic Logic8 .
It finally remains to translate individual steps to axioms defining pre, lstepf, gstepf and pcstepf.The axioms are given for each program counter value pc 1 , pc 2 , . . .separately.We only give the axioms for the steps resulting out of a translation of line D21.Line D21 is a persistence point for the dequeue operation (as marked by the withclause) and hence sometimes becomes a non-τ action in the IO automaton.This is conditional on the current state of the queue: if the node to be dequeued still has its deqID to be -1, the node can be dequeued and hence the action is do t (Deq).If this is not the case, the action is τ (because no dequeuing operation takes effect at this statement).This is specified in the precondition predicate: The global step function describes the fact that step D21 flushes the deqID field of the thread's local variable nxt from volatile to persistent memory.The notation employed for gstepf is KIV's notation for updating partial maps.The local state is left unchanged, and the program counter of a thread is moved from D21 to D22 by this step.

Thread-local proofs of assertions
A key ingredient for forward simulation proofs is the formalisation of an invariant that restricts the reachable states of the IOA of the implementation.Such an invariant typically comprises various assertions that are required to hold at different control points of the algorithm.
In work [DDD + 16], we have developed a thread-local proof method for establishing a global invariant for an IOA that has a structure consisting of mainly thread-local (program) steps.The proof method adapts traditional rely-guarantee approaches [Jon83, dRdBH + 01] to our setting.For each case study, one can show that these proof obligations together guarantee a complex invariant, however, the definitions of the required predicates and associated lemmas are often tedious, and the proofs often time consuming.In this section, we define proof obligations that are automatically generated from the assertions given at program points.These enable one to generate the proof obligations and the definitions of the invariants from them automatically.The proof system takes the following as input.
• For every label pc k ∈ PC an assertion ϕ k that must hold for a thread whenever the thread is at the control point labelled pc k .Within KIV, ϕ k is encoded as a comment at label pc k .Such assertions range over global variables and local variables of the thread in which the assertion appears.Recall that our program may contain special labels such as idle and ready; explicit assertions can also be given at such labels.Quite often, an assertion may range over a set of program locations (defined by program counter values).Explicitly given ranged assertions are conjoined to every ϕ k where pc k is in the range.For these assertions, three types of proof obligations are generated.
• For every step from label pc i to pc j with action a: step-i-j: ϕ i (gs, ls) ∧ GInv(gs) ∧ pre(gs, ls, pc i , a) → ϕ j (gstepf(gs, ls, pc i , a), lstepf(gs, ls, pc i , a)) ∧ GInv(gstepf(gs, ls, pc i , a)) • For every step from label pc i : rely-i: ϕ i (gs, ls) ∧ GInv(gs) ∧ pre(gs, ls, pc i , a) ∧ ls.tid t → Rely(gs, t, gstepf(gs, ls, pc i , a)) • For every label pc i : stable-i: ϕ i (gs, ls) ∧ GInv(gs) ∧ Rely(gs, t, gs ) → ϕ i (gs , ls) The first proof obligation (step-i-j) guarantees that each thread-local step guarantees the thread local assertion at the next control point (pc j ) and preserves the global invariant.The other two proof obligations ensure that steps of other threads do not invalidate assertions.This is split into showing that all such steps are rely steps (rely-i), and that all assertions are stable with respect to the rely (stable-i).
In the generated proof obligations, two simplifications are possible.First, since many steps do not update the global state (gstepf(gs, ls, pc i , a) gs) the rely proof obligations can be dropped provided the Rely predicate is reflexive.Second, if ϕ i and ϕ j , for i j , are syntactically the same formula, stable-i and stable-j are the same proof obligation, so only one of these is generated.
Moreover, for proofs of persistent memory programs, the steps corresonding to a flush only change the global state, thus their proof obligations are simpler.Thus, for these, the step-i-j proof obligation only requires one to prove that GInv(gs) is preserved, while the rely-i proof obligation requires the change to satisfy the Rely for all threads (since a flush is executed by the system).Therefore such steps preserve all assertions of all threads by the stable-i conditions and no extra stability proof obligations is required for the flushing step itself.
Within KIV, the proof obligations are defined such that a predicate Inv(gs, ls f , pc f ) is invariant, where Inv is defined as follows: The definition uses a local invariant LInv(gs, ls, pc) for each thread which is generated from the assertions and is a very large formula for the case study (several pages of text).Defining and maintaining this huge formula manually has been a severe problem in earlier case studies.Note, however, that for a specific label pc i , the formula LInv(gs, ls, pc i ) reduces to the assertion ϕ i (gs, ls) given at that label which is usually between two and five lines of text.
It remains to proof for the global steps.For the crash step-crash:GInv(gs) ∧ LInv(gs, ls, pc) → wp(Crash, GInv(gs) ∧ LInv(gs, ls, pc)) has to be proven using weakest-precondition calculus.The other global steps are the flushes.These execute a single assignment that modifies global state only.For these the wp-formula simplifies to predicate logic.As an example, for the flush of vhead with the assignment phead : vhead the postcondition of the implication is GInv(gs.phead:=vhead ) ∧ LInv(gs.phead:=vhead , ls, pc).
The following theorem shows the soundness of this thread-local proof technique, i.e., it shows that the above described proof obligations are sufficient for proving invariants.
Theorem 7.1 If step-i-j, rely-i, stable-i and step-crash hold, and Inv(gs, ls f , pc f ) holds in initial states, then Inv is an invariant of the automaton generated from the program, i.e., Proof.The proof is by showing that step-i-j, rely-i, stable-i are sufficient to establish the standard decomposition lemmas The first three lemmas follow from the proof obligations, by expanding the definition of lstep and by a case split over all possible pc i .Lemma otherstep-lemma states, that the local invariant LInv of thread lsq.tid is stable, when thread ls.tid executes a step.It combines the previous thre lemmas: step-lemma and rely-lemma allow to infer GInv(gs ) and Rely(gs, t, gs ) from the preconditions of the implication, enabling stable-lemma (with lsq and pcq instantiating ls and pc) which gives the postcondition.
The proof of the theorem has to first split away the global steps from the definition of step.
• For global steps, the proof follows directly from lemmas like step-crash by instantiating ls and pc with any ls f (t) and pc f (t) that results from expanding the definition of Inv.• Otherwise there is a thread t, such that the step is an lstep.By the definition of lstep, it has to be shown that Inv(gs , ls f [t : pc ], pc f [t : pc ]) holds after the step is taken, when Inv(gs, ls f , pc f ) and pre(gs, ls f (tid ), pc f (t)) holds and we have that for some action a: gs gstepf(gs, ls f (tid ), pc f (tid ), a) ls lstepf(gs, ls f (tid ), pc f (tid ), a) pc pcstepf(gs, ls f (tid ), pc f (tid ), a) respectively.
Unfolding Inv, we have to prove that GInv(gs ) and LInv(gs , ls , pc ) hold, and that LInv(gs , ls f (t), pc f (t)) is true for every t ls.tid.The first two properties follow from step-lemma.Finally, otherstep-lemma gives the desired conclusion, by instantiating lsq an pcq with ls f (t) and pc f (t) respectively.

Verification of the persistent queue
In this section, we apply our methodology to the persistent queue.In Section 8.1 we give an overview of the key properties needed for our proof.Section 8.2 describes the use of auxiliary variables that we use to track ownership queue nodes.Section 8.3 shows how persistence points are specified and explains the auxiliary (greyed out) code in Algorithm 3. In Section 8.4, we define our forward simulation relation.Section 8.5 briefly describes invariants and properties of the queue that are necessary for the proof, but not critical to the main ideas.Finally, Section 8.6 gives a summary of the effort needed to verify the case study.

Key properties of the queue data structure
There are several key properties that the persistent queue must maintain in order to ensure correctness.These properties are formalised as conjuncts of the global invariant of our proof.Consider Fig. 4, which represents a state of the queue data structure.The first part of this structure, the old reference list, contains references to nodes that have been logically dequeued from the queue.The second part, the queue reference list contains references to nodes that have been enqueued but not yet dequeued.To track these lists, we introduce two auxiliary variables: orl for the old references and qrl for the queue references, which are disjoint lists over the Re f type.The expression orl + qrl is the concatenation of our two lists, and we use the expresson qrl .last to mean the last element of qrl when qrl is nonempty.One problem that we must solve is to constrain the set of fields of nodes in the persistent queue that may be in a volatile state: that is, the unpersisted fields for which ps and vs disagree.Our invariants require that the val and next fields of each node referenced by an element of orl + qrl are both nonvolatile, so that vs and ps agree on these fields, with the possible exception of the next field of last node in qrl .The enqueue operation ensures these fields are nonvolatile before a node is enqueued.We further require that every deq ID field of every node is nonvolatile, except possibly the deq ID of the first element qrl.head of qrl .We explain these exceptions below.
The references in the queues are ordered such that for any two references r , r that are adjacent and in that order in orl + qrl , we have ps[r ].next r .Intuitively, the nodes referenced by elements of qrl contain the values that are currently in the queue and accordingly, we require that every such node has a persistent deq ID of −1 indicating that it has not been dequeued.We require that vhead be either the last or second to last element of orl and that phead is always an element of orl .A node is considered to be in the current queue state if it can be reached from phead by following next fields, and its deqID field in persistent memory is −1.It is these properties that guarantee that the recovery procedure is always able to find the logical head of the queue by traversing next pointers from phead .Fig. 4 illustrates a state satisfying our invariant where phead vhead .
Nodes move from qrl to orl during dequeue operations.The first step in this process is when a dequeuing thread successfully enters its id into the deq ID field of some node in line D19 of Algorithm 3. It can be shown using the local assertions of the dequeue operation that this node is always the first node of qrl, and that this node is referenced by ps [vhead ].next.At this point, the effect of the CAS has not been persisted, and the deq ID field of ps [vhead ].next is thus volatile.In the original queue, the linearization point that removes the element from the queue is executed with the successful CAS.This is not possible here, as a crash after the CAS would yield a persistent queue where the element d is still in the queue, "undoing" the linearization.To have a one-to-one match with the abstract queue, we therefore leave orl and qrl unchanged on a successful CAS.The next step in the process is that this field is flushed, either by the dequeuing thread at line D21, a helping thread, or the NVM system's flush action.This flush is the persistence point of the dequeue operation.Persistence points are analogous to linearization points in conventional linearizability [DD15].Roughly speaking, the linearization point of an operation is the point when the effect of the operation becomes visible to all threads.This is also true of persistence points, with the additional requirement that once a persistence point has been reached, the effect of the operation must still be visible after a crash.Thus, an update volatile that might be a linearization point in the conventional setting will typically not be a persistence point.Rather, the persistence point is often the moment when this update is flushed to the persistent store.
Line D21 moves qrl.head to the end of orl.The corresponding updates to qrl and orl in the auxiliary code of line D21 are explained in detail in the next Section 8.3.For the queue depicted in Fig. 4, the queue immediately after the volatile deqID of node d is flushed is as follows.
The abstract queue corresponding to this queue is e, f .Note that in the queue above, vhead pointer is now lagging (i.e. the pre-to-last element orl.butlast.last of orl) and must be updated to point to the new sentinel node d .This final step of the process of the process must occur, before another successful CAS of a dequeue is possible.Therefore a lagging vhead implies that the first element of qrl must still have deq ID field −1 in volatile store.
The process by which a node is added to qrl is simpler.Recall that the newly allocated node of an enqueue operation is added to the queue data structure by the CAS at line 9 in Algorithm 2. The persistence point of the enqueue operation is the point when the effect of this CAS is flushed (no later than line 10) and it is at this moment that the reference to the enqueued node is added to qrl .Consider again the queue in Fig. 4. The queue immediately after the next pointer of f becomes persistent is as follows.
old reference list queue reference list Note that this transformation must be performed before moving vtail , otherwise the nodes after g could be lost upon system crash.In the queue above, vtail is lagging and hence must be updated before a new node can be enqueued.As soon as the next pointer of f becomes persistent, the node g is considered to be part of the queue, i.e., the abstract queue corresponding to the queue above is d , e, f , g .

Ownership-based properties
The invariants and rely conditions we use must be able to track exclusive read/write access of different shared memory components.To this end, we employ an ownership-based mechanism, which describes how threads are accessing different shared resources.
In our queue example, the shared resources are the node references, and the owners are threads that are executing the queue methods.In particular, we maintain an auxiliary variable owns : Re f → Ownership, that maps each node reference to a value representing the ownership status of the node.We distinguish between two forms of ownership, strong and weak ownership.Both forms guarantee that whenever a thread owns a node reference, the fields of the node are not changed by other threads.Strong and weak ownership, however, have different stability properties.Strong ownership is stable: whenever a thread t strongly owns a reference, after any step of any other thread, thread t still strongly owns the reference in the post state.We model this property using a rely condition in Section 8.5.Weak ownership on the other hand is not stable, and may be removed by other threads.Note that during execution, a thread may transition from strongly owning a reference to weakly owning that reference.A thread may also give up ownership altogether.
Ownership and enqueues.Each enqueue operation is responsible for ensuring that key invariants of the queue nodes are satisfied when it adds its new node.To do this, we must ensure that newly allocated nodes are disjoint from the nodes of the queue.
When a thread t allocates a new node, node, at the start of enqueue, we set owns(node) Strong(t) to reflect the fact that t owns node.At that point no other thread can modify node, i.e., this is sufficient to guarantee that distinct threads modify distinct nodes, and thus that each thread can rely on the next and val fields of its own node not changing in volatile memory.After a successful CAS at line ownership of node by thread t changes to W eak(t).Now t itself another thread is able to execute a flush, which adds node to the persistent queue.When this flush occurs, we set owns(node) N one.
Our invariant requires that each reference r in orl + qrl satisfies owns(r ) N one, reflecting the fact that the set of nodes in the queue is disjoint from the set of nodes being initialised by some enqueue operation.To achieve this, it is sufficent to set own(r ) N one when r is added to qrl , which occurs when the next field of vs[qrl .last] is flushed at a point when r vs[qrl .last].next.Now, because every reference in orl + qrl has no owner but every modification of a next or val field occurs when the modifying thread has strong ownership, we can show that these modifications never change these fields within the queue data structure.
As we describe in Section 8.3, the persistence point of an enqueue operation is the moment when the reference to the operation's newly allocated node is flushed to the persistent store, which occurs after the CAS at line 9.We would like to use the ownership state of this reference to determine the id of the thread that enqueued the node.Thus we cannot set owns(node) N one after the CAS at line 9, where node is the local variable containing a reference to the new node.On the other hand, we cannot allow owns(node) Strong(t), because then t must "lose" this ownership when the field is flushed, contradicting the stablility of strong ownership.
We resolve this dilemma by using weak ownership, and setting owns(node) W eak(t) at the CAS at line 9.Our invariant requires that if the next field of vs[qrl .last] is in a volatile state, then we have that owns(vs[qrl .last].next)W eak(t) for some thread t that has not yet passed its persistence point.As described in the next section, we then use this value to construct the abstract action to simulate at this step.
Helping and the return value array.As described in Section 2, the persistent queue employs a helping mechanism to allow dequeuing threads to complete another thread's concurrent dequeue operation (see lines 27 to 32).The key feature of the helping mechanism is that a helping thread writes the value from the dequeued node into the other thread's current return-value object, and flushes that object to the persistent store.The challenge presented by this part of the algorithm is in dealing with this thread-to-thread synchronisation in the context of a thread-local proof.
In our formal model, we specify that the dequeue procedure returns the local variable optval, and this persistent queue is durably linearizable.However, in order to enable the program to recover from a crash, the persistent queue ensures that this value is persistently available in prvHp [vRetRefs(t)].To reflect this in our KIV development, we prove that the assertion optval prvHp[pRetRefs(t)] holds at the last line D34 of the dequeue code.This is one of the most tricky parts of the verification and requires a new auxiliary variable and new invariants as we now explain.
We introduce a new auxiliary variable r v Owns which maps each return-value object reference to a value describing the logical state of the reference.We update this auxiliary function as follows: • When a thread t allocates a new return-value object r (line D2), r v Owns(r ) is set to Pending(t).
• When a thread t claims a node to be dequeued (by successful execution of the CAS at line D19) r v Owns (vRetRefs(t)) is set to Claimed(t, v ) where v is the value stored in the val field of the node claimed.
It is useful to think of this technique as an extension of the ownership technique that we use to manage queue nodes.We think of the thread argument of each r v Owns value as the owner of the reference.Only steps of the owner thread are capable of changing an r v Owns(r ) value, and we encode this property in our rely condition.Furthermore, given this rely property, it is straightforward to show the following return-value ownership property: for all threads t, if vRetRefs(t) is not null, then t is the thread id of r v Owns(vRetRefs(t)).
We now describe how the r v Owns variable is designed to support the persistent queue's helping mechanism.After a thread t executes the CAS at line D19, installing its id in the deq ID field of the dequeued node nd , it is possible for another thread t to attempt to help t by writing the dequeued value into the return-value object of t.When this happens, there is a race between the write of t at line D22 and the helping write of t at line D29, and the correctness of the helping mechanism depends on t writing the correct value.Recall that r v Owns(vRetRefs(t)) is set to Claimed(t, vs [nd ].val), when t successfully executes the CAS at line D19.We use the value in this Claimed state to ensure that t writes the correct value into the return-value object when it executes the helping write at line D29.To achieve this, the precondition of the helping write of t implies r v Owns(addr ) Claimed (othert, vs[nd ].val) ( 5 ) where othert is a local variable of t which records the value of t in this scenario.We explain the value of addr shortly.Our rely relation is defined so that for any allocated reference r , if r v Owns(r ) is in the Claimed state, then the value stored at that reference does not change, and that vs [nd ].val never changes any node.These two properties are sufficient to ensure the stability of (5).Note that (5) is sufficient to prove that our helping mechanism is correct, so long as addr points to the appropriate return-value object.That is, for every thread t , if addr vRetRefs(t ) then t t, but this follows from the return-value ownership property described above.
This concludes our overview of how we manage the persistent queue's helping mechanism.The reader is referred to the KIV development for the full treatment.

Identification of persistence points
Finding persistence points is similar to standard linearizability, where proofs proceed via identification of linearization points [DD15].However, in durable linearizability, persistence points are typically statements (flush events) that cause the operation under consideration to become durable.Thus these statements must be simulated by the abstract do operation.Note that persistent points must occur after an operation has taken effect in NVM, but before the operation returns.
In MSQ, both the enqueue and the dequeue operation linearize upon successful execution of the CAS at line 9 of the enqueue and line 24 of the dequeue of the C code in Algorithm 2. However, in the persistent queue, these volatile memory actions cannot the persistence points of these operations, since their effects can be lost by an immediately subsequent crash.Rather, in the persistent queue, the persistence point is the first operation that flushes the effect of this CAS.
These flushes may occur by the same thread in the line following the CAS, or by another thread helping, or due to a system-controlled flush.Despite the actual persistence point being any of these possibilities, it is still possible to prove forward simulation with respect to the do(Enq) operation of the intermediate automaton IDQ.
To enable an elegant refinement proof we use a feature of KIV that allows us to overwrite the default internal action τ of program steps given in Section 6.2 with a custom action.We overwrite the default action with the corresponding action of IDQ given in Fig 3, thus allowing the forward simulation to have a one-to-one match between actions.Technically, the KIV code of Algorithm 3 has with clauses after line numbers that specify the custom action.
For dequeue operations that return a value (rather than empty), the persistence point is the flush at line D21, if the deqID field has not yet been flushed already.Therefore, the action is set to the action do t (Deq) of the intermediate automaton IDQ, when the persistent deqID field ps [nxt].deqID is still −1 (recall that (ϕ ⊃ t 1 ; t 2 ) evaluates to t 1 when ϕ is true, and to t 2 otherwise).The dequeue operation can also help another thread othert to execute the persistence point do othert (Deq) of its dequeue operation in line D28 under the same condition.Finally, the persistence point may also be a global flush of the vs[r ].deqID to ps [r ].deqID, when r is qrl.head, the first element of the current queue references.In all cases the auxiliary variables qrl and orl that store the current and old reference list get updated by moving the head of qrl to orl.
The enqueue persistence points are similar.A successful CAS at line 9 appends a new node to the queue, but this CAS is not persisted until a subsequent flush, which is the enqueue operation's persistence point.For the C code of Algorithm 2, the flush may occur in line 10 of the same thread, line 14 executed by another thread, or with a global flush of a next field.An additional persistence point, where the action do t (Enq) is executed, is line D16 in the dequeue algorithm shown in Algorithm 3. Compared to dequeue, an additional problem is to determine the thread t that executes the persistence point, since the thread id is not stored in a field.The relevant thread is always the one that allocated the node that is now enqueued.The next field points to this node, but without inspecting all the references stored in the local node variables, which contradicts thread-local reasoning, it is not possible to determine the thread.We use the ownership of the reference in the field being flushed to determine the identity of the relevant thread.For a node referred to by a reference nd that has been added to the queue but not yet flushed, our invariant guarantees that owns(nd ) W eak(t) where t is the id of the thread that allocated and added the node.We use this id to determine the action do t (Enq) to perform at the abstract level when the flush occurs.
Line D16 of Algorithm 3 provides an example of such a flush occurring.At line D16, the next reference of the cell that local variable last points to is flushed, therefore the with clause has action do owns(nxt) (Enq) if this reference is the one at the end of qrl. that it is possible that the reference has been flushed already, even that nodes have been enqueued already.The check last (orl + qrl ).last ensures together with the assertion nxt vs [last].next∧ nxt null which holds at D16 that the reference has not been flushed already, since all steps that execute the persistence point add the enqueued reference to the end of qrl as done by the auxilary code at D16.
Finally, the case of a dequeue that finds an empty queue, and returns value empty, has to be handled.The verification of the empty dequeue follows a similar pattern to the verification of the empty dequeue of MSQ.The persistence point is conditional on the future execution of the operation, thus we refer to the persistence point as a potential persistence point (this is similar to the concept of potential linearization points [DSW11, DGLM04, DD15]).The empty dequeue potentially takes effect at line D9, when the value loaded for nxt is null, but this decision is not resolved until later when the test at line D12 succeeds.Using the intermediate automaton (Fig. 3), allows the proof to proceed via forward simulation, like earlier proofs of linearizability [DD15,DGLM04], by executing action check Emp t , when the loaded reference nxt is null, and by executing do t (Deq) when exectution reaches line D14.

Forward simulation
The abstraction relation that is used in the forward simulation relates the state of the automaton generated from the programs to the intermediate automaton in Fig. 3.This can be split into a mapping between the global states, and two mappings absls and abspc between the local state and the label for each thread t. abs(mkstate(gs, ls f , pc f ), mkastate(q, alsf , apcf )) ↔ content(gs.qrl,gs.ps) q ∧ ∀ t. abspc(gs, ls f (t), pc f (t)) apc f (t) ∧ absls(gs, ls f (t), pc f (t), als f (t)) The global state of the intermediate automaton is the abstract queue q, so the global mapping just extracts the content of qrl by reading each value-field in the persistent store with a function content(qrl, ps).
Relating the program labels to the abstract ones for one thread t is done with a function abspc.For the three labels idle, ready and crashed that are common to both automata the function is identity.
All program labels in the dequeue program that are before the persistence points are mapped to the abstract label inv(Deq), while those after the persistence points are mapped to ret(Deq).In most cases the decision depends on the concrete program counter only.Labels that are both before the successful CAS at line D19 in Algorithm 3 as well as before the persistence point of an empty dequeue at D14 are mapped to inv(Deq), while lines after D14 and after the flush at D21 are mapped to ret(Deq).There are three lines in the code where the concrete label alone is not sufficient: at lines D20, D21 the CAS must have been successful, and the next reference must have already been flushed, for the thread to be in state ret(Deq).The label D5 at the start of the while loop is mapped to ret(Deq), if its test is already false.The enqueue program is similar, it just does not have to consider the special case of returning empty.
It remains to define a relation absls between the concrete local state ls and the abstract local state als of a thread.This relation ensures that the input values for enqueue that were set on invoke on both levels are identical before the persistence points, and that the output values of dequeue after the persistence point are identical too.For the abstract level the output value is out(t), for the concrete level there are three cases.After a successful CAS of dequeue, the output value is first stored in vs [nxt].val.After setting the return value at line D22 it is in vrvHp[vRetRefs(t)].The relation therefore needs to have the global state and the local program counter as parameters.
Finally, the local value obs Emp(t) of the abstract level is required to be true, after a reference has been loaded into the local nxt variable at D9, when action Check Emp t is executed.It must stay true until the persistence point at line D14.

Supporting and rely properties
We have outlined the main points in our proof, but as is typical for verifications of this type, we require a number of supporting properties to complete the proof.The reference lists orl and qrl satsify the following additonal invariants: 1.All nodes in the old reference list must have been marked, i.e. have a deqID field different from −1 in both ps and vs, indicating that they have been dequeued.2. All nodes in the queue reference list must have a field value −1 in ps. 3.Only the first node in qrl may have a deqID field value other than −1 in vs.This results from a dequeue that has executed its CAS, but not yet flushed.4. vhead is either the last element of orl or it is lagging and is the prior to last element.In this case however, the head of qrl cannot be marked in vs.This ensures that a lagging head is "repaired" first before another dequeue can happen.5.The vtail-pointer points to (orl + qrl ).last or is lagging too, pointing (orl + qrl ).butlast.last.In the latter case, ps[(orl + qrl ).last].next must be null(no red reference at the end of the queue).Again, this ensures, that a lagging tail is repaired first, before another enqueue has a chance to a successful CAS. 6.In any case, phead is never reachable from vhead and vhead is never reachable from vtail.
To maintain these invariants there are number of rely properties that enable thread-local reasoning.They ensure that steps of other threads than t will not destroy relevant properties: 1. the pointers vhead, vhead and vtail always move forward.Their new value is always reachable from the old value by following the chain of references in orl + qrl. 2. The lists qrl and orl also move forward: the list orl and orl + qrl before the steps of other threads are always a prefix of the lists after steps of other threads.This ensures that a local variable like f irst, which is initally set to vhead ∈ orl still points to a reference in orl. 3.For both vs and ps, the deq ID field of the cells in orl + qrl is always unchanged, or changes from unmarked to marked.4. If for all references r in a suffix of orl + qrl, the deq ID field vs [r ].deqID is different from t, then this will be preserved by steps of other threads.This property is crucial to make sure that a thread helping with a dequeue will never compute its own threadid t as the value of other t at line D25.Note that this property holds for the volatile store only, since another thread helping t in a dequeue may flush a reference with persistent deq ID field being t.
Again, the reader is referred to the KIV development for the full treatment.

Mechanization in KIV
The refinement has been mechanically proven in the interactive theorem prover KIV [EPS + 15], which has been used extensively in the verification of concurrent data structures (e.g., [SDW14,DSW11]).See [KIV20] for the full KIV proofs and the encodings.The proof consists of two parts.In the first part an invariant Inv(cs) was proved for the automaton C generated from the programs in Section 5 and the proof obligations resulting from it were discharged.
For the second part a mechanised theory for IOA that is available in KIV has been instantiated.This theory formalises a lot of the theory given in [LV95].In particular, a theorem that forward simulation implies refinement is proven there.Instantiating this theorem gives the standard proof obligations of a forward simulation.
For our case study the specialised proof obligations of Definition 4.2 are used, where the concrete automaton C is the one generated from the programs, and A is IDQ.Reachability of concrete states (cs ∈ reach(C )) is approximated with the invariant inv(cs).No invariant is needed for IDQ.Since we have made sure that even internal actions like do t (op) or check Emp t are the same in both automata, the proof obligation for internal step correspondence proves the first disjunct for steps with action τ , and the second disjunct with abstract action chosen to be equal to the concrete action otherwise.
Verifying correctness of persistent concurrent data structures: sound and complete method 571 The main complexity of the proof is in proving the invariant with the thread-local proof technique of Section 7. Getting all the assertions and the rely conditions correct, and strong enough to imply critical lemmas about persistence points and return values required a lot of iterations.The final proofs for this part require ca.4300 user interactions.Proving the step correspondence for forward simulation is less complex and took only a few iterations (which mainly lead to a few additional requirements for the invariant).The final simulation proofs have ca.1200 user interactions.

Conclusions and related work
The recent development of NVM has been accompanied by persistent versions of well-known concurrent constructs, including concurrent objects [FHMP18,CAL18], synchronisation primitives [HPM + 18, PKMH18] and transactional memory [JNCV18].Developing concurrent objects implemented for NVM presents a similar challenge to weak memory, in the sense that there are multiples levels of memory to consider.Moreover, caches and registers are volatile, while cache flush instructions allow reordering with store instructions in accordance with the memory model of the system (e.g., [RV18]).Correctness in the presence of crashes and recovery can be affected by the order in which elements are persisted, which necessitates the use of programmer-controlled flush operations, increasing complexity.This paper has focussed on a persistent queue [FHMP18], against the recently developed notion of durable linearizability [IMS16], extending the prior our work [DDD + 19].
Research on verification and validation under persistent memory is still in its infancy.Recent works have presented techniques for testing persistent memory applications [LWZ + 19, OBLL16], including works on testing linearizability under persistence [CCL + 19].Denny et al [DLV16] have considered static analysis techniques for verifying correctness language extensions to C, including extensions that support atomic transactions.However, these works do not cover full verification, or durable linearizability.A model checking approach to verifying the Friedman et al queue (i.e., the same case study as our paper) has also been developed [IU18].On the one hand, the authors address the TSO memory model, but on the other hand, the verification is incomplete.Although no errors were found, the authors state: "This result cannot guarantee correctness of the queue because of many reasons.For example, it was bounded model checking, user threads were assumed to work in accordance with a fixed scenario, the checked property did not cover the entire durable linearisability, and the model was unclear to be implemented correctly."Developing simulation-based techniques to address correctness of concurrent data structures in the persistent-TSO model [RWNV20, RV18] remains future work.
There are several works on full verification of (sequential) file systems that are tolerant to certain types of system crashes, e.g., using ASM refinement [EPSR16], crash-aware Hoare logic [CZC + 15] and SMT-based reasoning [SBTW16].More recently, verification techniques for concurrent systems with crashes have been developed [CTKZ19].These techniques are currently used to verify high-level client applications, as opposed to fine-grained concurrent data structures, and hence, have not been applied to verify durable linearizability specifically.In other work, we have considered correctness of software transactional memory algorithms under persistent memory semantics via a new condition durable opacity [BDD + 20] that extends opacity [GK10] to NVMs in the same way that durable linearizability extends linearizability.
Verification of durable linearizability for fine-grained persistent memory algorithms is inherently more complex than linearizability in the standard setting [HW90,DD15].Since an operation only takes effect after a flush event, helping is inevitably required to bring the data structure into a consistent state and for an operation to take effect.For proofs by refinement, these additional helping steps have to be considered in the simulation proof.This ultimately complicates the invariants used since helping is performed by another thread or by the system.Moreover, since the state of the data structure can be "lagging" immediately after helping is performed, precisely formalising the underlying helping mechanism further complicates the invariant.
In summary, on the theoretical front we have introduced an operational characterisation of durable linearizability that can serve as the basis for mechanisable proofs of correctness.On the practical side, we have presented a general technique for performing such proofs using the KIV proof assistent.To use this technique on a particular algorithm, one first models the algorithm's shared variables and defines the actions that flush them, using a simple pattern.Then one writes the algorithm in KIV's programming language and annotates the program with persistence points (using KIV's with syntax).Then one applies KIV's automatic translation tool to obtain an automaton modelling the program with its flushes and crashes.Finally, one uses KIV's thread-local proof support to verify appropriate invariants and a simulation relation.We have exemplified this technique using the persistent queue as a case study, and completed the verification in the interactive prover KIV.
This paper extends the results of our earlier version [DDD + 19] in three ways.First, the automated translation from KIV programs to automata that can model flushes and crashes is novel.Second, in previous work, we only showed soundness of the durable automaton, whereas here we also demonstrate completeness.Third, we present a much more detailed discussion of the proof than previously.

Fig. 1 .
Fig. 1.State of the queue with contents a, d, b ; volatile data represented using shading ; dequeue of thread 2 currently running and t ∈ T .For a sequence w of invocations and responses, we write s − w − → s iff either w and s s , or w u • w and there exists an s such that s − u → s and s − w − → s .The set of legal histories of S is given by legal
the next global and local state plus next program counter of a thread, given the previous ones and the action, provided the precondition holds.Based on these functions the step relation is defined by the axioms • A global invariant GInv(gs) over the global state gs.• A rely predicate Rely(gs, t, gs').All steps which are not executed by thread t should satisfy this predicate when they start in global state gs and end with gs'.Thread t relies on other threads to change the global state according to Rely.