Scenario-Based Proofs for Concurrent Objects

Concurrent objects form the foundation of many applications that exploit multicore architectures and their importance has lead to informal correctness arguments, as well as formal proof systems. Correctness arguments (as found in the distributed computing literature) give intuitive descriptions of a few canonical executions or "scenarios" often each with only a few threads, yet it remains unknown as to whether these intuitive arguments have a formal grounding and extend to arbitrary interleavings over unboundedly many threads. We present a novel proof technique for concurrent objects, based around identifying a small set of scenarios (representative, canonical interleavings), formalized as the commutativity quotient of a concurrent object. We next give an expression language for defining abstractions of the quotient in the form of regular or context-free languages that enable simple proofs of linearizability. These quotient expressions organize unbounded interleavings into a form more amenable to reasoning and make explicit the relationship between implementation-level contention/interference and ADT-level transitions. We evaluate our work on numerous non-trivial concurrent objects from the literature (including the Michael-Scott queue, Elimination stack, SLS reservation queue, RDCSS and Herlihy-Wing queue). We show that quotients capture the diverse features/complexities of these algorithms, can be used even when linearization points are not straight-forward, correspond to original authors' correctness arguments, and provide some new scenario-based arguments. Finally, we show that discovery of some object's quotients reduces to two-thread reasoning and give an implementation that can derive candidate quotients expressions from source code.


INTRODUCTION
Efficient multithreaded programs typically rely on optimized implementations of common abstract data types (adts) like stacks, queues, and sets, whose operations execute in parallel to maximize efficiency.Synchronization between operations must be minimized to increase throughput [Herlihy and Shavit 2008].Yet this minimal amount of synchronization must also be adequate to ensure that operations behave as if they were executed atomically, so that client programs can rely on their (sequential) adt specification; this de-facto correctness criterion is known as linearizability [Herlihy and Wing 1990].These opposing requirements, along with the general challenge in reasoning about interleavings, make concurrent data structures a ripe source of insidious programming errors.
Algorithm designers (e.g., researchers defining new concurrent objects) argue about correctness by considering some number of "scenarios", i.e., interesting ways of interleaving steps of different operations, and showing for instance, that each one satisfies some suitable invariant (which is not necessarily inductive).For example, a scenario of the Michael and Scott [1996] queue is described as: many threads concurrently reading, one enqueuer thread taking a specific read path finding a tail pointer to be outdated, and then succeeding a compare-and-swap (CAS) operation, causing others to fail their compare-and-swap (paraphrasing from Herlihy and Shavit [2008]).Such scenario descriptions are powerful because they describe unboundedly many threads and often generalize to cover many executions that are equivalent due to commutative re-orderings.Consequentially, informal correctness arguments need only consider a few representative scenarios.Furthermore, another critical benefit of scenario-based reasoning is that scenarios are more readily explainable to software developers, who need not have a background in formal logic.
Despite the intuitive benefit of these operational, scenario-based proofs-which continue to be widely used in the concurrent algorithms literature-it remains unknown as to whether they have a formal grounding.This has lead to cases where objects thought to be linearizable [Detlefs et al. 2000] where later determined to contain bugs in unconsidered scenarios [Doherty et al. 2004].
1.1 Formalizing Scenarios with otients In this paper, we show that operational, scenario-based correctness arguments can be formally grounded.To that end, we propose a new proof methodology that is based on formal arguments while keeping the intuition of scenario-based reasoning.This methodology relies on a reduction to reasoning about a subset of representative interleavings (i.e. a formal version of informal scenarios), which cover the whole space of interleavings modulo repeatedly swapping adjacent commutative steps.The latter corresponds to the standard equivalence up to commutativity between the executions of an object (e.g., Mazurkiewicz traces [Mazurkiewicz 1986]).
Reductions based on commutativity arguments have been formalized in previous work, e.g., Lipton's reduction theory [Lipton 1975], QED [Elmas et al. 2009], CIVL [Hawblitzel et al. 2015], and they generally focus on identifying atomic sections, i.e., sequences of statements in a single thread that can be assumed to execute without interruption (without sacrificing completeness).Relying on atomic sections for reducing the space of interleavings has its limitations, especially in the context of concurrent objects.These objects rely on intricate algorithms where almost every step is an access to the shared memory that does not commute with respect to other steps.
Our reduction argument reasons about a quotient of the set of object executions, which is a subset of executions that contains a representative from each equivalence class.In general, an execution of an object interleaves an unbounded number of invocations to the object's methods, each from a different thread 1 .These executions can be seen as a word over an infinite alphabet, each symbol of the alphabet representing a statement in the code and the thread executing that statement 2 .We show that when abstracting away thread ids from executions, carefully chosen quotients become regular or context-free languages.This is not true for any quotient since representatives of equivalence classes can be chosen in an adversarial manner to make the language more complex.
The principal benefit of quotients is that reasoning about correctness can be done by considering only a few representative execution interleavings, yet those conclusions generalize to all executions.For some kinds of concurrent object implementations (defined later), deriving representative traces can be reduced via induction to two-thread reasoning.
Proofs with program logics.Our work is inspired by the success of many prior works on proofs for concurrent objects based on program logics such as Owicki and Gries [1976], Rely/Guarantee [Jones 1983], Concurrent separation logic [Brookes 2007;O'Hearn 2007], RGSep [Vafeiadis and Parkinson   1 Typically, it can be assumed w.l.o.g. that each thread performs a single invocation in an execution. 2Such a sequence will be called a trace in the formalization we give later in the paper.2007], Deny-Guarantee [Dodds et al. 2009], Views [Dinsdale-Young et al. 2013], Iris [Jung et al. 2018[Jung et al. , 2015] ] and interactive proof tools for such logics.
The goal of this paper is orthogonal and focuses on finding a formal grounding for the operational, scenario-based correctness arguments present in the algorithms literature.To this end, our methodology is based on taking representative interleaved traces upfront and using commutativity-based equivalence classes for modularity/generalization rather than exploiting the program structure and invariants for modularity/generalization. Achieving this alternative reasoning strategy nonetheless requires careful formalization of what is meant by "representative traces", as well as how those classes of traces can be expressed abstractly, which we outline below.Our results show that (i) scenario-based reasoning can be done formally through quotients, (ii) quotients can be given for a variety of concurrent objects with subtle differences including non-fixed linearization points, (iii) quotients improve the correctness arguments from the literature, and (iv) for some cases, quotients-which represent interleavings of unboundedly many threads-can be automatically discovered through a reduction to two-thread reasoning.
1.2 Example: Scenario-Based Proofs of the Michael-Sco eue For the sake of concreteness, we now show how quotients make concurrent reasoning simpler, using the canonical Michael-Scott Queue (MSQ) as an example.Ultimately the theory and algorithms in this paper lead to an implementation that is able to automatically derive the representation discussed below, from the object's source code.The MSQ is implemented as a linked-list, with head and tail pointers and a sentinel head node, as depicted to the left below.
x 2 head tail

CAS
x 2 head tail Advance the tail x i

CAS
An enqueue (enq) operation, such as in the diagram above, repeatedly attempts to enqueue a new element by using an atomic compare-and-swap (CAS) operation on the tail element's next pointer, replacing null with the address of the new node ( in the diagram above).It is possible that this CAS operation will fail due to a concurrent enqueuer (of which there can be unboundedly many).Nonetheless, due to the CAS, one enqueuer will succeed.At this point, although the element is linked, it is not logically in the queue because the tail pointer is lagging.The enqueuer will thus perform a second CAS operation, as shown on the digram above to the right, to advance tail to point to .To ensure progress, concurrent enqueuers will also check to see if the tail lags and, if so, attempt to advance the tail before they attempt to enqueue their elements (i.e.helping).A dequeue (deq) operation repeatedly attempts to advance the head pointer to make 1 the new sentinel node, but also has to check that the queue is non-empty and that other threads have not recently dequeued.(To achieve all of these cases, deq must begin by reading the head pointer, the tail pointer and head's next pointer and validating to see which case applies.) To verify the correctness of objects like the MSQ, one has to consider all of the ways in which concurrent invocations of unboundedly many methods could interleave.One strategy to tackle this problem has been through the aforementioned program logics such as rely-guarantee where, roughly, one defines state-based invariants and then shows they are preserved and threads don't interfere with other threads' actions.Nevertheless, the correctness arguments laid out by algorithm designers (e.g., in the distributed computing community) typically are organized in a more operational manner and instead focus on discussing various "scenarios." Consider the following excerpt from The Art of Multiprocessor Programming [Herlihy and Shavit 2008] regarding the MSQ: An enqueuer creates a new node, reads tail, and finds the node that appears to be last.To verify that node is indeed last, it checks whether that node has a successor.If so, the thread attempts to append the new node with CAS.(A CAS is required because other threads may be trying the same thing.)[Assume that] the CAS succeeds.Such sentences describe scenarios that involve unboundedly many threads executing some portion of their programs.They are chosen to highlight tricky situations and describe why those situations are still acceptable.The above example can be thought of as the sequence: (1) Unboundedly many threads are reading the data structure.
(2) There is a distinguished thread, let's call .
(3) reads the tail and the tail's next pointer.(4) finds that tail's next is null.( 5) atomically updates tail's next to point to its new node.(6) The other (unboundedly many) threads fail their CASes on tail's next and restart.This scenario has a particular shape about it: unboundedly many threads read, then a single thread performs a write, then the remaining threads react to that write.This is a common setup in many non-blocking concurrent algorithms and a useful pattern (although, in general, we will describe scenarios beyond those of this shape).One might think of it as a regular expression denoted next : where is the (unbounded) set of all threads excluding .Above next expresses that some unboundedly many threads from set (including ) perform only -path actions, then succeeds its cas, then those unboundedly many threads restart.This expression is more powerful than it may first appear.There are a few important considerations: • Conciseness.The entirety of MSQ's concurrent execution behaviors can be represented with this and only two other similarly concise representative interleavings, along with four even simpler read-only interleavings.Expressions tail and head are similarly defined and represent advancing the tail pointer and the head pointer (due to a dequeuer), respectively.• Unbounded.With these concise descriptions, the interleavings between an unbounded number of enqueuers and dequeuers can be seen as an unbounded alternation ( next + tail + head ) * .
(Below we will further refine this approximation with stateful automata.)This starred-union description does not include all possible ways of interleaving steps of enqueuers, e.g., it does not include interleavings where a thread restarts after two successful CASs since it last read the shared memory.It includes just a subset of representatives that we call a quotient, which is succinct enough to correspond to the designer's intuition and large enough to cover the whole space of interleavings modulo repeatedly swapping adjacent commutative steps (i.e., the standard equivalence up to commutativity between executions known as Mazurkiewicz traces [Mazurkiewicz 1986]).For instance, an interleaving where a thread restarts after two successful CASs (since it last read the shared memory) is equivalent to one where the restart step is reordered to the left to occur immediately after the first CAS.This is because the restarting condition is fulfilled after this first CAS as well and the restart step does not perform any writes.
The MSQ falls into a special class of objects for which quotients can be expressed in this inductive way, as a sequence of what we call "layers" (above next , tail and head are layers) wherein only a single shared memory write action occurs per layer, and all other actions are thread-local/readonly (perhaps restarting due to a failed CAS).Consequently, it is possible via induction to reduce reasoning to a collection of two-threaded arguments (one writer, one reader).While quotients and their abstractions are a much broader class, layers are nonetheless an important subclass since they apply to many lock-free implementations and can be automated, as discussed below.

Challenges and Contributions
We now identify several challenges toward enabling scenario-based reasoning and discuss how we address them in this paper.
1. Concurrent Object Quotients.How can scenario-based reasoning be done formally?(Sec.3) We show that scenario-based reasoning can be made formal through a methodology wherein reasoning about all executions of a concurrent object is reduced to reasoning only about a smaller set of representative interleavings.At the technical core is the definition of an object's execution quotient which collapses executions that are equivalent up to swapping commutative adjacent actions.A quotient is parameterized by this equivalence relation and has both a minimality constraint (no two executions are equivalent) and a completeness constraint (all executions are equivalent to some execution in the quotient).We prove that linearizability of the quotient is sufficient to show linearizability of the object.The upshot is that concurrent object correctness is now accomplished via reasoning about a collection of scenarios (as in typical informal proofs).
2. Expressing Quotients.How can a quotient set be described?(Sec.4) A next question is how to finitely express a quotient, which can have unboundedly many interleavings.In Sec. 3, we introduce a quotient expression language that permits a mixture of regular expressions (e.g., Kleene-star iterations of subexpressions) and context-free grammars (e.g., unbounded but balanced subexpressions).We then give an interpretation/semantics for these expressions that maintains the minimality condition: there will only be one interleaving (with threads organized in a canonical order) for every unboundedly many unrolling.The MSQ expression ( next + tail + head ) * above provides an intuition for the quotient expression for the MSQ.(Technically, the actions are paths and the * -iterations within the x subexpressions are replaced with a context-free form of iteration.) As we will show later, quotients and their abstractions are expressive and can capture canonical concurrent objects as well as more complicated ones such as the Herlihy and Wing [1990] queue and the elimination stack of Hendler et al. [2004], each having different kinds of non-fixed linearization points.These are notoriously hard cases for today's proof methodologies.We note that, while the idea of reasoning about execution quotients is generic, identifying precise limits for the applicability of the particular class of quotients expressions is hard in general.This is similar to using abstract domains in the context of static analysis: it is hard to determine precisely the class of programs for which interval or polyhedra abstractions are effective.
3. Layer Quotient Expressions and Automata.(Sec.5) In addition to broad expressivity, are there classes of objects whose quotients have a simpler structure?To increase accessibility and automation, we next describe certain kinds of quotient expressions for which reasoning can actually be reduced, via induction, to two-thread reasoning.Specifically, for objects whose implementation can be written as a collection of (possibly restarting) read-only/local paths and paths that have only a single atomic read-write, we define layer quotients to more conveniently and inductively capture the quotient.Although this does not apply to all objects, it does apply to canonical examples such as the MSQ, Treiber's Stack, and even the Scherer III et al. [2006] synchronous reservation queue.For these objects, executions can be decompiled into a sequence of layers, each described by context-free quotient expressions of the form ( 1 + 1 + . ..) • • ( 2 + 2 + . ..)where 1 • 2 is a read-only path through the method implementation (possibly restarting), and is a path with a successful atomic read-write.The exponents in both expressions indicate the unbounded replication of local paths ( is not fixed; it ensures prefix/suffix balancing).Then an overall quotient expression can be made from regular compositions of these context-free layers, leading to an inductive argument.Furthermore, each layer can be discovered with two-thread reasoning: considering how each write, treated atomically, impacts each other read-only/local path.We describe how layer expressions can be conveniently represented as finite-state automata (and further below also used for automation).The layer automaton for the Michael-Scott Queue is shown in Fig. 1.We will discuss it in detail in Sec.6.1 but, roughly, the states track whether the queue is empty and whether the tail is lagging.The layer-labeled edges define the local/read-only (unbold) control-flow paths and how they are impacted by the write path (bold).There are also read-only layers, which we will describe later.
4. Evaluation: Verifying Concurrent Objects.(Sec.6) We consider a broad range of concurrent objects including Treiber's stack [Treiber 1986], the Michael and Scott [1996] queue, the Scherer III et al. [2006] synchronous reservation queue, the Herlihy and Wing [1990] queue, the Hendler et al. [2004] elimination stack, and the Restricted Double-Compare Single-Swap (RDCSS) [Harris et al. 2002].Each object has its own subtleties, including complications like multiple CAS steps and non-fixed linearization points.For each object we (i) show that its behavior and linearizability can be captured through a quotient and (ii) revisit the object's authors' correctness arguments.We find that quotients capture those intuitive scenarios and make scenarios explicit and comprehensive.
5. Generating Candidate Quotient Expressions.(Sec.7) Automating quotient-based proofs of concurrent objects is a rather large question (perhaps warranting new forms of induction) which we mostly leave to future work.Nonetheless, we present an algorithm and prototype implementation Cion for generating candidate quotient expressions, directly from a concurrent object's source code.We manually confirmed that these expressions are sound abstractions of those objects' quotients.We applied Cion to layer-compatible objects such as Treiber's Stack and the Michael/Scott Queue, finding that candidate layer expressions can be discovered in a few minutes.
For lack of space, some detail has been omitted and is available in the extended version [Enea et al. 2023] of this paper.Our implementation Cion is available on GitHub3 , along with benchmark sources.

PRELIMINARIES
Running example: A simple concurrent counter.Fig. 2 lists a concurrent counter with methods for incrementing and decrementing.Both methods of the counter return the value of the counter before modifying it, and the counter is decremented only if it is strictly positive.
Each method consists of a retry-loop that reads the shared variable ctr representing the counter and tries to update it using a Compare-And-Swap (CAS).A CAS atomically tests whether ctr equals the second argument and if this is the case, then it assigns the value specified by the third argument.If the test fails, then the CAS has no effect.The return value of CAS represents the truth value of the equality test.If the CAS is unsuccessful, i.e., it returns false, then the method retries the same steps in another iteration.The executions of the concurrent counter are interleavings of an arbitrary number of increment or decrement invocations from an arbitrary number of threads.Each invocation executes a number of retry-loop iterations until reaching the return.An iteration corresponds to a control-flow path that starts at the beginning of the loop and ends with a return or goes back to the beginning.For instance, the increment method consists of two possible iterations: (1) c = ctr; CAS(ctr, c, c+1); return c, and (2) c = ctr; assume(ctr != c).Iteration #1 is called successful because it contains a successful CAS, and the unsuccessful CAS in the iteration #2 is written as an assume that blocks if the condition is not satisfied.
An invocation can execute more iterations if ctr is modified by another thread in between reading it at line 3 or 10 and executing the CAS at line 4 or 13, respectively.Fig. 3 pictures an execution with 3 increments that execute between 1 and 3 retry-loop iterations.The first iteration of threads 2 and 3 contains unsuccessful CASs because thread 1 executed a successful CAS and modified ctr, and these invocations must retry, execute more iterations.Note that there are unboundedly many such executions and, even with bounded threads, exponentially many interleavings.Concurrent Object Syntax.We model concurrent objects using Kleene Algebra with Tests [Kozen 1997] (KAT).Intuitively, a KAT represents the code of an object method using regular expressions over symbols that represent conditionals (tests) or statements (actions).
Definition 2.1.[Kleene Algebra with Tests] A KAT K is a two-sorted structure (Σ, B, +, • , * , , 0, 1), where (Σ, +, • , * , 0, 1) is a Kleene algebra, (B, +, • , , 0, 1) is a Boolean algebra, and the latter is a subalgebra of the former.There are two sets of symbols: A for primitive actions, and B for primitive tests.The grammar of boolean test expressions is and the grammar of KAT expressions is The primitive actions and tests used in examples in this paper will be along the lines of A = {x := y, x.f := y, . ..} and B = {x = y, x.f = y, x = null, x.f = null . ..}.
Atomic read-write (ARW).We conservatively extend KAT with a syntactic notation ⟨ ⟨b•a⟩ ⟩, used to indicate a condition and action , between which no other actions can be interleaved.Apart from restricting interleaving (defined below), this does not impact the semantics so it can be represented with two special symbols "⟨ ⟨" and "⟩ ⟩" whose semantics are the identity relation.For example a compare-and-swap cas(x,v,v') can be represented as , where [ = ] is a primitive test and the assignment is a primitive action.Overline indicates negation, as in KAT notation.is the code to be executed when cas succeeds and ′ when it fails.Methods of a concurrent object.We define a method signature ( ì )/ì with a vector of arguments ì and return values ì (often a singleton ).For a vector ì , denotes its -th component.An implementation of a method is a KAT expression , whose actions may refer to argument values, e.g., x := args .A concurrent object is a set of methods = { 1 ( ì 1 )/ì 1 : 1 , . ..}, associating signatures with implementations.The set of method names in an object is denoted by Meth( ).
Example 2.2.The counter from Sec. 2 is formalized as The outer * in corresponds to the while (true) loop in the method increment while the inner + corresponds to the two branches of the conditional.The KAT expression represents every control-flow path of increment which goes a number of times through the assignment c:=ctr and the "false" branch of the conditional before succeeding the atomic read-write and returning (other sequences represented by this regular expression, e.g., , iterating multiple times through the atomic read-write and return will be excluded when defining the semantics).

Concurrent Object Semantics.
A full semantics for these concurrent objects is given in the extended version [Enea et al. 2023].In brief, the semantics involves local states ∈ Σ , shared states ∈ Σ , and nondeterministic thread-local transition relation , , ↓ ℓ ′ , ′ , ′ , which optionally involve label ℓ ( and ′ are KAT expressions representing code to be executed).These labels are taken from the set of possible labels L ⊆ A ∪ B ∪ call (ì ) ∪ ret(ì ) ∪ ⟨ ⟨ • ⟩ ⟩ which includes primitive actions, primitive tests, call actions, return actions or ARWs.(We here write call (ì ) with free variables to refer to the set of all call actions and similar for returns and ARWs.)Next, a configuration = ( , ) where : T ⇀ (Σ × (K ∪ {⊥})) comprises a shared state ∈ Σ and a mapping for each active thread to its local state and current code.We use T to denote the set of thread ids, which is equipped with a total order <.Configurations of an object transition according to the relation _ : C × (T × L) × C, labeled with a thread id and a label.
An object is acted on by a finite environment E : T → × ì Val, specifying which threads invoke which methods, with which argument values.Val denotes a set of values and ì Val denotes the set of tuples of values.We assume that object methods can not access thread identifiers (which is true for concurrent objects defined in the literature) and therefore, each invocation is assumed to be executed by a different thread.An execution of in the environment E is a sequence of labeled transitions between configurations 0 _ . . ._ that starts in the initial configuration 0 w.r.t.E and ends in configuration .A configuration = ( , ) is final iff ( ) = ( , ⊥), for some , for all ∈ dom( ).An execution is completed if it ends in a final configuration.
⊗ E denotes the set of completed executions of in the environment E. A trace ∈ Traces is a sequence of T × L pairs, i.e., thread-indexed labels 0 : ℓ 0 , . . ., : ℓ .A trace of an execution denoted is a projection of the thread-indexed labels out of the transitions in the execution.
The semantics of a concurrent object is defined as the set of traces under all possible environments (i.e., for any number of threads invoking any methods with any inputs).Formally, = { | ∈ ⊗ E , for some environment E}.Linearizability For an object , an operation symbol (or operation for short) = ( ì )/ ì represents an invocation of a method ∈ Meth( ) with signature ( ì )/ì , where ì is a vector of values for the corresponding arguments ì , and ì is a vector of values for the corresponding returns ì .A sequential specification for an object is a set of sequences over operation symbols.For instance, the sequential specification for the counter object includes sequences of increments and decrements corresponding to executions where each invocation executes in isolation, e.g., inc()/0 A trace of an object is linearizable w.r.t. a specification if there exists a (linearization-point) mapping ( ) : T → N where the label at position (index) ( ) in is considered to be the so-called linearization point of 's invocation, and must satisfy the following: (1) the position ( ) is after 's invocation label and before 's return, (2) the (linearization) sequence lin( , ) of operation symbols ( ì )/ ì , where the -th symbol represents the invocation of the -th thread w.r.t. the positions ( , ), belongs to .
For example, Fig. 3 pictures a trace which is linearizable w.r.t. the counter specification described above because there exists a linearization-point mapping which associates each thread with the position of the -th successful CAS.The linearization inc()/0 • inc()/1 • inc()/2 induced by this mapping is admitted by the specification.
For simplicity, we omit invocation labels from traces and consider the first instruction in an invocation to play the same role.Object is linearizable wrt a spec.if all traces in are linearizable wrt .

OBJECT QUOTIENTS
To formalize scenarios, we introduce the concept of a quotient of an object which is a subset of its traces that represents every other trace modulo reordering of commutative steps or renaming thread ids.For an expert reader, the quotient is a partial order reduction [Godefroid 1996] composed with a symmetry reduction [Clarke et al. 1998] of its set of traces.In general, an object may admit multiple quotients, but as we show later, there exist quotients which can be finitely-represented using regular expressions or extensions thereof.We interpret scenarios as components (sub-expressions) of these finite representations.
Two executions 1 and 2 are equivalent up to commutativity, denoted as 1 ≡ 2 , if 2 can be obtained from 1 (or vice-versa) by repeatedly swapping adjacent commutative steps.An execution 2 is obtained from 1 through one swap of adjacent commutative steps, denoted as 1 ≡ 1 2 , if obtained from 1 by re-ordering the steps labeled by : ℓ and ′ : ℓ ′ ).When there exist executions 1 and 2 as above, we say that the re-ordered labels ℓ and ℓ ′ are possibly commutative.
Definition 3.1.The equivalence relation ≡⊆ E × E between executions is the least reflexivetransitive relation that includes ≡ 1 .
The relation ≡ is extended to traces as expected: 1 ≡ 2 if 1 and 2 are traces of executions 1 and 2 , respectively, and 1 ≡ 2 .
For example, the Counter executions below are equivalent up to commutativity (related by ≡ 1 ): assuming that ctr > 0 at configuration 1 (recall that [c =ctr] represents an unsuccessful CAS).
Definition 3.2.Two traces 1 and 2 are equivalent up to thread renaming, denoted as 1 ≃ 2 , if there is a bijection between thread ids in 1 and 2 , resp., s.t. 2 is the trace obtained from 1 by replacing every thread id label with ( ).
For example, 0 ( : ) 1 2 and 0 2 are equivalent up to thread renaming.We define a quotient of an object as a subset of its traces that is complete in the sense that it represents every other trace up to commutative reorderings or thread renaming, and that is optimal in that sense that it does not contain two traces that are equivalent up to commutativity.Optimality does not include equivalence up to thread renaming (symmetry reduction) because the finite representations we define later abstract away thread ids.
Definition 3.3 (Quotient).A quotient of object is a set of traces ⟨⌊ ⌋⟩ ⊆ such that: Note that an object admits multiple quotients since representatives of equivalence classes w.r.t.≡ can be chosen arbitrarily.
For a quotient ⟨⌊ ⌋⟩, a set Swaps of pairs of possibly-commutative labels (in L × L) is called ⟨⌊ ⌋⟩-sufficient if all the swaps needed to establish ′ ≡ ′′ in Def.3.3 are between pairs of labels in Swaps.
Example 3.4 (Quotient and representative/canonical traces for the Counter).The trace of three increment-only threads from Fig. 3 represents many other traces of the Counter modulo commutative reorderings or thread renaming.It can be thought of as a sequence of three canonical phases, depicted with stacked parallelograms as follows: Each phase above groups together the retry-loop iterations that interact with each other: a single successful CAS instruction causes the other attempts to fail.For instance, it represents another trace where the first "cas fails" step occurs after the second successful CAS: This "late" CAS failure would also fail if moved to the left as shown above.Similarly, it also represents traces where the action 2 = 0 is swapped with 3 = 0 and even 1 = 0, or traces where thread ids change from 1, 2, 3 to 4, 5, 6 for instance.
One can define a quotient ⟨⌊ ⌋⟩ of Counter which includes representative traces of this form.The representative traces only differ in the number of incrementers/decrementers and the order in which they succeed their CASs.⟨⌊ ⌋⟩ will contain similar canonical traces for, say, an environment with 4 incrementers, 2 decrementers acting in the sequence ; ; ; ; ; (wherein the second does nothing).See Example 4.3 for a more precise description.
Preserving Linearizability Through Commutative Reorderings.Our goal is to reduce the problem of proving linearizability for all traces of an object to proving linearizability only for traces in a quotient.Therefore, given two traces and ′ that are equivalent up to commutativity ( ≡ ′ ), where for instance, would be part of a quotient, an important question is whether the linearizability of implies the linearizability of ′ .We show that this holds provided that the reordering allowed by the equivalence ≡ is consistent with a commutativity relation between operations in the specification.Given a specification , two operations 1 and 2 are -commutative for every 1 , 2 sequences of operations.Given a set of pairs of labels Swaps ⊆ L×L, a linearization point mapping ( ) of a trace is robust against Swaps-reorderings if for every two threads 1 and 2 , if the linearization points of 1 and 2 form a pair in Swaps, then the operations of 1 and 2 are -commutative.
Theorem 3.5.Let ≡ ′ be two equivalent traces, such that ′ is obtained from by swapping pairs of labels in some set Swaps.If is linearizable w.r.t.some specification via a linearization point mapping ( ) that is robust against Swaps-reorderings, then ′ is linearizable w.r.t. .The above holds by defining ( ′ ) by ( ′ )( ) = the index in ′ of the label ( )( ), for every .Theorem 3.5 implies that proving linearizability for an object reduces to proving linearizability only for the traces in a quotient ⟨⌊ ⌋⟩ of , provided that the used linearization point mappings are robust against Swaps-reorderings for a set Swaps which is ⟨⌊ ⌋⟩-sufficient (thread renaming does not affect this reduction because specifications are agnostic to thread ids).

FINITE ABSTRACT REPRESENTATIONS OF QUOTIENTS
We define finite representations of sets of traces, quotients in particular, which resemble regular expressions and which denote context-free languages over a finite alphabet.The finite alphabet is obtained by projecting out thread ids from labels in a trace.As we show in the evaluation section, scenarios in previous informal proofs of many concurrent objects correspond to components of these expressions, and linearization points can be identified directly within such expressions.
Let Abs be the set of expressions expr defined by the following grammar are finite sequences of labels, and for every application of the production rule 1 • expr • 2 , is a fresh variable not occurring in expr (this ensures context-free abstractions).Therefore, for every expression in Abs, a variable is used exactly twice.Such expressions have a natural interpretation as context-free languages by interpreting * , +, and • as the Kleene star, union, and concatenation in regular expressions, and interpreting every 1 • expr • 2 as sequences 1 , . . ., 1 • expr • 2 , . . ., 2 where the number of 1 repetitions on the left of expr's interpretation, denoted as expr , equals the number of 2 repetitions on the right.
We define an interpretation expr of expressions expr as sets of traces, which differs from the above only in the interpretation of , * , and 1 • expr • 2 , for finite sequences of labels , 1 , 2 .< l a t e x i t s h a 1 _ b a s e 6 4 = " S H 7 f w X a 2 D w f d   For readability we present it as four subexpressions called "layers" whose composition with regular expression operators (concatenation, union, star) is represented using an automaton (all states are accepting).The full formal definitions of an example layer-from the quotient expression grammar-is given in Example 5.3.In this figure, for conciseness, we subscript the primitives to indicate whether they were from increment-vs-decrement. Layer 1 represents decrements acting alone and finding the counter to be 0, Layer 2 corresponds to the first successful increment, Layer 3 and Layer 4 represent successful increments and decrements.For Layers 2 -4, some number of threads begin to read then a single different thread performs its complete write path, and then all threads fail their CAS instructions.Technically, Layer 2 is a specialization of Layer 3, by le ing = 0.However, treating them as separate layers provides a more refined representation.
Linearizability.Each layer corresponds to linearizing a single effectful invocation, i.e., an increment invocation or a decrement invocation when the counter is non-zero, or an arbitrary number of read-only invocations, i.e., decrement invocations when the counter is zero.

LAYERS: AN INDUCTIVE QUOTIENT LANGUAGE
We show that, for a broad class of objects, we can provide a subclass of quotient abstraction expressions-that we will call layer expressions-which, via an inductive argument, reduce reasoning about all executions (and about linearizability) to two-threads.This applies to numerous canonical examples such as Treiber Stack, the Michael-Scott Queue, a linked-list Set, and even the SLS Reservation Queue.For illustrative purposes, we will continue to use the concurrent Counter, whose quotient can also be expressed with layers.Many lock-free4 objects rely on a form of optimistic concurrency control where an operation repeatedly reads the shared-memory state in order to prepare an update that reflects the specification and tries to apply a possible update using an atomic read-write.The condition of the atomic readwrite checks for possible interference from other threads since reading the shared-memory state.
The executions of such objects can be seen as sequences of what we call "layers, " each one being a triple consisting of (i) many threads all performing commutative local (e.g., read) actions, (ii) a single non-commutative atomic read-write ARW on the shared state, and (iii) those same initial threads reacting to the ARW with more local commutative actions.For example, incrementing the counter involves a successful cas operation on the shared variable, which leads to other threads' old reads to go down a failure/restart path.In fact, with this layer language one can consider an arbitrary number of control-flow paths executed by an arbitrary number of threads where at most one can contain an atomic read-write.In the remainder of this section we discuss this in detail and then discuss automated discover of layers in Sec. 7.

Local-vs-Write Paths
For an implementation call ( ì ) • ∈ K of a method ( ì )/ì , a full (control-flow) path of is a KAT expression such that ≤ and contains only primitive actions, tests or ARWs, composed together with • ( contains no + or * constructor).In a representation with control-flow graphs of 's code, corresponds to a path from the entry point to the exit point.A path is any contiguous subsequence ′ of a full path , i.e., there exists (possibly empty) 1 and 2 such that = 1 • ′ • 2 .The set of paths of method is denoted by Π( ), and as a straightforward extension, the set of paths of an object defined by a set of methods with 1 ≤ ≤ is defined as Π( ) = 1≤ ≤ Π( ).Π ( ) denotes the subset of full paths in Π( ).
A primitive action is called local when it cannot affect actions or tests executed by another thread (atomic read-writes included), e.g., it represents a read of a shared variable or it reads/writes a memory region that has been allocated but not yet connected to a shared data structure (this region is still owned by the thread).Formally, let : (Σ × Σ ) → (Σ × Σ ) and : (Σ × Σ ) → {true, false} denote the functions defining the semantics of actions ∈ A and tests ∈ B. Then, an action ∈ A is local iff for every ( ′ , ′ ) = ( , ) and every ∈ A ∪ B that occurs in some method implementation, ( ′′ , ) = ( ′′ , ′ ), for every local state ′′ .A path is called local if it contains only local actions, and a write path, otherwise.Given a KAT expression ′ that represents a path, we use first ( ′ ) and last ( ′ ) to denote the first and the last action or test in ′ , respectively.
Example 5.1.Returning to the counter object , the full paths are as follows: The first two paths are from and the last three are from .Paths without ARWs consist of only local actions, that may read global ctr, but they do not mutate any global variables.

The Language of Layers
We now define layer expressions and discuss how they represent an object's quotient.

Definition 5.2 (Basic Layer Expressions). A basic layer expression has one of two forms:
• local layer: ( ) * where is a local path in Π( ).
• write layer: ) and the prefix and suffix are each repeated times, (3) last ( ← − ) and first ( − → ) do not commute with respect to the ARW in .
The first type, local layers, represent unboundedly many threads executing a local path .Since each instance of the path is local, they all commute with each other, so the interpretation puts them into a single, canonical order which follows the increasing order between their thread ids (by the interpretation of * in quotient expressions; see Def. 4.1).
The second type, write layers, represents an interleaving where threads execute read-only prefix ← − of paths (in a canonical, serial order), then a different thread executes a non-local path , and then corresponding suffixes − → occur, finishing their iteration reacting to the write of .Again, the interpretation of a write layer associates these KAT action labels with increasing thread ids.Prefixes and suffixes of local paths can be assumed to execute serially as in the first type of layer.The non-commutativity constraint ensures that such an interleaving is "meaningful", i.e., it is not equivalent to one in which complete paths are executed serially.
A layer expression is a collection of basic layer expressions, combined in a regular way via •, +, or * (defined in Sec. 4).That is, a layer expression represents complete traces as sequences of layers.
Example 5.3.The expression given in Fig. 4 representing a quotient of the Counter is a layer expression.It combines a single read-only layer with other three write layers.One layer pertains to the increment write path, along with the local paths that fail their CAS attempts.Here, we consider full paths.This basic expression is: This layer interleaves the write path between prefixes/suffixes of the two local paths.We subscript the primitives to indicate whether they were from increment-vs-decrement.The first and last actions/tests do not commute with the interleaved writer's ARW.
Support of a Layer.The support of a basic layer expression , denoted by supp( ), is defined as a set of KAT expressions where a single prefix/suffix local path is concretized to a single occurrence, and interleaved with the write path.Intuitively, the support of a write layer characterizes all of the pair-wise interference by representing interleavings of two paths executed by different threads.
Definition 5.4.For basic layer expression , supp( ) is defined as: Example 5.5.For Layer 3 in Fig. 4  The paths Π( ) of a basic layer expression are defined from its support: (1) if is a local layer, then Π( ) = supp( ), and (2) if is a write layer, then { , The paths Π(expr) of a layer expression expr is obtained as the union of Π( ) for every basic layer expression in expr.

Proof Methodology with Two-Thread Reasoning
Recall that layer expressions represent languages of traces so we now ask whether a given expression is an abstraction of an object's quotient (Def.4.2).That is: whether each execution of an object is equivalent to some execution ′ ≡ , where the trace of ′ is in the interpretation of the expression.
Interestingly, this can be done by considering only two threads at a time, since local paths do not affect the feasibility of a trace.Therefore, it is sufficient to focus on interleavings between a single local or write path (on a first thread) and a sequence ì of (possibly different) write paths (on a second thread), and show that they can be reordered as a sequence of layers, i.e., executes in isolation if it is a write path, and interleaved with at most one other write path in ì , otherwise (it is a local path).Applying such a reordering for each path while ignoring other local paths makes it possible to group paths into layers.The reordering must preserve a stronger notion of equivalence defined as follows: two executions and ′ are strongly equivalent if they are ≡-equivalent, they start and resp., end in the same configuration, and they go through the same sequence of shared states modulo stuttering.This notion of equivalence guarantees that any local path enabled in the context of an arbitrary interleaving between and ì remains enabled in the context of an interleaving where for instance, executes in isolation.A more detailed proof for the following theorem is given in the extended version [Enea et al. 2023].
Theorem 5.6.Let be an object defined by a set of methods with implementations call ( ì ) • ∈ K.A layer expression expr = ( 1 + . . .+ ) * is an abstraction of a quotient of if • the layers cover all statements in the implementation: Π(expr) ⊆ Π( ) and for each primitive action, test or ARW in for some , there exists a path in Π(expr) which contains , • for every path ∈ Π(expr) and every execution of starting in a reachable configuration that represents 5 an interleaving || ì , where ì is a sequence of write paths in Π(expr), -Write Path Condition (WPC): if is a write path, there is an exec.′ of s.t.′ is strongly equivalent to , and ′ represents a write path sequence ì 1 • • ì 2 where ì = ì 1 • ì 2 , -Local Path Condition (LPC): if is a local path, there exists an execution ′ of such that ′ is strongly equivalent to and * ′ represents a path sequence ì 1 • • ì 2 where ì = ì 1 • ì 2 ( executes in isolation) and is the support of a local layer , 1 ≤ ≤ , or * a sequence and is a write path ( interleaves with a single write path ), and 1 • • 2 ∈ supp( ) for some write layer , 1 ≤ ≤ .
Example 5.7 (Counter layers via two-thread reasoning).We now proceed to show that the starred union of the basic layer expressions defined in Fig. 4 is an abstraction of a quotient.Concerning WPC, a write path is of the form (c:=ctr) • ⟨ ⟨[c=ctr] • ctr:=c+1⟩ ⟩ • ret(c).Such paths can be reordered to execute in isolation because the ARW is enabled only if the counter did not change its value since the read, and therefore, the read c:=ctr can be reordered after any step of another 5 An execution represents an interleaving | | ì if it interleaves two sequences of steps labeled with symbols in and ì , respectively (in the same order).An execution represents a path sequence ì when it is a sequence of steps labeled with symbols in ì (in the same order).thread that may occur until the ARW.Also, the return action is local and can be reordered to occur immediately after the ARW.LPC holds because any "late" CAS failure (that occurs after more than one successful CAS) would also fail if moved to the left (as explained in Example 3.4).

Automaton Representation of Layer
otients A layer expression comprised simply of a starred union of basic layer expressions is not always appealing since some layers are not enabled from some configurations.For instance, as shown in Figure 4 for the Counter, the read-only "decrement returning 0" layer cannot occur after one successful increment layer.(In formal notation, layer 0 of in Example 5.3 is enabled only when ctr is 0.) In other words, the starred starred union composition of layers can be refined further to enforce certain orders in which layers can occur, by taking into account reachability.
We therefore describe a more convenient representation as a layer automaton, in which the automaton states represent abstractions (sets) of concrete configurations in executions (as defined in Sec. 2) and the transitions are labeled by basic layer expressions.Another example of such an automaton was seen for the Michael-Scott queue in Fig. 1 in Sec. 1.Briefly, the control states correspond to the configurations of the objects (e.g., , whether the MSQ is empty, tail is lagged, etc.), and the transitions are labeled by basic layer expressions (e.g., , the "Dequeue Succeed" layer from Fig. 1, in which one thread succeeds a CAS on the head pointer and other threads fail their CAS).These layer automata are a convenient representation of the quotient and, as shown in Sec. 7, we can derive candidate layer quotients represented as layer automata automatically from source code.

Definition 5.8 (Layer automaton). Given an object , a layer automaton is a tuple
where Q is a finite set of states representing abstractions (sets) of configurations of , Q 0 ⊆ Q is the set of initial states, and ⊆ Q × 2 Λ × Q is a set of transitions labeled with basic layer expressions (elements of Λ) with the constraint that an edge − → ′ can only be one of two types: (1) Unique self-loop: = 1 • • • is a sequence of ≥ 1 local layers, ′ = , and there are no other self-loops ′ − → .
(2) Single write layer edges: = is a single write layer.
The interpretation of the automaton, denoted by A , as a layer expression is defined as expected, except that the label of a self-loop is not starred.For instance, the interpretation of an automaton consisting of a single state and self-loop − → is defined as instead of * .Theorem 5.9.Given an object and a layer automaton A = (Q, Q 0 , Λ, ), the layer expression A is an abstraction of a quotient of if • the starred union of the basic layer expressions labeling transitions of A is an abstraction of a quotient of (Theorem 5.6), • every initial configuration of is represented by some abstract state in Q 0 , and every reachable configuration is represented by some abstract state in Q, • for every layer in A , if there exists an execution representing from a reachable configuration to a configuration ′ , then A contains a transition ′ − → where is an abstraction of and ′ is an abstraction of ′ .
The automaton in Fig. 1 is a layer automaton for the MSQ (see Section 6.1 for more details).
Corollary 5.10.(To Thm.3.5) If a layer expression expr is an abstraction of a quotient and there is a linearization point mapping for every trace in expr that is robust against re-ordering, then the object is linearizable.

EVALUATION: VERIFYING CONCURRENT OBJECTS
As discussed in Sec. 1, our goal is to provide a formal foundation for the scenario-based linearizability correctness arguments found in the distributed computing literature.To evaluate whether quotients serve that purpose, we examined several diverse and challenging concurrent objects, listed below.

Concurrent Object
Quotient Features

Atomic counter
Sec. 2 simple cas loop Michael and Scott [1996]  For each object, we (i) determine whether quotients can be used for verification and (ii) revisit the scenario-based correctness arguments given by the object's authors and compare those arguments to the quotient.We discuss the quotients of most in this section (with bold Sec 6._ in the Quotient column); further detail can be found in the appendix of the extended version [Enea et al. 2023].
Results summary.As we show, all above algorithms can be captured with quotient expressions.These expressions (i) capture the diverse features/complexities of these algorithms (per the Features column), (ii) provide a succinct, formal foundation for the scenario-based arguments used by those objects' authors, (iii) organize unbounded interleavings into a form more amenable to reasoning, (iv) make explicit the relationship between implementation-level contention/interference and ADT-level transitions, and (v) provide a scenario proof for HWQ which did not have scenario arguments.
6.1 The Michael/Sco eue Recall the implementation of MSQ, stored as a linked list from global pointers Q.head and Q.tail, and manipulated as follows.(Some local variable definitions omitted for lack of space.)necessary.As we will see in Sec.6.2, the SLS queue performs this tail (and head) advancing directly in the enqueue/dequeue method implementation.
Quotient.The layer automaton that abstracts a quotient of MSQ, mentioned briefly in Sec. 1, is shown in Fig. 1.The automaton states track whether Q.tail=Q.head and whether Q.tail->next is null, in rounded dark boxes.Edges are labeled with layers (discussed below), defined to the right in Fig. 1.The write operations in those layers induce the automaton state changes as shown by the various edges between automaton states.For example, the Dequeue Succeed layer can move from automaton state 2 to 1 .The three layers of the MSQ characterize three forms of interference: The Dequeue Succeed layer occurs when a dequeue thread successfully advances the Q.head pointer, causing concurrent dequeue CAS attempts to fail, as well as dequeue threads checking on Line 5 whether Q.head has changed.(We abbreviate local paths using line numbers rather than KAT expressions.)The Advancer Succeed layer occurs when an advancer moves forward the Q.tail pointer, causing concurrent advancer CAS attempts to fail, and causing concurrent enq threads to find Q.tail changed on Line 6.The Enqueue Succeed layer occurs when an enq thread successfully advances the Q.tail pointer, causing concurrent enq threads to fail.Naturally, some edges are not enabled.For example, there is no edge from 1 to 2 , because the latter is not reachable from the former via a single write path/layer.Also, while there are outbound edges from 1 , there is no layer involving a deq write operation (since the queue is empty).Some non-local layers self-loop, such as the Dequeue Succeed layer self-loop at 4 .There are also four local layers that self-loop.These involve local paths that return (e.g., Read Only Layer 1 where deq returns because the queue is empty) or paths that loop while waiting (e.g., Read Only Layer 3 where enq awaits the advancer thread).
Theorem 6.1.The above layer automaton is an abstraction of a quotient for Michael-Scott Queue.
Proof: Proof by the methodology of Def.5.6.
The WPC condition requires that all write paths (that include successful CASs) can be reordered to execute in isolation.This is a direct consequence of the semantics of a successful CAS which checks that the value did not change since the last read of the written location.The deq successful CAS on Q.head insures that Q.head did not change since it was read at Line 3, which also means that its next pointer did not change (this pointer is written only once in enq() for every node in the list).Therefore all actions on the deq path that includes the successful CAS can be reordered to execute together at the place of reading Q.tail.Similarly the enq successful CAS ensures that the actions between Line 5 and Line 8 can be reordered to occur together.Then, since the value of Q.tail could not have changed without Q.tail->next first having been changed, Lines 2-4 can also be reordered to occur together with the rest of the actions on this path.The case of the adv write path is similar.
The LPC condition follows from the fact that CAS operations always change the value so it is always possible to move a late "failing" CAS to the left so that it occurs after the first successful CAS following the previous reads in the same iteration.■ Theorem 6.2.The Michael-Scott Queue is linearizable.
Proof: We show that the traces in the quotient are linearizable via a linearization-point mapping which is robust against reorderings.Given a trace in the quotient (represented by the automaton in Fig. 1), the linearization points are the successful CAS operations in the {Dequeue,Advancer} Succeed layers (also in bold in the Fig. 1  points of dequeues returning some enqueued value and enqueues, respectively, and Line 7 is the linearization point of a dequeue returning empty.The validity of these linearization points can be proved by induction on the number of layers.The induction hypothesis will relate the last configuration of the quotient execution with a queue ADT state that is the sequence of elements reachable from Q.head.For instance, the successful CAS in the Dequeue Succeed layer will remove the first element in such a sequence which by the induction hypothesis is the oldest element in the queue. By the proof of the quotient's completeness (Theorem 6.1), successful CAS operations are never reordered.The only linearization point labels that can be reordered are those corresponding to Line 7 in deq() for a dequeue returning empty.It is easy to see that dequeues returning empty commute in the queue specification, which implies that the above linearization-point mapping is robust against a set of reorderings which is sufficient for this quotient.■ Comparison with the Authors' Proof.We evaluated the quotient by comparing with the correctness arguments from Herlihy and Shavit [2008].For lack of space, the following table gives example elements of the correctness argument/proof from Herlihy and Shavit [2008], and identifies where they occur in the quotient proof (see [Enea et al. 2023] for more details).
"If this method returns a value, then its linearization point occurs when it completes a successful [CAS] call at Line 38, and otherwise it is linearized at Line 33. " The successful CAS in the Dequeue Succeed Layer or Read-Only Layer 1 The layer quotient and, especially, the layer automaton helps make the Herlihy and Shavit [2008] proof more explicit, without sacrificing the organization of the proof, for a few reasons.First, all of the important ADT states are explicitly identified.Second, it can be determined, from each of them, which layers are enabled as well as the target ADT states that are reached after each such layer transition.This ensures that all cases are considered.Finally, linearization points are explicit in the layer quotient, occurring once with each layer transition.

The SLS Synchronous Reservation eue
The Scherer III et al. [2006] (SLS) queue builds on MSQ, but has some complications: queue operations are synchronous (blocking), a single invocation can involve multiple sequentially composed write paths that necessitate different layers, and linearization points must account for dequeuers arriving before their corresponding enqueuer.
Implementation.Like MSQ, SLS has paths that read the head or tail pointer and subsequent pointers, perform read validations and then attempt a CAS.Also like MSQ, enqueuers arriving at an empty list (or list of items), attempt to append item nodes (and then try to advance the tail pointer).Dequeuers arriving at a list of items, attempt to swap item node contents for null (and then try to advance the head pointer).
SLS then has some further complexities.Dequeuers arriving at an empty list (or list of reservation nodes) attempt to append reservation nodes (and attempt to advance tail).Enqueuers arriving at a list of reservations, attempt to fulfill those reservations by swapping null for an item (and attempt to advance head).The list never contains both items and reservations; when the list becomes empty it can then transition from an item list to a reservation list (or vice-versa).Finally, SLS is synchronous: dequeuers with reservations block until those reservations have been fulfilled and enqueuers with items block until those items have been consumed.(For the sake of comprehensiveness, the implementation is in the extended version [Enea et al. 2023], but not necessary for a general understanding.)As noted, unlike MSQ where paths have at most 1 write operation, a single SLS invocation can perform multiple write operations (e.g., a dequeue path inserting a reservation, advancing tail, awaiting fulfillment, advancing head).Despite conceptual simplicity, the implementation is non-trivial with many restart paths when validations or CAS operations fail.
Quotient.The quotient expression for the SLS queue is depicted as a layer automaton in Fig. 5.In the upper portion, the automaton states differentiate between whether the queue is empty or whether the queue consists of reservations (left hand region) or of items (right hand region).In each of those regions, it is relevant as to whether the head pointer is stale or not, as well as whether the tail pointer is stale or not.When the queue is a list of reservations, the head or tail could be stale (hence four states) and similar when the queue is a list of items.
The basic layers of the quotient expression are defined at the bottom of Fig. 5.The black circles (e.g., DE:CAS ℓ /t ) represent a write path in which a Dequeuer or Enqueuer has successfully performed a CAS at some program location ℓ.Along with the write path, we simply summarize the number of competing read-only paths, which are star-iterated.Two layers are enq/deq-agnostic: advancing the tail pointer in TA and advancing the head pointer (and "reaping" the head node) in HR.These helping operations happen in many places in the code, with corresponding read-only "_f" failure paths.Enqueue can either append an item node (Eapp) when in the RHS states of the automaton or else swap an item into a reservation node (Eswap) in the LHS.These layers have a single CAS operation (e.g., E:CAS 5 /t ) along with read-only paths where concurrent competing threads fail.The dequeue layers Dapp and Dswap are similar.
Finally, these (context-free) basic layer expressions are connected into an overall expression, represented here as an automaton or (below) as a star-/plus-/or-combination of layer expressions.
Theorem 6.3.The SLS queue is linearizable.Proof: We associate linearization points with layers: Dswap is an LP for dequeue, Eapp is an LP for enqueue, and Eswap is an LP for a combination of an enqueue followed by a dequeue.Next, we project the linearization points out of the quotient to obtain simply ( • ) * • ( * + * ).Combining this with a lemma that this expression is an abstraction of the quotient, we obtain that all executions in the quotient meet the sequential spec. of a queue.This linearization point mapping is also robust because successful CASs (linearization points) do not have to be swapped in order to prove the completeness of the quotient.(Detail in the extended version [Enea et al. 2023].)■ Comparison with the Authors' Proof.We evaluated the SLS quotient expression by revisiting the authors' proof in Scherer III et al. [2006].Line numbers in the authors' quotes below refer to a reproduction of the source code given in in the extended version [Enea et al. 2023].For lack of space, some discussion of the authors' quotes can be found in the extended version [Enea et al. 2023].
The authors split the enqueue operation into two linearization points: a "reservation linearization point" and a later "follow up linearization point, " so that synchronous, blocking enqueue implementations are a single reservation LP and then repeated follow-up LPs (as if the client is repeatedly checking whether the operation has completed).
[Regarding enqueue,] the reservation linearization point for this code path occurs at line [...] when we successfully insert our offering into the queue -Scherer III et al. [2006] This prose describes a scenario, (i) identifying an alleged linearization point at E:cas 3 /t , involving a specific change to shared memory (a CAS on the tail's next pointer), and (ii) identifying the important ADT state transition (inserting an offer node into the queue).This scenario is formalized by the Eapp layer in the quotient expression.The successful CAS E:cas 3 /t in Eapp is the linearization point, with competing concurrent threads abstracted away by the starred fail path expression, and the state transition is given in the automaton as the downward Eapp-labeled arcs in the righthand region of the automaton.The scenario and LP for dequeue on a list of reservation nodes is symmetric, and represented in the quotient expression as layer Dapp involving D:cas 3 /t and competing fail path.
The quotient expression makes the interaction between LPs and ADT states more explicit (e.g., through -marked layers) and comprehensive (e.g., the authors do not discuss the 9 different automaton ADT states and which transitions are possible from each).The quotient expression can be seen as an abstract view of an implementation of the sequential specification.This prose again indicates important mutations (e.g., swapping the node's contents pointer), ADT state changes (e.g., supplying data) and that the head dummy node needs to be advanced.These memory mutations and state changes are explicit in the quotient expression.For example, Eswap performs a memory CAS and makes a ADT state transition.The staleness of the head is also captured directly in the ADT states and the HR layers' transitions.The authors' prose also discusses failure paths (see [Enea et al. 2023]) and retry, which are also captured in the layer definitions.Summary.The layer quotient expression/automaton provides a succinct formal foundation for the correctness arguments of Scherer III et al. [2006], capturing the authors' discussions of LPs, ADTs, impacts of writes, CAS contention, etc.

The Hendler et al. Elimination Stack
The Elimination Stack of Hendler et al. [2004] is difficult because the linearization point of some invocation can happen in another (threads can awake to find they were linearized earlier) and it uses a submodule: Treiber's stack [Treiber 1986].
We first show the Treiber's stack quotient, and then build elimination on top.Since Treiber's stack is simple, we explain only the basics here, with more detail in the extended version [Enea et al. 2023].The implementation of push prepares a new node and then attempts a CAS to swing the top pointer, while pop attempts to advance the top pointer and return the removed node's value.The quotient for Treiber's stack is shown in the upper right of Fig. 6 and is similar to the counter, but with ADT states tracking emptiness (rather than non-zeroness) and CAS contention on the top pointer (rather than the counter cell).There is one read-only layer for a pop and an empty stack, and other layers involve one successful CAS with failed competing CAS attempts.See [Enea et al. 2023] for more detail, as well as a lemma proving that this layer automaton is an abstraction of the quotient.
The Elimination Stack, listed in Fig. 6(a), augments Treiber's stack with a protocol for "colliding" push and pop invocations so that the push passes its input directly to the pop without affecting the underlying data structure.An invocation starts this protocol after performing a loop iteration in Treiber's stack and failing (due to contention on top).The protocol uses two arrays: (1) a location array indexed by thread ids where a push or pop invocation publishes a descriptor tuple (op,id,input) with fields op for the type of invocation (push or pop), id for the id of the invoking thread, and input for the input of a push operation, and (2) a collision array indexed by arbitrary integers which stores ids of threads announcing their availability to collide.
Each invocation starts by publishing their descriptor in the location array (line 3).Then, it reads a random cell of the collision array while also trying to publish their id at the same index using a CAS (lines 4-6).If it reads a non-NULL thread id, then it tries to collide with that thread.A successful collision requires 2 successful CASs on the location cells of the two threads (we require CASs because other threads may compete to collide with one of these two threads): the initiator of the collision needs to clear its cell (line 10) and modify the cell of the other thread (line 11) to pass its input if the other thread is a pop.The first CAS failing means that a third thread successfully collided with the initiator and the initiator can simply return (lines 15-17).Failing the second CAS leads to a restart (line 13).Succeeding the second CAS means there has been a successful collision and the thread returns, returning null for a push and otherwise using the descriptor to obtain the popped value (line 11).If the invocation reads a NULL thread id from collision, then it tries to clear its cell before restarting (line 19).If it fails, then as in the previous case, a collision happened with a third thread and the current thread can simply return (line 20-22).
Quotient.We use the automaton in the lower right of Fig. 6 to describe a sound abstraction of the quotient.Layers of Treiber's stack interleave with layers of the collision protocol (some components are not exactly layers as in Definition 5.2, but quite similar).Executions in the quotient serialize collisions and proceed as follows: (1) some number of threads publish their descriptor and choose a cell in the collision array, (2) some number of threads publish their id in the collision array (there may be more than one such thread -note the self-loop on the "Publish collision intent" state), (3) some number of threads succeed the CAS to clear their location cell but only one succeeds to also CAS the location cell of some arbitrary but fixed thread him and return, and (4) the thread him returns after possibly passing the tests at line 7 or 9. (Note that, for succinctness, we have combined push/pop into the same method, which also makes the automaton succinct.The code and corresponding automaton could also have been written in a more verbose way where the bottommost layer is replaced with two layers: (1) a layer where a push's successful CAS takes with it a corresponding pop, and (2) a layer where a pop's successful CAS takes with it a corresponding push.For succinctness, we have combined those layers using the "push/pop" notation.)We emphasize that collisions happen in a serial order, i.e., at any point there is exactly one thread that succeeds on both CASs required for a collision and immediately after the collided thread returns (publishing descriptors or collision intent interleaves arbitrarily with collisions).Theorem 6.4.The Elimination Stack is linearizable.
Proof: Follows from the fact that the above expression is an abstraction of the quotient (See Enea et al. [2023]), with the bold actions in the layers being the LPs.■ Comparison with the Authors' Proof.A proof is given by Hendler et al. [2004] in that paper's Section 5.It is a lengthy proof so, for lack of space, the full review is in in the extended version [Enea et al. 2023] and summarized here.Overall, the correctness argument requires numerous lemmas in the Hendler et al. [2004] proof, mostly focused on establishing a bijection between the active thread and its correspondingly collided passive thread.The authors lay out a few definitions, which are also captured by the quotient.For example, the authors' prose includes: [A] colliding operation op is active if it executes a successful CAS in lines C2 or C7.We say that a colliding operation is passive if op fails in the CAS of line S10 or S19.[underlines added] - Hendler et al. [2004] Above the authors' intuitive concept of "active" is captured by the paths in a layer that succeed their CAS, denoted in bold in the quotient automaton above.Likewise for "passive" and CAS failure.As mentioned above, the active thread is captured as the bold thread that succeeds its CAS in the bottommost layer; the passive thread is the thread that finds itself collided with in the layers on arcs exiting the bottommost layer.
we show that push and pop operations are paired correctly during collisions.Lemma 5.7.Every passive collider collides with exactly one active collider.
The bottommost layer in the bold action, a single push or pop succeeds, colliding with another operation of the oppose type, and passing the element from the push to the pop.Authors' LPs are given for "active" threads as the time when the second CAS succeeds, and linearization points for "passive" threads "the time of linearization of the matching active-collider operation, and the push colliding-operation is linearized before the pop colliding-operation. " The linearization points in the quotient correspond to the bold successful CAS in the bottommost layer in the quotient automaton (this linearizes both a push and a pop).Importantly, every run of the quotient automaton gives a serial linearization order that is a repetition of pairs of active/passive threads.All other executions are equivalent to one such serialized run, upto commutativity.
In summary, as detailed in the extended version [Enea et al. 2023], the quotient naturally and succinctly captures the key concept of the Elimination stack: that a single successful CAS of one type of operation is the LP for that operation as well as the corresponding matched operation.The quotient captures "active" versus "passive" threads (in the automaton layers/states/transitions), as well as this bijection through the runs of the automaton: every run in the automaton contains some number of active/passive pairs and provides a representative serialization order (in each pair the push is serialized before the pop).Linearization points and other logistics of threads preparing/completing are similarly captured by the quotient automaton.

The Harris et al. Restricted Double-Compare Single-Swap (RDCSS)
RDCSS [Harris et al. 2002] is a restricted version of a double-word CAS which modifies a so-called data address provided that this address and another so-called control address have some given expected values (the tests and the write happen atomically).RDCSS attempts a standard CAS on the data address to change the old value into a pointer to a descriptor structure that stores the inputs of the operation.This fails if the data address does not have the expected value.A second standard CAS on the data address is used to write the new value if the control address has the expected value or the old value, otherwise.Faster threads can help complete the operations of slower threads using the information stored in the descriptor.
The traces in the quotient of RDCSS interleave successful attempts at modifying the data address with unsuccessful ones.A successful attempt consists of a thread succeeding the first CAS combined with competing threads that fail, followed by another thread succeeding the second CAS (this can be different from the first one in the case of helping) combined with other threads that fail.An unsuccessful attempt may contain just a thread failing the first CAS, or it can contain two successful CASs like a successful attempt (when the data address has the expected value but the control address does not).Proving linearizability of quotient traces is obvious because they make explicit the "evolution" of a data address, oscillating between storing values and descriptors, and which CAS is enabled depending on the value of the control address.See Enea et al. [2023] for more details.

The Herlihy-Wing eue
The quotients of some data structures cannot be represented using layer automata.The Herlihy-Wing Queue [Herlihy and Wing 1990] is one such example and it is notorious for linearization points that depend on the future and that cannot be associated to fixed statements, see e.g.indicates the last possibly non-empty slot.An enq atomically reads and increments back and then later stores a value at that location.A deq repeatedly scans the array looking for the first non-empty slot in a doubly-nested loop.We show that the Herlihy-Wing queue quotient can be abstracted by an expression (deqF * • (enqI) + • enqW * • deqT * ) * , where deqF captures dequeue scans that need to restart, deqT scans succeed, enqI reads/increments back and enqW writes to the slot.For lack of space, a detailed discussion about how this expression abstracts the quotient is given in the extended version [Enea et al. 2023].Importantly, linearization points in executions represented by this expression are fixed, drastically simplifying reasoning from the general case where they are non-fixed.
Theorem 6.5.The Herlihy-Wing Queue is linearizable.(see Enea et al. [2023]) Comparison with the Authors' Proof.Herlihy and Wing [1990] give intuitions of scenarios: Enq execution occurs in two steps, which may be interleaved with steps of other concurrent operations: an array slot is reserved by atomically incrementing back, and the new item is stored in items.-Sec 4.1 of Herlihy and Wing [1990] This describes a scenario with unboundedly many threads, though is not yet an argument for why that scenarios is correct.This scenario appears in the quotient as the fact that enqI and enqW are distinct.To cope with non-fixed LPs (in this and other objects), the authors introduce a proof methodology based on tracking all possible linearizations that could happen in the future.This general methododology complicates the proof.The quotient, by contrast, allows one to consider scenarios along the lines of "one or more enqueuers increment back, possibly some of them write to the array, and then some dequeuers succeed, " following the quotient's regular expression.In summary, the quotient here provides the first scenario-based proof of correctness, through representative executions that allow the linearization order to be fixed and all other executions are equivalent to one such representative execution up to commutativity.

GENERATING CANDIDATE QUOTIENT EXPRESSIONS
In Sec. 6 we showed quotients can be defined for a wide range of concurrent objects, including notoriously difficult ones.We leave the (rather large) question of automated quotient proofs for the general case as future work.Here we take a first step asking, Can candidate quotient expressions can be generated algorithmically?
This section answers this question with an algorithm, implementation and experiments showing that, from the source code of concurrent data-structures such as Treiber's stack and the MSQ, candidate quotients expressions (equivalent to those in Sec. 6) can be automatically discovered.We manually confirmed that these generated candidates are indeed sound abstractions of the quotient, a process that can also be automated (perhaps through new forms of induction) in future work.

Computing Layer Automata
Given a set of layers 1 ,. .., whose starred union is an abstraction of an object quotient (cf.Theorem 5.6), a layer automaton satisfying Theorem 5.9 can be computed automatically.The algorithm consists of the following steps: (1) States: Compute the automaton abstract states as boolean conjunctions of the weakest preconditions (and their negations) of traces in the support of a layer with 1 ≤ ≤ .We assume that the initial state can be determined from the object spec.(2) Edges: Whenever a state implies the precondition of a write layer with write path , compute every post-state ′ that can hold, and add an edge − → ′ .This can be encoded as an assertion violation in a program that assumes ; and asserts the negation of ′ .(3) Self-Loops: For every state collect every local layer that is enabled from and create a single self-loop consisting of a concatenation of all these layers.

Implementation and Experiments
We built a proof-of-concept implementation of our algorithm, called Cion in ∼1,000 lines of OCaml code, using CIL and Ultimate [Heizmann et al. 2018].Cion is publicly available6 .We applied Cion to some of the Sec.6 objects that were amenable to layers.Experiments were run on Ubuntu 22.04 within a Parallels VM on a MacBook Pro M2 with 32GB RAM.Benchmarks are available in Cion repository.We used Ultimate v0.2.1 (54a68f4) as a reachability solver, with its default configuration.
The results are summarized in Table 1.For each benchmark, we report the number of automaton States | Q |, the number of local Paths # and number of write paths # .We then report the number of Transitions | | in the automata constructed by Cion and the number of Layers, as well as the wall-clock Time in seconds, and the number of Queries made to the solver (Ultimate).The results show that Cion is able to efficiently generate candidate layer automata for some important and challenging concurrent objects.

RELATED WORK
Linearizability proofs.Program logics for compositional reasoning about concurrent programs and data structures have been studied extensively, as mentioned in Sec.1.1.Improving on the classical Owicki and Gries [1976] and Rely-Guarantee [Jones 1983] logics, numerous extensions of Concurrent Separation Logic [Bornat et al. 2005;Brookes 2004;O'Hearn 2004;Parkinson et al. 2007] have been proposed in order to reason compositionally about different instances of fine-grained concurrency, e.g.[da Rocha Pinto et al. 2014;Dragoi et al. 2013;Jung et al. 2018Jung et al. , 2020;;Krishna et al. 2018;Ley-Wild and Nanevski 2013;Nanevski et al. 2019;Raad et al. 2015;Sergey et al. 2015;Turon et al. 2013;Vafeiadis 2008Vafeiadis , 2009]].We build on the success of such program logics toward improving the confidence in the correctness of concurrent objects.In the current paper we alternatively focus on the scenario-based reasoning found in the distributed computing literature, and have aimed to capture those scenarios as formally-defined representative executions.In future work it could be interesting to combine the benefits of program logics with those of quotients.Other more distantly related works include: Berdine et al. [2008], Vafeiadis [2010], Bouajjani et al. [2013], Chakraborty et al. [2015], Zhu et al. [2015], andAbdulla et al. [2016].
Reduction.The reduction theory of Lipton [1975] introduced the concept of movers to define a program transformation that creates atomic blocks of code.QED [Elmas et al. 2009] expanded Lipton's theory by introducing iterated application of reduction and abstraction over gated atomic actions.CIVL [Hawblitzel et al. 2015] builds upon the foundation of QED, adding invariant reasoning and refinement layers [Kragl and Qadeer 2018;Kragl et al. 2018].Reasoning via simplifying program transformations has also been adopted in the context of mechanized proofs, e.g., [Chajed et al. 2018].Inductive sequentialization [Kragl et al. 2020] builds upon this prior work, and introduces a new scheme for reasoning inductively over unbounded concurrent executions.The main focus of these works is to define generic proof rules to prove soundness of such program transformations, whose application does however require carefully-crafted artifacts such as abstractions of program code or invariants.Our work takes a different approach and tries to distill common syntactic patterns of concurrent objects into a simpler reduction argument.Our reduction is not a form of program transformation since quotient executions are interleavings of actions in the implementation.

CONCLUSION
We have shown that scenario-based reasoning about concurrent objects has a formal grounding, answering an open question.The key insight is the concept of a quotient, defined so that it admits only representative traces and all other traces are merely equivalent to one of those representatives, up to commutativity.We then gave a language for finitely expressing abstractions of those quotients (as regular or context-free languages) and an inductive and automata-theoretical way of describing them.Our results show that quotients provide a succinct formal foundation for scenario-based reasoning, are capable of capturing a wide range of tricky objects, enhance original authors' correctness arguments, and that discovery of candidate quotient expressions can be automated.In the future will explore further mechanization and other application domains.

Fig. 3 .
Fig. 3.The steps of an execution with three increment-only threads whose actions are aligned horizontally.For readability, we rename the local variable c in thread to c .The curved blue arrows depict data-flow dependencies between reads/writes of ctr.

(
See definition of Layer 4 to the right) H D b w s 6 6 T L 5 8 2 R 0 6 m Y S Z L + I S 8 n P 8 Q 7 7 5 J L 7 o 3 3 D 2 I m j r g Y H D O d 9 l 5 k x W a + U o S b 4 F 4 Z W r 1 6 5 v b d + I b u 7 c u n 0 n 3 r 1 7 4 q r G A o 6 g 0 p U 9 y 6 R D r Q y O S J H G s 9 q i L D O N p 9 n 5 m 6 r m Y L l r x r N J a T h U f / Z V P b y Y 0 9 V r 5 i Z I 5 o K b 5 V p u r 6 0 2 7 a g N 7 / + V r Y e m s U + 4 j h e x + 5 3 O u N U e 9 A d 1 4 V 0 Q L E D 7 J P z x j n 6 L j k 7 H r a 8 k zl i h u A Y m q b W j Y J B D 6 C 4 E w S S v f F J Y n l N 2 S V M + c l B T x W 1 Y 1 p + m w o e u E + M k M + 5 o w H V 3 X V F S Z e 1 U R Y 6 p K E z s 9 m z W / N t s V E D y I n R O 8 w K 4 s 1 s v S g q J I c O z H 4 h j Y T g D O X W A M i P c W z G b U E M Z u H / q u xC C b c u 7 4 O x J P 3 j W f / r e p f E a z W s P P U S P U R c F 6 D k 6 Q W / R K R o i 1 v j p P f C w d + D 9 a j 5 q t p u H c 6 r X W G j u o 4 1 q 9 n 8 D 0 e g i d Q = = < / l a t e x i t > −−−−−−− ((c:=ctr)inc) n • ((c:=ctr)dec) m • (c:=ctr) • hh [c=ctr] • ctr:=c-1ii • ret(c) • [c=ctr] dec m • [c=ctr] inc n < l a t e x i t s h a 1 _ b a s e 6 4 = " Y f B A 1 9 m q w l v A o p L 2 D P k M j r t s k B Fig.4.An expression representing a quotient of the Counter.For readability we present it as four subexpressions called "layers" whose composition with regular expression operators (concatenation, union, star) is represented using an automaton (all states are accepting).The full formal definitions of an example layer-from the quotient expression grammar-is given in Example 5.3.In this figure, for conciseness, we subscript the primitives to indicate whether they were from increment-vs-decrement. Layer 1 represents decrements acting alone and finding the counter to be 0, Layer 2 corresponds to the first successful increment, Layer 3 and Layer 4 represent successful increments and decrements.For Layers 2 -4, some number of threads begin to read then a single different thread performs its complete write path, and then all threads fail their CAS instructions.Technically, Layer 2 is a specialization of Layer 3, by le ing = 0.However, treating them as separate layers provides a more refined representation.

Fig. 5 .
Fig. 5. Layer automaton for the synchronous SLS queue.Layers' acronyms and their definitions are given in the lower half of the figure.For conciseness, layer definitions do not split the prefix/suffix of the read paths.
The other case occurs when the queue consists of reservations (requests for data), and is depicted [to the right].In this case, after originally reading the head node (step A), we read its successor (line [...]/step B) and verify consistency (line [...]).Then, we attempt to supply our data to the head-most reservation (line [...]/C).If this succeeds, we dequeue the former dummy node ([...]/D) and return (a) Elimination Stack source code (b) Stack Quotients

Fig
Fig. 6.Elimination Stack Schellhorn et al. [2012]!The queue is implemented as an array of slots for items, with a shared variable back that Proc.ACM Program.Lang., Vol. 8, No. OOPSLA1, Article 140.Publication date: April 2024.

Table 1 .
Evaluation of Cion discovering candidate layers from source code.