Commutativity Simplifies Proofs of Parameterized Programs

Commutativity has proven to be a powerful tool in reasoning about concurrent programs. Recent work has shown that a commutativity-based reduction of a program may admit simpler proofs than the program itself. The framework of lexicographical program reductions was introduced to formalize a broad class of reductions which accommodate sequential (thread-local) reasoning as well as synchronous programs. Approaches based on this framework, however, were fundamentally limited to program models with a fixed/bounded number of threads. In this paper, we show that it is possible to define an effective parametric family of program reductions that can be used to find simple proofs for parameterized programs, i.e., for programs with an unbounded number of threads. We show that reductions are indeed useful for the simplification of proofs for parameterized programs, in a sense that can be made precise: A reduction of a parameterized program may admit a proof which uses fewer or less sophisticated ghost variables. The reduction may therefore be within reach of an automated verification technique, even when the original parameterized program is not. As our first technical contribution, we introduce a notion of reductions for parameterized programs such that the reduction R of a parameterized program P is again a parameterized program (the thread template of R is obtained by source-to-source transformation of the thread template of P). Consequently, existing techniques for the verification of parameterized programs can be directly applied to R instead of P. Our second technical contribution is that we define an appropriate family of pairwise preference orders which can be effectively used as a parameter to produce different lexicographical reductions. To determine whether this theoretical foundation amounts to a usable solution in practice, we have implemented the approach, based on a recently proposed framework for parameterized program verification. The results of our preliminary experiments on a representative set of examples are encouraging.


INTRODUCTION
The framework of trace theory (formulated by Mazurkiewicz in 1987) formalizes equivalence relations for concurrent program runs based on a commutativity relation over the set of atomic steps taken by individual program threads.Two program statements of different threads commute if the order in which we execute them is irrelevant to the outcome of the execution.Two program runs are equivalent up to commutativity if one can be acquired from another through successive swaps of adjacent commutative program steps.For any program , we call a program  a reduction of  if and only if  includes at least one representative from each (commutativity) equivalence class of behaviours in .Recent work [Farzan 2023;Farzan et al. 2022;Farzan andVandikas 2019, 2020] has shown that some reductions of a program admit simpler proofs than the program itself.More specific versions of this observation had already been made in the literature of concurrent and distributed program verification.In particular, it is exploited in the context of verification of distributed programs by favouring the verification of synchronous (or almost synchronous) programs in place of asynchronous programs with the rationale that the synchronous program admits a simpler proof [Genest et al. 2007;Kragl and Qadeer 2018;von Gleissenthall et al. 2019].
The common thread in all these contexts is that there is often a lot of redundancy in the set of behaviours of a concurrent program, and removing redundant behaviours with complicated proofs in favour of those with simpler proofs simplifies the entire reasoning task.The choice of a program reduction, then, is a choice of which representatives from equivalence classes of program behaviours stay and which ones go.Traditionally, people have opted for canonical choices: those that maximize sequential (local) reasoning in the case of concurrent programs [Elmas et al. 2009;Kragl and Qadeer 2018], or those that get as close as possible to a synchronous program [Genest et al. 2007;von Gleissenthall et al. 2019] for distributed protocols.As such, each such framework makes an a priori assumption about a particular type of reduction.In recent work, however, a family of parametric lexicographical program reductions [Farzan et al. 2022;Farzan andVandikas 2019, 2020] were introduced that formalized a broad (infinite) class of reductions that would include both canonical choices.The idea is that different program verification tasks may respond best to different strategies for picking representatives.By taking a lexicographic order as a parameter to a reduction that chooses the (lexicographically) least representative of each equivalence class, one controls the composition of the reduction.
These frameworks, however, were fundamentally built based on an assumption that the alphabet of program actions is finite, and therefore, they can only be applied to program models with a fixed/bounded number of threads.This brings us to the central research question in this paper: "For programs with unboundedly many threads, is it possible to define an effective parametric family of program reductions that can be exploited for finding simple proofs?"This paper presents an affirmative answer to this question for parameterized concurrent programs.A parameterized program P stands for an infinite family of programs P () ∈N .Each program P () arises from taking a number  of threads, where  is not bounded.Each thread runs an instance of the same given thread template.This is without the loss of generality, since well-known encoding tricks [Hoenicke et al. 2017] accommodate the use of multiple thread templates.
It is well-understood, even outside the realm of algorithmic verification, that modular reasoning techniques for parameterized programs (e.g.Owicki-Gries for parameterized programs [Nieto 2001]) are only complete in the presence of the full power of history variables.Therefore, program proofs may require highly nontrivial ghost variables, which are notoriously hard to compute and reason about automatically.In contrast, in the fixed thread case, the canonical choice of program counters is always available and mainly becomes a time/complexity issue for verification algorithms.This paper argues that reductions can help simplify proofs of parameterized concurrent programs, in a sense that can be made precise based on the ghost variables required for the proof.We make ℓ 0 ℓ 1 assert x!=0; {x = 0} x:=x+1; x:=x-1; Fig. 1.Template for P ± the observation that a reduction of a parameterized program may admit a proof which uses fewer or less sophisticated ghost variables and may therefore have a higher chance of being within the reach of an automated verification technique.
As a simple example to make this observation concrete, consider the the parameterized program P ± , given by the thread template in Fig. 1.The goal is to prove the property that whenever a thread is in location ℓ 1 , , Vol. 1, No. 1, Article .Publication date: November 2023.
the global variable x is non-zero, assuming x is initially 0. It can be shown there does not exist a proof (formally, a proof in the form of an Ashcroft invariant) if one does not introduce a ghost variable [Hoenicke et al. 2017].Intuitively, the proof needs to keep track of the number of threads that have already executed their increment but not yet the matching decrement.Now consider the reduction where the threads are executed sequentially one after the other (sequential composition).The reduction is sound because all statements of two different threads commute (since we do not model the specification assert x!=0 as a statement, we are not concerned with its commutativity).The proof for the reduction does not need any ghost variables.We will use the example later as a running example (see Section 4).
Our first technical contribution is a notion of a reduction for a parameterized program.The reduction R of the parameterized program P can be viewed as a family of lexicographical reductions.This means that R stands for an infinite family of programs R () ∈N where for each , R () is a (lexicographical) reduction of P ().Crucially, the infinite family can be finitely represented.In fact, the reduction R is again a parameterized program, and the thread template of R is obtained by source-to-source transformation of the thread template of P. The key benefit of this observation is that existing techniques for verification of parameterized programs can now be directly applied to R instead of P.
Reductions that favour program behaviours with long sequential blocks, like the sequential composition for the example in Fig. 1, can be generated using lexicographical reductions based on thread orders; i.e. when statements of each thread are grouped together and ordered wrt.statements of other threads according to their thread identifiers.In Section 2, we present an example that demonstrates why, in the context of parameterized program verification, other reductions like lockstep reductions, may be essential if proof simplification is the desired outcome.
Our second technical contribution is that we define an appropriate family of orders, called pairwise preference orders, that can be effectively used as a parameter to produce many different lexicographical reductions of the same program given the same commutativity relation (including the above-mentioned lockstep reduction).This generalizes similar results from the literature on how reductions for a fixed number threads are generated parametric on order relations [Farzan et al. 2022;Farzan andVandikas 2019, 2020].We show that, as in the case of thread orders, reductions of a parameterized program P parametric on pairwise preference orders can also be finitely represented as parameterized program R, with the same correspondence between P () and R () for all .
The two technical contributions outlined so far put forward an algorithmic path for verifying parameterized concurrent programs using a broad family of reductions.To determine whether this amounts to a usable solution in practice, we selected the proof method based on thread-modular proofs at many levels [Hoenicke et al. 2017] to instantiate and evaluate this solution.The proof method encodes the existence of a proof of a specific form (an Ashcroft invariant with a number  of universal quantifiers over thread IDs) for an input parameterized program P as a satisfiability problem of a set of constraints in a specific form (CHC, for Constrained Horn Clauses).To use the proof method for verifying a reduction of the input parameterized program, we apply the proof method to our proposed parameterized reduction, i.e., to the parameterized program R.
We implemented the construction of the parameterized program R and the constraint generation according to Hoenicke et al. [2017].We evaluated the approach on a set of 19 parameterized programs taken from the literature, by discharging the generated constraints with several off-theshelf CHC solvers.The results are very encouraging: The implementation succeeded in verifying the reductions of 14 programs, only 4 of which can be verified without the use of reductions.
It is noteworthy that our proposal for parameterized reductions (and therefore, the corresponding set of CHC constraints) have the desired property that any Ashcroft invariant of the original program is also a valid invariant for the reduced program.The converse does not hold; i.e., the reduction R may admit an Ashcroft invariant that is not a valid invariant of the original program P, and a proof in the form of an Ashcroft invariant may not exist for P even though it does for R.
The property of the conservative extension of the validity of an Ashcroft invariant from R to P does not, however, mean that we are (in practice) able to compute a proof in the form of an Ashcroft invariant for R whenever we are able to compute one for P. In fact, the parameterized program uses a set of additional variables as the means of encoding the reduction.It is thus natural to wonder whether the task of the CHC solver could somehow become harder because it has to deal with constraints over a larger set of variables, and, if so, whether anything can be done to alleviate this issue.We investigate this question systematically in Section 6 and propose an alternative encoding with fewer variables.This new encoding is an orthogonal contribution of this paper.It is inspired by the idea of symmetry reduction [Clarke et al. 1998].Intuitively, in the encoding based on Hoenicke et al. [2017], the solver is forced to prove the correctness of symmetry-equivalent classes of reductions.In Section 6.2 we demonstrate how the CHC encoding can be modified so that this redundancy is eliminated.
To conclude, this paper proposes a way of incorporating commutativity-based reductions into, in principle, any existing parameterized verification methodology.In particular, it makes the following contributions: • We observe that reductions simplify proofs of parameterized programs in a precise sense: Proofs of reductions require less complex ghost state than the proofs of original programs; this can manifest as the need for less complicated information to be recorded in ghost variables, or that simply fewer ghost variables are needed overall (Section 2).• The theoretical formulation of a parameterized reduction in two parts: (1) We formulate a lexicographical reduction of a parameterized program and show that it can be finitely represented, namely again as a parameterized program (Section 4).
(2) We propose an appropriate notion of preference orders for the parameterized context and show that the construction of a lexicographical reduction from a parameterized program can be made parametric on the preference order (Section 5).• We give an improved formulation of the search problem for an Ashcroft invariant, by breaking some inherent but redundant symmetries in the search space and the corresponding solution space without affecting soundness or completeness of the methodology (Section 6.2).

MOTIVATING EXAMPLE
We demonstrate the benefits of commutativity for proof simplification using the parameterized program P notify shown in Fig. 2.This program models a distributed system, in which one thread (called notifier) generates data through some computation (line 6-9), and broadcasts it to an unbounded number of listener threads (line 11-13).The threads communicate via a message queue, which is here modeled via an infinite queue array along with an integer current pointing to the head of the queue (specifically, to the first invalid entry).Each listener thread joins the conversation by setting its thread-local idx variable to the value of current.The listener then continuously waits for new data to appear in the queue (line 22).When data has arrived, it reads the message from the queue (line 23-24).In the next step, the listener checks the integrity of the received message.In particular, it checks that the received value is greater than the previous message (line [26][27][28]. Showing correctness of this program is non-trivial; even with ghost variables, a proof is challenging.An unbounded amount of time may pass between the moment when a message is sent by the notifier thread, and when the last listener receives it.Thus, for certain traces, one must keep track of the idx variables of unboundedly many listener threads, not just a finite subset of them.There exists however a subset of traces, for which the correctness argument is much simpler.Namely, consider those traces where every message sent by the notifier is immediately received and checked by all listeners that have already joined the conversation (i.e., all listeners that will ever receive the message).Let us call these traces synchronous.In synchronous traces, the difficulty of reasoning about an unbounded number of messages already sent but not yet received by some listener completely disappears.At any point, there is at most one such message, and consequently, the proof has to reason only about one message.
Of course, synchronous traces make up only a small fragment of the many interleavings of the program.To show correctness of the program, we must establish that every trace is correct.Here, commutativity comes to the rescue: We observe that for many statements of the program, the order in which they are executed does not affect the outcome.We say that such statements commute with each other.We exploit this observation by repeatedly swapping commuting statements, and thereby reorder any arbitrary trace of the program to an equivalent synchronous trace.Through a meta-argument (i.e., the soundness theorem of our approach), we establish that any trace that is equivalent to a correct synchronous trace must itself be correct.Thus, it suffices for a proof to show correctness of synchronous traces, in order to conclude that the program is correct.
Consider for instance the statements last:=data (line 9) and prev:=msg (line 28).Executing these statements in either order yields the same result, i.e., the statements commute with each other.Similarly, we can argue that all statements of the notifier thread commute with the statement prev:=msg .Therefore, we consider for instance the following traces to be equivalent: These equivalences allows us reorder entire iterations of the notifier thread, i.e., the computation and broadcast of new data, wrt. the statement prev:=msg .We proceed similarly with respect to the other statements of the listener thread, as well as for the statements of two different listener threads.
For some of these other statements, we must consider broader notions of commutativity.As an example, we cannot generally claim that the order in which the statements queue[current]:=data and msg:=queue[idx] are executed does not affect the outcome.Specifically, if we have current = idx, the order is in fact crucial.However, observe that the program ensures that, whenever the statement msg:=queue[idx] is executed, it actually holds that idx < current.In such contexts, the order in which the statements are executed is indeed irrelevant.Hence we can say that the statements commute within this particular program.
The essential insight of commutativity reasoning is this: It suffices for a proof to cover a so-called reduction of a program, i.e., a subset of traces such that each program trace is equivalent to a trace in the reduction.In our example, the reduction is formed by the set of synchronous traces.By soundness of commutativity, we can conclude that, if the reduction is proven correct, the entire program must be correct.In this manner, our approach can verify the program P notify by giving a proof for synchronous traces.As discussed, a proof for the set of synchronous traces is much simpler than a proof for all traces, as it does not require complex ghost state or quantified invariants.
As another example where commutativity simplifies the proof, let us consider the program P ± , with the thread template shown in Fig. 3.The program has a global variable x, which is initially 0. The program uses a constant  for which we assume a fixed value.Each thread repeatedly checks if the Fig. 3. Template for P ± current value of x is less than , and if so, increments x.It asserts that x is non-zero, eventually decrements x again, and begins the loop anew.
This program is similar to the example discussed in the introduction, yet due to the guard using the constant , the proof is in some sense simpler: The value of a ghost variable counting the number of threads in location ℓ 1 can never exceed .Thus, we can alternatively consider the local state of  other threads as ghost state.Specifically, if a thread is in location ℓ 1 , and some number  (with 0 ≤  ≤ ) of the  other threads are also in location ℓ 1 , we know that x ≥  + 1, and therefore, decrementing x does not violate the assert statement in any thread: Either we have  > 0, in which case x is still positive after the decrement, or  = 0, in which case none of the threads is in location ℓ 1 .
It has been shown that for any value of , a proof does indeed need to consider at least  additional threads as ghost state (and thus overall consider  + 1 threads at a time) in order to show correctness of this program [Hoenicke et al. 2017].However, commutativity simplifies the required ghost state.
Let us investigate the commutativity in P ± .Two statements x:=x-1 and x:=x-1 of different threads commute, as do two statements assume x < K; x:=x+1 and assume x < K; x:=x+1 of different threads.For the statements x:=x-1 and assume x < K; x:=x+1 , the order of execution may indeed matter.But whenever it is possible to execute the sequence assume x < K; x:=x+1 x:=x-1 , it is also possible to execute the sequence x:=x-1 assume x < K; x:=x+1 with the same effect (x is not modified), i.e., the latter sequence allows a strict superset of executions.Thus we can verify traces containing the sequence x:=x-1 assume x < K; x:=x+1 and conclude that traces containing the sequence assume x < K; x:=x+1 x:=x-1 are also correct.
Analogously to the example in Fig. 1, we exploit this commutativity (or semi-commutativity) to reorder any trace of the program such that all statements of a thread are executed in a single block.For the resulting reduction of the program, it is sufficient to consider the local state of a single additional thread as ghost state, rather than  threads.If a thread is in ℓ 1 , and the other thread (which serves as ghost state) is also in ℓ 1 , we know that x ≥ 2, so x remains positive after a decrement.If the "ghost thread" is not in ℓ 1 , neither thread executes the assert statement.Commutativity has again simplified the ghost state required to prove correctness of the program.

PARAMETERIZED CONCURRENT PROGRAMS
A parameterized program P is given by its thread template (a control flow graph) and a set of thread-local variables, i.e., P = ⟨Loc, Δ, ℓ init , Var local ⟩ with a finite set of locations Loc, a finite transition relation Δ ⊆ Loc × Stmt × Loc (where Stmt is the set of atomic program statements), an initial location ℓ init ∈ Loc, and a set of thread-local variables Var local .Any variable not in Var local is considered global.We denote the set of global variables as Var global .
The enabled statements enabled (ℓ) of a location ℓ are the statements st such that ⟨ℓ, st, ℓ ′ ⟩ ∈ Δ for some ℓ ′ .We assume that the only case in which enabled (ℓ) contains more than one statement is the case of a branch (or loop head), and thus enabled (ℓ) = {assume , assume ¬} for some branching condition (or loop guard) .This assumption is only required for the minimality of our reduction (Proposition 4.10); the soundness of our approach does not rely on it.
A parameterized program describes a family of programs.For each number of threads  ∈ N, the instance of the program with  threads is denoted by P ().The variables of the program instance P () consist of the global variables, as well as indexed local variables   for each  ∈ {1, . . ., } and  ∈ Var local .The program instance P () uses indexed statements st:, where st ∈ Stmt is a statement as it appears in the thread template, and the thread index  ∈ {1, . . ., } indicates which thread executes the statement.
Traces.A thread template defines a languages  over the alphabet Stmt, consisting of all sequences of statements that label any path from the initial location (regardless which location is reached in the end).
The language of an instance P () of the parameterized program P is a language of traces, i.e., sequences of indexed statements.For the language  defined by the thread template of P, let [] be the language  where every statement st has been replaced by the indexed statement st:.The program instance P () then defines the language of all traces allowed by the control flow of P: where ∥ denotes the shuffle operation on languages.
Semantics.We assume that each statement st ∈ Stmt has an associated semantics ⟦st⟧, given by a binary input/output relation between valuations of the program variables.In particular, the semantics of assignment statements := and assume statements assume  is as one would expect.
We extend this semantics to indexed statements.Executing the indexed statement st: may modify the global variables as well as the indexed local variables   , but leaves local variables of other threads unmodified.Formally, we define the semantics of an indexed statement as follows: where  1 ,  2 are valuations of the variables of P (), and  |  is the unique valuation of the program variables such that  |  () =  (  ) for local variables  and  |  () =  () for global variables .
Based on these semantics of atomic statements, we define the semantics of each program instance.A configuration of P () is a pair ⟨ ì ℓ, ⟩, where ì ℓ = ⟨ℓ 1 , . . ., ℓ  ⟩ ∈ Loc  denotes the control locations of the running threads, and  is a valuation of the variables of the program instance P ().We say that the configuration ⟨ ì ℓ, ⟩ is initial if ì ℓ = ⟨ℓ init , . . ., ℓ init ⟩.Let ⟨ ì ℓ, ⟩ be a configuration, such that ⟨ℓ  , st, ℓ ′  ⟩ ∈ Δ is a transition of the thread template, and such that there is a successor valuation  ′ with (,  ′ ) ∈ ⟦st:⟧.From this configuration, the program can execute st:.Thread  moves to control location ℓ ′  , whereas all other threads remain at the same location (ℓ ′  = ℓ  for all  ≠ ).We write ⟨ ì ℓ, ⟩ if there exists a corresponding sequence of configurations (called an execution) ⟨ ì Synchronous Statements.Our approach uses a particular kind of statements, so-called synchronous statements [Hoenicke et al. 2017].A thread can execute a synchronous statement to (atomically) update the local variables for all (unboundedly many) other threads.Synchronous statements have the form for  ≠  :   := where  and  are symbolic indices representing the thread whose variables are updated () and the thread that executes the statement ().The updated variable  must be a local variable ( ∈ Var local ).
The expression  may refer to global variables, as well as local variables   ,   indexed by  or .
Additionally, we allow  to refer to special variables pc i and pc j , which represent the current control locations of thread  resp..

Correctness and Proofs.
A specification for a parameterized program P consists of a precondition pre, and a partial map assert from program locations to formulae over the program variables.Both the precondition pre and an assertion assert (ℓ) may refer to global and local variables.The program P satisfies the specification ⟨pre, assert⟩ if for all numbers of threads , the following holds: ) ,   ⟩ of the program instance P (), such that  1 |  |= pre for all  ∈ {1, . . ., } and such that assert (ℓ ()

𝑗
).In the remainder of the paper, we always assume that a parameterized program is accompanied by a specification ⟨pre, assert⟩.For examples, we annotate the specification in the thread template (as in Fig. 1).We simply say that P is correct if P satisfies this specification.
As an aside, our approach can be extended to more general notions of (safety) specifications, e.g. a set of error states given by a generator set as in [Hoenicke et al. 2017].Such specifications allow for instance a direct encoding of mutual exclusion.However, since this is orthogonal to our contributions, we focus here on the simpler notion of specification as defined above.
In Section 6, as well as several examples, we consider a particular notion of proofs for parameterized programs: Ashcroft invariants.An Ashcroft invariant is a formula of the form where  is a quantifier-free formula, whose variables range over the global program variables, indexed local variables    (for  ∈ Var local ,  ∈ {1, . . .,  }) and variables pc i r (for  ∈ {1, . . .,  }) representing the current control location of thread   .The quantified variables symbolically represent  threads of the program.The premise 1≤ < ≤   ≠   expresses the fact that  1 , . . .,   indeed refer to  distinct threads.Thus, the conclusion  expresses a relation between the global variables, as well as the locations and local variables of any subset of  distinct threads of the program.We call the number of quantified variables  the width of the Ashcroft invariant.
An Ashcroft invariant is inductive for the parameterized program P, if it is an inductive invariant for every instance P (), assuming the precondition pre initially holds for every thread.Since we only consider inductive Ashcroft invariants, we omit the adjective from now on.
Finally, let us define what it means for an Ashcroft invariant to prove correctness of a parameterized program P. We say that an Ashcroft invariant ∀ 1 , . . .,   .(  ≠   ≠   ) →  is safe, if it is inductive, and for every location ℓ where assert (ℓ) is defined, the following entailment holds: If a safe Ashcroft invariant for a program P and a specification ⟨pre, assert⟩ exists, then P satisfies the specification ⟨pre, assert⟩.However, the reverse is not true.
Other Program Models.The model of parameterized programs is a natural model for certain classes of concurrent programs, e.g.GPU code and distributed protocols.More generally, most classes of concurrent programs can be encoded in parameterized programs.Hence our theoretical results can be expected to hold for a wide class of concurrent programs.In practical terms, such encodings may present a challenge for verification algorithms.For example, for structured parallel programs with sophisticated dependence graphs implemented using fork/join, the best practice would not be to encode the program in this model and try to verify it with our verification algorithm.The main burden in these cases is that the inductive invariant for the program may have to recover part or all of the structure lost from the original model, and this can be unreasonable to expect from an automated invariant generator.Smaller extensions of the model, such as allowing a finite number of different thread templates, as in Fig. 2, are more straightforward and are indeed supported by our implementation.

REDUCTIONS OF PARAMETERIZED PROGRAMS
In this section, we discuss commutativity-based reductions.We introduce the underlying formalism, which has previously been used for fixed-thread programs, and discuss how it generalizes to parameterized programs.Then we present our first key contribution: a finite representation of an infinite family of commutativity-based reductions.
To begin, let us quickly summarize the basics of commutativity theory.The most fundamental notion is a commutativity relation between statements.Specifically, in this work we say that two (indexed) statements st 1 : and st 2 :  (with  ≠ ) commute, denoted st 1 : ↷ ↷ st 2 : , if executing them in either order yields the same semantics, i.e., ⟦st 1 : st 2 : ⟧ = ⟦st 2 :  st 1 :⟧.We discuss broader notions of sound commutativity in Section 7.
The commutativity relation over statements defines an equivalence relation on traces.We say that two traces  1 and  2 are equivalent if  2 can be derived from  1 by repeatedly swapping adjacent commuting statements.Note that, by repeated application of the definition of commutativity, equivalent traces have the same semantics.Consequently, it suffices to show that one trace satisfies a specification in order to conclude that all equivalent traces are correct as well.
Motivated by this observation, one can introduce the concept of a reduction.A set of traces  ′ is a reduction of another set of traces  if  ′ ⊆ , and for each trace in  there exists an equivalent trace in  ′ .It follows that if we prove that all traces in a reduction  ′ are correct, we can soundly conclude that all traces in the set  are correct.Specifically, we are interested in reductions of the language of traces given by a program instance P () for a fixed number of threads .

A Family of Reductions
It has been shown that commutativity-based reduction can lead to simpler proofs for concurrent programs with a fixed number of threads.In particular, the proof for a (suitably chosen) reduction of a program may be within reach of algorithmic verification, whereas a proof for the entire program may not.
Example 4.1.Let us consider the program P ± as discussed in the introduction, with the template shown in Fig. 1.For any fixed number of threads , the instance P ± () is correct.In this case, the proof for the (unreduced) program is comparatively simple: The instance P ± () can be proven correct with the assertions x ≥ 0, x ≥ 1, . . ., up to x ≥ .Note however that the proof size, i.e., the required number of assertions, grows with the number of threads.Since the increment and decrement of x commute, as do two increments resp.two decrements, we can apply commutativity to simplify the proof.We define, for each number of threads , a reduction R ± (): a set of traces that contains, for each equivalence class of traces in P ± (), the representative trace in which each thread executes all its statements in the trace in a single block.Thus R ± () can be written as R ± () =  1  2 . . .  , where   = x:=x+1: x:=x-1: *  + x:=x+1: .In traces of this reduction, the value of x reaches a value ≥ 2 only if the last statement executed by some thread  is an increment without a matching decrement.In this case, x never falls below 2 again, as every future decrement is preceded by a matching increment.Consequently, the resulting reduction R ± () can be proven correct with only the assertions x ≥ 0, x ≥ 1, x ≥ 2, for any number of threads .
In this work, we are concerned with proof simplification for parameterized concurrent programs, with an unbounded number of threads.Thus, we are searching for one uniform proof that proves a program P correct for all numbers of threads .A key insight is that commutativity can similarly lead to proof simplification in this setting.
Specifically, suppose that for each , we have proven correctness of a reduction R () of the program instance P () with  threads.Then, by soundness of commutativity for a fixed number of threads, we can conclude that each P () is correct, i.e., the parameterized program P is correct.Furthermore, if the proofs of reductions for different  have a similar structure, we can hope to find one uniform, finite proof for the parameterized program P.
Example 4.2 (continued from Example 4.1).Let us consider again the program P ± , and the claim that each reduction R ± () can be proven correct with the assertions x ≥ 0, x ≥ 1 and x ≥ 2. Specifically, each trace in the reduction R ± () can be given a correctness proof (an annotation of the trace) using the following Hoare triples, instantiated for all  ∈ {1, . . ., }: The proof simplification is significant: Without reduction, a proof of the program P ± requires a ghost variable that counts the number of threads that have incremented but not yet decremented x.
Up to this point, the basis for our considerations has been an infinite family of reductions R () ∈N .In order to arrive at an effective proof method for parameterized programs, one crucial step is missing: We need a way to effectively construct a finite representation of this family.

Parameterized Reductions
The key insight behind our first contribution is this: Observation 4.3.For every parameterized program P, there exists an infinite family of reductions R () ∈N such that the entire family can again be represented as a parameterized program.
Representing a family of reductions as a parameterized program enables us to reuse the many mature existing methods for verification of parameterized programs, and to combine them with commutativity-based reduction.
For a fixed , a finite automaton recognizing a reduction R () can be constructed using the concept of sleep sets [Farzan et al. 2022]: In addition to the control locations of the threads, the sleep set automaton tracks a set of (indexed) program statements, the eponymous sleep set.In each state, the sleep set automaton prevents transitions labeled by statements in the state's sleep set.After each transition, the sleep set is updated, i.e., statements are removed and added depending on their commutativity with the statement labeling the transition.Consider the illustration of an automaton for R (2) in Fig. 4. Initially, the sleep set is empty.When traversing the edge labeled st 2 : 2 , we add  sleep set, because it has a smaller thread index than st 2 : 2 , and because we assume in this example that st 1 : 1 and st 2 : 2 commute.In the next state, as st 1 : 1 is in the sleep set, we prune the corresponding edge.Any trace that would be accepted via this edge is equivalent to a trace where st 2 : 2 and st 1 : 1 are swapped, and this trace is already accepted by a run via the left-most st 1 : 1 -transition.
If we assume that st 1 : 1 also commutes with st 3 : 2 , we can keep st 1 : 1 in the sleep set after traversing st 3 : 2 , and again prune the corresponding edge in the right-most state.If on the other hand st 1 : 1 and st 3 : 2 did not commute, we would instead remove st 1 : 1 from the sleep set and preserve the transition.The sleep set technique as explained here can be applied for any fixed number of threads .However, each  yields a different language, and this approach does not lead to a uniform representation for the family of reductions.The key insight which enables such a uniform finite representation is the observation that for the correctness, we are only interested in feasible traces.Thus, we can encode the family of reductions through an instrumentation of the original program's thread template.For each thread instance , we add a boolean variable sleep  , which keeps track of whether thread  (resp.its currently enabled statements) are in the sleep set.Consequently, when sleep  is true, thread  must not make a move.In other words, any trace where thread  makes a move while sleep  is true must be infeasible.By shifting from an explicit mechanism (computing sleep sets, and removing edges from an automaton) to a symbolic approach, we thus arrive at a uniform finite representation of the family of reductions.
This instrumentation deviates slightly from the explanation above.Instead of tracking statements in the sleep set, we track the threads that would execute these statements.I.e., the variable sleep  of a thread  is true, if the thread's next enabled statements enabled (ℓ  ) are in the sleep set.(In case multiple statements are enabled, i.e., at a branch or loop head with enabled statements assume  and assume ¬ , either both statements are in the sleep set, or neither is.)This shift from tracking individual letters in the sleep set to tracking the threads pays off in terms of the complexity added to the state space of the instrumented program: We need only a single boolean variable, rather than one variable for each statement that appears in the thread template.
We define a formula which expresses that thread  is in a control location whose enabled statements commute with a given statement st: executed by a different thread .
The commutativity test is used in the instrumentation of statements.
Definition 4.5 (Instrumented Statements).Let st be a statement.We define the instrumented statement  (st) as the atomically executed block of statements An instrumented statement  (st) first checks if its thread is in the sleep set, and if so, blocks.Otherwise, i.e., if the statement is allowed to execute, the instrumentation performs the update of the sleep set through a synchronized statement [Hoenicke et al. 2017] that modifies the sleep , Vol. 1, No. 1, Article .Publication date: November 2023.
Fig. 5. Template for P ± sleep variables of all (unboundedly many) other threads .Finally, the original statement st executes.
Recall that here, the symbols  and  are part of the syntax of synchronized statements rather than logical variables: The symbol  represents the thread executing the statement, and  represents any other thread.
Note that the instrumentation refers to a thread-local integer variable id .We add such a (nondeterministically initialized) ID variable to the thread template to serve as a tie-break.If two threads can move, but allowing both to execute statements would result in equivalent traces, we must identify which thread should go first.We assume that all thread IDs are pairwise distinct.Thus, these thread IDs allow us to distinguish the thread instances and to decide: If the enabled statements of two threads commute, the thread with a smaller ID moves first.Figure 5 shows the thread template for the sleepinstrumented program P ± sleep corresponding to the program P ± shown in Fig. 1.Since all statements of P ± commute, the commutativity tests comm( , x:=x+1: ) and comm( , x:=x-1: ) both resolve to the formula pc j = ℓ 0 ∨ pc j = ℓ 1 .
The sleep-instrumented program P sleep serves as uniform finite representation for the family of reductions R () ∈N .To formalize this relationship, let  be the inverse of , i.e., a mapping between statements such that  ( (st)) = st.We extend this mapping to indexed statements, traces, and sets of traces in the natural way.The following key result formally expresses that the sleep-instrumented program describes a family of reductions R () ∈N of the original program, modulo feasibility.Theorem 4.8 (Reduction).Let Feas be the set of all feasible traces.For each number of threads , the set of traces  (P sleep () ∩ Feas) is a reduction of P () ∩ Feas, i.e., the feasible traces of P ().
We consider only feasible traces, since the reduction works based on the guards (¬sleep) added to each transition.This is necessary to describe the reduction as a parameterized program: Only when we fix the number of threads , the sleep guards and updates can be evaluated.Theorem 4.9 (Soundness).The sleep-instrumented program P sleep is correct iff P is correct.
The reduction achieved by the instrumentation is minimal: We retain only one representative per equivalence class, and hence a strict subset cannot be a reduction.This means that we do not unnecessarily burden the verification with the proof of redundant traces; the instrumentation fully realizes the benefit of commutativity.Proposition 4.10 (Minimality).For every feasible trace  of P, the sleep-instrumented program P sleep has exactly one feasible trace  ′ such that  ( ′ ) is equivalent to .
As demonstrated in Section 2, there exist programs such that no proof of the program without non-trivial ghost state exists, but where some reduction of the program has a simple proof.We investigate this phenomenon for the sleep-instrumented program.To make this precise, we fix Ashcroft invariants as our notion of proof, and consider a simple example.Consider again the program P ± , with the template shown in Fig. 1.There does not exist a safe Ashcroft invariant of any width for the program P ± [Hoenicke et al. 2017].Intuitively, the invariant would have to express the information that the value of the global variable  is always greater than or equal to the number of threads that have executed the increment but not yet the decrement.Ashcroft invariants cannot express this information (for a formal argument, see [Hoenicke et al. 2017]).
Figure 6 shows a safe Ashcroft invariant (of width 2) for the sleep-instrumented program P ± sleep (shown in Fig. 5).This invariant uses the fact that, in the traces of the reduction, if the value of x exceeds 2, it never falls below 2 again.Any trace of the original program is equivalent to a trace in the reduction, since increments and decrements commute and can be arbitrarily reordered.
Let us examine some traces of P ± sleep (2) to see how the Ashcroft invariant proves the correctness of traces in R ± (2) as well as outside R ± (2) by using the variables sleep  and sleep  .We assume that id 1 < id 2 .Given a trace of P ± sleep (2), we instantiate  := 1 and  := 2 in the Ashcroft invariant, insert concrete values for pc 1 and pc 2 , and simplify the formula to get an inductive annotation of the trace.For instance, the trace x:=x+1: 1 x:=x+1: 2 x:=x-1: 1 of P ± (2) is not included in the reduction R ± (2).We get the following annotation for the corresponding instrumented trace: {x ≥ 0}  (x:=x+1): Consider in particular the last Hoare triple.Since sleep 1 holds, and  (x:=x-1): 1 assumes ¬sleep 1 , the last statement cannot be executed, i.e., the trace is infeasible (hence, any postcondition holds afterwards).By contrast, consider the annotated trace corresponding to x:=x+1: 1 x:=x+1: Note again the last Hoare triple.The assumption ¬sleep 2 by the statement  (x:=x-1): 2 together with the precondition ensures that x ≥ 2, and thus x ≥ 1 still holds after the decrement.The final postcondition x ≥ 1 however does not prevent us from extending the trace with the statement  (x:=x-1): 1 , yielding the postcondition x ≥ 0. While the resulting trace would not correspond to a trace in the reduction R ± (2), the Ashcroft invariant does not prove its infeasibility.Instead, it simply proves that the trace satisfies the specification.
Even in cases where a safe Ashcroft invariant of some width  exists, sleep instrumentation can simplify the proof.Specifically, sleep instrumentation can reduce the minimum width of a safe Ashcroft invariant.
Example 4.12.Consider the program P ± with the template shown in Fig. 3, where x is a global integer variable.Given a fixed value for the constant , a safe Ashcroft invariant for this program must have at least width  + 1 [Hoenicke et al. 2017].However, the Ashcroft invariant shown in Fig. 6 has width 2, and is safe for the sleep-instrumented program P ± sleep , for every value of .
The following theorem states that, if we already have a proof (i.e., a safe Ashcroft invariant) for the original program P, there also exists a safe Ashcroft invariant for the sleep-instrumented program.Hence, more (and by Example 4.11, strictly more) programs can be proven correct with our instrumentation than without.Additionally, the theorem shows that a proof of the sleepinstrumented program P sleep need never be more complicated than a proof of the original program P; and in cases such as Example 4.12, it may be strictly simpler.Theorem 4.13 (Conservative Extension).Every safe Ashcroft invariant for a program P is a safe Ashcroft invariant for the corresponding sleep-instrumented program P sleep .

REDUCTIONS BEYOND SEQUENTIAL COMPOSITION
Up to this point, we have considered a very restricted class of reductions based on thread ordering: A thread  could only change sleep  from false to true if id  < id  .If all statements of different threads commute, the resulting reduction is the sequential composition of threads: As soon as a thread with a higher ID takes a step, all threads with lower ID are "put to sleep" and never awakened again.If we do not have total commutativity, the reduction overapproximates sequential composition.
Recall the program P notify discussed in Section 2 (as shown Fig. 2).For this program it is crucial to align the "sends" (i.e., writes to the queue array) in the notifier thread with all the "receives" (i.e., reads from the queue array) in the listener threads in order to find a simple proof.The (approximation of) sequential composition would not provide sufficient opportunity for simplification.Therefore, in this section, we widen our view to consider the larger class of lexicographical reductions [Farzan et al. 2022], which have been shown to be practically useful reductions for program verification.
Previous work [Farzan et al. 2022] uses preference orders to describe different reductions of fixed-thread programs.A preference order is a total order over program traces (or, more generally, words over some alphabet).It can be used to define a reduction as follows: Definition 5.1 (Definition 4.2 in [Farzan et al. 2022]).Let  be a language over an alphabet Σ, and let ⪯ be a total order over Σ * .The reduction of  induced by ⪯ is denoted red ⪯ () and contains, for each equivalence class, only the minimal trace wrt. the preference order.
In this work, we focus on the class of positional lexicographic preference orders [Farzan et al. 2022].Positional lexicographic preference orders are a generalization of a lexicographic orders over program traces, where the underlying order on statements may differ depending on the current program locations of all threads.The reductions induced by positional lexicographic preference orders are called lexicographical reductions.
We extend the concept of (positional lexicographic) preference orders to parameterized programs.(we omit the superscript ).
A parameterized preference order is thus given by the choice of the underlying ordering of threads (all statements of the same thread are ordered the same).As the threads move to different control locations, the ordering of threads assigned by a parameterized preference order may change.Thus the reduction may differ significantly from the sequential composition of threads.
Definition 5.3 (Pairwise Preference Order).Let  ⊆ Loc 2 be a total and transitive relation.The pairwise preference order induced by  is the parameterized preference order (≼  ) ∈N such that Thus, the ordering of threads ,  wrt. a pairwise preference order only depends on the control locations ℓ  , ℓ  of  and  in the tuple ì ℓ.The locations of other threads do not play a role.If the pair ⟨ℓ  , ℓ  ⟩ is in , and the reversed pair ⟨ℓ  , ℓ  ⟩ is not in , we prefer thread .If both the pairs ⟨ℓ  , ℓ  ⟩ and ⟨ℓ  , ℓ  ⟩ are in , we prefer the thread with a smaller ID.By totality of  (i.e., ⟨ℓ, ℓ ′ ⟩ ∈  ∨ ⟨ℓ ′ , ℓ⟩ ∈  for all ℓ, ℓ ′ ∈ Loc), one of two compared threads must always be preferable over the other.
Proof.We have to show that for every  and ì ℓ ∈ Loc  , the induced relation ≼ ì ℓ is a total order over the set of thread indices {1, . . ., }.
Reflexivity Follows from totality, which is shown below.□ Section 4 considers the special case that  = Loc 2 , such that we have  ≼ ì ℓ  ⇐⇒ id  ≤ id  .The class of pairwise preference orders also includes other interesting orders.
Example 5.5 (Lockstep Order).For each ℓ ∈ Loc, let  (ℓ) be the minimum length of a path (in the thread template) from the initial location to ℓ.We define the transitive and total relation  = { ⟨ℓ, ℓ ′ ⟩ |  (ℓ) ≤  (ℓ ′ ) }.The induced pairwise preference order mimics lock-step execution: Whenever a thread  has "fallen behind" a thread  (i.e.,  (ℓ  ) <  (ℓ  )), thread  is preferred over thread  and is thus allowed to "catch up".When the locations of both threads have the same depth, i.e., ⟨ℓ  , ℓ  ⟩ ∈  and ⟨ℓ  , ℓ  ⟩ ∈ , the thread with the smaller ID is preferred and takes the next step.
Let us once again consider the program P notify .For this program, the reduction which admits the simple proof discussed in Section 2 is induced by lockstep order.
The construction of our instrumented program P sleep can be generalized to arbitrary pairwise preference orders.To this end, and for the remainder of the paper, let  ⊆ Loc 2 be a total and transitive relation.In order to represent the reduction wrt.any pairwise preference order again as a parameterized program, we define: The preference test pref (, ) evaluates to true if, in the current program configuration ⟨ ì ℓ, ⟩, our pairwise preference order prefers the statements of thread  over the statements of thread , i.e., if  ⪯ ì ℓ .We modify the instrumentation of statements (Definition 4.5) to use the preference test.Definition 5.7 (Instrumented Statement with Preference Test).Let st be a statement.We define the instrumented statement  (st) as the atomically executed block of statements Our results in Section 4 (Theorems 4.8, 4.9 and 4.13 and Proposition 4.10) still hold for the modified instrumentation, and for every pairwise preference order.

FINDING ASHCROFT INVARIANTS FOR A REDUCTION
We apply the approach of thread-modular verification at many levels [Hoenicke et al. 2017] to find proofs of parameterized programs, in the form of Ashcroft invariants.We show how this approach can be applied to the sleep-instrumented program P sleep to verify a reduction of a parameterized program.

Thread-Modular Verification of Reductions
In thread-modular verification at many levels [Hoenicke et al. 2017], the existence of a safe Ashcroft invariant of some fixed width  for a program P is encoded through a constraint Horn clause (CHC) system.This CHC system, which we denote TM(P, ), uses a single uninterpreted predicate symbol Inv(, pc 1 ,  1 , . . ., pc k ,   ).The parameter  represents the global variables of the program.The parameters pc i and   represent the current control locations resp.the thread-local variables of  different thread instances.
We can apply an off-the-shelf CHC solver to check satisfiability of this CHC system.If the system is unsatisfiable, there does not exist a safe Ashcroft invariant of width .However, this does not mean that the program is incorrect.It might simply be that every safe Ashcroft invariant has a width larger than , or that there does not exist a safe Ashcroft invariant of any width, yet the program is still correct.If on the other hand the CHC system is satisfiable, we can construct an Ashcroft invariant from a solution.Lemma 6.1 (Lemmas 1 and 3 in [Hoenicke et al. 2017]).If Φ Inv is a solution of TM(P, ), then the formula ∀ 1 , . . .,   .
is a safe Ashcroft invariant (of width ) for the program P.
We apply the same methodology to the sleep-instrumented program P sleep , yielding the CHC system TM(P sleep , ).Since P sleep is again a parameterized program, no conceptual changes are required.In particular, the thread-modular CHC encoding supports the synchronized statements used by our instrumentation to update the sleep variables of all (unboundedly many) other threads [Hoenicke et al. 2017].Figure 7 shows the resulting CHC encoding for TM(P sleep , ).We call the encoding TM(P sleep , ) the symbolic-sleep encoding, to distinguish it from the explicit-sleep encoding introduced in Section 6.2.
Intuitively, the clauses describe an invariant predicate Inv that must hold for any subset of  distinct threads (mirroring the structure of Ashcroft invariants).The clause Initial establishes that the invariant holds initially.For any  threads  1 , . . .,   , the Inductivity clauses demand that the invariant must be preserved if any of the threads  1 , . . .,   makes a step, whereas Non-Interference Initial: Inductivity (for each edge ⟨ℓ, st, ℓ ′ ⟩ ∈ Δ and each  ∈ {1, . . .,  }): Non-Interference (for each edge ⟨ℓ, st, ℓ ′ ⟩ ∈ Δ): Safety (for each  ∈ {1, . . .,  } and ℓ ∈ Loc where assert (ℓ ) is defined): imposes that the invariant is preserved if another thread (denoted ★) makes a step.Replacing any of the threads   by the thread ★ yields another set of  distinct threads, so we may assume that the invariant Inv holds for any of these sets.This yields the additional premises in the Non-Interference clause.Finally, Safety ensures that a solution to the CHC system describes a safe Ashcroft invariant.
Proposition 6.2.If the CHC system TM(P sleep , ) is satisfiable, the program P is correct.
Proof.Follows from Lemma 6.1 and Theorem 4.9.□ In analogy to Theorem 4.13, the symbolic-sleep CHC encoding TM(P sleep , ) behaves conservatively wrt. the encoding TM(P, ) of the original program.Observation 6.3.Any solution to TM(P, ) is also a solution to TM(P sleep , ).Moreover, there are cases where TM(P, ) has no solution, but TM(P sleep , ) does.

Breaking Symmetry with the Explicit-Sleep Encoding
Despite Observation 6.3, it is not clear that a CHC solver will be faster to find a solution when applied to the symbolic-sleep encoding TM(P sleep , ) than when applied to the encoding TM(P, ).In order to gain a systematic understanding of how easy or difficult it is for a CHC solver to find a solution, let us introduce the notions of search and solution space.Definition 6.4 (Search and Solution Space).Let C be a CHC system over a single predicate symbol  ( 1 , . . .,   ) of arity .The search space Search(C) of possible solutions to C is the set of all first-order formulae Φ  ( 1 , . . .,   ) whose free variables lie in {  1 , . . .,   }.
The solution space Sol(C) denotes the subset of the search space Search(C) containing exactly all those predicates Φ  ( 1 , . . .,   ) that satisfy the given CHC system C.
A larger solution space means that a solver is more likely to find a satisfying solution to a CHC system, whereas a larger search space is indicative of potential additional effort to rule out other predicates.In particular, while sleep instrumentation does somewhat increase the search space (it introduces new variables), it leads to a significantly and qualitatively larger solution space: Most importantly, for some programs, Sol(TM(P, )) is empty while Sol(TM(P sleep , )) is not.
Our evaluation (Section 8) shows that for some programs which can be proven without reduction, we observe a notable overhead for the instrumented version, due to the increased search space.To minimize this overhead, we further improve upon the CHC system TM(P sleep , ), by decreasing the search space.The improvement is quantitative, i.e., does not increase the expressivity of the approach, but rather serves to allow CHC solvers to find a solution faster.
We observe that solutions to the symbolic-sleep encoding TM(P sleep , ) typically include redundant information due to symmetry: Since the ordering of threads expressed by the thread IDs is nondeterministic, solutions often need to make case distinctions covering all possible orderings.Intuitively, we force the solver to prove correctness of a symmetry-equivalence class of reductions.To illustrate this, consider again the Ashcroft invariant in Fig. 6.In the last two conjuncts, the invariant makes a case distinction over the ordering of thread IDs.While this is a small example, and the Ashcroft invariant in Fig. 6 is still relatively simple, in general (for larger programs, and larger ), such case distinctions can result in a factorial (in ) number of symmetric conjuncts.
In order to avoid paying this additional cost, we take advantage of the symmetry between threads.Symmetry reductions [Clarke et al. 1998] have been widely used for parameterized systems to reduce the search space of analyses.The idea behind symmetry reductions is closely connected to our observations: Instead of naïvely enumerating all possible cases of a nondeterministically chosen order, and recovering the same (or rather, symmetric) results for each case, one focuses on a single fixed order.
In our case, we fix the ordering of the  threads considered by the CHC predicate symbol Inv(, id 1 , pc 1 ,  1 , . . ., id  , pc k ,   ) such that we always have id 1 < . . .< id  .This allows us to simplify the CHC system.In particular, it allows us to resolve the comparisons of thread IDs in the preference test (Definition 5.6) statically.Definition 6.5 (Explicit-sleep Preference Test).For  ∈ {1, . . .,  },  ∈ {1, . . ., , ★} and  ′ ∈ {1, . . .,  }, we define the explicit-sleep preference test as the formula Recall that  is a total and transitive relation which induces a pairwise preference order (Definition 5.3).We pass two parameters ,  ′ for the second thread, in order to account for the arbitrary ordering (represented by  ′ ) of the interfering thread (represented by  = ★) wrt. to the  other threads (represented by ).I.e., we could have id ★ < id 1 (i.e.,  ′ = 1), or id 1 < id ★ < id 2 (i.e.,  ′ = 2), . . ., or id  < id ★ (i.e.,  ′ =  +1).In order to cover all cases, we introduce one non-interference clause for each of these  + 1 possible orderings.Thanks to this explicit case distinction, all comparisons between thread IDs are resolved statically.Consequently, we eliminate the thread IDs from the CHC system completely.
Figure 8 shows the resulting explicit-sleep CHC encoding TM sleep (P, ), for a program P and width .Note that the explicit-sleep encoding receives the original program P as input; the sleep instrumentation is performed as part of the encoding.Nevertheless, we semantically connect this encoding to the sleep-instrumented program P sleep , in a manner analogous to Lemma 6.1.Proposition 6.6 (Explicit-Sleep Soundness).Let Ψ Inv be a solution to TM sleep (P, ).Then ∀ 1 , . . .,   .id  1 < . . .
is a safe Ashcroft invariant (of width ) for P sleep .
Corollary 6.7.If the explicit-sleep encoding TM sleep (P, ) is satisfiable, the program P is correct.
Proof.Follows from Proposition 6.6 and Theorem 4.9.□ The following proposition states that in a certain sense, the symbolic-sleep encoding and the explicit-sleep encoding are equivalent.Consequently, the explicit-sleep encoding still encodes the existence of an Ashcroft invariant of width  for the sleep-instrumented program.Proposition 6.8 (Eqisatisfiability).The explicit-sleep encoding TM sleep (P, ) is satisfiable iff the symbolic-sleep encoding TM(P sleep , ) is satisfiable.
Proof idea.If Φ Inv is a solution for the symbolic-sleep encoding, then is a solution for the explicit-sleep encoding.If Ψ Inv is a solution for the explicit-sleep encoding, then is a solution for the symbolic-sleep encoding, where S  denotes the set of all permutations over the set {1, . . .,  }. □ The factorial explosion inherent in the case distinction over all permutations of threads is precisely the cost we seek to avoid through the explicit-sleep encoding.Because the explicit-sleep encoding does not use variables for the thread IDs, the search space Search(TM sleep (P, )) for the explicit-sleep encoding is a strict subset of the search space Search(TM(P sleep , )) for the symbolic-sleep encoding.The above proposition clarifies that we neither lose expressivity, nor do we gain qualitative proof simplification, i.e., the solution space Sol(TM(P sleep , )) is empty if and only if the solution space Sol(TM sleep (P, )) is empty.Beyond that, the solution spaces are difficult to compare, as solutions range over different sets of variables.However, symmetry reduction has been shown to be practically beneficial in many settings [Clarke et al. 1998].And indeed, Section 8 confirms empirically that the explicit-sleep encoding has significant practical benefit over the symbolic-sleep encoding when using state-of-the-art CHC solvers.

Inductive Invariants of Reduction Families
Proposition 4.10 states that the sleep-instrumented program represents a family of minimal reductions: Every equivalence class of traces is represented by a single trace in the reduction; if that representative is removed, the remaining set of traces is no longer a reduction.The intention is to not burden the verification with the proof of any redundant traces.
However, this "minimality" refers to the family of infinite-state programs P sleep () ∈N .When we fix a notion of finite proofs for the parameterized program P sleep , we are settled with a particular expressiveness to describe this infinite family of programs.It is not clear a priori that a certain kind of proof is expressive enough to fully benefit from the minimality of the reduction.And in fact, if we consider Ashcroft invariants, we observe that the expressiveness Ashcroft invariants gain through sleep manifestation depends crucially on the invariants' width.Specifically, for Ashcroft invariants of width 1, no expressivity is gained through the reduction.Proposition 6.9 (Collapse at width 1).Suppose there exists an Ashcroft invariant of width 1 for the sleep-instrumented program P sleep .Then there also exists an Ashcroft invariant of width 1 for the original program P.
Intuitively, the additional expressive power through sleep instrumentation can only be harnessed through relational assertions, i.e., assertions that relate the local variables (including program counter and sleep variables) of different threads.An Ashcroft invariant of width 1 does not include such relational assertions.It cannot even distinguish two threads.Hence, the Ashcroft invariant can either claim that all threads are asleep (which is unsound, as there is always at least one thread awake), or that none of the threads are asleep (i.e., there is no reduction).
By contrast, we have seen that for Ashcroft invariants of width 2 (and consequently, any higher width), we gain expressivity through sleep instrumentation.However, the fact that such invariants can benefit from reduction does not imply that they can precisely capture the infinite family of minimal reductions R () ∈N corresponding to a sleep-instrumented program P sleep .An Ashcroft invariant may simply capture overapproximations of the minimal reductions, which nevertheless allow for significant (qualitative) proof simplification (as in Example 4.11).Indeed we observe: However, in the program P ( + 1), we can reach a configuration with the control locations ⟨ℓ 1 , . . ., ℓ 1 , ℓ 2 ⟩, such that the sleep set is empty.In particular, if the last step is the execution of st 2 : + 1, this statement does not commute with the enabled statements of all other threads, thus the sleep set is emptied.We instantiate the Ashcroft invariant ∀ 1 , . . .,   . such that it considers the first  threads ({ 1 , . . .,   } = {1, . . .,  }) in increasing order of IDs (id  1 < . . .< id   ).Equation ( 9) prescribes that the sleep variables sleep  1 , . . ., sleep   −1 are true, when indeed for this configuration, they are all false.Thus the configuration, while reachable, does not satisfy the Ashcroft invariant.□ Note that the key obstacle to precisely capturing the reduction in the above proof was the non-commutativity of statements st 2 : and st 2 : .We observe: Observation 6.11.If all statements of different threads commute, an Ashcroft invariant of width 2 can capture a tight overapproximation of the reduction inherent in P sleep .
Explanation.The following Ashcroft invariant precisely captures the control flow: In other words, as soon as a thread  takes a step, all threads with smaller IDs are put to sleep and never awakened again.This is satisfied by all reachable configurations of P sleep , and the Ashcroft invariant is precise: While it may include unreachable configurations ⟨ ì ℓ, ⟩ of P sleep , such configurations are either (i) unreachable due to data constraints (not due to the reduction), or (ii) there is a reachable configuration ⟨ ì ℓ,  ′ ⟩ with the same control locations and variable values, except that  may assign more sleep variables to ⊥.The latter case is not problematic however, because ⟨ ì ℓ, ⟩ and ⟨ ì ℓ,  ′ ⟩ satisfy the same invariants over program variables, and all executions possible from ⟨ ì ℓ, ⟩ are also possible from ⟨ ì ℓ,  ′ ⟩. □

BROADER NOTIONS OF SOUND COMMUTATIVITY
We have so far focused on one particular notion of commutativity (see Section 4): Executing commuting statements in either order must yield the same semantics.The framework of commutativity theory however admits more general notions of commutativity, from which verification can benefit.Specifically, we extend our approach along two lines: Contextual Commutativity.The position of statements inside a program, and in an execution, provides a rich context which can benefit commutativity.Consider for instance the statements queue[current]:=data (line 12) and msg:=queue[idx] (line 23) from the program P notify shown in Fig. 2.These statements do not, in general, commute; in the case that idx = current, executing the statements in different orders yields different semantics.
However, it is clear from the code of P notify that, in every state where these statements are enabled, it holds that idx < current.In such contexts, the order of execution does indeed not matter.Hence, we say that the statements commute in the context idx < current (or, more broadly, in the context idx ≠ current).
Semi-Commutativity.Commutativity as in Section 4 defines a symmetric relation: If st 1 : commutes with st 2 : , then st 2 :  commutes with st 1 :.The semantics of both execution orders are equal, and consequently we can swap the statements in either direction to get an equivalent trace.Let us entertain a non-symmetric variation.Consider for instance the statements current:=current+1 (line 13) and assume idx < current (line 22) from the program P notify shown in Fig. 2.These statements do not commute.Specifically, the execution of assume idx < current: current:=current+1: blocks if we have idx = current, the execution of current:=current+1: assume idx < current: does not.Generally, executing the increment of current first allows a strict superset of executions.Thus, it is sound to eliminate a trace in which the increment happens after the assume statement in favor of a trace with the opposite order, but the reverse is not true.
Without these generalized notions of commutativity, the program P notify would not admit a simple proof.We thus extend our approach.
Definition 7.1 (Contextual Semi-Commutativity).Let  be a formula over global program variables and local variables indexed by  or .Statements st 1 : and st 2 :  semi-commute in the context , denoted st 1 : ↷  st 2 : , if for all states ,  ′ such that  satisfies , the following implication holds: The general framework of commutativity theory is adapted accordingly.In place of an equivalence relation, we now consider a preorder over traces (i.e., we lose symmetry).Specifically, we say that a trace  1 is covered by a trace  2 if  2 can be derived from  1 by a sequence of swaps of adjacent statements, where for every swap from a trace  ′ (st 1 :) (st 2 : )  ′′ to a trace  ′ (st 2 : ) (st 1 :)  ′′ , we must have st 1 : ↷  st 2 :  for some  that always holds after the execution of the prefix  ′ .A reduction language of traces  is then a subset  ′ where for every trace  ∈ , there exists some trace  ′ ∈  ′ such that  is covered by  ′ .
We modify the sleep instrumentation to account for contextual semi-commutativity by redefining the commutativity test.To this end, we assume the existence of mapping from indexed statements st 1 :, st 2 :  to commutativity conditions  comm (st 1 :, st 2 : ), i.e., formulae over global variables as well as local variables of threads  and , such that st 1 : ↷  comm (st 1 :,st 2 :  ) st 2 : .At this point it is crucial that in the instrumentation  (st) of a statement st, the update of the sleep variables, including the evaluation of the contextual semi-commutativity test, is performed before the original statement st.Otherwise the instrumentation would not faithfully reflect contextual semi-commutativity and might become unsound.In the implementation of our approach (see Section 8), we generate commutativity conditions  comm (st 1 :, st 2 : ) by encoding semi-commutativity as a first-order logic formula and applying an abduction algorithm to find sufficient conditions to guarantee it.
The modified sleep set instrumentation with contextual semi-commutativity tests still represents a reduction (Theorem 4.8) and satisfies soundness (Theorem 4.9) as well as conservative extension (Theorem 4.13).Furthermore, the CHC encodings introduced in Section 6 can be used with the contextual semi-commutativity test in place of the commutativity test, and remain sound.However, the represented lexicographical reductions are not necessarily minimal [Farzan and Vandikas 2020], i.e., Proposition 4.10 does not hold.This is because the covering relation is not symmetric.There may exist traces  1 ,  2 such  1 is not covered by any lexicographically smaller trace,  2 covers  1 , and  2 is only covered by itself.Then both  1 and  2 appear in the lexicographical reduction, yet including  2 would suffice.

EVALUATION
As a proof of concept, we have developed a tool that integrates reduction in parameterized verification.In particular, we implemented the different CHC encodings for the existence of an Ashcroft invariant for the sleep-instrumented program P sleep , as discussed in Section 6.Our tool reads Boogie [Leino 2008] programs, generates the CHC clauses, and executes different CHC solvers to check if the CHC system is satisfiable.In particular, we used the state-of-the-art CHC solvers Eldarica (github.com/uuverifiers/eldarica),Golem (verify.inf.usi.ch/Golem) and Z3/Spacer (github.com/Z3Prover/z3).We evaluated the tool on a number of parameterized programs from the literature as well as custom benchmarks.The purpose of this evaluation is to answer the following questions: Q1: Can the modular approach of (1) encoding reductions through sleep instrumentation and (2) subsequently verifying the resulting parameterized program work in practice?Q2: Can we observe a practical benefit of the symmetry-aware explicit-sleep CHC encoding in comparison to the default symbolic-sleep encoding?
We executed the benchmarks on a Debian 10.10 machine with a AMD Ryzen Threadripper 3970X 32-Core Processor using the BenchExec benchmarking tool [Beyer et al. 2019].Each verification run was given a timeout of 30 min and a memory limit of 15 GB.Our suite of 19 benchmarks is comprised of a number of variations (inc-b?dec-*) of the program P ± (see Fig. 1), where a variable is incremented and decremented by each thread and compared with 0; we also included variants where a nondeterministic value is added to and subtracted from the variable (add-sub-*).Several examples (namely lock, ticket, and mutex-*) are taken from [Hoenicke et al. 2017]; the mutex-* examples correspond to the program P ± (see Fig. 3).As more complex programs, we included the bluetooth example in the form presented in [Farzan et al. 2014], the example presented in Section 2 (notify-listeners), the thread-pooling example from [Farzan et al. 2015], a program in which each thread computes the same sum of array elements (equalsum-ghost), and a custom example involving communication via queues (line-queue).
Table 1 shows the benchmark results.The reported CPU time encompasses both the time required to generate the CHC clauses (typically quite small) and the time required by the fastest successful CHC solver, if any solver is successful.
Regarding Q1, we observe that the approach (in the explicit-sleep configuration) is able to verify 14 out of 19 benchmarks.In particular, we successfully show correctness of non-trivial benchmarks such as bluetooth and notify-listeners.Without reductions, these programs do not have a safe Ashcroft invariant; a proof would require complex ghost state and/or quantified invariants.
At the same time, even for the most successful configuration (explicit-sleep), three state-ofthe-art CHC solvers are unable to solve 5 of our benchmarks.Beyond the possibility of general improvements in CHC solving, a possible way to improve the situation may be to guide the solvers to specifically take advantage of the reduction.This could be beneficial in two scenarios: First, for programs which do not have an Ashcroft invariant without reduction, one could prevent the solver from considering solutions that ignore the instrumentation.Second, one could try to prevent the solver from considering solutions that use the sleep variables in "exotic" ways unsuitable to express reduction.The second case could also reduce the overhead from the instrumentation for programs where an Ashcroft invariant exists without reduction.As an example, consider the program inc-dec-eq0-locked-assert, which has an Ashcroft invariant even without reduction.
Here, the CHC solvers spend significantly longer to find a solution when the instrumentation is present.For the program ticket, the solvers even time out, even though the program can be proven without reduction.The evaluation data clearly shows the performance advantage of the explicit-sleep encoding.With this encoding, our tool is able to verify 13 programs, compared to only 11 programs with the symbolic-sleep encoding.Notice in particular that the complex program notify-listeners is only proved correct by the explicit-sleep encoding.Furthermore, for programs solved by both the symbolic-sleep and explicit-sleep encoding, the explicit-sleep encoding can lead to significant speedup, up to a factor of 10x in the most extreme case (add-sub-positive-nondet). Despite the increased number of clauses, we do not observe any overhead for the explicit-sleep encoding.

RELATED WORK
There is a huge body of work on verification of parameterized programs.It is noteworthy that this paper does not put forward a new (algorithmic) framework for verifying parameterized programs, but rather suggests a generic way of incorporating commutativity into any existing framework.As such, we will only very briefly survey a few techniques only to justify why we chose a particular one as the framework to use for our proof of concept application.

Parameterized Program Verification
In invisible invariants [Arons et al. 2001;Pnueli et al. 2001], a candidate for an Ashcroft invariant is constructed by first computing the set of reachable states of the instance of the program with  threads, and then generalizing the concrete thread identifiers in the reachable states.The candidate (a universally quantified formula with  variables over thread identifiers) is then verified using a syntactic cutoff theorem.This approach, as well as other heuristic searchers [Emmi et al. 2010] for Ashcroft invariants, do not have a guarantee of completeness.Therefore they suffer from the problem that, if they fail, one does not know whether there is no proof with  quantifiers or whether the heuristic did not find it.This is why we opted to build our reduction framework on top of thread-modular proofs [Hoenicke et al. 2017], which come with the guarantee of finding Ashcroft invariants when one exists (modulo incompleteness of the Horn clause solver) or proving that no Ashcroft invariant exists.This allows for a more principled comparison of the power of the framework in proving the original program or a lexicographical reduction of it.
In [Farzan et al. 2014;Kaiser et al. 2014], counting proofs are constructed automatically.This can be viewed as a partial solution to the problem of discovering the required ghost state automatically; partial, in the sense that only ghost counters can be discovered.Such techniques are complementary to the proposal in this paper; the simpler the proof, the more likely that a combination of this technique can succeed in discovering it automatically.Grebenshchikov et al. [2012], Hojjat et al. [2014], Gurfinkel et al. [2016], and Monniaux and Gonnord [2016] study Horn constraints for -thread-modular proofs, closely related to the framework we chose to demonstrate our approach [Hoenicke et al. 2017].

Commutativity for Proof Simplification
There has been extensive work in incorporating commutativity into verification of concurrent programs.One big cluster of such work appears under the title of partial order reduction (POR) [Abdulla et al. 2014;Flanagan and Godefroid 2005;Godefroid 1996;Kahlon et al. 2009], and much of this work is concerned with finite-state systems or executions of bounded length.
In the context of proofs of infinite-state programs, the focus of commutativity reasoning in algorithmic verification so far has been on programs with a bounded number of threads [Chu and Jaffar 2014;Farzan et al. 2022;Farzan andVandikas 2019, 2020;Wachter et al. 2013].Popeea et al. [2014] integrate the theory of Lipton's movers [Lipton 1975] with compositional proofs in the style of Owicki and Gries, to verify programs with a bounded number of threads.The approach is described as a complex Horn clause system that combines compositional reasoning, the determination of mover annotations (i.e., commutativity checks) and the search for reducible blocks.
In interactive proofs [Elmas et al. 2009;Kragl and Qadeer 2018], commutativity reasoning based on the principle of Lipton's movers has been incorporated in a way applicable to programs with a bounded number of threads as well as programs with an unbounded number of threads, despite not explicitly using the modeling formalism of parameterized programs.Essentially, the input program is alternatingly reduced and further abstracted.Each abstraction step may allow more statements to commute, which enables further reduction.Since being a mover can be viewed as a local property of an atomic problem step, the size of the environment (finite vs unbounded number of threads) makes no difference in how larger atomic blocks are formed out of smaller ones by reasoning about movers, and thus a successfully verified program is correct for any number of threads.Flanagan and Freund [2020] also apply mover reasoning to simplify verification of programs with an unbounded number of threads.Data structures are annotated with synchronization specifications that indicate mover types (i.e., semi-commutativity) of read and write accesses to the data structure.Users specify a reduction of a concurrent program by manually instrumenting the program with yield points indicating where interleaving with other threads may occur in the reduction.The verifier then checks if this instrumentation is indeed sound, i.e., encodes a reduction of the program.
As discussed in [Farzan and Vandikas 2020], however, the kinds of program reductions that result from Lipton's movers are not comparable with those that are produced as lexicographical reductions of (binary) commutativity relations.Besides, the locality advantages of movers disappear in the context where the goal is anything but large block reasoning: for example, a lockstep reduction.Such reductions are by definition not local to a single thread/process.
In inductive sequentialization [Kragl et al. 2020], a vaguely similar philosophy about proof simplification is used: Rather than reason about arbitrarily complicated executions of distributed protocol, one can reason about their equivalence to simpler ones and as such only give a proof of correctness for the simpler ones.It is important to note that the notion of equivalence employed is not the simple syntactic one (based on commutativity) used in this paper.As such, even the reasoning about such equivalences may involve the use of invariants, and other proof-type constructs.The final product is a proof of refinement between the complex and the simple protocols, and the ingredients of the proof are provided by a user.

CONCLUSION AND FUTURE WORK
This paper proposes a methodology for incorporating commutativity reasoning into algorithmic verification of parameterized programs.We put forward the thesis that this is a worthwhile cause, because commutativity-based reductions can simplify the proofs of these programs in a precise sense: a possible substantial complexity reduction in the nature of the ghost state required for the proof.The solution was devised with an eye on practical concerns, in the sense that rather than devising a whole new algorithmic framework, one should be able to use existing frameworks for parameterized program verification with little effort.
Our investigation of this problem has led us to several new research questions that would be interesting to explore in the future.Our results from Section 6.3 highlight the fact that Ashcroft invariants, as a standard family of global invariants for parameterized programs, lack the expressive power to encode optimal reductions for the entire family of programs represented by the parameterized program for an arbitrary commutativity relation.It would be interesting to investigate whether this lack of expressivity is shared by other ways of giving a finitely-representable proof to a parameterized program, for instance proof spaces [Farzan et al. 2015].
Classical trace theory, which studies commutativity in a principled way, relies on a finite alphabet of program actions.For parameterized programs, one needs an infinite (indexed) alphabet of actions to model the program behaviour faithfully.Most of the work on program reductions relies on a classic result from trace theory that says "the set of lexicographical representatives of a regular and (commutativity) closed language is regular".The notion of regularity for indexed alphabets is less standard, and can be defined based on a number of data automata like register, nominal, or predicate automata.It will be interesting to investigate if an analogous result for these automata exists and whether it can suggest fundamentally different ways of incorporating commutativity in verification of parameterized programs.example) can be accommodated via the instrumentation of the thread template with sleep variables, that the instrumentation yields again a parametrized program, and that this parametrized program can be handled by the proof method based on thread modularity at many levels [Hoenicke et al. 2017].

Fig. 2 .
Fig. 2. The program P notify .The variables current (an integer) and queue (an integer array) are global, all other variables are local.An instance P notify () consists of a single notifier thread and  listener threads.

Fig. 4 .
Fig. 4. Illustration of the sleep set mechanism.Sleep sets (blue) are updated after every edge, and lead to removal of transitions.

Fig. 6 .
Fig. 6.Safe Ashcroft invariant for P ± sleep Example 4.11 (continued from Example 4.7).Consider again the program P ± , with the template shown in Fig.1.There does not exist a safe Ashcroft invariant of any width for the program P ±[Hoenicke et al. 2017].Intuitively, the invariant would have to express the information that the value of the global variable  is always greater than or equal to the number of threads that have executed the increment but not yet the decrement.Ashcroft invariants cannot express this information (for a formal argument, see[Hoenicke et al. 2017]).Figure6shows a safe Ashcroft invariant (of width 2) for the sleep-instrumented program P ± Definition 5.2 (Parameterized Preference Order).A parameterized preference order is a family of functions (≼  ) ∈N , where ≼  : Loc  → TO {1,...,} maps -tuples of locations ì ℓ to total orders ≼  ì ℓ over thread indices {1, . . ., }.For ì ℓ ∈ Loc  , we write ≼ ì ℓ instead of ≼  ì ℓ Definition 5.6 (Preference Test).The preference test for the pairwise preference order induced by the total and transitive relation  ⊆ Loc 2 is defined as the following formula: pref (, ) :≡ ⟨  ,   ⟩ ∈  ∧ (⟨  ,   ⟩ ∈  → id  ≤ id  ) , Vol. 1, No. 1, Article .Publication date: November 2023.

Fig. 7 .
Fig. 7. Symbolic-sleep CHC encoding TM(P sleep , ) for the existence of a safe Ashcroft invariant of width  for the sleep-instrumented program P sleep .Differences from the CHC encoding TM(P, ) for the unreduced program are highlighted in red.

Fig. 8 .
Fig. 8. Explicit-sleep CHC encoding TM sleep (P, ) for the existence of a safe Ashcroft invariant of width  for the sleep-instrumented program P sleep .The encoding does not include the id variables.Further differences to the symbolic-sleep encoding are highlighted in red.

,
Vol. 1, No. 1, Article .Publication date: November 2023.Observation 6.10.There exist programs for which no Ashcroft invariant of any width precisely captures the reachable configurations of the reduction.An Ashcroft invariant of width  would have to capture that, when the program P () reaches the control locations ⟨ℓ 1 , . .., ℓ 1 ⟩, all threads except one are in the sleep set.The last thread to take a step must have had the maximal thread ID, otherwise it would have been added to the sleep set earlier.But then, in the last step, the next enabled statement of every other thread (st 2 ) commutes with the executed statement st 1 , thus all threads with a lower thread ID are added to the sleep set.Thus, if an Ashcroft invariant ∀ 1 , . . .,   . precisely captures the reduction, we must have

Table 1 .
Benchmark results.sat indicates that an Ashcroft invariant (of width ) was found, unsat indicates that a CHC solver proved that no such Ashcroft invariant exists, "TO" indicates a timeout (> 30 min).