Compositional Semantics for Shared-Variable Concurrency

We revisit the fundamental problem of defining a compositional semantics for a concurrent programming language under sequentially consistent memory with the aim of equating the denotations of pieces of code if and only if these pieces induce the same behavior under all program contexts. While the denotational semantics presented by Brookes [Information and Computation 127, 2 (1996)] has been considered a definitive solution, we observe that Brookes's full abstraction result crucially relies on the availability of an impractical whole-memory atomic read-modify-write instruction. In contrast, we consider a language with standard primitives, which apply to a single variable. For that language, we propose an alternative denotational semantics based on traces that track program write actions together with the writes expected from the environment, and equipped with several closure operators to achieve necessary abstraction. We establish the adequacy of the semantics, and demonstrate full abstraction for the case that the analyzed code segment is loop-free. Furthermore, we show that by including a whole-memory atomic read in the language, one obtains full abstraction for programs with loops. To gain confidence, our results are fully mechanized in Coq.


INTRODUCTION
Denotational semantics aims to define the meaning of a piece of code independently of the context under which it is executed.Generally speaking, such semantics assigns a denotation ⟦ ⟧ to every command of a given programming language in a way that satisfies the following desiderata: Compositionality: The denotation of a command should be determined from the denotations of the command's immediate constituents.For instance, assuming a sequential composition operator, ";", we require that ⟦ 1 ; 2 ⟧ is a function of ⟦ 1 ⟧ and ⟦ 2 ⟧.Adequacy: Assuming a given operational semantics, the denotations should only consider equivalent commands that operationally behave the same when plugged in an arbitrary program context.When denotations are partially ordered, we also want the semantics to admit a directional version of adequacy that targets contextual refinement under the operational semantics instead of contextual equivalence.For instance, assuming that denotations are sets, as is the case in our development, (directional) adequacy ensures that ⟦ 1 ⟧ ⊆ ⟦ 2 ⟧ implies that for every program context [−], every behavior of [ 1 ] under the operational semantics is also a behavior of [ 2 ].This makes denotations beneficial in supporting modular reasoning about the operational semantics, which by itself is only able to capture complete closed programs.In particular, an adequate denotational semantics can be used for formally justifying local program transformations, as performed by optimizing compilers.Indeed, adopting contextual refinement as the correctness criteria of program transformations, adequacy allows one to derive the correctness of a local transformation src ⇝ tgt from ⟦ tgt ⟧ ⊆ ⟦ src ⟧.Full abstraction: Ideally, it is desirable for a denotational semantics to equate all pairs of commands that are contextually equivalent under the given operational semantics.A directional version requires that ⟦ 1 ⟧ ̸ ⊆ ⟦ 2 ⟧ implies that for some program context [−], some behavior of [ 1 ] under the operational semantics is not a behavior of [ 2 ].Conceptually, full abstraction, together with compositionality and adequacy, means that ⟦ ⟧ is indeed a precise compositional counterpart of the given operational semantics.A fully abstract denotational semantics provides a complete reasoning principle for correctness of local program transformations.Full abstraction is sometimes considered as the holy grail of denotational semantics and it is typically very difficult to obtain [Cardone 2021].
In this paper we consider concurrent programs that employ shared variables for inter-thread synchronization, governed by a non-deterministic scheduler that cannot be controlled by the program.For this domain, developing compositional, adequate, and fully abstract semantics is highly challenging.Indeed, the standard approach for (non-deterministic) sequential programs, which models programs as transformations from an initial state to a set of final states, fails to provide compositional semantics for parallelism, since the state transformation induced by a parallel composition 1 ∥ 2 cannot be determined from those of 1 and 2 .One needs more detailed structures to capture the behaviors of 1 and 2 , but being too concrete risks full abstraction.
This problem was addressed by Brookes [1996] (see there also a discussion about earlier attempts).In Brookes's approach, the semantics ⟦ ⟧ of a command is given by a set of sequences of transitions from memory to memory, assuming arbitrary environment interference between transitions.For example, a sequence of the form ⟨ 1 , ′ 1 ⟩, ⟨ 2 , ′ 2 ⟩ consisting of two transitions represents the case that did some steps to transform 1 to ′ 1 ; then the environment did some steps transforming ′ 1 to 2 ; and then continued its execution from 2 and terminated in ′ 2 .Brookes showed how these sequences can be derived from a given command by first deriving a concrete set of sequences, and then closing it under two closure operators, called mumble and stutter.In particular, ⟦ 1 ∥ 2 ⟧ is obtained by considering all interleavings of sequences of ⟦ 1 ⟧ with sequences of ⟦ 2 ⟧, and closing the resulting set under the two closure operators.Brookes demonstrated compositionality, adequacy, and full abstraction for this semantics.
However, the programming language assumed in [Brookes 1996] employs a command of the form (await then ) that implements a "conditional critical region": it blocks the execution as long as is unmet, and then in a single atomic step it verifies that holds and fully runs .Since and may involve multiple variables, this construct can implement arbitrary atomic (finite) memory-tomemory transformations (e.g., [x 1 ↦ → 0, ... ,x 100 ↦ → 0] to [x 1 ↦ → 1, ... ,x 100 ↦ → 100]), which requires all other components to be suspended and is unrealistic in practical concurrency.Removing (or restricting) await does not harm compositionality or adequacy, but, Brookes's full-abstraction proof relies on await instructions for building a concurrent context [−] that precisely mimics the environment transitions in a given sequence.In fact, the starting point for the current work is our observation that there are commands 1 and 2 that behave the same when plugged in program Proc.ACM Program.Lang., Vol. 8, No. PLDI, Article 169.Publication date: June 2024.
Compositional Semantics for Shared-Variable Concurrency 169:3 contexts without await, but can be differentiated by contexts that use await (see Examples 3.7, 4.6 and 5.1 below).Therefore, Brookes's semantics is too concrete for a language without await.
The main contribution of this work is a novel denotational semantics that addresses this problem.We propose two models: • A "concrete semantics" in which denotations track the write operations performed by the command interleaved with environment writes.For example, W(x, 1), W(x, 2), W(y, 1) represents the case that writes 1 to x, expects the environment to write 2 to x, and then writes 1 to y.
The concrete semantics is compositional and adequate, but it is not fully abstract.Nevertheless, since, in contrast to [Brookes 1996], this semantics reflects the property of the operational semantics that each transition updates at most one variable and since, again in contrast to [Brookes 1996], we do not record read operations in our denotations, the concrete semantics suffices for validating a wide variety of contextual refinements (see Fig. 4 below).• An "abstract semantics" obtained by closing concrete denotations under four rewrite rules, each of which mimics a certain operational simulation argument allowing one to hide and introduce component writes from the concrete trace.We show that the abstract semantics is also compositional and adequate, whereas full abstraction holds up to some level: -Full abstraction fully holds if we have a "snapshot" instruction that blocks the execution until some condition is met (a restriction of await to instances of the form await then skip).-Without "snapshot", we establish a restricted version of full abstraction: if 2 is loop-free and . Thus, the abstract semantics is always sound for validating local program transformations src ⇝ tgt , and it provides a complete reasoning principle when src is loop-free.When src has loops, we provide a (rather complicated) counterexample for full abstraction of our abstract semantics (see Example 4.28 below).
Instead of await instructions the language we assume employs standard read-modify-write (RMW) constructs that perform an atomic update of a single variable at a time.A natural question is whether, like await, RMWs allow concurrent contexts to distinguish between commands that are indistinguishable for contexts consisting solely of reads and writes.We answer this question affirmatively by demonstrating such cases (see Example 5.1 below).Moreover, we show that by strengthening one of the four rewrite rules used to define the abstract semantics, we obtain a denotational semantics that enables compositional reasoning about program transformations under the assumption that the context cannot perform RMW operations.
Finally, we note certain limitations of the current work (see also §6).All of them raise interesting questions for future work to which our approach may constitute a starting point.
• We assume that the underlying memory ensures sequential consistency (SC, for short) [Lamport 1979]-the strongest memory model with simple operational semantics based on interleaving concurrent manipulations of a standard variables-to-values mapping.• Our notion of behavior under the operational semantics is based on partial correctness, that is: we only consider terminating executions as inducing program behaviors.Accordingly, contextual refinement ensures that the target program preserves safety properties of the source, but it is termination-insensitive, where a diverging program refines every program.Since a compositional characterization of partial correctness is already challenging, we left the question of termination to future work.This is in line with multiple previous works that consider only terminating executions [Liang et al. 2012[Liang et al. , 2014;;Turon and Wand 2011].Nevertheless, Brookes [1996, §10] includes an extension to termination-sensitive refinement, using infinite sequences and assuming certain fairness conditions on the operational semantics.• Our programming language is a first-order language.Fully abstract semantics for higher-order languages have proved elusive [Cardone 2021], but we hope that our model can be useful for a higher-order language with a full abstraction guarantee that applies to its first-order fragment.
Outline.The rest of this paper is structured as follows.In §2 we present the syntax and operational semantics of the language studied in this paper.In §3 we present the concrete denotational semantics, establish its compositionality and adequacy, and demonstrate various transformations it validates (Fig. 4).In §4 we present the abstract denotational semantics, establish its compositionality ( §4.1), adequacy ( §4.2), and (restricted as discussed above) full abstraction ( §4.3 and §4.4), and demonstrate transformations validated by the abstract semantics but not by the concrete one (Fig. 5).In §5 we present the modification of the abstract semantics under the assumption that the context does not perform RMWs.Finally, in §6 we discuss related and future work.
Artifact.Our results are fully mechanized in Coq, and the proof scripts are available in https: //doi.org/10.5281/zenodo.10925596.

SYNTAX, OPERATIONAL SEMANTICS, AND CONTEXTUAL REFINEMENT
In this section we present the syntax of the studied programming language, its operational semantics, and the notion of contextual refinement w.r.t. that semantics.
Syntax.We assume a set Var ≜ {x, y, z, ...} of shared variables, ranged over by , ; a set LVar ≜ {a, b, c, ...} of local variables, ranged over by , ; and a set Val ≜ {0, 1, 2, ...} of values, ranged over by .We define a state to be a function in State ≜ Var → Val.In some examples below, we use s 0 ≜ .0 as the initial state.
Figure 1 presents the grammar for expressions, let expressions, commands, and contexts.Expressions are defined standardly and are composed of values ( ) and local variables ( ).Let expressions are used on the right-hand side of let bindings, and include standard expressions, shared variables, and RMW primitives.The latter are used to atomically execute a read from memory followed by a write to memory.We consider two kinds of RMWs, whose intuitive semantics is as follows: 1 • Exchange (XCHG) loads from a shared variable and modifies it to a given argument.
• Fetch-And-Add (FAA) increments a shared variable by a given argument.
These instructions return the value they read, before the modification was performed.
Commands are mostly customary for a (first-order) imperative parallel language, with several choices that may deserve attention: • Parallel composition, "∥", is a first class construct that can be employed arbitrarily deep inside other commands, rather than top-level parallel composition which is sometimes assumed when studying semantics of parallel languages.• We include non-deterministic choices-between commands ( 1 ⊕ 2 ), stored values ( := * ), and as a loop termination condition (while * do ).• Less standardly, we use functional-style let bindings for assigning values to local variables.This allows us to restrict the scope of these variables inside a command, in a way that a parallel context cannot change or directly observe.Loops use global variables in the termination condition.• A non-standard snapshot( ) command is used to block the execution until the memory is in state (see operational semantics below).
In examples, we also use (if then ) for (if then else skip), and employ syntactic sugar incorporating loads inside expressions, such as := for let a = in := a and assume( = ) for let a = in assume(a = ).We denote by fv( ) (respectively, fv( )) the set of local variables that occur free in an expression (command ), and call an expression (command ) closed if fv( ) = ∅ (fv( ) = ∅).We write { / } for the command obtained from by substituting the free occurrences of by .
Finally, Fig. 1 specifies "contexts" which are defined standardly as commands with one "hole".We write [ ] for the command obtained by "plugging in" the command in , that is: substituting the unique − in by .
Operational Semantics.We assume that closed expressions are evaluated to values using a function ⟦•⟧ in a standard way.The operational semantics of commands is given in Fig. 2 as a "small-step" transition relation between configurations, which are tuples of the form ⟨ , ⟩, where is a command and ∈ State.For a uniform definition of let bindings, it uses a "helper" relation which defines how let expressions are evaluated to values and affect the state.
The operational semantics is mostly standard.We use syntactic substitution to handle let bindings, so that steps execute only on closed commands.Parallelism is captured by arbitrary interleaving of component steps, with non-preemptive scheduling, in the sense that there are no explicit language constructs for controlling the scheduler.The shared memory follows the SC model, where each read reads the latest written value recorded in the state.To assist later definitions, the transitions are labeled with a write label of the form W( , ) (with ∈ Var and ∈ Val) or with an label when no write is performed.We often omit the label from the transition, writing ⟨ , ⟩ − → ⟨ ′ , ′ ⟩ to mean that ⟨ , ⟩ − → ⟨ ′ , ′ ⟩ for some .Note that no steps are associated with skip or with assume( ) when ⟦ ⟧ = 0. Intuitively, a state of the form ⟨skip, ⟩ is a valid final state ("a value").We write Contextual Refinement.Contextual refinement under the operational semantics is identified with soundness of local program transformations, which is defined as follows: Definition 2.1.A transformation from a command src to a command tgt is sound, denoted by for every context such that [ src ] and [ tgt ] are closed.We write 1 ↭ 2 when both 1 ⇝ 2 and 2 ⇝ 1 hold.
Example 2.2.For 1 = x := x + 1 and 2 = let a = FAA(x, 1) in skip, we have 1 ⇝ 2 but 2 ̸ ⇝ 1 .For the former, we can execute a load followed by a store in one atomic step to simulate the effect of FAA.(The denotational semantics below provides a formal account.)For the latter, with = − ∥ x := 1 ; The following transitivity and congruence properties are easy to establish: Lemma 2.3.If 1 ⇝ 2 and 2 ⇝ 3 , then 1 ⇝ 3 .
Putting snapshot aside, in the above operational semantics, every instruction involves at most one shared variable, which allows us to easily prove the following property: and is snapshot-free, then there exists some ∈ Var such that for Compositional Semantics for Shared-Variable Concurrency 169:7 This is in contrast with [Brookes 1996], which has "await" instructions of the form await then , where is a boolean condition, which may read from several shared variables, and is a finite sequence of assignments that read and write to shared variables.
To formulate await in our terminology, we use extended expressions, ranged over by E, that consist of values as well as shared variables (e.g., E = x + y + 9).A standard function ⟦E⟧ evaluates E at state .Then, the operational semantics of await is formalized as follows: In one atomic step, the system performs the loads from memory necessary to evaluate E and (conditionally) executes multiple assignments involving any number of additional loads in stores.
While being instrumental in the full abstraction proof, await is not standardly available in real-world shared-memory concurrent programming.Indeed, to implement await, one has to block all other concurrent processes from accessing any of the variables that are read/written in the await instruction.(Note that it does not suffice to only block concurrent await's, we also need to block primitive loads and stores.)Instead, programming languages and multicore architectures provide atomic instructions that atomically manipulate a single address, including loads, stores, and RMWs.Locks, transactional libraries, concurrent objects, and other synchronization mechanisms are implemented on top of these basic instructions.Such implementations necessarily involve races-cases in which two different threads are concurrently accessing the same variable, and at least one of them is writing.Our focus is on concurrent implementations at this level of abstraction.
As we show in §4.3 and §4.4,we are only able to develop fully abstract denotational semantics when the source program is loop-free.With loops, our proposed denotational semantics is adequate but not fully abstract.To get full abstraction with loops, we use snapshot, which, like await, we consider to be unrealistic.The snapshot command uses only the "condition part" of the await, and can be thought of the restriction of await to the form await then skip.(Since every program uses only finitely many variables, the state used in snapshot can be always translated into an extended expression.)Thus, our results provide a full abstraction statement similar to [Brookes 1996] but without the full power of await.The (pretended) implementation of snapshot has to block all other concurrent processes from writing to shared variables, but unlike await, reads can proceed concurrently.
Finally, we note that for a single variable, we also have assume( = ) behaving like Brookes's await = then skip, generating "no behavior" if the condition (on a single variable) is not met.We use assume commands in the full abstraction proof, but since we only consider terminating behaviors in this work, it is also possible to use busy-loops that wait until = .

CONCRETE DENOTATIONAL SEMANTICS
In this section we present the "concrete" denotational semantics and establish its compositionality and adequacy.The main ingredient for this semantics is our notion of a trace, which consists of an initial memory state, an initial store, which assigns values to local variables, and a chronicle, which is a sequence of actions performed by the command along with those expected by the concurrent context.Next, we formally define these objects and the required operations on them.
Notation 3.1 (Sequences).For a finite alphabet Σ, we denote by Σ * the set of all (finite) sequences over Σ.We use to denote the empty sequence.We write 1 • 2 for the concatenation of sequences, which is lifted to concatenation of sets of sequences in the obvious way.We identify symbols with sequences of length 1 or their singletons when needed (e.g., in expressions like • ).
Stores.A store is a function ∈ Store ≜ LVar → Val.Stores are extended to expressions in the standard way.We also lift stores to let expressions by applying them inside (e.g., (FAA( , )) = FAA( , ( ))).In some examples below, we use 0 ≜ .0 as the initial store.
Actions.An action is either a component write of the form W( , ) with ∈ Var and ∈ Val, or an environment write of the form W( , ) with ∈ Var and ∈ Val.We write Act, CmpW, and EnvW for the set of all actions, component writes, and environment writes (respectively).
Chronicles.A chronicle is a finite sequence of actions.We denote by Chro the set of all chronicles, by CmpChro the set of all chronicles consisting solely of component writes, and by EnvChro the set of all chronicles consisting solely of environment writes.A chronicle induces a function from states to states, recursively defined by: ( ) ≜ and (W( , We refer to the three components as the initial state ( ), the initial store ( ), and the chronicle ( ) of , and to the state ( ) as the final state of the trace .
Sequential Composition of Traces.The sequential composition of = ⟨ , , ⟩ and ′ = ⟨ ( ), , ′ ⟩, denoted by ; ′ , is the trace ⟨ , , • ′ ⟩.When the final state of does not coincide with the initial state of ′ or the two traces do not have the same initial store, then ; ′ is undefined.
Parallel Composition of Traces.Parallel composition is defined for actions, chronicles, and traces: (1) The dual of an action , denoted by ¯ , is defined by ).Two actions and ′ are parallelly composable if either = ¯ ′ or = ′ ∈ EnvW.In that case, their parallel composition, denoted by ∥ ′ , is given by: If some ∥ ′ is undefined or the chronicles are not of the same length, then ∥ ′ is undefined.
From Commands to Traces. Figure 3 presents an inductive definition of the concrete semantics, which is a function ⌊•⌋ that maps commands to sets of traces.The skip command does not perform any component writes and tolerates any environment interference, thus it is associated with traces with arbitrary environment chronicles (rule skip).A store instruction generates a component write, and allows arbitrary environment interference before and after (rule store).The value to be stored is determined according to the initial store.(Unlike the operational semantics, this semantics assigns meaning to open programs as well.)Let bindings (rule let) start with environment interference and then possibly generate a component write W( , ) following their operational semantics (reusing the first part of Fig. 2).Note that the memory visible to the let expression is the one obtained by applying on the initial state.In turn, the continuation is given by starting from a modified state and store.The resulting chronicle is the concatenation of , ∈ {W( , ), }, and a chronicle of the continuation.Here we use the transition labels from the operational semantics as component actions or the empty chronicles.Sequential composition of commands is handled by sequential composition of traces (rule seq).The denotation of parallel composition uses a (partial) operation for parallel composition of traces (rule par).Intuitively speaking, a component action on one side has to match the environment action expected from the other side, and together they form a component action for their external environment.In addition, if both sides expect the same environment action, then that action is also expected from the external environment of the parallel composition.The concrete semantics of other language constructs follow similar ideas aiming to match their operational semantics.As expected, for loops, the definition is recursive.
The concrete denotations admit some invariants, which are useful in our proofs.In particular, they are closed over environment actions before and after that command's effects: This invariant is explicitly enforced in some rules (e.g., skip, store), whereas other rules close over the prefix (e.g., let) or not close at all (e.g., seq) since they inherit the closure from their parts.
Compositionality.From the definition of the semantics, it is easy to see that the concrete semantics is compositional.More formally, the following property is proved by standard induction on contexts (with an inner induction on the derivation of ∈ ⌊while _ do ⌋ for loops): As a corollary, we obtain the compositionality of ⌊•⌋: for every command whose immediate sub-commands are 1 , ... , , we have that ⌊ ⌋ is a function of ⌊ 1 ⌋, ... ,⌊ ⌋.To see this, consider for instance the case of = 1 ∥ 2 , and suppose that Then, again by Lemma 3.4 applied to the context = ′ 1 ∥ − and the commands 2 , ′ 2 , we obtain The next lemma provides the key for the adequacy of the concrete semantics.
Proof.For the proof we inductively define an auxiliary relation =⇒ between configurations labeled with a chronicle , which represents an operational execution interrupted with the environment writes along (akin to Brookes [1996]'s "state trace" behaviors): Then, the claim of the lemma is a direct corollary of the following two claims.First, when is a component chronicle, =⇒ trivially coincides with the operational semantics: Claim 3.5.1: Second, ⌊ ⌋ lies in tight correspondence with =⇒: Claim 3.5.2:Let be a command, and let 1 , ... , be an enumeration of fv( ).Then, ⟨ , , The proof of each direction in this claim proceeds by induction on , where the interesting cases follow from the fact that =⇒ is compatible with sequential and parallel compositions.More concretely, for the left-to-right direction, we prove that: ( For the converse, we prove: Adequacy of the concrete semantics is now a corollary: Proof.Suppose that ⌊ tgt ⌋ ⊆ ⌊ src ⌋.Let be a context such that [ src ] and [ tgt ] are closed, and suppose that ⟨ [ tgt ], ⟩ ↓ ′ .Since ⟨ [ tgt ], ⟩ ↓ ′ , by Lemma 3.5, we have ⟨ , , ⟩ ∈ ⌊ [ tgt ]⌋ for some store and component chronicle ∈ CmpChro such that ( ) Figure 4 presents examples of program transformations that are validated by the concrete semantics.Among RMWs, we only list transformations involving FAA, but similar transformations can be shown for XCHG and CAS (and some are included in our Coq development).
Many of the transformations in Fig. 4 are structural transformations revealing the algebraic properties of the language operators.In particular, generalized sequencing reduces parallel composition to sequential composition.Indeed, by introducing and eliminating skip instructions, using generalized sequencing we obtain that 1 ∥ 2 ⇝ 1 ; 2 for every 1 and 2 .This transformation is typically considered counterproductive for performance (although it saves the time it takes to spawn a thread), but it shows the expected monotonicity property of the operational semantics, which does not hold under some weak memory models [Lahav and Vafeiadis 2016].
Example 3.7.The concrete semantics captures some refinements that are invalid in [Brookes 1996].Indeed, every command in the language we study changes at most one shared variable.This is reflected in traces since every action in them mentions one variable.For instance, using the concrete semantics we can show that 1 ; 2 ; 3 ⇝ 1 ; 3 for: 3 In Brookes's setting, this refinement fails to hold.For example, the condition in 2 will never be satisfied in the context − ∥ await true then (x := 2 ; y := 2).

ABSTRACT SEMANTICS
The semantics above fully tracks the sequence of writes performed by a command.There are, however, contextual refinements in which writes are eliminated or introduced.The "abstract semantics" presented in this section supports such refinements.The main idea is to close the concrete sets of traces under certain rewrite rules that hide or introduce actions that can be safely assumed to be unobservable by the concurrent environment.Then, it may be the case that some traces in ⌊ tgt ⌋ are not in ⌊ src ⌋, but they are in the closure of ⌊ src ⌋ under these rewrites.
The main technical challenge lies in identifying these rewrite rules and proving the required properties for this semantics.In §4.1, we establish the compositionality property, which, unlike the case of the concrete semantics, is not a direct corollary of the definition, and requires a new argument.Then, in §4.2, we show how adequacy of the abstract semantics follows from its compositionality and Lemma 3.5 about the concrete semantics.In §4.3, we show that the set of rules is "complete" by establishing full abstraction.In §4.4,we consider full abstraction in the absence of snapshot.
Notation 4.1 (Rewrite Rules and Closures).A rewrite rule is a binary relation on syntactic objects.We use the notation − → to mean that ⟨ , ⟩ ∈ .For a set of rewrite rules, we write − → if − → for some ∈ .A set is closed under if ∈ whenever − → for some ∈ .Assuming some universal set A, the closure of under , denoted by , is defined as the smallest subset of A that contains and is closed under .
The following general propositions are useful in the sequel.We define the abstract semantics using four rewrite rules.The rules aim to match operational arguments for cases where it is possible to eliminate a redundant idempotent write, eliminate several writes that cancel each other, or introduce an invisible write.The examples following the definition provide the intuition behind each rewrite rule.
Definition 4.4.The abstract denotation of a command , denoted by ⟦ ⟧, is defined by ⟦ ⟧ ≜ ⌊ ⌋ R , where R consists of the following rewrite rules on traces: 3 To assist the reader, we highlight the commands eliminated by a transformation.
Example 4.5 (Rule coalesce).Rule coalesce permits to combine consecutive component writes into one "atomic block".The condition ensures that the effect of the formed block is the same as the effect of a single write.As an example, let src = let a = y in (y := 1 ; x := 1 ; y := a).Intuitively speaking, we can always execute it atomically, without letting the environment to interfere in between the first load and the final store.In that case, it behaves like tgt = x := 1.Such reasoning is impossible in the concrete semantics, but the rule coalesce of the abstract semantics is allowing us exactly that, thus justifying src ⇝ tgt .Indeed, to show that ⌊ tgt ⌋ ⊆ ⟦ src ⟧ observe that every trace in ⌊ tgt ⌋ has the form ⟨ , , 1 • W(x, 1) • 2 ⟩.We can start from a corresponding trace in ⟦ src ⟧ of the form ⟨ , , 1 • W(y, 1) • W(x, 1) • W(y, 1 ( )(y)) • 2 ⟩ and rewrite by coalesce with 1 = 1 , 1 = W(y, 1), 2 = W(y, 1 ( )(y)), and 2 = 2 to obtain .Example 4.6 (Rule coalesce).Rule coalesce allows one to "attach" component actions to an environment action, provided that the composed block has the same effect as the single environment action.To see this in action, let src = let a = y in y := 3 ; if x ≠ 2 then (if x = 2 then y := a).The two if conditions are satisfied only if the concurrent environment changes x from non-zero value to zero.In this case, we can encompass that environment store of x with the load from y and the store of 3 to y just before, and the store of the previous value of y just after, and in this case src behaves like tgt = if x ≠ 2 then (if x = 2 then skip else y := 3) else y := 3. The rule coalesce of the abstract semantics is needed for that, thus justifying src ⇝ tgt .Indeed, to show that ⌊ tgt ⌋ ⊆ ⟦ src ⟧ observe that the traces in ⌊ tgt ⌋ are either of the form ⟨ , , Traces of the latter form are directly in ⌊ src ⌋.For a trace of the first form, we start from a corresponding trace in ⌊ src ⌋ of the form ⟨ , , 1 • W(y, 3) • W(x, 2) • W(y, 1 ( )(y)) • 2 ⟩ and rewrite by coalesce with 1 = 1 , 1 = W(y, 3), 2 = W(y, 1 ( )(y)), and 2 = 2 to obtain .As in coalesce, the condition of coalesce ensures that the formed atomic block affects the memory exactly as the single environment store.We note that in Brookes's setting, the transformation src ⇝ tgt fails to hold.For the context = − ∥ await (x ≠ 2 ∧ y ≠ 3) then x := 2, starting from a state with x ↦ → 3, y ↦ → 2, only [ tgt ] terminates in a state with x ↦ → 2, y ↦ → 2.
Example 4.7 (Second side-condition of coalesce).Attaching component actions to an environment write W( , ) may fail if these actions modify and the environment write is due to an RMW on .This is the reason for the condition 1 ( 1 ( ))( ) = 1 ( )( ) in coalesce.For example, using x instead of y in the commands in Example 4.6, without this condition, we would obtain: However, starting from x ↦ → 0, in parallel to FAA(x, 2), only the target can terminate with x ↦ → 2.
Example 4.8 (Rule del-red).Executing src = let a = x in x := a atomically is invisible for the concurrent environment, behaving like skip.In the concrete semantics, we cannot prove src ⇝ skip since all chronicles of ⌊ src ⌋ have one component write, whereas those of ⌊skip⌋ have none.The rule del-red is needed here.Indeed, to show that ⌊skip⌋ ⊆ ⟦ src ⟧, we start with an arbitrary trace in ⌊skip⌋, which must have the form ⟨ , , ⟩.Then, a corresponding trace in ⌊ src ⌋ of the form ⟨ , , • W(x, ( )(x))⟩ can be rewritten to by del-red with 1 = and 2 = .
Example 4.9 (Rule add-red).In the operational semantics fetch-and-add by 0 is equivalent to a read.In particular, for src = let a = x in y := a and tgt = let a = FAA(x, 0) in y := a, we have src ⇝ tgt .This cannot be shown by the concrete semantics since chronicles of src have only one component write, while those of tgt have two such writes.To show that ⌊ tgt ⌋ ⊆ ⟦ src ⟧, we start with an arbitrary trace in ⌊ tgt ⌋, which must have the form ⟨ , , Example 4.10.Rule add-red is also necessary for a language without RMWs.For: src = let a = y in (y := 1 ; (if x ≠ 0 then x := 0) ; y := a) tgt = assume(x = 0) ; x := 0 we have src ⇝ tgt , but ⌊ tgt ⌋ ⊆ ⟦ src ⟧ cannot be established without add-red.
Remark 4.11.In the presence of coalesce and add-red, the rule del-red can be strengthened: Indeed, we can rewrite as follows using an arbitrary ∈ Var, and then apply del-red:

Compositionality
We establish the compositionality of ⟦ ⟧.First, to handle sequential composition, we observe that the rules of R can be applied inside sequential composition of traces: Proposition 4.12.The following hold for every r ∈ R: From this property, we obtain the following proposition, which solves the case of sequential composition in the compositionality proof.
We prove the first claim and the second proof is symmetric.Suppose that ⌊ 1 ⌋ ⊆ ⟦ ′ 1 ⟧.Let ∈ ⌊ 1 ; 2 ⌋.By definition, we have = 1 ; 2 for some 1 ∈ ⌊ 1 ⌋ and 2 ∈ ⌊ 2 ⌋.Our assumption entails that In particular, ′ 1 ; 2 is defined, and thus by definition we have ′ 1 ; 2 ∈ ⌊ ′ 1 ; 2 ⌋.It follows that ∈ ⟦ ′ 1 ; 2 ⟧. □ Handling parallel composition is more difficult.Indeed, a claim like Prop.4.12 does not hold for parallel composition instead of sequential composition: since the rewrite rules change the chronicle in the trace, it may be that 1 R − → ′ 1 and ′ 1 ∥ 2 is defined, but 1 ∥ 2 is undefined.We address this problem by showing that in such cases there must be another trace 1 ∥ 2 and belongs to any concrete denotation that 2 belongs to.For the formal argument, we introduce a Proc Proposition 4.14.For every command , ⌊ ⌋ is closed under D.
Proof.By induction on ∈ ⌊ ⌋ using the following claims for the inductive step: Claim 4.14.1:For every 2 for some ′ 1 and ′ 2 satisfying one of the following: With Propositions 4.14 and 4.15, we obtain the variant of Prop.4.13 to handle parallel composition: We prove the first claim and the second proof is symmetric.Suppose that We prove this claim by induction on the number of rewrite steps in In the base case we have ′ 1 = 1 and we can take ′ 2 = 2 and ′ 1 ∥ ′ 2 = .For the induction step, suppose that for ′ 1 there exists ′ 2 such that Using Propositions 4.13 and 4.16 for handling sequential and parallel composition, and similar lemmas for other constructs, we can easily establish the following lemma by induction on : As discussed above for the concrete semantics (see discussion after Lemma 3.4), the compositionality of ⟦•⟧ follows from Lemma 4.17.This also entails that there exists a (mathematical) function that maps the denotations of the immediate sub-commands of to the denotation of .To see this, consider again the case of = 1 ∥ 2 .Given ⟦ 1 ⟧ and ⟦ 2 ⟧, we can arbitrarily "pick" some commands Candidates for a direct compositional definition of ⟦ 1 ; 2 ⟧ and ⟦ 1 ∥ 2 ⟧ are to take the R-closure of the set obtained by taking all possible sequential/parallel compositions of traces from ⟦ 1 ⟧ and ⟦ 2 ⟧.This works for sequential composition, as we have To see that the converse does not hold, let: 1 = x := 1 ; assume(y = 0) ; assume(z ≠ 1) ; assume(z = 1) ; x := 0 2 = y := 1 ; assume(x = 0) ; assume(z ≠ 1) ; assume(z = 1) ; y := 0 Using coalesce on ⟨s 0 , 0 , W(x, 1) Indeed, no trace in ⌊ 1 ∥ 2 ⌋ has a single environment write to z, and all rules of R preserve the environment actions.
From the full abstraction proof, we observe that although multiple rewrites of a trace may be necessary, these rewrites do not overlap.We only apply them to disjoint parts of the chronicle.Formally, we let R loc be the set consisting of "local" variants of the rules: and 1 The relation ⇒ between traces is inductively defined as follows: Then, the full abstraction proof shows that src ̸ ⇝ tgt whenever In fact, by analyzing the rewrite rules we prove the following: Lemma 4.26.For every set of traces, we have R = { ′ | ∃ ∈ .⇒ ′ }.

Full Abstraction Without Snapshots
The full abstraction proof above relies on the availability of the snapshot command, which gives the parallel context the ability to simultaneously observe the values of all variables.Next, we show that snapshots can be avoided in that proof provided that src is loop-free.Roughly speaking, we show in this case it is possible to achieve the effect of a snapshot executing in parallel to src by repeatedly reading shared variables a number of times that can be determined from src .This means that when src is loop-free snapshots do no not increase the distinguishing power of the parallel context.In turn, we present a delicate example of a command src with loops, where a certain refinement holds for snapshot-free contexts but fails to hold for contexts with snapshot.
In particular, this implies that in the language without snapshot, full abstraction of the abstract semantics does not hold for code fragments with loops.Formally, we say that a transformation from a command src to a command tgt is sound for no-snapshot context, denoted by src ⇝ snapshot tgt , if ⟨ [ tgt ], ⟩ ↓ ′ implies ⟨ [ src ], ⟩ ↓ ′ for every snapshot-free context such that [ src ] and [ tgt ] are closed.
Proof Sketch.In the proof Thm.4.23, snapshot is needed in order to ensure that a certain state is reached when src is executed concurrently.When src is loop-free, we can achieve this result by repeatedly reading the shared variables used in src , and checking their values one-by-one.More precisely, given a state , let ≜ assume( 1 = ( 1 )) ; ... ; assume( = ( )) where 1 , ... , is an enumeration of all shared variables occurring in src .When src is loop-free, there exists a bound ∈ N on the number of writes performed by src (i.e., the number of component actions in ⌊ src ⌋).We use a sequential composition ; ... ; consisting of + 1 copies of instead of snapshot( ).If after every execution of we reach a state different than , then for the next execution of to terminate, we need at least one write by the concurrent context.Since the src is performing at most writes, executing + 1 times in a row ensures that at some point we visit .□ The above implication fails if src has loops.The simplest example we found is presented next.
Example 4.28.For the commands src = while * do (y := 0 ; x := * ; x := 0 ; y := * ) and tgt = y := 0 ; x := 1 ; y := 1 ; x := 0, we have src ̸ ⇝ tgt but src ⇝ snapshot tgt .The former follows from Thm. 4.23 since we have ⟨s 0 , 0 , W(y, 0) To see that src ⇝ snapshot tgt , we have to resort to cumbersome operational reasoning, and provide a simulation relation that relates operational executions of [ tgt ] to those of [ src ].Roughly speaking, the main idea is to execute y := 0 and x := * (with 1 for * ) in the source when the target executes y := 0 and x := 1, respectively.Then, when the target executes y := 1, the source executes x := 0 ; y := * (with 1 for * ).This creates a mismatch between the target's state that has x = 1 and the source's state that has x = 0. Nevertheless, whenever the concurrent context relies on the value of x, the source can do another half-iteration and execute y := 0 ; x := * to fix the value of x as it is in the target's state, moving the mismatch between the target and the source to y.This way, we are able to use the source's non-deterministic loop, to provide the concurrent context with whatever value it needs for x and y, one at a time.Finally, when the target executes x := 0 the source executes x := 0 ; y := * (with the final value of y in the target for * ).
Making this intuition formal is rather challenging (which provides us with more confidence that the denotational semantics is beneficial for formal refinement proofs).In our Coq development, we do that by generalizing the notion of a command context, demonstrating how generalized contexts interact with the operational semantics, and using generalized contexts for defining the simulation.
Example 4.28 uses non-deterministic looping, while * do , but, by using the following proposition, it is possible to devise a similar example without non-deterministic looping: Using Prop.4.29, we can adapt Example 4.28 to use a command ′ src that does not use while * do and := * instead of src and have ′ src ⇝ snapshot tgt .To see that ′ src ̸ ⇝ tgt , note that a concurrent snapshot observing x = y = 1 is possible for tgt but not for ′ src .Thus, snapshots strictly increase the distinguishing power of contexts also in a language without non-deterministic loops.

SEMANTICS FOR RMW-FREE CONTEXTS
In this section we show that RMWs strictly increase the power of contexts to distinguish between code fragments, and show how to modify the abstract semantics for the case of RMW-free contexts.

RELATED AND FUTURE WORK
We have already discussed the seminal work of Brookes [1996], from which we took a lot of inspiration.Our traces consist of write actions, rather than transitions (pairs of states) as in Brookes's traces, and are closer in spirit to models of Milner's CCS [Milner 1980] and Hoare's CSP [Hoare 1985].This choice has several advantages.First, it directly reflects the property of the operational semantics that each transition updates at most one variable.Second, since reads are not recorded in traces, our concrete semantics, i.e., before imposing any closures, already validates a variety of refinements, including all those that do not involve writes.In contrast, in Brookes's traces reads are tracked as stuttering transitions, and closures are needed also for refinements of reads (and of skip).Third, explicit environment writes in traces allows us to have a rule like coalesce that mimics operational simulation that attaches component actions to one environment write.
Brookes's traces, which are very similar to the traces used for giving meaning to rely/guarantee judgements [Jones 1983;Xu et al. 1997], have provided a useful intuition and formal basis for multiple later frameworks, e.g., [Dingel 1999[Dingel , 2002;;Liang et al. 2012Liang et al. , 2014;;Turon and Wand 2011], which propose relational program logics for reasoning about refinements.For example, [Dingel 1999[Dingel , 2002] used Brookes's semantics for deriving a refinement calculus allowing one to develop full concurrent programs by repeatedly refining a specification.Some works address the challenge of validating contextual refinements that are conditioned by some assumptions on the concurrent context.Our results on snapshot/RMW-free contexts go in this direction, but there is, of course, a variety of more fine-grained assumptions that will allow deriving useful refinements.For example, we would like to be able to reason about common concurrency primitives, such as locks and transactions.These can be implemented from standard shared memory constructs, but when studying full abstraction for them, one should only consider disciplined contexts, that, e.g., properly interleave lock and unlock commands.Some works, which provide sound techniques but do not consider full abstraction, addressed similar challenges.For example, [Liang et al. 2012[Liang et al. , 2014] ] developed a framework for establishing contextual refinement that handles assumptions such as data-race-freedom and data encapsulation in concurrent objects, and demonstrate that their technique is sufficiently expressive for verifying a complex garbage collector.More recently, Frumin et al. [2021]; Song et al. [2023] studied refinements conditioned by separation logic premises, and Khyzha and Lahav [2022]; Singh and Lahav [2023] studied refinements that assume that clients adhere to a given library call policy.We hope that our denotational semantics will form a basis for continuations along these lines.
Another line of work, see e.g., [Benton et al. 2016], attempts to capture shared-memory concurrency in general, and Brookes's semantics in particular, using monadic constructions following [Moggi 1991], or even as an algebraic theory [Abadi and Plotkin 2010;Dvir et al. 2022].A prominent advantage of these approaches is their ability to capture higher-order programs, while we are limited to first-order programs.Additionally, this approach detaches structural refinements from effectful ones and paves the way to type-and-effect systems, enabling reasoning about refinements using assumptions from a type analysis (see e.g., [Birkedal et al. 2012;Kammar 2014]).
Our work handles shared variables admitting sequentially consistent semantics (SC).Jagadeesan et al. [2012] modified Brookes's semantics to apply for x86-TSO memory (see [Owens et al. 2009]), and achieved full abstraction using await instructions.Dvir et al. [2024] developed Brookes's semantics for the Release/Acquire memory model (see [Lahav et al. 2016]), but did not study full abstraction.A large body of work, e.g., [Jagadeesan et al. 2020;Jeffrey and Riely 2019;Jeffrey et al. 2022;Kavanagh andBrookes 2018, 2019;Paviotti et al. 2020], has been devoted to the study of compositional semantics for weakly consistent memory that is not necessarily accompanying an existing operational semantics like in our case.A prominent idea there is the use of partially ordered multisets ("pomsets") [Pratt 1986] or event structures [Winskel 1987] that generalize linearly ordered traces, like those we work with.This aligns with axiomatic approaches (see, e.g., [Alglave et al. 2014]), which, as is, like operational semantics, are restricted to apply on closed full programs.
In the realm of weak memory models, reasoning about correctness of local compiler optimizations is rather challenging and error-prone.Many works have addressed this issue in different levels of formality, e.g., [Burckhardt et al. 2010;Chakraborty and Vafeiadis 2016;Cho et al. 2022;Dodds et al. 2018;Morisset et al. 2013;Poetzl and Kroening 2016].Interestingly, it is not always the case that a weaker memory model allows more optimizations than a stronger one (see, e.g., [Gopalakrishnan et al. 2023]).For instance, weak memory model usually do not support "store-after-load elimination" and "redundant FAA elimination" that are valid under SC (see Fig. 5).Attempting to allow local proofs of optimizations, some of these works develop compositional semantics, but these are restricted to top-level parallel composition.An noteworthy exception is the work of Dodds et al. [2018] who developed a denotational semantics for the Release/Acquire weak memory model.Their semantics is based on an axiomatic formulation, which they generalize to allow "block-local execution graphs" that iterate over all possible context execution graphs, and thus achieving full abstraction.Their blocks are, however, restricted to be sequential, which enables local validation of program transformations without actually showing that ⟦ 1 ∥ 2 ⟧ is a function of ⟦ 1 ⟧ and ⟦ 2 ⟧.
Our notion of contextual refinement is based on partial correctness, and is insensitive to termination.In concurrent programs termination is interesting assuming scheduler fairness [Francez 1986], and, termination is generalized into a family of progress conditions [Liang and Feng 2020].By using infinite traces, Brookes's semantics generalizes to fair infinite runs [Brookes 1996, §10], and is shown to be fully abstract w.r.t.operational "state-trace behaviors" consisting of sequences of states visited during the computation.We leave the task of incorporating this dimension into our semantics for future work, possibly by taking coinductive versions of our concrete semantics.For the abstract semantics, we expect that the local rewriting rules (see Lemma 4.26) will be handy.