Verification under Intel-x86 with Persistency

The full semantics of the Intel-x86 architecture has been defined by Raad et al in POPL 2022, extending the earlier formalization based on the TSO memory model incorporating persistency. This new semantics involves an intricate combination of the SC, TSO, and PSO models to account for the diverse features of the enlarged instruction set. In this paper we investigate the reachability problem under this semantics, including both its consistency and persistency aspects each of which requires reasoning about unbounded operation reorderings. Our first contribution is to show that reachability under this model can be reduced to reachability under a model without the persistency component. This is achieved by showing that the persistency semantics can be simulated by a finite-state protocol running in parallel with the program. Our second contribution is to prove that reachability under the consistency model of Intel-x86 (even without crashes and persistency) is undecidable. Undecidability is obtained as soon as one thread in the program is allowed to use both TSO variables and two PSO variables. The third contribution is showing that for any fixed bound on the alternation between TSO writes (write-backs), and PSO writes (non-temporal writes), the reachability problem is decidable. This defines a complete parametrized schema for under-approximate analysis that can be used for bug finding.


INTRODUCTION
The semantics of a modern concurrent memory system can be quite complex and hard to comprehend.One component of such semantics is the consistency model which determines the values that read operations may return along computations.The simplest such model is the sequential consistency (SC) model where instructions within each thread are executed in the order in which they are issued (program order) and where memory writes are instantaneously visible to all other threads.In general, for performance reasons, consistency models allow violations of both of these properties: they permit reordering of operations, and they may also make the result of a write visible at di erent times to di erent threads (a simple example of this is when the writing thread may read the written value before it is available to other threads).These features result in behaviours that do not meet the strong consistency guarantees of the SC model and the set of reachable memory con gurations is larger than under SC.Some well known examples of such relaxed/weakly memory models are the Total Store Order (TSO) and Partial Store Order (PSO) models, that form part of the model studied in this paper.
In addition, when the memory system includes a mechanism for ensuring resilience to crashes, by the use of non-volatile storage, the semantics includes a second component that determines the order in which writes to memory take e ect in such a storage (persistent storage).For performance reasons, this order may di er from the order re ected in the consistency model and thus resulting in another dimension of reordering of operations.The result of this second type of reordering is visible after recovering from a crash -the values still available in memory are only those recorded in the persistent memory prior to the crash.As a result, the set of memory con gurations reachable along runs with crashes and recoveries include some that are not reachable based only on the consistency model.
Typically, a variety of instructions (fences) are provided in the instruction set which programmers may use in their code to force ordering of instructions, both w.r.t. the consistency and the persistent stages.The behaviour of these fences is another important component of the semantics.(Notice that fences are expensive as they go against the purpose of the reorderings which is to provide better performance.) These aspects of a modern memory system (consistency, fences and persistency) interact in non-trivial ways and thus programming concurrent applications under such models is hard and error prone.There is a need to develop veri cation methods applicable to this setting.In this work, we consider the issue of the decidability of the safety veri cation problem (reducible to solving the state reachability problem) which is fundamental for the development of automatic veri cation algorithms.We address this issue on a concrete instance that is signi cant enough to show the main issues that arise in this context -we consider the case of verifying concurrent programs running over the Intel-x86 architecture with persistent memory for which a formal semantics has been de ned in [Raad et al. 2022].We assume in this work that programs have a nite data domain, and concentrate on the decidability issues related only to concurrency and the e ect of reordering among operations.
The decidability of reachability veri cation under weak consistency and persistency has not yet been investigated extensively so far in the literature.The only work we are aware of in this context is [Abdulla et al. 2021a] where the authors prove the decidability of this problem for Persistent x86-TSO (PTSO) [Raad et al. 2020] which is an extension of the TSO (Total Store Order) consistency model with a persistency model.However, PTSO does not capture faithfully the semantics of the Intel-x86 architecture with persistency as has been pointed out in [Raad et al. 2022].Our aim in this paper is to address the decidability of the veri cation of the reachability problem for the full semantics of the Intel-x86 architecture with persistency as it was de ned in [Raad et al. 2022].
According to the TSO model, write operations issued by a thread are placed in an unbounded FIFO store bu er where they remain pending till committed to (volatile) memory.During this time, the written value is visible only to the writing thread.Commit to memory makes it visible to all other threads simultaneously.While a write is pending in the store bu er, later reads, in program order, by the same thread to other locations can be executed fetching values from the memory.The relaxation provided by TSO model may still be unnecessarily strict for some applications -for instance bulk transfers of video bu ers may not need the preservation of ordering between writes on distinct locations.
To provide exibility the Intel-x86 architecture provides instructions to be used for di erent types of consistency requirements, ranging from strong SC like to very weak ones.Its subset of instructions with TSO semantics corresponds actually to the one targeting the so-called write-back (wb) memory.There are others.Among the available memory types, uc (for strong-uncacheable) memory allows only updates that are immediately committed, like write operations in the SC model.On the other hand, wc (for write-combining) memory allows reorderings between writes on di erent variables issued by a same thread, as it can happen in the PSO (Partial Store Order) model (though wc reads di er from PSO reads).In addition to having di erent semantics for di erent memory types, writes operations can be declared to be non-temporal, which changes their original semantics.Roughly, non-temporal writes (ntw) behave similarly to wc memory updates, allowing reordering between writes on di erent variables, even if these writes are on the write-back memory.The permitted reorderings of reads in these di erent memory types have subtle di erences and further there are also intricate rules governing reorderings of operations to distinct memory types.The di erent memory types, non-temporal writes, and the interactions between these features induce a consistency model eTSO (for extended TSO) that is more relaxed and way more complex than the TSO model.
Apart from the consistency stage, the Intel-x86 architecture also includes a persistency stage.Operationally, the consistency model is extended with an additional unbounded bu er to which writes can be moved from the volatile stage observing certain ordering rules.Naturally these rules di er for the di erent types of memory operations.This model, which we call epTSO model, is formalized in [Raad et al. 2022] generalizing the one [Raad et al. 2020] taking into account all the features of the Intel-x86 architectures mentioned above (various types of memories, non-temporal writes, and associated fence operations).
Even for nite-state threads (i.e., the data domain is nite and each thread has a nite control), proving decidability (of reachability) in this context is hard because of the unbounded reorderings of operations permitted by the complex consistency model, the persistency model and their interaction.All this hints at undecidability.Yet, despite the in nity sources in the semantics the models could still satisfy some properties that make their analysis possible.In fact, reachability under both TSO and PSO consistency models for instance is decidable [Atig et al. 2010[Atig et al. , 2012]].This has been proved by reduction to reachability problems in well-structured systems [Abdulla et al. 1996;Finkel and Schnoebelen 2001] (a class of systems for which it is known that this problem is decidable).More recently, the reachability problem for PTSO has also been proved to be decidable by a quite sophisticated reduction to well-structured systems.This makes the decidability of reachability an interesting and challenging problem.
Our rst contribution is to prove that the reachability problem under epTSO can be reduced to the reachability problem under eTSO.To achieve that, we proceed in two steps.The rst and easier step is to eliminate crashes from our analysis.A computation with crashes is decomposed into a sequence of crash-less phases separated by occurrences of crashes.Then the sequence of persistent states at the interfaces between these phases are guessed, reducing the original reachability problem to checking reachability between two given persistent memory states via computations without crashes.The second and involved step is to reduce this crash-free reachability between persistent memory states to reachability without the persistency stage.Now, to solve the crash-less reachability problem (between persistent memory states), we rst observe that (as far as reachability is concerned), wb and ntw updates together with atomic readwrites and fences, can simulate all other types of write operations.Then, the main reduction consists in showing that reachability under the model with persistency can be reduced to reachability under a model without persistency that is an extension of TSO with non-temporal writes (only).The reduction is code-to-code, in the sense that given a program for which a reachability question is stated, this question is reduced to another reachability question for a program that is derived from the original one.The main idea behind the construction is to consider a manager running as an additional thread which, in cooperation with the other threads of the program, implements a protocol that simulates the persistency semantics.Basically the manager guesses for each variable the value that will be recovered from the persistent memory upon the next crash, and then checks the validity of the guess by ensuring that no operation which overwrites the persistent memory in a manner that invalidates the guessed persistent memory state occurs.That this manager is just a thread, and not an external nite state observer synchronizing with all updates, is important -otherwise, the TSO bu ers are FIFO, and such a manager in cooperation with the threads can solve undecidable problems such as PCP.This makes interaction of the manager with the program threads quite subtle.Each thread helps the manager in its guessing by marking, for each variable, a write operation that it considers as its last write that persists.The manager takes one of these values as the guessed value for that variable.Then, by watching the operations done by each thread, subsequent to this point, it can check that his guess was right.This requires an intricate protocol of signaling between the threads and the manager.The important and surprising fact is that, the manager can be de ned as a nite-control process, i.e., it needs to track only a nite amount of extra variables and yet we get rid of the unbounded bu er usually employed to model the persistency stage, e.g., in [Raad et al. 2020] and [Raad et al. 2022].Another important point is that the manager uses only SC variables.This implies that our reduction can be used already for SC extended with persistency, as well as for TSO or PSO extended with persistency.In particular, this construction remarkably simpli es the models introduced in [Abdulla et al. 2021a;Khyzha and Lahav 2021] to reason about reachability in PTSO.
The question then is whether the reachability problem for Intel-x86 without persistency (the eTSO consistency model) is decidable or not?Our second contribution is to prove its undecidability.This is surprising -the operations on each of the Intel-x86 memory types and non-temporal writes, when considered in isolation, have semantics corresponding to either the SC, TSO, or PSO models, and reachability is known to be decidable under each of these models.However, we prove that very limited interactions between operations in these models, as allowed in eTSO, leads to undecidability.The reachability problem becomes undecidable for programs where all threads are running according to SC except one that uses TSO variables and two PSO variables.
Given the undecidability, a natural approach to this problem is to seek a decidable parametrized bounded under-approximate analysis schema.This means de ning some appropriate bounding parameter such that the reachability problem is decidable under each xed bound, i.e., by considering only the program behaviors satisfying the bounding constraint.This is a classical and widely adopted approach in the context of bug nding, particularly for concurrent programs where a number of bounding concepts such as context-bounding [Qadeer and Rehof 2005] and delay-bounding [Emmi et al. 2011] have been introduced.Ideally, the parametrized bounding schema should be complete in the sense that the union of all program behaviors under all bounds is the set of all its possible behaviors, which implies that if there is a bug in the program, there must exist a bound where it will be observed.
We take as bounding parameter the number of alternations between wb writes and ntw's along computations.Clearly this bounding schema is complete.We prove that for any xed bound on such alternations, the reachability problem is decidable.The proof is by a reduction to the reachability problem under PSO which is known to be decidable [Atig et al. 2012].Our reduction is formulated as a code-to-code translation to PSO.In our reduction, each thread is now represented by 2 threads, one for each of its rounds (in each round it performs either only wb writes or only ntw writes), which run in parallel.The TSO rounds are handled by suitable insertion of fences.There are a number of subtle requirements imposed by fences that require orderings to be enforced between operations among these 2 threads.There is also the ow of information, through read own writes, between these threads that has to be managed.Both of these are handled by a dedicated manager thread (i.e.we have one manager to handle the 2 threads corresponding to one eTSO thread).Interestingly, this manager is nite-state and needs to track only a nite amount of information at the interface between the di erent computation rounds.
Related Work.Our work uses the formal semantics of Intel-x86 memory types and non-temporal stores that has been proposed in [Raad et al. 2022].For the related work on formal semantics of weak memory models, the reader is referred to [Raad et al. 2022].
In the following, we focus on the decidability and complexity results for the veri cation problems of programs running under weak memory models.The decidability and complexity of the reachability problem for program under Total Store Ordering (TSO) has been studied in [Abdulla et al. 2016[Abdulla et al. , 2018;;Atig et al. 2010Atig et al. , 2012]] [Lahav andBoker 2020, 2022] and promising semantics in [Abdulla et al. 2021b].The work [Abdulla et al. 2022] studies the reachability problem for TSO programs with dynamic thread creation.The parameterized veri cation (i.e., the veri cation of an arbitrary number of identical threads) for TSO has been addressed in [Abdulla et al. 2016[Abdulla et al. , 2018[Abdulla et al. , 2023[Abdulla et al. , 2020a] ] and Release Acquire in [Krishna et al. 2022].
The robustness problem (which can be seen as a stronger problem than reachability) for programs running under weak memory models has been addressed for TSO [Bouajjani et al. 2013[Bouajjani et al. , 2011]], POWER architecture [Derevenetc and Meyer 2014] and fragments of C11 memory model [Lahav and Margalit 2019;Margalit and Lahav 2021].A closer problem of the robustness problem called persistence has been studied in [Abdulla et al. 2015a].
All these works do not consider persistency.[Abdulla et al. 2021a] is the only work (as far as we know) that addresses the decidability and complexity of programs running weak memory models with persistency.However, the considered formal model in [Abdulla et al. 2021a] uses the formal semantics of Intel-x86 persistency that was introduced in [Raad et al. 2020].This formal semantics considers only write-backs memory and does not model non-temporal stores as it is the case of [Raad et al. 2022].Our results are di erent from [Abdulla et al. 2021a]: First, the reachability problem for program under Px86 was shown to be decidable in [Abdulla et al. 2021a] using the framework of well structured systems [Abdulla 2010;Abdulla et al. 1996;Finkel and Schnoebelen 2001] while we show that this problem is undecidable for the full Intel-x86 consistency model regardless of the persistency feature.Furthermore, we show, in this paper, that the reachability problem under the full Intel-86 architecture with persistency can be reduced to the reachability problem under a consistency model without persistency.Finally, our decidability result of the reachability problem when bounding the number of alternation between non-temporal writes and temporal write-back operations is more general than the decidability result of [Abdulla et al. 2021a] and the proof is done by reduction to the reachability problem for PSO.

Notation
Let Σ be an alphabet, Σ * (resp.Σ + ) denote the set of (non-empty) nite words over Σ.Let denote the empty word.Consider a word over Σ, we use | | to denote the length of .For : 1 ≤ ≤ | |, we write [ ] to denote the ℎ letter of .For any ∈ Σ, we write ∈ to denote that there exists : 1 ≤ ≤ | | such that [ ] = .Given two words 1 , 2 ∈ Σ * , 1 • 2 stands for the concatenation of 1 and 2 .Given a word ∈ Σ * and Σ ′ ⊆ Σ, we use ↓ Σ ′ to mean the word obtained by deleting from all the letters not in Σ ′ .Given two sets and , we use [ → ] to denote the set of all functions from to and we write : → to denote that ∈ [ → ].We write [ ← ] to denote the function ′ where ′ ( ) = , and

Transition System
Let Σ be a nite alphabet (called also the set of events) which contains a special empty event (denoted ).A word ℎ ∈ Σ * is called a history over Σ if the empty event does not occur in ℎ.Let be a word over Σ.We use − ( ) to denote the history ↓ Σ\{ } (obtained from by deleting all the occurrences of the empty event).
A transition system A is de ned by a tuple ⟨Γ, Γ init , Σ in , Σ out , − →⟩ where Γ is a set of con gurations, Γ init ⊆ Γ is the set of initial con gurations, Σ in (resp.Σ out ) is the set of input (resp.output) events, and We also use 1 − → 2 to denote that there are 1 ∈ Σ in and 2 ∈ Σ out such that 1 Given a set of con gurations ⊆ Γ, we use |= ′ to denote that there is a con guration ∈ such that * − → ′ .We abuse the notation and use |= ′ for { } |= ′ .We say A |= when Γ init |= .This de nition is extended to sets of con gurations as expected, denoted by A |= where ⊆ Γ .
A run of A is a sequence of transitions of the form 0

EPTSO-FORMAL SEMANTICS
In this section, we present the formal semantics of concurrent programs running under the epTSO semantics (following the style of [Abdulla et al. 2021a;Raad et al. 2022]).We assume a nite data domain D, which also contains the special value 0. To de ne a program, we de ne a simple programming language in Fig. 1.A program P is then any code that conforms to this programming language.Notice that a program then contains a set of shared (global) variables (say X) and a set of threads (say Θ).We sometimes refer to a program as P = ⟨Θ, D, X⟩.A thread ∈ Θ declares a set lvars of local variables, followed by its code.Let lvars = ∈Θ lvars be the set of all local variables.We assume that the local variables and the global variables range over the data domain D. An instruction is of the form ℓ : stmt where ℓ is the label and stmt is the statement of the instruction.A label occurs at most once in P, and hence, for a given label ℓ, the instruction and the thread to which ℓ belongs are uniquely de ned.We use Λ to denote the set of all labels.We assume a set expr of Boolean expressions involving local and global variables.The set of thread's instructions includes reading, writing, atomic-read-write instructions on shared and local variables, and branching instructions.Moreover, we allow ush operations (flush opt and flush), and fence instructions (mfence and sfence).There are two types of write instructions, write-back (wb) and non-temporal writes (ntw).

Handling other type of operations
In Intel-x86 architecture there are several other kinds of reads and writes [Raad et al. 2022].Further, the memory is partitioned into di erent types and the operations permitted on a location are determined by its type.We allow only one memory type which corresponds exactly to the wb-type in [Raad et al. 2022].We also restrict our write operations to ntw and wb writes and rmw.We describe below, how to handle other type of operations using only these operations.
The wc-type memory permits wc writes which have exactly the same behaviour as ntw writes on the pending and persistency stages, so they can be simulated by ntw writes.However, wc reads behave di erently from reads on wb type memory.They do not overtake any operation (and reads are not overtaken by any operation).In e ect, it is like a wb read but should execute only when the pending bu er is empty.One may be tempted to simulate it by a mfence followed by an wb read and this matches the behaviour of wc read in the pending stage.However, memory fence has the e ect of ushing the persistent bu ers while wc read entails no such ushing.The correct solution is the following and uses an additional helper thread man and a new variable ◀ empty ▶ which takes values over {⊤, ⊥}.To simulate a rd wc ( , ) we execute: flush (◀ empty ▶)); ◀ empty ▶ := wb ⊤; assume (◀ empty ▶ = ⊥) ; := The helper thread man is a thread that non-deterministically executes ◀ empty ▶ := rmw ⊤, ⊥.Observe that the ush at the beginning of the sequence ensures that no prior writes (in particular ntw writes) can be delayed beyond this point.The write followed by read on ◀ empty ▶, in conjunction with the helper thread, veri es the emptiness of the pending bu er.
The other type of memory considered in [Raad et al. 2022] is the uc-memory.The behaviour of uc-read is exactly identical of that wc-read.The uc write however behaves di erently from wb and ntw writes.Its e ect on the pending bu er is the same as a mfence followed by wb write followed by a mfence, but it has no e ect on the persistent bu er.
We use the same idea as above but in addition we have to ensure the emptiness of the bu er after the write completes.This can be arranged by executing the above protocol after the write.
We can also comply with the typing of memory by syntactically identifying memory locations with speci c memory types.With this translation we can restrict our attention to wb and ntw writes to prove our results.

Semantics
In the following, we give the operational semantics of a program in the epTSO semantics as a composition of three transition systems namely the program, the volatile memory, and the persistent memory transition systems.These transition systems are de ned below.For the purpose of these de nitions, we x the threads of program to be Θ and the shared variables to be X.

The Program Transition System.
A program P induces an transition system A P = Γ P , Γ P init , Σ P in , Σ P out , − → P de ned as follows: A con guration of A P is a pair ⟨L, R⟩ where L : Θ → Λ returns, for each thread, the label of the next instruction to be executed, and R : lvars → D gives the values of the local variables.There is only one single initial con guration ⟨L init , R init ⟩ where L init ( ) returns the label of the initial instruction, and R init ( ) = 0 for every ∈ lvars.

The Volatile Stage Transition System. The volatile stage transition system
→ V describes how we handle each instruction of the program in the the volatile stage of epTSO.Here, Σ V in = Σ P out , that is, the set of input events is equal to the set of output events from the program.Furthermore, the set of output events A con guration of the system is a pair of the form ⟨ , ⟩ where: (1) the map : Θ → * , with is the bu er alphabet, corresponds to the operations delayed by the corresponding thread, and (2) the map : X → D gives the value of each shared variable in the volatile memory.In the sequel, we refer to as the pending (or store bu er) and the elements residing in it as messages.For instance, we say a wb-message on in the bu er of , or an fo-message, etc.We may refer to messages by their types, e.g., a message of type wb, fo, etc.We de ne the initial con gurations as: Γ V init := {⟨ init , init ⟩}, where init ( ) := for any ∈ Θ and init ( ) := 0 for any ∈ X.We will sometimes use ⟩} to mean the transition system obtained by modifying the initial volatile memory.
We de ne the transition relation according to the inference rules of Fig. 2. We classify the set of inference rules in four categories: (i) In the rules wb-get,ntw-get,sf-get,fl-get, fo-get, the transition system gets the corresponding events from the program.We append the corresponding message Fig. 3.The persistency stage transition relation.
to the tail of the store bu er.(ii) The transition described in the rule rmw can only be executed if the store bu er is empty, the value of in the volatile memory is 1 .The execution of this rule will set the value of in the volatile memory to 2 .Observe that the rule mf requires also that the store bu er is empty.(iii) Reading the value of a shared variable can be performed by two rules.
In read-own-write, the store bu er contains a write message on .Then, the most recent pending write message on is read.In read-from-memory, there is no write message on in the store bu er.In such case, we read the value of directly from the volatile memory.(iv) This category of transitions concerns updates (i.e.propagating the messages from the store bu er).We carry out the updates of sf in-order (i.e. it does not re-order with any other instructions).The cases of ntw and wb are dealt with separately in the rules wb-update and ntw-update, respectively.In case the message to be updated is of the form wb then it is propagated only if the following messages are absent in front of it in the bu er: (i) ⟨sf⟩-and fl-messages, (ii) ntw-write messages on the same variable , and (iii) wb-writes on any variable.In case the message to be update is of the form ntw then it is propagated only if the following messages are absent in front of it in the bu er: (i) ⟨sf⟩-and fl-messages, (ii) write-messages (whether wb or ntw) on the variable .If the messages are of the form ⟨fl ⟩ or ⟨fo ⟩, then the rules fl-update and fo-update are used respectively.

3.2.3
The Persistent Stage Transition System.We capture the behavior of the persistency stage by the transition system A P := Γ P , Γ P init , Σ P in , Σ P out , − → P .Here Σ P in = Σ V out and Σ P out = { }.In other words, the transition system gets its input events from the pending stage transition systems.The con gurations Γ P are of the form ⟨ , ⟩ where (1) : X → * , with is the bu er alphabet, is the content of the persistency bu er for every shared variable, and (2) : X ↦ → D gives the value of each variable in the persistent memory.The set of initial con guration is de ned by Γ P init = {⟨ init , init ⟩} where init ( ) = and init ( ) = 0 for any shared variable .We use A P [ ] = Γ P , Γ P init [ ], Σ P in , Σ P out , − → P where Γ P init [ ] = {⟨ init , ⟩} to refer to the transition system obtained by replacing the initial persistent memory.The transition relation − → is given in Fig. 3.The wb-get handles wb write messages arriving from the volatile stage.It appends the corresponding message to the tail of the persistency bu er for the relevant variable.The rule fo-get concerns fo-messages.Both the wr and fo messages are removed from the bu er in-order.The ntw case is handled by the rule ntw.It is only enabled if the bu er of the corresponding variable is empty.In this case the value is directly propagated to the persistent memory.The case of fl is similar, i.e. the rule fl is enabled only if the corresponding bu er of the variable is empty.The case of rmw-messages is described in rmw.The transition is enabled only if there are no fo messages on in any of the bu ers, where the is the thread that executed the instruction.The case of sf and mf are also similar and is given in the rules sf and mf respectively.
Remark 1.When only the program transition system and the volatile stage transition systems are involved, we refer to their composition as the eTSO system.That is, given a program P, we refer to A P ⊗ A V as the eTSO transition system.

The Reachability Problems
Let P be a program with a nite set Θ of threads and with a set Λ of labels.Let A P be the transition system associated with the program.Let L be a labelling function and 1 , 2 be a pair of persistent memories.The Crash-Free Reachability Problem (CRP) asks whether for a given con guration of A P , if can be reached (i.e. 2 ).The Crash-Free Persistent Reachability Problem (CPRP) asks whether for a given 2 , there is a 2 = ⟨ , 2 ⟩ for some such that for some and 1 .That is, whether a con guration with the persistent memory as 2 can be reached when starting from 1 in both the persistent memory and the volatile memory.With an abuse of notation, we denote this as be a recovery function that associates for each valuation of the persistent memory a labelling function.Intuitively, the recovering function de nes the new initial labels of the threads after a crash of the system.We assume w.l.o.g that Rec ( init ) = L init and that the recovery function is computable.Let be an valuation of the persistent memory.We de ne P□ to be the event transition system A P [Rec ( )] ⊗ A V [ ] ⊗ A P [ ] .The Full-Reachability Problem (FRP) asks whether, for a given a con guration of A P and a recovery procedure Rec, there exists a nite sequence and A +1 = P□ for all < , and (3) A |= ⟨ , 1 , 2 ⟩ for some 1 and 2 .To solve the full-reachability problem, it is su cient to guess the intermediary valuations of the persistent memory 0 , 1 , . . ., , then solve the crash-free persistent reachability problems A |= and the crash-free reachability problem A |= ⟨ , 1 , 2 ⟩.
Theorem 3.1.The full-reachability problem is reducible to the crash-free persistent and crash free reachability problems.

REMOVING THE PERSISTENCY STAGE
In this section we show informally how we can eliminate the persistency stage, while preserving correctness modulo reachability.More precisely, given a program P, we translate P to a new program P ′ such that the CRP of P ′ is equivalent to the CPRP of P. We provide an overview of the Fig. 6.An ntw-spoiled run of P 2 in Fig. 5.

Speculation
At the heart of our translation scheme is a speculation procedure: for each variable , the protocol guesses and freezes the value of an arbitrary write message on .The protocol then ensures that the value of in the persistent memory is just before the next crash.We implement the protocol with the help of an extra thread, the manager, that acts according to SC, and that veri es the consistency of these guesses.In implementing such a protocol, we need to handle these challenges: • Soundness: We must preserve the behavior of the input program P up to reachability.
• Freezing: The manager and the other threads must agree on a freezing point for each variable.
• Non-Spoiling: the frozen values should not be spoiled, i.e., they should not be overwritten in the persistent memory.Consider the program P 1 of Fig. 5 consisting of the threads and , in which we would like to persists the values = 1 and = 2.In Fig. 4, we give a run that persists these values.In 1 , the thread has executed its instructions and has placed the corresponding messages in its pending bu er.The run de nes the freezing point of to be the message wb 1 .In 2 , the two messages of have updated the volatile memory and crossed to the persistency stage.In particular, the message wr 1 is induced by the message that was the freezing point of .Also, in 2 , the thread has performed its three instructions and placed the corresponding messages in its pending bu er.The con gurations 3 and 4 show these messages update the volatile memory and cross to the persistency bu ers.In 5 , the message wr 1 has updated the persistent memory.Since this message is a freezing point, the run can no longer write to in the persistent memory; otherwise, we would overwrite a frozen value.In 6 , we have transferred the messages in the -persistency bu er to the persistent memory, and, in particular, we have obtained the desired values of and .
Consider the program P 2 of Fig. 5, which we get from P 1 by replacing the instruction := wb 1 by the instruction := ntw 1.Also, consider the run of P 2 depicted in Fig. 6.The con gurations 1 and 2 are similar to Fig. 4. We notice that, due to the presence of ⟨sf⟩, the message wb 2 will reach the end of the pending bu er after the message ntw 2 (this is true even in the case of P 1 ).The main di erence is that the ntw 2 cannot cross to the persistency stage until the -persistency bu er is empty.Furthermore, the ntw must hit the persistent memory immediately without passing through the -persistency bu er.This means that the frozen value = 1 will be overwritten in the persistent memory.We say that the ntw 2 acts as a spoiler.Sometimes, we explicitly refer to the spoiler's type, so, in this case, we say that ntw 2 is an ntw-spoiler.In our construction, we will rely on the fact that to be able to persist a set of write messages, we should be able to nd a run along which (i) we can freeze the values of the variables in some order, and (ii) once the value of a given variable is frozen, then the value will not be spoiled (overwritten in the persistent memory).In general, di erent program runs may use di erent spoilers to overwrite the variable values in the persistent memory.In Fig. 4, we demonstrated that P 1 has such a run that allows freezing the correct value without spoiling.In particular, wb 2 is not a spoiler in the run of P 1 in Fig. 4 since it can be transferred to the persistent memory only after all the correct variable values have persisted.
Next, we consider another type of spoiler, namely sf-fo-wr-spoilers (SFW-spoilers for short).We illustrate SFW-spoilers using the program P 3 of Fig. 5, In any program run, the three highlighted instructions of will act as a spoiler.The three instructions generate the following sequence of messages ⟨sf⟩⟨fo ⟩ wb 2 , see Fig. 7.According to the epTSO semantics, these messages cannot be re-ordered in the pending bu er.Therefore, wb 2 will rst enter the -persistency bu er as the message wr 2 .Next, the message ⟨fo ⟩ will also enter the -persistency bu er, and it cannot be re-ordered with wb 2 .Finally, when the message ⟨sf⟩ reaches the end of the pending bu er, it forces the message ⟨fo ⟩ to leave the -persistency bu er, which, in turn, causes the message wb 2 to persist.In general, to be a spoiler, an operation or combination of operations should be able to force a new value to be persisted (after the freezing).There are also other types of spoilers namely fl-wr, mf-fo-wr and rmw-fo-wr-spoilers, which are similar.

Soundness and Visibility
Fig. 8.The Writing Protocol, illustrated on an a single thread operating on two shared variables and .
The text boxes, with a green background, on the righthand side of the figure describe the manager's actions.We also show part of the shared memory: the variables , , ◀ • ▶, and ◀ • ▶.
As we mentioned in Section 4.1, spoiler detection is a vital component of the speculation protocol.We must enable the manager to detect the spoilers that reach the shared memory.Recall that we no longer have the persistency stage; hence, we need to make all decisions based on messages that reach the volatile memory.The manager cannot detect spoilers only by inspecting the values of the variables in the volatile memory.The reason is twofold: (i) ntwspoilers are di cult to detect since we cannot distinguish between values written by wb or ntw instructions.A value, say = 2 in the memory, does not reveal whether the writer was of type wb or ntw.(ii) We cannot detect WFS or WFF spoilers since fences and barriers do not modify the memory in the rst place.To solve this problem, we "divert the tra c" from the threads to the memory and make it pass through the manager.More precisely, we now have additional copies of the memory locations.The threads write to one of the copy and the manager moves the written value to the main memory.Together with the normal data values, we augment the messages travelling through the store bu ers with additional information that helps the manager to detect freezing points, the types of writes, etc.Furthermore, we make fences and barriers visible to the manager by replacing them with write instructions on special variables.The manager inspects the arriving messages looking for frozen messages and spoilers and then copies the relevant data, i.e., the variable's values, to the shared memory.A crucial challenge is to ensure that replacing instructions in this manner is "su ciently precise" to preserve the epTSO semantics.We achieve this objective using the protocols of the following subsections.The manager processes exactly one message from one thread at a time.So messages are processed atomically, preserving order within threads (however some messages may be missed) and interleaving across threads.

The Writing Protocol
Consider the single-thread program P 1 of Fig. 8 performing three wb instructions.The writing protocol ensures that the manager observes and transfers enough write messages to preserve the epTSO semantics (up to reachability).We only need to preserve su ciently many messages, but not necessarily all messages, since, in a similar manner to the classical TSO semantics, the epTSO semantics is almost lossy (but not entirely lossy).In a wb message sequence, on the same variable, in a store bu er, we can lose all but the last message without compromising the semantics.As far as the manager is concerned, it needs to observe the last message in such a sequence, while it may or may not see the rest.We call this the last-message guarantee.In Fig. 8, we need to ensure that the manager observes the second write instruction, but not necessarily the rst.If the manager missed the second write instruction, we would allow a memory con guration where the values of and are 1 and 3, respectively.This memory con guration is not reachable in the epTSO semantics.For each thread ∈ Θ and variable ∈ X, the protocol uses a shared variable ◀ • ▶.
In 1 , the thread has executed all its instructions, resulting in the three messages we show in the gure.At the other end of the bu er, the manager will wait for the variables ◀ • ▶ and ◀ • ▶ to be populated.The rst message will update the value of ◀ • ▶ (con guration 2 ).In this program run, the manager will not notice this message since its value is overwritten by the next message (con guration 3 ).In 3 , the shared variable ◀ • ▶ carries the last written value on before a write message on another variable ( in this case) arrives.The one at a time feature means that the manager must process this last written value on before the message on arrives -it fetches the value 2 of verifying that no other variable updates have occurred.It thus provides the last-message guarantee.The rest of the simulation similarly transfers the write message on .

The Freezing Protocol
The aim of the freezing protocol is twofold.Firstly it allows, for each shared variable ∈ X, to guess and freeze the value of a particular write instruction on .The intuition here is that (i) will persist and (ii) will not be overwritten in the persistent memory until the next crash occurs.It also facilitates each thread to guess within its execution, the position where the freeze occurs.Fig. 9 shows the simulation of the protocol for a program with two threads and sharing two variables and .Each thread guesses, for each variable ∈ X, an -freezing point.The freezing point is de ned by a write message which we enrich by an extra ag ❆.A typical example of such a message is wb that is issued by thread in Fig. 9.This will tell the manager that expects the value of to be frozen after the message is fetched from the store bu er but before the next message of the form wb ◀ • ▶ ⊳ ⊲ is fetched from the bu er.The manager, for its part, waits until the freezing points for have arrived from all the threads.At that point, the manager freezes the value of currently in the shared memory, we refer to this point in simulation as -freeze.
In Fig. 9, the threads guess the freezing points for to be the rst messages in their respective bu ers and guess the freezing points for to be the last messages in the bu ers (con guration 1 ).In 2 , the -freezing point of has reached the memory, to which the manager reacts by ticking its local variable ◀ • • ❆ ▶.The manager copies the value 1 of carried by the message to the variable in the shared memory (con guration 3 ).The next message fetched from the bu er (corresponding to the second instruction of ) is not a freezing point, so the freezing protocol does not react to it.However, the value will still be transferred to the memory (con guration 4 ).In 5 , the manager receives the -freezing point of .At this point, the manager has received the -freezing points of all threads.The manager freezes the value of in the shared memory, which in 5 is equal to 4. Notice that although the manager requires all the threads to propose their own -freezing points, it freezes only a single value, namely the value accompanying the last -freezing point.In this case, the -freezing point of arrived last (i.e., after the -freezing point of ), and hence the value 4 was frozen (rather than 1).We might get the impression that, for a given variable , it is su cient that only one of the threads guesses the freezing of .After all, only the last freezing point for is taken to consideration.However, as we see below, the per-thread freezing points for are needed so that the threads can help the manager to handle potential -spoilers.

Handling Spoilers
4.5.1 The ntw Protocol.We recall that an ntw-spoiler is an ntw-message on in the bu er of a thread , that occurs after the -freezing point of .The goal of the ntw-protocol is to enable the manager to detect NTW-spoilers (cf.Fig. 10).To that end, we enrich the data domain by values of the form ⊳ • ntw ⊲ where ∈ D, i.e., we tag the written values by an ntw-ag.Before the -freezing point, we generate write messages using the standard (un-tagged) values (the messages 1 and 2 in Fig. 10.After the -freezing of occurs (the message 3 in Fig. 10, we watch out for the rst ntw-instruction on by (the message 4 in Fig. 10) The message 4 is an ntw-spoiler of .Instead of generating a message wb ◀ • ▶ ⊳ 5 ⊲ , the thread will generate the message ntw ◀ • ▶ ⊳ 5•ntw ⊲ .In other words, the value 5 of the message is tagged with the ntw-ag.When the manager sees the message, it knows from the tag that it is an ntw-spoiler.From this point on, the threads tags all write messages on , whether of type wb or type ntw, with the ntw-ag, i.e., it only generates messages of the form The reason is that these subsequent messages, e.g., the message 5 in Fig. 10, may overwrite the spoiler.For instance, assume we do not tag 5 with ntw.A possible scenario is that the manager misses 4 and only reads the memory after 5 has arrived (this is an allowed behavior according to the almost-lossiness property we described above).This means that when the manager reads the memory, the message 5 has already overwritten the value ⊳ 5 • ntw ⊲, written by 4 , and replaced it with the un-tagged value 4; and hence the manager have missed the fact that a spoiler has occurred.With the tagging of the subsequent messages, this scenario will not occur.More precisely, whenever the manager sees the ntw-ag in a write message on , it knows that either the message itself or a preceding message is an -spoiler.Two further remarks: First, the manager halts the program execution whenever it sees an ntw-tagged message since such a message indicates the existence of an ntw-spoiler.Therefore, these tagged messages never reach the (volatile) memory.As for read-own-operations, the thread strips o the ntw-ag and treats the message as a regular write operation.Second, the tagged messages preserves the message type (wb-or ntw-type message), and hence the protocol does not a ect the re-ordering of messages inside the bu ers.

4.5.2
The SFW Protocols.We describe the idea behind detecting any SFW spoiler in detail.Such a spoiler involves a pattern consisting of anfreezing point, a write to , an flush opt on and an sf-message.One di erence with the case of ntw-spoilers is that that the write to may be performed by another thread.We use this thread to reveal the possibility of such a spoiler to the manager.
The thread waits until it is past the freezing point for before activating the SFW-protocol.Once we are past this point, the thread switches to helping the manager detect a possible SFW spoiler by going three phases, (i), (ii) and (iii), as follows.In phase (i), it guesses the position where it expects the violating write on to occur (which is possibly from another thread) and marks it with a write to a special variable ◀ nxt • • ▶.This guess can be veri ed by the manager.After inserting this message the thread changes its behaviour and enters phase (ii).It no longer executes the flush opt ( ) but looks for one which can be a potential spoiler.Note that the ⟨fo ⟩ can re-order with other messages, hence not all the flush opt ( ) in phase (ii) are potential spoilers.It remembers, in its local state, if a potential spoiler indeed occurs.In phase (iii) the thread tracks ⟨sf⟩-messages.The reason is that an flush opt -instruction would contribute a spoiler only if it is followed by an ⟨sf⟩-message.As mentioned earlier, there is no way for the manager to observe an sf-message since it does not modify the memory.Hence, if such a ⟨sf⟩-message were to be generated, the thread inserts a write on the variable ◀ sf • ▶ with a special value ⊳ flush opt • ⊲ as a signal to the manager that a SFW spoiler has occurred.If this message reaches the manager it detects the spoiler and aborts.Replacing an ⟨sf⟩ by by a write to the variable ◀ sf • ▶ can allow more behaviours since a write can re-order with other operations where as an sfence cannot.We remedy this by guarding the write with ⟨sf⟩ messages.The other spoilers are handled using similar ideas.

Formal Translation
Recall that in CPRP, we are given a program P = (Θ, D, X) and a persistent memory , we ask if A P ⊗ A V ⊗ A P |= .We show in this section that we can construct another program P = Θ , D , X and a con guration of A P such that can be reached if and only if can be reached, as stated in the below theorem.
Theorem 4.1.Given a program P and a persistent memory , we can construct another program P and a con guration such that That is, the CPRP reachability in P reduces to CRP reachability in P .
We show in the Figure 12, the translation of the given program P = ⟨Θ, D, X⟩ into P = Θ , D , X , in particular we have − Θ = { 1 , . . ., } ⊎ { man }, where man is the manager thread that acts as a validator.
Shared Variables and Data.Let us examine the shared variables and data-values listed above and explain their roles.We begin with the variables in V instr .The variable ◀ • ▶ is used by the thread to implement the wb or ntw writes in such a way that the manager is aware of it, this is as explained in Sections 4.3.The variables ◀ sf • • ▶, is used to make visible the occurrences of SFW spoiler to the manager.Recall that an sfence is replaced by a write to this variable in case the thread identi ess it as a spoilers, this was discussed in section 4.5.2.Further, in the phase-i of the protocol, each thread speculated where it expects a violation and writes to a variable ◀ nxt • • ▶ at that position.We also have other variable that will be used as part to detect the other spoilers.
Next we turn our attention to the data values.As indicated in Section 4.4, the freeze protocol required each thread to speculate a freeze point.This was achieved by tagging the data value with a ❆ tag.Similarly the ntw protocol described in Section 4.5.1, on detecting an ntw spoiler required that the data value is tagged with an ntw tag.The data values D wr serve this purpose.The values in D sync will be used to handle the synchronisations due to rmw, sf, fl instructions.
Local variables.To implement the freezing protocol from Section 4.4 each thread has a variable ◀ • lfrz ▶ to indicate if it has issued the freezing write on .The process uses the variable ◀ • lntw ▶ to remember an ntw spoiler, the threads then ensure that any write that follows this is also tagged as a spoiler (see Section 4.5.1).To implement the SFW protocol (and other spoiler protocol) correctly, each thread , has a local variable ◀ • lpfo ▶, ◀ • fo ▶, ◀ nxt • ▶.The need for the local variable ◀ • lpfo ▶ is important and subtle.It is used to arrange the re-orderings of the fo with the other messages.Recall we mentioned that every flush opt in phase ii of the SFW protocol need not be a potential spoiler.This is because the fo can be removed from the store bu er before the write to ◀ nxt • ▶ is removed from the bu er.The the local variable ◀ • lpfo ▶ allows for such re-orderings.It is set as soon as ◀ nxt • ▶ is written to and any flush opt when it is set is deemed not a potential spoiler.The variable ◀ • lfo ▶ is used to remember a potential fo that is a spoiler.Thus the set of local variables used by each thread is given by Next we examine the local variables used by the manager thread.It uses a local variable ◀ • • lfrz ▶ to remember whether the freeze marked write on variable from thread has been observed and ◀ • lfrz ▶ to remember if all the freeze markers are observed.To detect the spoilers listed in Section 4.1, and to verify that the threads have guessed the next write positions correctly, it uses the variable ◀ • • lwr ▶.This variable is set when it encounter a write to a frozen variable from thread .We have the set of local variables of the manager, Algorithm () Translation.We are now ready to describe translation and the behaviour of the manager, we only provide the implementation relevant to the ntw spoiler and SFW spoiler, the rest of the implementations are similar.The algorithm 1 through 4 are for the implementation of the thread and rest of the algorithms are the implementation of the manager.
Program code.We now describe the code to code translations.We describe in the Algorithm 3, how to simulate the sfence.In this case, it is rstly checked if the ◀ •lpfo ▶ is set, this guards any flush opt from re-ordering with it.In case there was a flush opt that was a potential spoiler (line 2), then the variable ◀ • fo ▶ would be set, this also indicates an SFW spoiler.In this case, we write to ◀ sf • • ▶ (line 4) as described in the Section 4.5.2.Otherwise the instruction is simulated as it is.Notice the invocation of the procedure ND(), we will describe this in sequel.The simulation of the instruction flush and rmw are similar, their implementation is guided by how to handle the fl-wr spoiler and rmw-fo-wr-spoilers respectively.
Next we examine Algorithm 1 which provides the translation of wb-writes.Firstly we guard against re-ordering with an flush opt , this is in line 1.We also ensure that we do not write before we speculate the next write, this is done in line 2. If the ag ◀ • lntw ▶ is set then we tag the data value with ntw, indicating that this write is preceded by a spoiler.As explained in Section 4.5.2 this ensures that the manager never misses an ntw-type spoiler.It can also nondeterministically choose to set the freeze marker, if it was not set before.Finally it simulates the instruction as a wb write to the variable ◀ • ▶.
Algorithm 2, for ntw-writes, is similar but is also tasked with setting the ◀ • lntw ▶ if the thread has already used its freeze marker for .The rest of the code follows the pattern in Algorithm 1.
The flush opt ( ) instruction is described in Algorithm 4. The instruction is simply ignored as long as it is not a potential spoiler.It is deemed a spoiler if it cannot re-ordered before the write to problem where we do not use wb instructions.Given a program P and a bound , we translate P to new program P ′ such the ntw reachability problem for P ′ is equivalent to the -alternation bounded reachability problem for P. We illustrate the ideas using the program P of Fig. 13.For simplicity, we let the program have a single thread without loops.In general, our framework deals with multiple threads and with loop constructs.Forbidding wb-write instructions means that we need to simulate such instructions by ntwinstructions.To preserve equivalence with the eTSO-semantics, we need to keep the allowed orderings between messages, i.e., (i) We do not allow wb-messages to overtake each other.We simulate a wb-write instruction by an ntw instruction, and we encapsulate the latter by sf instructions, i.e., we put an sf instruction both before and after the ntw-instruction in our translation.The encapsulation will help prevent forbidden re-orderings of wb-messages.(ii) We allow ntw-messages to overtake wb-messages even on the same variable.To that end, we simulate each phase by a separate thread in P ′ .Since the threads have di erent bu ers, the respective messages may now overtake each other.(iii) We do not allow wb-messages to overtake ntw-messages on the same variable; nor do we allow them to overtake sf-or fl-messages.To that end, we add a manager thread, and implement a protocol, that we refer to as the interface protocol.The protocol lets the manager, in collaboration with the other threads, ensure that write messages are updated to the memory in the correct order, and that read instructions see the correct values.In the rest of the section, we describe how we implement the interface protocol, by giving the set of threads, variables, data domains, updates and reads.
Threads.For the program of Fig. 13, we will consider a 2-alternating run in which executes the following instructions: (i) the rst instruction in phase 1, (iii) the next three instructions in phase 1 1 2 , and (iii) the last three instructions in phase 2. We simulate each phase of in P ′ by a separate thread which we call the ⟨ , ⟩-thread, i.e., the threads are ⟨ , 1⟩, , 1 1 2 , and ⟨ , 2⟩.The program P ′ runs the instructions of the di erent phases one after one.The current phase of is given by the shared variable ◀ phase • ▶.Although the instructions are run sequentially, updating the messages belonging to di erent phases of the same thread may now interleave since we are using separate threads to simulate them.Therefore, we need to ensure that message updates and the values seen by read instructions faithfully mimic the behavior of the input program P. We do this by guessing the interfaces, i.e., the memory contents between the various phases of the same thread, and then run a protocol, the interface protocol, that ensures that the inter-phase interaction is carried out correctly.We implement the protocol using a manager, which is an extra thread to which we divert the tra c between the threads and the memory (similar to the case of Section 4).
Interfaces.For each phase , thread , and variable , we guess the value of the last write operation on by during .For instance, in Fig. 13, the ⟨ , 1⟩-interface is de ned by = ⊥ and = 1.These values are given by the shared variables ◀ interface • • • 1 ▶ and ◀ interface • • • 1 ▶, respectively.The above values tell us that (i) will not perform any write instruction on during phase 1, and (ii) that the last write instruction of on , during phase 1, assigns the value 1 to .The other interfaces are interpreted similarly.
Variables and Data.The variable ◀ phase • ▶ gives the current phase of .In 1 , the current phase is 1 1 2 , which means that the thread ⟨ , 1⟩ has already executed its (only) instruction (it did so in phase 1, which we do not show in the gure.)It has put the corresponding message in the bu er.To simulate a wb-write instruction by an ntw instruction, we encapsulate it by sf instructions, i.e., we put an sf instruction both before and after it in our translation.The encapsulation will help prevent forbidden re-orderings.To simplify the notation, we replace the sequence ⟨sf⟩ ntw ⟨sf⟩ by ntw , for any and .The enriched message ntw tells us that has performed wb-write on in phase 1.The written value is 1; furthermore, the LW attribute tells us is the last write message on by thread .Also, in 1 , the thread , 1 1 2 has performed its rst two instructions and has generated the corresponding messages ntw . The rst message does not have the ag LW in its value since it is not the last write by the thread , 1 1 2 on .In 3 , we have entered phase 2 and the thread ⟨ , 2⟩ has executed its write instruction resulting in the message ntw Correct Updates.We need to guarantee that updates to the memory are performed in the correct order.We need to ensure that, within any thread, the following properties are satis ed: (i) For any given variable , the ntw write messages are not re-ordered within the same thread.(ii) the wb-write messages are not re-ordered.We let message updates go through the manager.For instance, from in the transition from 1 to 2 , we transfer the message ntw to the variable ◀ • • 1 1 2 ▶.This enables the manager to inspect the value before transferring the correct value to the memory (the last step is not shown in the gure; in 3 , we have already moved the next message of the bu er to get the value = 2 in the memory).We provide, for each variable , thread , and phase , the shared variable ◀ LWstatus • • • ▶.The latter is a Boolean ag that tells whether the last write message on generated by the thread ⟨ , ⟩ is still in the bu er (the value true) or has left the bu er (the value false).For instance, in 1 of Fig. 13, the last write on in phase 1 is still in the bu er, whence the value true of the corresponding.The value is false for since there is no write on in phase 1.The manager changes the value of the LWstatus-ag to false when it receives a write message whose value contains the LW-ag.For instance, in 3 , the manager has switched the value of the ag ◀ LWstatus • • • 1 1 2 ▶ since it has received the the message ntw (which has LW as part of its value).In this manner, the manager can record whether a given bu er contains an outstanding write message on a given variable and approve variable updates only if they do not violate the semantics.As an example , in 3 , the manager would not accept to update the message ntw from the bu er of the thread ⟨ , 2⟩ since the bu er of the thread ⟨ , 1⟩ still contains a the message ntw . Such an update would correspond to re-ordering two ntw-writes messages on the same variable, which is forbidden in the eTSO semantics.On the other hand, it allowed updating the messages of phase 1 1 2 since they represent wb-messages overtaking ntw messages on di erent variables, which is permitted under eTSO.
Correct Reads.Similarly to updates, the interface protocol allows to read the correct values.Assume that the thread needs to read the variable value during phase .Then, the thread ⟨ , ⟩ in P ′ will perform the following sequence of actions: (i) It checks whether it has performed a write on ◀ • • ▶ (this information is maintained locally in ⟨ , ⟩).In such a case, it reads the value.(ii) Otherwise, it inspects the value of the last write on in the previous phase = − 1 2 .If ◀ interface • • ▶ = ⊥, then the previous phase never performed a write on , and hence we continue to the phase − 1 (until we possibly reach the memory).If ◀ interface • • ▶ ≠ ⊥, then has performed a write on during phase .We check whether this last written value is still in the bu er.More precisely, ⟨ , ⟩ checks the value of the variable of ◀ LWstatus• • • ▶.If that variable contains a value di erent from ⊥, it fetches the value from ◀ interface • • • ▶; otherwise, it reads the variable's value in the memory.For instance, in 2 , if executes the instruction a1:=y, it will read the value of ◀ interface • • • 1 ▶ which is 1.If executes the instruction a2:=y in 4 it sees its own last write on , which is 2. If executes the instruction a3:=x in 5 it needs to go all the way to the memory to nd the value 2.
Finally we invoke the result in [Atig et al. 2012] which states that reachability problem under PSO which is known to be decidable to obtain the following theorem.Theorem 6.1.Given a program P, the alternation bounded reachability problem for it is decidable.

CONCLUSION
We have investigated the decidability of the reachability problem under the Intel-x86 semantics de ned in [Raad et al. 2022].
We have rst provided a reduction that allows to take into account persistency without using an unbounded memory to model the persistency stage.The reduction is based on a program instrumentation that augment the given program with an extra nite-state thread, which allows to reduce the original reachability problem to verifying reachability under the consistency model only.The reduction is valid in particular when the consistency model is SC, TSO, or PSO.This allows in particular to provide a simpler decidability proof for the reachability problem under PTSO than the one in [Abdulla et al. 2021a] that uses an unbounded bu er for the persistency stage.An interesting issue is to investigate the class of storage systems for which such a nite-memory instrumentation is possible to encode the persistency semantics.
We have also shown that mixing operations obeying to various consistency models with decidable reachability problems may lead to undecidability.However, we have provided for the case we consider a condition under which verifying reachability becomes decidable: bounding the number of alternation between wb writes and ntw's in computations.This result is interesting as it provides a complete parametrized bounded analysis schema for bug nding in the setting we consider.Other types of restrictions could be investigated, based on commonly adopted patterns for the use of operations on di erent memory types and for their interactions.

Fig. 5 .
Fig. 5. Three programs with threads and .The di erences between program codes are highlighted in pink.

Fig. 9 .
Fig.9.The Freezing Protocol.We simulate the protocol on the program shown in the top-le corner.Together with the pending bu ers, we depict part of the shared memory, namely the variables , , and the variables ◀ • ▶, ◀ • ▶, ◀ • ▶, and ◀ • ▶.We also show some of the manager's freeze variables, namely the local variables◀ • • ❆ ▶, ◀ • • ❆ ▶, ◀ • • ❆ ▶, and ◀ • • ❆ ▶.The pink text boxes describe the thread's actions, while the green ones describe the manager's actions.
Fig.13.An input program P consisting of a single thread .The highlighted parts of the code give the instructions executed in the phases 1, 1 1 2 , and 2 in the execution of P ′ , respectively.
translation, through examples in Sub-sections 4.1 to 4.5.2 and then give the formal reduction in 4.6 Proc.ACM Program.Lang., Vol. 8, No. PLDI, Article 195.Publication date: June 2024.