Constraint Based Program Repair for Persistent Memory Bugs

We propose a constraint based method for repairing bugs associated with the use of persistent memory (PM) in application software. Our method takes a program execution trace and the violated property as input and returns a suggested repair, which is a combination of inserting new PM instructions and reordering these instructions to eliminate the property violation. Compared with the state-of-the-art approach, our method has three advantages. First, it can repair both durability and crash consistency bugs whereas the state-of-the-art approach can only repair the relatively-simple durability bugs. Second, our method can discover new repair strategies instead of relying on repair strategies hard-coded into the repair tool. Third, our method uses a novel symbolic encoding to model PM semantics, which allows our symbolic analysis to be more efficient than the explicit enumeration of possible scenarios and thus explore a large number of repairs quickly. We have evaluated our method on benchmark programs from the well-known Intel PMDK library as well as real applications such as Memcached, Recipe, and Redis. The results show that our method can repair all of the 41 known bugs in these benchmarks, while the state-of-the-art approach cannot repair any of the crash consistency bugs.


INTRODUCTION
Persistent memory (PM) is a type of non-volatile random-access memory with the capability of retaining data after the loss of electrical power.It has become commercially viable in the past few years.In modern computer architecture, PM may serve as the intermediate layer between volatile DRAM and non-volatile storage such as solid-state disks or replace part of the DRAM-based main memory.This will lead to a drastic reduction in latency and power consumption of the computing systems, and an increase in robustness against frequent and unpredictable power interruptions.This is why PM is used in more and more applications as commercial PM devices [20] come close to DRAM in terms of speed but with a significantly larger capacity.However, software developers are required to write PM related software code in order to unleash the full power of these PM devices [44].Unfortunately, it is a challenging task to use PM instructions and APIs correctly and efficiently.The reason is because, due to performance concerns, PM instructions are often designed to have weaker persistency/consistency models than volatile memory instructions.Thus, what is considered as a correct behavior for volatile memory may no longer be correct for persistent memory.Since the persistency/consistency models are far from being intuitive, unless developers have a deep understanding of both software and the PM semantics associated with hardware, it will be difficult to use these PM instructions and APIs correctly and efficiently.
Although a large number of program analysis techniques have been developed to help detect PM bugs [6, 8-10, 14, 28, 31-33, 38, 42] or prove their absence [13,27,40], little has been done on automated diagnosis and repair of PM bugs.In fact, the only existing repair technique that we are aware of is the Hippocrates tool developed by Neal et al. [37].Unfortunately, Hippocrates only repairs one type of relatively simple PM bugs called durability bugs; these bugs are simple in that fixing them requires only the addition of missing PM instructions.There are more complex PM bugs, often called crash consistency bugs in the literature, that Hippocrates cannot repair; fixing them requires some of the existing instructions to be reordered.Furthermore, Hippocrates uses syntactic-level pattern-matching, which means if a bug matches a known pattern, the tool will be able to repair it by applying a pre-defined code transformation.However, if the bug does not match any known pattern hard-coded into the repair tool, the bug cannot be repaired.
To fill the gap, we propose a constraint based method for automatically computing repairs for a broader class of PM bugs.Unlike the syntactic-level pattern-matching based approach of Neal et al. [37], our method relies on a semantical analysis of PM instructions to compute repairs.By symbolically encoding the PM-related program behavior and the correctness property as a set of logical constraints, and then leveraging an off-the-shelf SMT solver to solve these constraints, our method is able to search for novel repair strategies in a large solution space.As a result, our method is able to repair durability and crash consistency bugs of arbitrary form, even if these bugs do not match any of the known syntactic-level bug patterns hard-coded into Hippocrates.
This work is licensed under a Creative Commons Attribution International 4.0 License.
Fig. 1 shows an overview of our method.The input consists of the program and the violated PM property, and the output is the suggested repair.Internally, our method first leverages a Valgrind based software tool to instrument the program and generate the execution trace.The traces generated at the end of this step may be fed to any existing PM bug detection tool [18,19,22] to confirm the property violation.To compute a repair, our method uses an SMT solver to symbolically encode the solution space.As shown in Fig. 1, it symbolically checks possible repairs in the solution space to find a valid repair.In this context, a repair can be thought of as a modification of the program through a combination of inserting new PM instructions and reordering PM instructions.Our search for a repair is an iterative process, involving multiple calls to the SMT solver for both finding the repair candidate and validating it.Only valid repairs are returned to the user.
At the center of our method is the SMT solver based symbolic analysis for two reasons.First, symbolic analysis allows us to explore a large number of possible solutions quickly.Second, symbolic analysis is able to model various types of PM instructions and properties not only accurately but also uniformly, meaning that during symbolic encoding, everything boils down to a set of logical constraints.Since these constraints are expressed in a fragment of the SMT-LIB format, i.e., linear integer arithmetic (LIA), they can be solved efficiently using any off-the-shelf SMT solver.
We have implemented the method in a tool named PMBugAssist.During experimental evaluation, we focused on comparing our method with Hippocrates [37].This is because our focus is on automated repair, for which Hippocrates represents the state of the art.In contrast, prior work on detecting PM bugs [8,27,31,33,40] and verifying their absence [13,27,40] is less relevant; instead, they are complementary to our method.
Our benchmarks include programs from the well-known Intel PMDK library [21] as well as real applications such as Memcached [4], Recipe [30] and Redis [3].According to prior works on PM bug detection, these benchmarks have 41 known bugs in total, including 23 durability bugs and 18 crash consistency bugs.Our experimental results show that the new method can repair all of these 41 bugs, whereas Hippocrates cannot repair any of the crash consistency bugs.We also evaluated the runtime performance of the new method, and found that, for all benchmark programs, it can finish the repair computation quickly.
To summarize, we make the following contributions: • We propose the first constraint based method for repairing a broader class of PM bugs.Compared with the state-of-the-art approach, our method can repair PM bugs that do not match any known bug pattern.• We formalize PM bug repair as a special case of the syntaxguided synthesis (SyGuS) [2] problem, through which we discuss the soundness and decidability of our method.• We implement and evaluate the method on a large number of benchmark programs to demonstrate its advantages over state-of-the-art (Hippocrates).
The remainder of this paper is structured as follows.In Section 2, we review the technical background.In Section 3, we present the top-level procedure of our method.This is followed by our SMT solver based symbolic analysis in Section 4, our repair algorithm in Section 5, and discussion of correctness and optimizations in Section 6.We present the experimental results in Section 7 and review related work in Section 8. Finally, we give our conclusions in Section 9.

BACKGROUND 2.1 Persistent Memory (PM) Semantics
We focus on Intel's persistent x86 (Px86) model as published by Raad et al. [40].In the standard x86 architecture, STORE instructions executed by the CPU are sequentialized in a store buffer before taking effect in memory, while LOAD instructions are allowed to take effect immediately.This allows a fast LOAD to take effect before a slow STORE, provided that they have no control/data dependency, while preserving the semantic equivalence of the program.
In the Px86 architecture, a persistent buffer is added after the store buffer to further sequentialize the STORE instructions, before the written values show up in persistent media.While the CPU still preserves the sequential program behavior during normal (crashfree) execution, when a program crashes due to power failure, the order in which the written values show up in persistent media may be significantly different.This may lead to PM bugs.  1, which is taken from Raad et al. [40], characterizes an important aspect of Px86 that is relevant to our work: the order in which instructions take effect in persistent memory.Given a pair of instructions, (  ,   ), where   is executed before   by the CPU, the corresponding table entry shows whether Px86 guarantees that   persists before   using the symbols ✔ (yes) and ✘ (no).The third symbol, CL, means that   persists before   only when the two instructions access memory address blocks mapped to the same cache line.

The Persistency
For example, (STORE x, LOAD y) may persist in reverse order according to the ✘ symbol in Table 1 when the CPU chooses to execute the fast LOAD y before the slow STORE x for performance reasons.However, according to the table, (LOAD y, STORE x) must persist in the same order as they appear in the program, due to a possible control/data dependency.That is, since these two instructions may come from either the code snippet if(y>0) {x=1;} (with control dependency) or the code snippet {reg=y; x=1;} (without dependency), to be safe, the CPU would have to disallow the reordering based optimization.

PM-related Instructions.
In this work, we are concerned with the following PM-related instructions besides LOAD, STORE, and RMW (read-modify-write) instructions.
• clflush, which stands for cache-line-flush, is a synchronous operation of the CPU that results in flushing the cache line associated with addr immediately.Since this legacy instruction is blocking and slow, it is rarely used.• clflushopt, which stands for cache-line-flush-optimized, is an asynchronous operation that may postpone flushing to a convenient future time.It is fast, but the exact persistency time is less predictable.• mfence, which stands for memory-fence, is a memory barrier for both STORE and LOAD instructions.• sfence, which stands for store-fence, is a memory barrier for STORE instructions only.• Following Raad et al. [40], we treat clwb (cache-line-writeback) the same as clflushopt since the two instructions are semantically equivalent.
While the legacy instruction CLFLUSH is semantically equivalent to CLFLUSHOPT followed by SFENCE or MFENCE or RMW according to Intel's user manual, in terms of performance, the fastest and mostfrequently-used combination is CLFLUSHOPT followed by SFENCE.Thus, we focus on this combination in this paper.

Persistent Memory (PM) Bugs
We are concerned with two common types of PM bugs, called durability bugs and crash consistency bugs in the literature, which can be generated by many existing PM bug detection tools such as PMemCheck [19] and PMTest [33].

Durability Bugs.
Here, durability means that a value written by STORE eventually shows up in persistent media.However, this is not automatically guaranteed.Fig. 2 shows an example code snippet adapted from Intel's website, where the value written to header->counter may never show up in persistent media.This is because the program does not force the CPU to flush the corresponding cache line and, as a result, the written value (temporarily stored in the volatile part of the CPU) may be lost permanently if a power failure occurs while writer() is executed.After crash recovery, reader() may not have access to the values written by writer(), for example, due to the incorrect value of header->counter.
To make STORE instructions durable, __mm_clflushopt() and __mm_sfence() must be used to force the CPU to flush the cache line; these API calls correspond to CLFLUSHOPT and SFENCE.This is how values written to the name and addr fields of records[i] are made durable in Fig. 2 (Lines 12-14 and 20 for the THEN-branch, and Lines 18 and 20 for the ELSE-branch).
Note that neither instruction in the CLFLUSHOPT+SFENCE combination may be omitted; otherwise, durability is not guaranteed.

Crash Consistency Bugs.
When a program crashes due to power failure, it is possible that some (but not all) of the written values have been stored in persistent media.To prevent the persistent media from entering an inconsistent state, the program must use CLFLUSHOPT+SFENCE correctly, to force the STORE instructions to take effect in a certain order.The persistency order, in general, is determined by the reader() executed during crash recovery.
The reader() in Fig. 2  In existing bug detection tools, such as PMemCheck [19] and PMTest [33], the durability and must-persist-before properties are typically specified by the user and then checked for violations automatically.Such tools would be able to detect property violations in Fig. 2. For header->counter, the written value is not made durable at all using CLFLUSHOPT+SFENCE.As for records[i], there is a property violation since the reader() may read value 1 for records [i].valid from persistent media, and then expect records[i].name and records[i].addr to be available in persistent media, but end up with uninitialized or partially initialized values.

Detecting PM Bugs
Existing tools for detecting PM bugs (e.g., PMemCheck [19] and PMTest [33]) are based on analyzing the execution traces.Fig. 4 // trace for executing the THEN-branch Inst   shows two example traces for branches of the loop body in Fig. 2. For simplicity, we only show the STORE, CLFLUSHOPT, and SFENCE instructions relevant to the violated property assertions.
The first assertion violated by the ELSE-branch represents a durability property.Assume that all the STOREs in Fig. 2 are expected to persist in PM media.For the STORE  1 , its persistency time is denoted PTime( 1 ).Assuming that TMAX is the upper bound of the persistency time (bounded by the number of executed instructions in this program), we can express durability as PTime( 1 )<TMAX.The assertion is violated because clflushopt 0x4a3c0C0 is not used to force the CPU to flush the written value from cache to persistent media.
The assertion violated by the THEN-branch captures a crash consistency property.Here, the expectation is that the value written by  1 always persists before the value written by  2 , as shown in PTime( 1 )<PTime( 2 ).The assertion is violated because the CPU allows two CLFLUSHOPT instructions to take effect in reverse order, as shown by the ✘ symbol in Table 1.
Note that, even if we swap the execution order of the two instructions ( 3 and  4 ) in the program, the assertion will still be violated.Fig. 5 illustrates the reason.Here, the solid edges represent the execution order, while the dashed edges represent the persistency order imposed by Px86.Since the dashed edges remain the same (before and after swapping the execution order of  3 and  4 ), the requirement that  1 always persists before  2 is still not satisfied.

Repairing PM Bugs
Hippocrates [37] is the only existing method for repairing PM bugs, with two limitations.First, it only repairs the relatively simple durability bugs, such as the one shown in the ELSE-branch of Fig. 4, but not the more complex crash consistency bugs.Second, it only repairs bugs that syntactically match the patterns hardcoded into the repair tool.For bugs that do not have a syntactical match, Hippocrates would not know how to repair them.For example, if repairing a bug requires reordering some instructions, then Hippocrates cannot do it.
In contrast, our method can repair both durability and crash consistency bugs, and can repair bugs that do not syntactically match any of the known patterns hard-coded into Hippocrates.This is because our method has the ability to analyze the semantics of the PM instructions, and thus repair PM bugs through a combination of inserting new PM instructions and reordering instructions.We illustrate the technical challenges using examples in Fig. 6.Fig. 6 shows two possible repairs of the bug in the THEN-branch of Fig. 4. The first attempt, based solely on reordering the existing instructions of the execution trace, is not a complete repair.The reason is because, by moving  4 and  5 before  2 and  3 , the new version of the program guarantees that records[i].namepersists before records[i].valid.However, reordering also introduces a new durability bug for  2 : without a subsequent SFENCE instruction, the value written by  2 is no longer guaranteed to show up in persistent media, e.g., if the program crashes in the middle of the execution due to power failure.
Fig. 6 highlights the fact that, sometimes, it is impossible to repair a crash consistency bug solely by reordering instructions; we also need to add new PM instructions.We shall explain in the remainder of this paper how our method finds out that, by adding SFENCE in  6 , we can completely repair the crash consistency bug.
To summarize, for the buggy writer() in Fig. 2, the repaired version is shown in Fig. 3. Through a combination of inserting new PM instructions and reordering instructions, the repaired version in Fig. 3 guarantees both the durability of header->counter and the crash consistency requirement that records[i].validalways persists before header->counter.Note that, to satisfy the second requirement, we not only have to add CLFLUSHOPT+SFENCE for header->counter, but also have to move header->counter++ (Line 7 in Fig. 2) after the IF-ELSE statement (Line 16 in Fig. 3).

OVERVIEW OF OUR METHOD
Our method takes an existing PM bug as input.Besides the PM bug, which is an execution trace T that violates a property assertion A, no other input or constraint needs to be provided by the user.The PM bug may be produced by any existing bug detection tools such as PMemCheck [19] and PMTest [33].Specifically, the trace T = { 1 , . . .,   } is a sequence of instructions, each of which has an instruction type specified in Table 1.
The assertion A may be of the form  (  ) <   (durability) or  (  ) <  (  ) (crash consistency) as shown in Fig. 4. Here,   is the upper bound of the persistency time.Thus, if there exists a way of satisfying  (  ) ≥   , there exists a durability violation where   has not yet taken effect in persistent media at the end of the execution.The first repair is incomplete since it adds a new durability bug for  2 ; the second repair is complete because it removes the new durability and original crash consistency bugs.
Algorithm 1 shows the top-level procedure.Since we only invoke the procedure on a buggy execution trace, the first call to the subroutine BugIsFound(A, T ) always returns .Next, we use ComputeRepair(A, T ) to compute a potential repair.It guarantees that, after applying the repair R to the given trace T , the assertion violation no longer exists.However, this is not yet enough to guarantee that R is a valid repair.
There are two possibilities.One possibility is that R indeed is a valid repair: by permuting the instructions in T , R removes all the bad executions and retains only the good executions.The other possibility is that R is a vacuous repair in that, by creating a contradiction between R and T , it artificially removes all valid executions of the instructions in T .Since there is no longer any valid execution, by definition, the SMT solver cannot detect any violation (which must be a valid, and yet buggy, execution).
To find out whether the repair R is valid or vacuous, we use the subroutine RepairIsValid(A, R) to check, after applying R to T , whether any valid execution exists.If the answer is yes, then R is a valid repair, and thus is returned to the user.Otherwise, we use AddInstructions(A, T , R) to add more SFENCE and CLFLUSHOPT instructions to T , and try again.
There is a distinction between the normal program behavior and PM-related behavior, only the latter of which can be affected by CLFLUSHOPT/SFENCE instructions.Since our method only inserts and reorders CLFLUSHOPT/SFENCE instructions, it will not change the normal program behavior.As for the PM-related behavior, due to the use of the verification subroutine BugIsFound(T , A) in Line 1 of Algorithm 1, our method guarantees to eliminate the violation of the property assertion A in the given trace T .
Our method explicitly considers the efficiency of the computed repair by adding SFENCE and CLFLUSHOPT instructions iteratively on a "need-to" basis.As soon as enough instructions are added, the while-loop in Algorithm 1 will terminate.In this sense, it minimizes the number of added instructions, but without using an "optimizing solver" such as MAXSMT in DirectFix [35].

SYMBOLIC ANALYSIS OF THE PM BUG
In this section, we present our SMT based method for analyzing the PM bug symbolically.It is the foundation of not only the subroutine BugIsFound(T , A) but also the subroutines ComputeRepair(T , A) and RepairIsValid(T , R) in Algorithm 1.

The Satisfiability Problem
Given the trace T and the assertion A, whether there exists a valid execution of the instructions in T that violates A can be formulated as a satisfiability (SAT) problem.Toward this end, we construct a logical formula Φ := Φ  ∧Φ persistency ∧¬Φ assertion , where Φ  encodes the program order, Φ persistency encodes the persistency order, and Φ assertion encodes the assertion.Thus, Φ is satisfiable if and only if there exists a valid execution of the instructions in T that violates A.
We express Φ in a fragment of the SMT-LIB format that allows only integer variables (such as  and ) and Boolean compositions of linear integer arithmetic (LIA) constraints of the form ( < ).Thus, the satisfiability of Φ can be efficiently decided using any off-the-shelf SMT solver.
Before presenting our method for constructing Φ, we define the two sets of variables used to encode Φ as follows: The _  Variables.For each instruction   ∈ T , where  = 1, . . .,  , we define a variable _  whose value may be any integer in the interval [0,  ); it stands for the execution time, i.e., when the instruction   is executed by the CPU.Inside Φ, we will constrain _  variables to allow only valid permutations of T .
The  _  Variables.For each instruction   ∈ T of the STORE type, we define a variable  _  whose value may be any integer in the interval [0,  + 1]; it stands for the persistency time of   , i.e., when the value written by   is actually stored in persistent media.
Figure 7: Our symbolic encoding of the subformulas in The execution time starts from 0 and is bounded by  , the total number of instructions in T .We also require each _  variable to have a unique value.The definition of Φ  is presented in Fig. 7.

Using Φ program to Encode Execution Order
4.2.2Subformula Φ  .This store-order (so) constraint requires the STORE instructions in T to execute in the same order as they appear in the trace.This is because Px86 has a single store-buffer for all STORE instructions; thus, reordering of two STORE instructions (  ,   ) is not allowed, as shown by ✔ in Table 1.The definition of Φ  is also presented in Fig. 7.
While computing the repair, we may choose to relax Φ  in certain cases, to allow some of the STORE instructions to reorder.This is because some PM bugs cannot be repaired unless some STORE instructions are allowed to reorder in the program.We discussed an example at the end of Section 2, and we will discuss details of this relaxation in Section 6.

Subformula Φ 𝑓 𝑠
. This flush-store (fs) constraint requires that, for each CLFLUSHOPT (  ), its execution time must be after at least one of the STORE (  ) that it can flush.This requires that   and   are mapped to the same cache line, i.e., ℎ(  ,   ) holds.

4.2.4
Subformula Φ   .This fence-order (fo) constraint requires multiple SFENCE instructions to be executed in the same order as they appear in the trace.4.2.5 Subformula Φ  .This memory overwrite (mo) constraint says that two STORE instructions (  ,   ) cannot write the same 1 //Program order constraints: 9 //Persistency time constraints: 15 //Assertion violation constraints: Figure 8: Encoding for the THEN-branch of Fig. 6 with both durability and crash consistency assertions.
address without a CLFLUSHOPT (  ) inserted in between, to avoid memory overwrite.Memory overwrites must be avoided because, by definition, it violates the durability property.

Using Φ persistency to Encode Persistency Order
Let Φ persistency := Φ  ∧ Φ  ∧ Φ   be a set of constraints on  _  variables such that, for every satisfying assignment to Φ persistency , the values of the  _  variables correspond to a valid persistency order of instructions in T .These subformulas are defined in Fig. 7  but not yet fenced, and  + 1 means   has not even been flushed yet at the end of the execution.4.3.2Subformula Φ  .This persistency time store (pts) requires the persistency time of each   ∈  to be no earlier than the execution time of   , i.e., the value of _  .

Subformula Φ 𝑓 𝑖
. This fence interval (fi) constraint requires that, for each   ∈ , matching   ∈ ℎ, and   ∈ , the persistency time of   is no later than the execution time of   .

Using Φ assertion to Encode the Assertion
Let Φ assertion := Φ  ∧Φ  , where Φ  represents the set of durability conditions and Φ  represents the set of crash consistency conditions.Both of them are defined in Fig. 7.
Recall that for each   ∈ , the value written by   is expected to be stored in persistent media at the end of the execution (  =  ).Thus, if ( _  ≥  ) is satisfiable, there exists a durability bug.Similarly, given two instructions   ,   ∈ , if   is expected to always persist before   , then the satisfiability of ( _  ≥  _  ) means there exists a crash consistency bug.

An Example for Our Encoding Method
Fig. 8 shows the constraints in Φ constructed by our method for the THEN-branch of Fig. 6, after the new SFENCE instruction  6 has been added to the end of the trace.
Algorithm 2: R ← ComputeRepair(T , A) Line 8 encodes the program order.In particular, _ 1 < _ 2 encodes Φ  , which requires the two STORE instructions to execute in order._ 1 < _ 4 and _ 2 < _ 3 encode Φ   , which requires each CLFLUSHOPT to execute after a corresponding STORE._ 5 < _ 6 encodes Φ   , which requires the two SFENCE instructions to execute in the same order as in the trace.
Line 10 encodes Φ  and Φ  , where Φ  requires each  _  to have a value in [−1, 7], and Φ  requires each  _  to be no earlier than the corresponding _  .
Finally, Line 16 encodes the conditions under which assertion may be violated.
Since the set of constraints (Φ) in Fig. 8 is satisfiable, an SMT solver may return a solution corresponding to the permutation T ′ =  1 ,  4 ,  2 ,  3 ,  5 ,  6 .This is a valid permutation of T because, according to the CL symbol in Table 1, CLFLUSHOPT ( 4 ) is allowed to reorder before  2 and  3 .However, it violates the crash consistency property because  2 may persist before  1 .In the next section, we present our method for repairing this violation.

COMPUTING THE REPAIR
Algorithm 2 shows our method for computing R formula Φ is satisfiable.Our method first uses the subroutine ComputeRepair(T , A) to compute a candidate R, and then uses the subroutine RepairIsValid(T , R) to check if R is a valid repair.

Subroutine ComputeRepair(T , A)
The repair R is represented by a conjunction of blocking constraints, each of which, denoted ¬  , removes a subset of permutations of T allowed by Φ. Recall that Φ allows only valid and yet buggy permutations.Thus, we want to compute a set of blocking constraints that remove all valid and yet buggy permutations.
In Algorithm 2, R is initialized to , which represents an empty repair.Then, as long as Φ ∧ R remains satisfiable (Line 3), we compute a constraint   from the satisfying assignment () to the formula Φ ∧ R. Here,   is a conjunction of happens-before constraints, (_  < _  ), extracted from the antecedents of Algorithm 3: RepairIsValid(T , R) the subformula Φ   such that all these happens-before constraints are satisfied by the assignment ().
Since   captures a set of valid-and-yet-buggy permutations of T , by adding ¬  to R, we remove them (Line 5).Inside the while-loop of Algorithm 2, we keep adding ¬  until Φ ∧ R is no longer satisfiable.
Within each call to ExtractSatConstraint(Φ ∧ R), we compute a minimal set of constraints to be included in   based on the satisfying assignment () returned by the SMT solver.This is accomplished using the greedy algorithm as follows: First, we extract the concrete values of the _  variables from the assignment (), and use these concrete values to decide, for each (_  < _  ) constraint in the antecedents of Φ   , whether the constraint is satisfied.All the satisfied (_  < _  ) constraints are added to   .Thus, the negation of   will eliminate permutations associated with the assignment ().
Before adding ¬  to R, we remove the obviously-redundant constraints from   .These are constraints that are implied by other constraints in  .For example, if  contains both (_ 1 < _ 2 ) and (_ 2 < _ 3 ), then we remove (_ 1 < _ 3 ) from   since it is redundant.

Subroutine RepairIsValid(T , R)
Algorithm 3 shows our method for validating the repair candidate R in two steps.First, we define a new formula Ψ := Φ program ∧ Φ persistency to capture the set of valid permutations of T .Note that Ψ is a subformula of Φ because Φ := Ψ ∧ ¬Φ assertion .Next, we check if the combined formula (Ψ ∧ R) is satisfiable; we say that R is a valid repair only if (Ψ ∧ R) is satisfiable.
Fig. 9 illustrates why we check the validity of the repair in this way.Here, formulas ¬Φ assertion and Ψ can be thought of as filters of permutations of the trace T : red ones are buggy and black ones are non-buggy.In this sense, Ψ retains only the valid permutations of T , and the repair candidate R filters out the valid-and-yet-buggy permutations.If R retains at least some non-buggy permutation (black arrow), we say that R is a valid repair.But if R does not retain any non-buggy permutation at all, it is a vacuous repair.
The existence of some (valid and non-buggy) permutations means that the constraints imposed by R is realizable.

An Example for Our Repair Method
We use the example in Fig. 8 to illustrate the repair computation and validation process.Fig. 10 shows the corresponding steps.
First, recall that the constraints (Φ) shown in Fig. 8 are satisfiable.From the first solution to Φ returned by the SMT solver, our method identifies the happens-before constraints in the antecedents of the  //First iteration --Satisfiable From solution T ′ =  1 ,  2 ,  3 ,  4 ,  5 ,  6 , we extract  as follows: //Second iteration --Satisfiable From solutionT ′ =  5 ,  1 ,  2 ,  3 ,  4 ,  6 , we extract  as follows:  fence interval (fi) subformula Φ   .This corresponds to Line 4 of Algorithm 2. While there are four antecedents in Φ   as shown in Fig. 8, only two of them end up in   , as shown in Fig. 10.By adding ¬  to R, our method removes the buggy permutation where instructions  1 and  2 , together with their CLFLUSHOPT instructions, execute before the first SFENCE instruction in  5 .
Next, we check if Φ ∧ R is satisfiable (Line 3 of Algorithm 2).Since the answer is yes, from the second solution to Φ returned by the SMT solver, our method computes another   and then uses ¬  to remove the buggy permutation where instructions  1 and  2 , together with their CLFLUSHOPT instructions, are moved in between instructions  5 and  6 .
At this moment, the only remaining permutation is as follows:  1 and its CLFLUSHOPT are before  5 , while  2 and its CLFLUSHOPT are between  5 and  6 .Since this permutation does not violate the assertion, our method exists the while-loop in Algorithm 2 and returns R as a potential repair.
Finally, our method uses Algorithm 3 to validate the repair by checking the satisfiability of Ψ ∧ R. Since Ψ ∧ R is satisfiable, the SMT solver returns a solution that corresponds to the permutation This permutation of T shows exactly how to reorder instructions in the extended execution trace to avoid the assertion violation.Thus, by mapping the reordered instructions from T back to the original program, we obtain the repaired software code shown in the THEN-branch of Fig. 3.

CORRECTNESS AND OPTIMIZATIONS
In this section, we first discuss the correctness of our repair method by treating it as a special case of the well-known syntax-guided synthesis (SyGuS) problem [2].Then, we discuss two optimizations.

Relating to SyGuS
Our repair problem can be viewed as deciding the existence of a relation R such that Ψ(, ) ∧ R () =⇒ Ψ assertion () must be valid (for all  and ) and, at the same time, Ψ(, ) ∧ R () must be satisfiable (for some  and ).Here,  denotes the set of _  variables and  denotes the set of  _  variables.
This is the well-known SyGuS problem [2].
In our method, since the validity of  ∧  =⇒  is equivalent to the unsatisfiability of the negated formula  ∧  ∧ ¬, we rewrite the problem as follows: This allows use to use off-the-shelf SMT solvers to decide the two satisfiability subproblems.The first one says that Ψ ∧R ∧¬Φ assertion must be unsatisfiable, and the second one says that Ψ ∧ R must be satisfiable.They are the foundations of our method for computing and validating the repair in Algorithms 2 and 3.The link to SyGuS allows us to understand the complexity of the repair problem.Since quantification is applied to the relation R, the problem is expressed as a formula in second-order logic, which is known to be undecidable in general.That is why practical solutions to the SyGuS problem tend to be sound (and yet incomplete) solutions.In our repair method, we adopt the same approach.
Our Method Is Guaranteed to Be Sound with Respect to the Given Trace.That is, the repair R computed by our method is guaranteed to be correct.This is because, by definition, R is able to make Ψ ∧ R ∧ ¬Φ assertion unsatisfiable, as shown in Algorithm 2. At the same time, it is able to make Ψ ∧ R satisfiable, as shown in Algorithm 3. Thus, R can always eliminate the failed assertion.
Our method is not necessarily complete, meaning that even if there exists a valid repair, in theory, our method may not find it.We do not attempt to make the method complete for efficiency reasons, even if this may be achieved by restricting the search to a decidable solution subspace.Instead, we will demonstrate through experimental evaluation (Section 7) that, in practice, our repair method can always find a valid repair.

Adding New Instructions to T
So far, our analysis assumes that the set of instructions in the execution trace T is fixed.Sometimes, however, the PM bug cannot be fixed merely by permuting T ; in addition, new CLFLUSHOPT and SFENCE instructions must be added.This is the reason why there is a while-loop in Algorithm 1 and whenever the PM bug cannot be repaired using instructions in given execution trace T , we use AddInstructions (Line 6 in Algorithm 1) to add instructions to T , and try again.
Which instructions to add first depends on the violated assertion.If the violated assertion is  _  <  _  , our strategy is to add a CLFLUSHOPT instruction whose address is the same as the address of   or   .If the violated assertion is  _  <  (a durability bug), our strategy is to add a CLFLUSHOPT instruction first and then check if a valid repair exists; if the violation still exists, we add an SFENCE instruction and check again.
Fig. 6 shows an example.Prior to adding the instruction  6 , the last violated assertion represents the durability of the value written by  2 .Thus, we add an SFENCE instruction.The reason why there is no need to add the CLFLUSHOPT instruction for  2 is because such an instruction already exists in the given execution trace.

Relaxing the Subformula Φ 𝑠𝑜
So far, our analysis assumes that STORE instructions in the given trace T are executed in the same order as they appear in the program.This is codified in the subformula Φ  .However, enforcing Φ  may prevent some bugs from being repaired.
An example has been shown in the ELSE-branch of Fig. 4. In addition to the durability property ( _ 1 <  ), the user also wants to satisfy the crash consistency property ( _ 2 <  _ 1 ).However, since the must-persist-before constraint ( _ 2 <  _ 1 ) contradicts with the happens-before constraint (_ 1 < _ 2 ) in Φ  , it is impossible to repair the bug.If we assume that Φ assertion correctly expresses the intended behavior, then we must relax the happens-before constraints in Φ  .
In our repair method, the solution is to enforce the subformula Φ  first.However, if this does not lead to a valid repair, we relax it.Toward this end, we first check if Φ  contains a constraint (_  < _  ) that contradicts the transitive closure of the must-persistbefore constraints imposed by the crash consistency requirement Φ  .If the answer is yes, then we remove the conflicting constraint from Φ  , and try again.
To summarize, whenever the must-persist-before constraints in Φ assertion contradict with the happens-before constraints in Φ  , we assume that Φ assertion is the intended behavior, and relax Φ  .

EXPERIMENTS
We implemented our method by using Z3 [7] to conduct the symbolic analysis described in Algorithms 1, 2 and 3. Our method takes an execution trace and a failed assertion as input and returns the repair as output.The known-to-be-buggy execution traces are generated using PMemCheck [19], although many other existing PM bug detection tools [18,19,22] can also be used to generate traces.

Benchmarks
Table 2 shows the benchmark statistics, including the name, the number of lines of C code (LoC), a short description, and the known PM bug type.These benchmark programs fall into two sets.The first set consists of programs with durability bugs.The first ten programs come from the Intel PMDK library.The last two programs are real applications: Memcached [4] is a high-performance object caching system, and Recipe [30] is a set of durable concurrent data structures for fast indexing.The durability bugs in these programs have been confirmed by prior work [37].The second set consists of programs with crash consistency bugs.The first twelve are unittesting programs for durable data structures implemented in the Intel PMDK library.These unit tests are created by Intel developers to illustrate various scenarios under which crash consistency bugs occur.The last two program are two real applications, including Memcached as well as Redis [3], which is a distributed key-value database.All of these crash consistency bugs have been confirmed by the developers.

Experimental Set-up
Since the only prior work on repairing PM bugs is Hippocrates [37], we focus on comparing our tool, PMBugAssist, with Hippocrates on all benchmark programs.Our experiments were designed to answer the following research questions.
• RQ 1: Is PMBugAssist more effective than Hippocrates in repairing the PM bugs?• RQ 2: Is PMBugAssist efficient enough for computing repairs for the benchmark programs?• RQ 3: Does PMBugAssist correctly compute repairs for the benchmark programs?
The experiments were conducted on a computer with AMD Ryzen 5 5600X CPU and 32GB memory, running Ubuntu 20.04.First, we present the experimental results that answer RQ 1.They are shown in the last two columns of Table 3.Here, the first two columns show the benchmark name and the length of the original execution trace T .The last two columns show the effectiveness of the two repair methods: PMBugAssist with Hippocrates.Here, the symbol ✔ means that the method can repair the bug, whereas the symbol ✘ means that the method cannot repair the bug.For each suggested repair generated, we manually inspect and compare it with the developers' fix and verify their correctness.The first twelve rows of Table 3 are benchmark programs with 23 confirmed durability bugs.The last fourteen rows are benchmark programs with 18 confirmed crash consistency bugs.The results in Table 3 shows that PMBugAssist was able to repair all of the 41 bugs, while Hippocrates was able to repair 22 of the 23 durability bugs and none of the 18 crash consistency bugs.
We also show, in Table 3, the CLFLUSHOPT+SFENCE instructions added and the time taken by the two methods.Overall, our method added either the same number of instructions or fewer instructions.For obj_mem, our method used significantly fewer CLFLUSHOPT instructions than Hippocrates (11+0 versus 210+0) because multiple STORE operations share the same cache line.For Memcached, our method used fewer instructions (9+1 versus 10+6) because SFENCE may be shared by multiple STORE operations.For pmemspoil, our manual inspection shows that Hippocrates's repair is actually incorrect-at least one CLFLUSHOPT must be added.
While our method takes more time since it conducts the additional semantic analysis of the modified program, this is needed to discover new repair strategies; in contrast, Hippocrates only applies the predefined repair strategy for durability bugs bug cannot repair crash consistency bugs.For pmem_memset-1, our method had a longer running time because the erroneous STORE residing in a loop showed up in the trace many times and thus slowed down our symbolic analysis.Overall, the time taken by our method is reasonable when compared to the alternative of relying on programmers to manually repair the bugs.Note that neither code size nor trace length is a reliability indicator of how hard the repair problem is.For example, although the majority of durability bugs have traces with more than 100K instructions, the repair problems are often simple, because each ( _  <  ) constraint involves only one STORE instruction   , and many instructions in the trace are unrelated and thus may be ignored during the analysis.In contrast, while the crash consistency bugs have shorter traces, they have more complex interactions between the _  and  _  variables and, as a result, have significantly larger search spaces.
For example, even with 10 to 30 instructions in the trace T , the total number of possible repairs in the solution space can be astronomically large (10! to 30!).This means that it is impossible for developers to enumerate the possible repairs manually.This is also the reason why SMT based symbolic analysis is needed.
Column 3 of Table 3 shows that our SMT based symbolic analysis is efficient in computing repairs.Except for obj_memops, all durability bugs were repaired in a few seconds.This is the case even for applications such as Memcached and Recipe, for which our repair method finished within 10 seconds.For obj_memops, it took 15 minutes because the program has a very large number of PM accesses and thus requires many SMT solver calls.For crash consistency bugs, our method finished within seconds except for pmreorder_8, pmreorder_flushes_1 and pmreorder_flushes_2.For pm-reorder_8, our method took longer because it went through more iterations in the while-loop, while adding 6 new PM instructions to the original execution trace (shown in Column 4) and reordering 4 instructions in the extended execution trace.For the last two benchmarks, pmreorder_flushes_1 and pmreorder_flushes_2, the reason is because there are more relevant instructions in the traces and more of these instructions need to be reordered to repair the bugs.
Our method also minimizes the number of SFENCE/CLFLUSHOPT instructions added (Section 3).For durability bugs, the results are as efficient as the repairs generated by Hippocrates.For crash consistency bugs (which cannot be handled by Hippocrates), the efficiency of our repairs is shown in Column 4 of Table 3.

Results for Answering RQ 3
To answer RQ 3, we inspected the repairs computed by our method to see if they are also correct for other traces.Recall that, since each repair is computed from a single trace, theory, there is no guarantee that the repair is correct also for other traces.However, our results show that for all the benchmarks in Table 3, our repairs are correct also for other traces.The reason is that our repair almost always resides in a local code block, such that the code block (basic block) is either executed in its entirety by a trace, or not executed at all by the trace.An example would be the THEN-branch (or the ELSE-branch) of an If-statement.It is extremely rare for the STORE instructions and the corresponding CLFLUSHOPT and SFENCE instructions to be separated into different code blocks.As a result, a trace executes either all or none of the instructions involved in our repair.
Table 4 shows how often this easy-to-check sufficient condition is satisfied in practice.Here, a repair is called Sequential when all instructions fall into a straight-line code block, called Branch In Scope when all instructions fall into a branch of an If-statement, and called Branch Out of Scope when some are in a branch but others are outside of the branch.For Sequential and Branch In Scope, correctness of the repair is guaranteed for all traces.
Table 4 shows that, for durability, 20 of the 23 repairs (86%) are Sequential and only 3 (14%) are Branch In Scope.For crash consistency, 16 of the 18 repairs (89%) are Sequential and only 2 (11%) are Branch In Scope.Whether a repair is Sequential or Branch In Scope can be checked automatically using static program analysis.

RELATED WORK
As we have mentioned earlier, Hippocrates [37] is the only existing PM bug repair tool, but is limited to repairing durability bugs.Our method, in contrast, can also repair crash consistency bugs.
Our work is complementary to existing, trace based PM bug detectors [6, 9, 10, 12-14, 31, 42].This includes, for example, PMem-Check [19] and Persistence Inspector [18], which are trace based PM bug detection tools from Intel, Pmreorder [22], which is an extension of the Intel tools for explicitly generating trace permutations, Yat [28], which is a framework based on hypervisor for testing persistency bugs on POSIX-compliant file system (PMFS [41]), and Chipmunk [29], which is a framework for testing PM file systems for crash-consistency bugs.
PMTest [33] is a tool that leverages user specified checking rules to compute the persistency time interval of STORE instructions, to decide if there are persistency violations.XFDetector [32] is a tool that automatically injects failures into the program and then replays the execution traces before and after failure, to detect cross-failure bugs.PMDebugger [8] is a also tool that leverages user-specified constraints to detect PM bugs.In addition, there are techniques for verifying the absence of PM bugs [27,40].
At a high level, our repair method is also related to techniques for repairing other software bugs.They include ExtractFix [11], which is a constraint-based semantic repair approach that leverages an execution trace and a crash-free constraint as input to generate candidate patches that satisfy the constraint, BugAssist [23,24], which repairs assertion failures in a sequential program, and Con-cBugAssist [26], which repairs failures in a multi-threaded program.Other similar repair techniques include SemiFix [39], Di-rectFix [35], and the method proposed by Malik et al. [34] for repairing data structures.There are also techniques for synthesizing and optimizing fences and synchronization primitives for concurrent programs [1,5,25,36].However, none of these existing techniques can repair PM bugs.
Our SMT solver based symbolic analysis is related to techniques used by existing tools for traced-based analysis to detect concurrency bugs such as data races and atomicity violations [16,43,[45][46][47], as well as symbolic analysis techniques for detecting information leaks through side channels [15,17].However, these techniques were designed exclusively for programs that use volatile memory, and thus cannot be used to detect or repair PM bugs.

CONCLUSIONS
We have presented a method for automatically repairing both durability and crash consistency bugs in application software that leverages byte-addressable persistent memory.Our method relies on a novel SMT based symbolic analysis to first identify the valid and yet buggy executions allowed by the program, and then remove these executions through iterative addition of blocking constraints.Due to the efficiency of the symbolic analysis over explicit enumeration, our method is able to explore possible repairs in a large solution space quickly.Our experiments on a diverse set of benchmark programs show that the proposed method is significantly more effective in repairing PM bugs than the state-of-the-art approach.

Figure 4 :
Figure 4: Execution traces of the program in Fig. 2, with a durability bug in ELSE-branch (write to header->counter may never show up in PM) and a crash consistency bug in THEN-branch (write to records[i].name may not persist before write to records[i].valid).

Algorithm 1 : 3 ifFigure 5 :
Figure 5: Ordering constraints for THEN-branch: modifying the program by swapping the two clflushopt instructions will not fix the crash consistency bug.

Figure 6 :
Figure 6: Two repairs for the bug in THEN-branch of Fig. 4:The first repair is incomplete since it adds a new durability bug for  2 ; the second repair is complete because it removes the new durability and original crash consistency bugs. .

Figure 10 :
Figure 10: Illustrating the repair computation and validation.

The repaired writer() in the example program.
uses header->counter to decide whether to read records[i], and then uses the value of records[i].valid to decide whether to read records[i].name and records[i].addr.Thus, the correct persistency order, which must be enforced by writer(), is that both records[i].name and records[i].addr

Table 2 :
Statistics of the benchmark programs.

Table 3 :
Results of the experimental evaluation.
Now, we present the experimental results that answer RQ 2. There are two parts.The first part is shown in Column 2 of Table 2, which reports the program size.It shows that PMBugAssist is able to handle programs with reasonably large code sizes.For example, both Memcached and Recipe have more than 20K lines of C code.The second part is shown in Column 2 of Table 3, which reports the length of the execution trace.It shows that PMBugAssist is able to handle reasonably long execution traces.

Table 4 :
The type of code block that our repair belongs to.