ProveNFix: Temporal Property-Guided Program Repair

Model checking has been used traditionally for finding violations of temporal properties. Recently, testing or fuzzing approaches have also been applied to software systems to find temporal property violations. However, model checking suffers from state explosion, while fuzzing can only partially cover program paths. Moreover, once a violation is found, the fix for the temporal error is usually manual. In this work, we develop the first compositional static analyzer for temporal properties, and the analyzer supports a proof-based repair strategy to fix temporal bugs automatically. To enable a more flexible specification style for temporal properties, on top of the classic pre/post-conditions, we allow users to write a future -condition to modularly express the expected behaviors after the function call. Instead of requiring users to write specifications for each procedure, our approach automatically infers the procedure’s specification according to user-supplied specifications for a small number of primitive APIs. We further devise a term rewriting system to check the actual behaviors against its inferred specification. Our method supports the analysis of 1) memory usage bugs, 2) unchecked return values, 3) resource leaks, etc., with annotated specifications for 17 primitive APIs, and detects 515 vulnerabilities from over 1 million lines of code ranging from ten real-world C projects. Intuitively, the benefit of our approach is that a small set of properties can be specified once and used to analyze/repair a large number of programs. Experimental results show that our tool, ProveNFix, detects 72.2% more true alarms than the latest release of the Infer static analyzer. Moreover, we show the effectiveness of our repair strategy when compared to other state-of-the-art systems — fixing 5% more memory leaks than SAVER, 40% more resource leaks than FootPatch, and with a 90% fix rate for null pointer dereferences.


INTRODUCTION
Finding temporal logic property violations is typically accomplished by well-known reactive system verification methods like model checking [7].The common restrictions of model checking are that it assumes that all the procedures used are available, it usually handles bounded state spaces, and it suffers from the "state explosion problem" [8].Beyond model checking, testing [25] or fuzzing [26] approaches have also been applied to software systems capturing real-world implementations to find temporal logic property violations.While effectively finding bugs, they rely on higher-quality test suits or longer execution times to achieve better code coverage.Static analyzers can help mitigate the above problems, but no existing static analyzer is specifically designed for analyzing temporal properties.To allow systematic code coverage and effectively finding bugs, we are interested in developing the first compositional temporal-property-based static analyzer, where each part of the program is analyzed locally and independently of the global context.It is known that breaking the large analysis of a whole program into small analyzes of its procedures gives us the ability to scale independently of the size of the analyzed code [35].Here, we highlight three main difficulties in building a compositional static analyzer to detect temporal property violations: (1) To check program behaviors against given properties, existing works [7,25,26] rely on inclusion checkers from deterministic finite automaton.However, the automata-based approach is not only complex but also prevents the analysis from being modular 1 due to the lack of high-level compositional patterns for hierarchical design; (2) Having all the procedures in question, the formal specifications (abbreviated using specs from here) of each procedure in question is not always available, and writing specs for them is unnecessarily tedious and challenging.It is worth mentioning that it is not new for static analyzers to automatically generate specs.For example, the Facebook's Infer tool [5] utilizes bi-abduction [23] to infer pre/post specs from bare code, given the specs for the primitives at the base level of the code.Hence, the human does not need to write pre/post-conditions for all the procedures, which is the key to achieving a high level of automation.However, simply developing a bi-abduction for temporal properties is not sufficient for temporal property analysis; (3) The classic pre/post-conditions only provide constraints for behaviors before the function call, and behaviours expected from the current function call, respectively, but we cannot easily express behaviors after executing the function call, such as temporal constraints like "opening a read-only file should not be followed by any writing operations" or "some meaningful operations can only happen if the return value of loading the certificate is positive".
To solve the above-mentioned problems, this paper first introduces future-conditions to express constraints for behaviors after the function calls have finished.Together with the pre/postconditions, a triplet style spec modularly and expressively captures a usage protocol for functions in concern.We further propose a compositional temporal property static analyzer, which automatically infers specs for each procedure and utilizes a term rewriting system as the back-end solver for proving temporal logic formulae inclusions.
Moreover, the proposed future-condition and the compositional analysis also advance automated program repair of temporal property violations.Program analysis-based repair has been previously shown to be effective in fixing various bugs.For example, FootPatch [41] deploys Infer [5], and fixes bugs related to resource leaks, memory leaks, and null dereferences, using templated repairs based on separation logic.MemFix [24] deploys a typestate analysis for small programs, fixing memory bugs, including memory leak, use-after-free, and double-free.The state-of-the-art tool SAVER [17] has targeted the same set of memory bugs and has supported the generation of conditional patches by constructing a full object flow graph for each given program.However, existing techniques either only support templated patches by inserting statements [41], or they cannot handle the generalized bug types such as unchecked return values and customized allocators/deallocators [17,24].In this work, we leverage the expressiveness of temporal logic and derive a bi-directional fault localization that is steered by future-conditions to compose safe patches for various bug types. 1 Each procedure declaration is only analyzed once and procedures can be replaced by their already verified properties.Putting it all together -this paper presents ProveNFix, a static analyzer guided by temporal properties and supported by a proof-based repair strategy for fixing detected bugs where possible.Our goal is to automatically detect and fix bugs from some classes of program bugs and enable a modular and program-independent analysis/repair.An extensive evaluation shows that ProveNFix can fix various bug types, including null pointer dereferences, resource leak, and memory bugs.In total, ProveNFix detects 515 and repairs 492 bugs for a more than 1 million lines of code benchmark within 15 minutes, which outperforms SAVER and FootPatch regarding the execution time and fix rate.Besides, ProveNFix can fix other bugs, such as unchecked return values and customized temporal bugs.Our main contributions are summarized as follows: • We propose a novel feature, called future-condition, and formalize a modular and practical program analysis engine to effectively detect bugs with the help of minimal spec annotation.

ILLUSTRATIVE EXAMPLES
This section presents a few examples to show the core idea and benefits of our approach.As an example demonstrated in Fig. 1, we write pre/post/future-conditions for the key functions and primitive APIs.Each spec Φ contains a set of tuples, i.e., (∧ ), where each disjoined cases has a pure formula  for the arithmetic constraints and a trace formula  for the temporal constraints.We use ← − F , G, F to denote the temporal operators for "past-time finally", "globally" and "finally", respectively.
The precondition of free says that before freeing the (non-null) input pointer ptr, it has to have an event malloc(ptr) that occurred in its past history.We use (→ ) as a short-hand for (∧ ∨ ¬∧_ ★ ).Its postcondition says if ptr is null, then the function does nothing, i.e., ; and if it is not null, its post-condition captures an event free(ptr).Its future-condition enforces that after freeing it and within its lifetime, globally the identifier cannot be used by any events, i.e., G (!_(ptr)), which is used to prevent null pointer dereference, double-free and use-after-free.Similarly, the precondition of malloc requires its input argument size to be positive, and it can be called at any point of the execution, i.e., _ ★ .Its postcondition states that, when a pointer is successfully allocated, its postcondition captures an event malloc(ret), where ret denotes the return value.Lastly, its future-condition enforces that the allocated pointer should be finally freed, which is used to prevent memory-leak.Although simple, these six lines of specs already cover the major memory usage bugs.
The above reasoning rule for function calls captures the essence of having future-conditions, where proof obligations are highlighted .Traditional Hoare-style forward reasoning rule for function calls works roughly as follows: it retrieves the callee 's spec from the environment E and if the current program state Φ entails the callee's instantiated precondition [ * / * ]Φ pre , it obtains the instantiated postcondition to be the extended program state, i.e., Φ ′ post .Now, having the future spec, we extend the rule with one more proof obligation: the behavior of 'e', i.e., Φ  , entails the callee's instantiated future-condition.Each proof obligation enforces constraints for code segments, here for the code before the current call and for the code after the call, i.e., 'e'.

Specification inference and interprocedural analysis
Many existing tools [24] [17] perform program analysis via the call-strings technique [35], which blends interprocedural flow analysis with the analysis of intraprocedural flow, turning a whole program into a single flow graph.In this way, they split the interprocedural analysis into a preanalysis phase, which gathers overestimated information about each procedure and followed by a global intraprocedural analysis.We decide to target a more accurate analysis by viewing procedures as collections of structured blocks and aim to establish input-output relations via the pre/post/future specs -where procedures can be replaced by their verified properties.This approach relates closely to most of the known techniques for program verification, and has the advantage of being rather simple and potentially admitting efficient and scalable implementations.Specification inference.Based on the primitive specs defined in Fig. 1, spec inference allows us to generate specs for bigger code blocks, which make use of the malloc and free primitives.Here, we use the examples shown in Fig. 2, to demonstrate how ProveNFix propagate futureconditions for customized memory allocations in different scenarios.For the first two cases, i.e., wrap_malloc_I and wrap_malloc_II, the future-conditions have been associated with their input and return pointers, respectively; both contain disjunctive cases to distinguish the behaviors depending on whether the memory is successfully allocated or not.For the third case, the future-condition for wrap_malloc_III is no longer a disjunctive form because the program terminates when the pointer is null, and in case it returns, it must be a non-null pointer.For the last case, it dynamically allocates memory to a pointer, but there is no deallocation within its lifetime, so ProveNFix reports a bug in this case and generates a conditional patch to fix the bug.In the meantime, wrap_malloc_IV does not have any meaningful future-condition, i.e., it allows anything to happen afterward.
Interprocedural analysis.Fig. 3 presents a more involved example, drawn from the prior work [24] and demonstrates a common failure point in existing repair techniques.Due to the possible aliasing between q and p.f by calling foo at line 10, there is a possible double-free error at line 12.We handle this program by first generating the post and future-condition for foo, shown as follows (assuming there is sufficient heap and malloc will never fail):  + if (p->flag) free(q); 12 free ( p .f ) ;} // double -free Fig. 3. Fixing a double free bug [24].
Next, when we reason about the main procedure, the future-conditions accumulated by the malloc at line 8 and by calling foo at line 9 enforce that the behaviors of lines 10-12 should satisfy the following spec: Then, after the free statement in line 10, free's futurecondition, i.e., true ∧ G(!_(q)), is violated by the second free statement in line 12, i.e., (!p->flag ∧ q=p->f) ∧ free(p.f)@12̸ ⊑ true ∧ G(!_(q)); therefore, ProveNFix detects there is an error.When this assertion failed, the bi-directional constraint propagation computes the spec for line 10 to be: Φ 10 = p->flag ∧ F free(q).Finally, guided by Φ 10 , ProveNFix synthesizes a patch by deleting the free statement at line 10 and inserting a conditional free statement at line 11, which fixes this error.Reducing false positives.We use the resource leak example in Fig. 4, detected in the Swoole project [32], to show that future-conditions help to reduce the false positives in practice.The first leak happens at line 6, where the code returns without releasing the socket.The repair is simply to insert a close statement at line 5 before returning.However, Infer reports another leak at line 9 because the program never releases the socket.This is a false positive because of the assignment at line 8, and the program is safe as long as 'swPort_listen' or the caller of 'swReactorProcess_reuse_port' releases 'ls->sock' in the future.In our approach, ProveNFix manages to generate the specs for swPort_listen, which does not release the socket; therefore, we generate a future-condition for swReactorProcess_reuse_port, defined as follows: ret≠SW_ERR ∧ F (close(ls->sock)), pushing the obligation to close the resource to its callers, avoiding this false positive.Many functions' return value indicates the success of their actions, which alerts the users whether or not to handle any bugs caused by that function.Ignoring the return values can cause the program to overlook unexpected states and conditions, leading to a crash or other unintended behaviors.One instance of unchecked return values (URV) is known as null pointer dereference (NPD).For example, as shown in Fig. 5, the function host_lookup takes an IP address, verifies that it is well-formed, and then looks up the hostname and copies it into a buffer2 .If an attacker provides an address that does not resolve to a hostname, the call to gethostbyaddr, line 4 will return null.Since the code does not check the return value, a null pointer dereference would occur in the call to strcpy.While there are no complete fixes aside from conscientious programming, one potential mitigation is annotated in line 5, which inserts a conditional statement to exit if hp is null.We detect and repair such NPD bugs by having the following primitive spec:

Handling generalized bug types
which restricts that if the return value ret is null, globally ret cannot be used as a parameter in any events; and we let each pointer dereference generate a "deref" event; here, hp->h_name generates "deref(hp)".With these specs, ProveNFix automatically detects NPD bugs and generates patches, as provided in line 5. Notably, Infer cannot detect this bug because it does not support any inputs for primitive specs.Without loss of generality, the URV bugs we can handle are not limited to NPDs but also various other APIs used in different contexts, such as: fgets, returnChunkSize and pthread_mutex_lock etc.Furthermore, other than APIs from standard C libraries, the URV errors also often show in the applications of internet-facing protocols.Fig. 6 presents a fix of an issue raised in the keepalived project 3 , where function SSL_new() returns a pointer to an SSL object on success or null on error.However, the code shown didn't check the return value properly.There are other URV errors caused by not having any handlers, such as the example shown in Fig. 7 detected in the sslsplit project 4 .For such cases, our approach can produce patches by inserting temporary variables, here t0, which cannot be achieved by any existing repair tools.Experimental results in Sec.7.5 show that ProveNFix can be used to generate specs for these APIs from the source code, and these specs can be used to analyze the applications.System overview.Fig. 8 shows the system overview, where the inputs and output of the system are directed by the fat arrows.ProveNFix takes a target program and a set of primitive specs written in IntRE; produces a bug report and safe patches at the end.Our technical contributions are captured in the rounded boxes: a Hoare-style forward reasoning which infers specs for procedures and generates constraints for code segments; the bi-directional constraint propagation for the buggy code segments; and the deductive synthesis to derive source code patches from existing ingredients and the spec pool.We use ⊑ 5 to denote the proof obligations between two IntRE formulae, and ̸ ⊑ to denote the failed assertions.The workflow of ProveNFix is as follows:

SYSTEM OVERVIEW AND LANGUAGES
(1) For each procedure, a set of well-defined forward reasoning rules (presented in Sec. 4) summarize the actual behavior of its body using IntRE formulae.In particular, they generate temporal constraints for program segments, dynamically aligning with the existing specs; in the meantime, they infer specs for the current procedure and add them into the spec pool; (2) Assertions between actual behaviors against their specs are represented using proof obligations between IntRE.The proving is discharged by a back-end term rewriting system (presented in Sec.5), which is an extended inclusion checker for regular expressions.While proving the inclusions, proof obligations for arithmetic constraints are discharged by the Z3 solver [11]; (3) Then, if any of the inclusions fails, it will be fed into the novel bi-directional spec propagation (presented in Sec.6.1), to compute the spec for the core buggy code segments; (4) Lastly, we use the deductive synthesis (presented in Sec.6.2) to generate source-code patches.
The patches that ProveNFix can generate include inserting/deleting code blocks, conditional patches, and inserting temporary variables for unhanded return values.
Our approach has two principal benefits: the constant effort of spec annotation without restricting the bug types and the highly reduced search space for patch generation.Intuitively, the events of our interests are prescribed in the primitive specs, and after the spec inference, the search space is effectively pruned to the code ingredients, which would generate effectful events.We show the experimental results in Sec. 7, and conclude in Sec. 9.
Target language.We target an imperative, well-typed, call-by-value core language, defined in Fig. 9.A program P comprises a list of primitive specs spec * , and procedure declarations proc * .Here, we use the * superscript to denote a finite list of items, for example,  * refers to a list of variables,  1 , . . .,   .Each procedure has a name nm, formal arguments  * , and an expression-oriented body 5 The inclusion relation between two IntRE specs Φ 1 ⊑ Φ 2 is formally defined in Definition 5.
Proc.ACM Softw.Eng., Vol.IntRE, the spec language.As defined in Fig. 10, IntRE, standing for Integrated Regular Expressions, denoted by Φ, contains a set of tuples (or conditioned traces per se) including disjointed pure formulae  and their corresponding event sequences  .Traces comprise false (⊥); empty traces ; singleton events I; sequences concatenations  1 •  2 ; disjunctions  1 ∨  2 ; and the arbitrary times (zero or many, possibly infinite) repetition of a trace, constructed by a Kleene star  ★ .Singleton events are: parameterized events A(); events with a specific label A(_); negation of parameterized events !A(); negation of events which make use of the value , i.e., !_(); the wild card _ matching to all the events; and event conjunctions I 1 ∧ I 2 .

(IntRE)
Φ ::= We use  to denote a pure formula which captures the simplified (decidable) Presburger arithmetic conditions on program inputs and local variables, where  and  denote true and false respectively.A term can be a simple value , or simple computations of terms,  1 + 2 and  1 - 2 .
It is proven that the expressive power of regular expressions subsumes the classic linear temporal logic (LTL) formulae [43].While ProveNFix allows both syntaxes, the underlying reasoning and proving are formalized using IntRE.We deploy a standard translation from LTL formulae to IntRE, shown in Appendix A.

FORWARD REASONING AND SPECIFICATION INFERENCE
This section addresses the details of constraint generation for code segments and spec inference for procedures; while they happen simultaneously, we present them separately for clarity.

Constraint generation and bug reporting
Together with the rule [ -Call] presented in Sec.2.1, Fig. 11 formalizes a set of syntax-directed forward rules for the target language.Our Hoare-style reasoning is in the form of E, C ⊢ {Φ}  {Φ ′ }, where E is an environment, initialized by the given primitive specs and extended with the inferred specs along the way; C denotes the current procedure being analyzed.E and C are omitted when not needed.The meaning of the relation is: if Φ describes the behaviors triggered before executing , then by executing , Φ ′ describes the extension of the traces that will be triggered.Notice that, the post-states of the basic rules, [ -Return] and [ -Assign] are associated with a completion code ∈{0, 1}, where when =0, the reasoning can proceed; when =1, the current procedure returns.Each completion code is initialized using 0, and only the return statements update it to 1 as delimiters, marking the end of the local procedure.Anything concatenated after states with non-zero completion codes will be abandoned.The completion code is essential when the compositional rules come in, such as [ -Seq].Starting from a pre-state Φ, rule [ -Seq] firstly computes the behavior of  1 , denoted by Φ 1 ; then reasons about  2 with the extended pre-state, i.e., Φ • Φ 1 ; lastly the final result is a concatenation of Φ 1 and Φ 2 .The concatenation between two singleton program states is formally defined in Definition 1.
Definition 1 (Program state Concatenation).Given two singleton program states Φ 1 = ( 1 ∧  1 ,  1 ) and Φ 2 = ( 2 ∧  2 ,  2 ), we define: Next, rule [ -While] computes the behaviors of loops by unfolding the loop body  many times, i.e.,  ★ ; and in practice, ProveNFix unfolds loops once to balance precision and efficiency.Rule [ -If -Else] computes the behaviors from both branches by enforcing the pre-state with constraints  and ¬, respectively; then, it disjunctively unions the results.Rule [ -Local] creates an existential quantifier for the local variable .For assertions parameterised with Φ pre and Φ future , rule [ -Assert] creates proof obligations (as highlighted ) for the precondition checking and the future-condition checking regarding the behavior of the rest of code.The proving of proof obligations is discharged by a term rewriting system, presented in Sec. 5.Each failed assertion is reported as a bug, and in this paper, we are interested in finding the true bugs, defined in Definition 2.
Definition 2 (Manifest True Bug [23]).There exists a path from the local procedure declaration that leads to the bug, and for any value of the input, the bug occurs.
Unchecked return value without handlers.As the example demonstrated in Fig. 7, there are cases where unchecked return values are caused by having no handlers, also, the futurecondition contains the pattern ( ∧ G(!_(ret))), indicating an error state where the purpose of calling the function has failed, and the return value should not be used.The rule we present here, [-Call-Handling] (as a special case for [ -Call]) is designed for such cases.It first generates a fresh variable to be the temporary handler, here t; then, as usual, checks the precondition, instantiates the postcondition.Lastly, it synthesizes the handling code based on the future-condition, and inserts the code as a patch.The synthesize algorithm is presented in Sec.6.2.
Moreover, in the rule [ -Assert] (the spec inference version of [ -Assert]), if the inclusions only hold when there exist frames Φ ′ pre and Φ ′ future needed, we add the frames as the inferred pre/future-conditions of the current procedure C into the environment.The exact inference applies to the inclusion checkings in rule [ -Call] as well, and we omit it here for simplicity.
Informally, proving the IntRE inclusion is to check that for all the disjoined cases specified in the consequent, there always exists a compatible case from the antecedent to form a valid trace inclusion.Then, when the consequent cannot find a witness from the antecedent, we trigger the patch generation process to synthesize conditional statements accordingly.
Therefore, we have the following top-level rules to decompose the disjunctions.In particular, [TRS-Base] triggers the including checking between traces by initialising the proof hypotheses using ∅, cf.Sec.5.2.Rule [TRS-Missing-Case] triggers the repair process by synthesizing code that satisfies the required spec, cf.Sec.6.2. [TRS-Base]

Auxiliary functions
To facilitate the inclusion rules in Sec.5.2, we provide the definitions and core implementations of the deployed auxiliary functions:6 Nullable(), First(fst) and Derivative(D) respectively.Informally, the Nullable function  ( ) returns a Boolean value indicating whether  contains the empty trace; the First function fst ( ) computes a set of possible initial events from  ; and lastly, the Derivative function D I ( ) eliminates an event I from the head of  and returns what remains.The subset relation I ⊆ J means that, the set of events in I is a subset of the set of events in J.
Definition 6 (Nullable).Given any sequence  , we recursively define  ( ) as follows: (false for unmentioned constructs) Definition 7 (First).Let fst ( ) be the set of initial events derivable from sequence  .
Definition 8 (Partial Derivative).The partial derivative D I ( ) of trace  w.r.t. an element I computes the effects for the left quotient, I -1  , defined as follows:

Inclusion rules
Given the well-defined auxiliary functions above, we now present the key rewriting rules in Fig. 12 deployed in inclusion proofs.During the rewriting process, the inclusions are in the form of , Γ ⊢  1 ⊑  2 , a shorthand for: Here  is the history traces from the antecedent that have been used to match the traces from the consequent;  is the path constraint and Γ is the proof context, which contains a set of inclusion hypothesis.Rule [Prove] is used when the antecedent has no first elements.Rule [Reoccur] is to prove an inclusion when there exist inclusion hypotheses in the proof context Γ, where we can soundly prove the current goal.One of the special cases of this rule is when the identical inclusion is shown in the proof context; we then prove it valid.Rule [Unfold] is the inductive step of unfolding the inclusions.Firstly, we make use of the auxiliary function fst to get all the possible initial events from the antecedent,  .Secondly, we obtain a new proof context Γ ′ by adding the current inclusion, as an inductive hypothesis, into the current proof context Γ. Thirdly, we iterate each event I ∈  and compute the derivatives (next-state formulae) of both the antecedent and consequent with respect to I. The proof of the original inclusion succeeds if all the derivative inclusions succeed.
[Prove] There are two possible failing cases.Rule [Dis-Nullable] is a heuristic refutation step to disprove the inclusions early when the antecedent evidently contains more traces than the consequent; and here, the nullable function  witnesses the empty trace.Rule [Failed-Unfold] captures the situation where there exists an initial event I from the antecedent such that eliminating I from the consequent leads to false.When such failed assertions occur, we use ConstrProp to propagate the constraints for the core buggy code (cf.Sec.6.1), which intakes the path constraint and the failed inclusion.
Termination of the TRS is guaranteed because the set of derivatives to be considered is finite, and possible cycles are detected using memorization, i.e., Γ.The term rewriting for regular expression is proven to be sound and complete [2], and prior TRS-based works [1,3,18,21,[36][37][38][39], suggest that TRS is a better average-case algorithm than those based on the comparison of automata, by avoiding the complex translation process and disproving invalid inclusions earlier.

CONSTRAINT PROPAGATION AND DEDUCTIVE PATCH SYNTHESIS
The repair process incorporates two main components: (1) the constraint propagation when there are failed inclusions, and (2) the code synthesis when we should insert extra code with a given spec.In particular, the constraint propagation triggers the synthesis process after it extrapolates the expected spec for the buggy segments.In other words, (1) deletes code while (2) inserts code.

Bi-directional constraint propagation
From the previous section, we can now automatically prove or disprove given inclusions.Here, we are concerned with if an invalid inclusion exists; how to safely uncover the core buggy code, i.e., the smallest program segment which leads to the failed inclusion, and derive the expected spec for it.
Intuitively, having the path constrain  and the failed inclusion  1 ̸ ⊑  2 , we do a backward matching to "sandwich" the buggy segment in the middle.As shown in the rule [Constr-Prop], if there exists a frame trace   , such that   serves as a common prefix of the reversed traces of  1 and  2rev( 1 ) and rev( 2 ) respectively -and their postfixes form an invalid inclusion,  ′ 1 ̸ ⊑  ′ 2 ; then it passes the spec  ∧ rev( ′ 2 ) to the code synthesis process.More specifically, rev( ′ 1 ) represents the current behavior of the core buggy segment, which failed to entail its intended spec.Therefore, the patch is generated in the way of firstly deleting7 the code represented by rev( ′ 1 ), and secondly inserting the code which is synthesized based on the intended behavior, i.e.,  ∧ rev( ′ 2 ).The reverse function is defined in Definition 9. Definition 9 (Trace Reversing).Given a trace  , its reversed trace is defined as follows: Comment.Our repair strategy is more generic as it supports the deletion and insertion of code blocks instead of only supporting insertions [41] or only supporting single lines' repair [17,24].

Source-level patch synthesis
Given the intended behavior  ∧  target , the synthesis function searches through the environment E -including both primitive specs and inferred specs, and composes an expression   in the target language grammar -that effects the following state transition: E ⊢ { ∧ }   { ∧  target }.

Algorithm 1 Algorithm for the Deductive Synthesis
Require: E, ( ∧  target ) Ensure: if  target =  then return if  then  acc else () // there exist a set of program variables  * 6: acc =  acc ; ( * ) 8: end if 9: end for 10: return without any suitable patches We present a deductive synthesis algorithm in Algorithm 1.The procedure takes the target spec as the input; recursively searches through the environment E; and finally returns a source code block   .
In line 1, it initializes the accumulator  acc using unit.From line 2, it iterates E; and for each procedure signature, denoted by ( * ), it tries to exploit if a function call to  forms a progressive step towards the target trace.The base case is presented in line 3, where the target trace is , indicating no more synthesis obligations are needed.In this case, the procedure returns a conditional statement if  then  acc else ().Otherwise; we try to find a set of program variables  * to instantiate the generic postcondition of ( * ), and obtains the next-step target trace  ′ target by subtracting 's postcondition from the head of  target .If the subtraction does not lead to false, in line 7, it extends the current patch accumulator with a function call to nm, parameterized with  * .Lastly, in line 10, if there are no satisfactory expressions after iterating all the ingredients from the program, the synthesis terminates with no patches.
Limitations.Since the ingredients in E are only procedure signatures, the synthesis only composes function calls.To mitigate this, we ought to dynamically extend the environment with mappings from statements to their effects.The current synthesis only uses the postconditions; therefore, inserting the function calls could lead to failed assertions introduced by these calls.Nevertheless, our approach generates sound patches, which resolve the targeted violated entailments, i.e., the current violations will no longer exist.While it is not trivial to have a repair without re-verification, we take it as a future work.Nevertheless, the following section shows that the current design serves a promising experimental result.

IMPLEMENTATION AND EVALUATION
We prototype our proposal into a program analysis and repair tool ProveNFix, using approximately 5,000 lines of OCaml code, leveraging on the AST structures produced by the Infer front-end.Our implementation includes a lightweight parser, which inputs the user-defined temporal specs.To show the effectiveness of our approach in analyzing and fixing a wide variety of bugs, we design the experimental evaluation to answer the following research questions (RQ): • RQ1: What is the effectiveness and efficiency of ProveNFix compared to Infer? • RQ2: What is the performance of ProveNFix in fixing memory usage bugs compared to SAVER, and fixing resource leaks compared to FootPatch?• RQ3: Can ProveNFix automatically find/fix generalized temporal bugs, such as unchecked return values and properties involving execution orders, that prior works either cannot find or fix? • RQ4: Is it practical to use ProveNFix as a specification inference tool to find desirable behaviors of internet-facing protocol implementations?
We ran experiments on an Ubuntu 22.04 LTS server with Intel Xeon E-2278G CPUs and 20GB of RAM.The source code and evaluation benchmarks are openly accessible from [46].

Primitive spec annotation
To facilitate the RQ 1-3 for comparison with older results, Table 1 presents the minimal efforts (in 54 lines of code) we took for annotating the temporal specs.The marks ✓ and ✗ in columns Pre, Post, and Future represent if the corresponding spec is needed for the given API.Notably, for different types of bugs, the need for annotations may vary; for example, malloc requirements pre/post/future conditions for memory bugs but only needs a future-condition for null pointer dereferences.We show the detailed specs for these 17 APIs in Appendix B.  We use the same dataset as SAVER [17] to evaluate ProveNFix, which raise at least one error by running the latest release of Infer (v1.1.0,released on Mar 26, 2021).There are ten projects in total, and the basic information of the benchmarks is shown in the first two columns of Table 2.

RQ1: Comparison with Infer
As shown in Table 2, we compare the performance in detecting bugs between ProveNFix and Infer-v1.1.0.For each column, we manually classified the alarms into true and false positives.For example, for the Swoole project, Infer found 37 null pointer dereferences, which includes 30 true positives and 7 false positives.In comparison, ProveNFix found the same 30 true positives and 23 more true positives (i.e.actual null pointer dereference bugs).In addition, Infer found 20 memory leak bugs in the same project, including 16 true positives and 4 false positives; and out of the 16 true positives, ProveNFix found 12 of them (missing out 4 true alarms that Infer has found), and 16 more true positives.In total, Infer found 299 (166+107+26) true bugs, and ProveNFix found 249 (137+85+27) of them.In addition, ProveNFix found 266 (124+113+29) more true bugs, finding 72.2% ( 249+266−299 299 ) more true bugs, with a 17% ( 299−249 299 ) loss of missing true bugs (w.r.t.what Infer could find).We discuss all the bugs reported by ProveNFix in Appendix C.
Infer uses heuristics when processing failed proofs and bug patterns to reduce the false positives, which leads to the loss of many true bugs.This is intuitively why ProveNFix manages to find significantly more bugs.Moreover, due to the limited support for global variables, ProveNFix chose to miss out on some true bugs to keep a low false positive rate, cf. an example in Fig. 13.

RQ2: Comparison with SAVER and FootPatch
To compare the repair abilities between ProveNFix and the state-of-the-art, experimental results are shown in Table 3.The right-hand side of the table records the performance of SAVER and FootPatch, done on a virtual machine running Ubuntu-16.04.Both of them are built on top of an older version of Infer (v0.9.3, released on Sep 22, 2016).Upgrading the Infer version of SAVER and FootPatch requires too much effort; hence, we choose to report FootPatch/ SAVER results based on the older version of Infer.Given Infer's bug report, SAVER obtains a 73.7% fix rate for fixing true memory errors, and FootPatch obtains a 60% fix rate for true resource leaks.The left-hand side of the table shows the repair results from ProveNFix, which has a 90% fix rate for null pointer dereferences, 79% fix rate for the memory leaks and 100% fix rate for the resource leaks.The correctness of auto-generated patches are confirmed by human validation.
Although the evaluation is based on different sources of static analysis, the true bugs recorded in the right-hand side table are fully subsumed by the left-hand side.We compare the fix rates and find that ProveNFix fixes 5% more memory leaks than SAVER and 40% more resource leaks than FootPatch.Besides the higher fix rate, SAVER needs a significant pre-analysis time to construct the whole object control flow, e.g., up to 26.3 seconds for the flex project and 39.5 minutes for the snort-2.9.13, while ProveNFix only takes several minuets to generate all the patches, averaged 0.25 seconds ( 261+174+57 2 2 ) per patch.ProveNFix can also find and fix generalized temporal bugs.

RQ3: Capturing general temporal bugs
Here, we use double-free (DF) errors as a case study to show the effectiveness of ProveNFix in finding and fixing bugs involving execution orders, and leave the case study for unchecked return values in Fig. 7.5.The benchmark shown in Table 4 is taken from SAVER [17].SAVER usually relies on other bug detectors to generate bug report, before their repair.However, there were no suitable bug detectors for DF.Hence, among the ten projects shown from Table 2, SAVER manually records the DF errors by inspecting commit fixes by developers from open-source projects.Note that, Table 4 does not show the exact versions because these bugs are coming from different commit versions of their projects (lxc, p11-kit, and grub).Therefore, this experiment is done by injecting these errors into the version we used in Table 2.
All the true bugs reported in Table 4 can be correctly fixed by SAVER and ProveNFix, respectively.In total, SAVER fixed four true bugs while ProveNFix automatically found the same four bugs and two more true bugs, with two false positives.The two more true bugs that were missed out before are located in the same functions as the recorded bugs.The fixes provided by the developer were to remove the free statements in the recorded locations, which happened to solve the unrecorded bugs that ProveNFix discovered.Still, the extra true bugs that ProveNFix discovered seem to have never been spotted/recorded by people.The two more false positives that ProveNFix generated are caused by unrolling loops only once and the complex aliasing/re-assignment, which exposes the limitations of our analysis.Nevertheless, the false positive rate of ProveNFix is still reasonably low.It takes ProveNFix 3m 4s to find and fix these bugs.All these bugs are documented in Appendix D.
7.5 RQ4: Effectiveness of specification inference for real-world programs Our approach can generate specifications for real-world programs.We demonstrate the effectiveness of ProveNFix's specification generation, on an internet-facing protocol implementation OpenSSL [29].Correct usage of Secure Socket Layer (SSL) APIs is required to satisfy certain constraints, such as call conditions or call orders.Violations of these constraints will lead to severe security implications.For example, missing error status code validation of SSL APIs will cause a denial of service by remote attackers (CVE-2016-2182 [10]), and broken SSL certificate validation [13] will result in man-in-the-middle attacks [9].Here, we focus on the bugs caused by URV.
There are a lot of security-sensitive URV bugs in OpenSSL applications [14].Table 5 presents several real-world bugs that have been confirmed, fixed, and merged into the master branches.To detect and fix those bugs, we provide 2 predefined primitive specs and rely on ProveNFix to generate specs for all the exposing APIs by analyzing OpenSSL-3.1.2source code (3792bd7, 556.3 kLoC).Then, these automatically generated specs are used to analyze the projects shown in Table 5. Experimental results show that ProveNFix successfully detects and repairs almost all the reported bugs -one failure case is caused by no spec generated for 'BN_set_word' -within 1 minute per project, demonstrating the effectiveness of ProveNFix in inferring correct procedure specs.Moreover, the generated specs do not leak the implementation details of the APIs apart from revealing the input-output relations, as it omits all the intermediate variables by annotating them as existential.Detailed configuration and all the inferred specs are recorded in Appendix E. Fig. 13 presents a code snippet simplified from the Swoole project, where the function passes the pointer "ptr" to a global object "swoole_objects".For such cases, ProveNFix infers the postcondition with a special event "consume(ptr)", which, by design, entails any event, such as free(ptr) or close (ptr).As ProveNFix is designed for modular reasoning, without capturing the global states, it assumes global variables are well-managed, i.e., the traces of using them satisfy all the possible constraints.As a practice of systematically -instead of using heuristics -strengthening the inferred postconditions, having consume events helps us to reduce false positives effectively; however, such a design also introduces false negatives.Therefore, generally, in ProveNFix, false negatives are caused by the pre-defined pre/future-conditions being too weak or the inferred postconditions being too strong.In duality, false positives occur when the pre-defined pre/future conditions are too strong or the inferred postconditions are too weak.

RELATED WORK
Analyzing for temporal properties.Model checking [7] is a well-known verification technique that can prove temporal properties in finite state systems, usually encoded using automata [6,16,20].Furthermore, model checking is employed to identify counterexamples where properties are violated, such as CMC [27], Java Pathfinder [42], and CPAchecker [4].While these tools effectively uncover assertion violations, they encounter challenges like state space explosion and slow logical formula solving.In contrast, ProveNFix performs a compositional analysis that scales to large real-world programs while achieving high precision.
Another approach to detecting temporal property violations is runtime verification, such as JavaMOP [25], MarQ [33] and Mufin [40], which monitors test executions against formal specs.In particular, our IntRE draws similarities to JavaMOP's spec language -both explicitly capture the past-time LTL by putting them into preconditions.Specifically, JavaMOP monitors the manually written and automatically mined specs against tests in open-source projects and finds hundreds of bugs.However, those runtime verification tools rely on large numbers of tests, which are not always available.Although LTLFuzz [26] could automatically generate tests via greybox fuzzing, the coverage of generated tests is usually low, leading to false negatives.Our method performs static analysis without requiring test cases, which could complement runtime verification.
The latest Infer release (v1.1.0)introduces a new checker Topl8 , to detect errors based on userprovided state machines describing temporal properties.Like ProveNFix, Topl aims to encode different analyzes as temporal properties, such as taint analysis.For example, its analysis can prevent a value returned by method source() from being sent as an argument to a method sink().However, Topl is still in the experimental phase and hasn't shown any potential for automated program repair; moreover, just like Infer, Topl possibly suffers from many false negatives due to the heuristics deployed for reducing false positives.
Program analyzer based repair.Recently, several repair techniques based on static analysis tools have been proposed in the program repair community [17,19,24,28,41].One of the most relevant works is FootPatch [41], which relies on separation logic to fix bugs related to resource release, freeing memory, and null pointer dereference.However, FootPatch may introduce doublefree when fixing a memory leakage.Moreover, to automatically generate patches, it requires templated annotations for the specs at the bug locations.Another tool MemFix [24] targets safely fixing memory usage bugs such as: memory leaks, use-after-frees, and double-frees.MemFix works for small-scale programs (<5kLoC) based on a variant of typestate static analysis and cannot produce patches with conditional deallocation.In contrast, SAVER [17] targets on the same set of bugs, but deals with big-scale projects (up to 320kLoC) with the help of Infer.However, SAVER's patches are generated by keeping object flow graphs for the buggy programs.In other words, SAVER failed to make use of the intermediate bugs information generated by Infer.The proposed technique in this work also draws similarities to Senx [19], which proposes property-based APR.Such properties are automatically inferred via an access range analysis, which can be used to deal with bugs, including buffer overflow/integer overflow and bad cast.Although Senx and ProveNFix target different bug types, the hint highlighted by Senx is valuable: the advantage of the propertybased approach is that a small set of safety properties can be specified once and used on a vast number of programs without the need to specify anything specific about each of the programs, or collect a comprehensive set of test cases.Moreover, such properties are inherently precise and complete.Our work is the first work to deal with different bug types (e.g., memory leaks, unchecked return value, double-free, and general temporal bugs) via encoding them into temporal properties.
Specification guided repair and deductive synthesis.Other existing works have also explored formal specs guided program repair in the form of writing pre/post-conditions [28,30] or assertions [15,34].Those approaches then utilize verification-based approaches to generate patches, ensuring the given conditions are satisfied.They can produce precise and complete patches, but writing formal specs for each function is tedious and time-consuming.Instead of asking users to provide pre/post-conditions, our approach relies on a future-conditions to modularly enable an automated spec generation for their callers.Prior work [31] describes a deductive approach to synthesizing imperative programs with pointers from specs expressed in Separation Logic.Their synthesis algorithm takes as input a pair of assertions -a pair of pre/post-condition -which describe two states of the symbolic heap and derives a program that transforms one state into the other, guided by the shape of the heap.Kneuss et al. [22] explore the problem of automatically repairing programs written as a set of mutually recursive functions in a purely functional subset of Scala, evaluated on seeded bugs on small programs.More recently, Nguyen et al. [28] propose a novel method to automatically repair buggy heap-manipulating programs using constraint solving and deductive synthesis [31].All those approaches only target small programs, while our approach has been shown to scale to large programs.
Temporal property inference.Automatically generating temporal specifications is an important research direction.To enable a compositional analysis, both Infer and ProveNFix vastly reduced the human effort by automatically inferring the specifications starting from a core set of primitives.To further reduce the effort of annotating the primitive APIs, one approach is to use data mining tools to extract the specifications from "good" code.Prior work [44,45] propose an automatic approach to inferring a target system's temporal properties by analyzing its event traces.The core of this technique is a set of predefined property templates crafted for a set of common events.These templates form a partial order in terms of their strictness; and it finds the strictest properties satisfied by a set of events based on the traces.Another approach is to make use of inference tools, such as Infer or ProveNFix, to generate specifications from the API implementations (hinted in Sec.7.5), such as OpenSSL projects, and release the generated specifications for future analysis.
The first approach may be fast but imprecise, while the second is more precise but requires additional tools to support spec inference.Nevertheless, these other directions are orthogonal to our current proposal.In this paper, the temporal specs of primitive functions are manually annotated according to the API documentation for C or descriptions for program functionalities.Our spec inference uses a compositional analysis to infer the spec of procedures based on its behavior(s) from given primitive API's specs.
Realizability checkers for reactive systems.Realizability checking [12] with assume/guarantee contracts has been studied for synthesis and checking temporal logic requirements.A contract is realizable, if it is possible to synthesize a component such that for any input allowed by the contract assumptions, the component can produce outputs that satisfy the contract guarantees.However, such techniques do not enforce the constraints on events occurring after the component, as we captured via future conditions.We note that realizability checkers focus on synthesis, while our proposal focuses on temporal bug detection and repair.

CONCLUSION
This work is motivated by "how to modularly analyze and repair bugs which can be encoded using temporal properties?".Our main contribution is showing the feasibility of finding and repairing a wide range of bug types using expressive temporal specifications.Specifically, we present a compositional framework driven by a minimal effort of annotating primitive APIs, which then incorporates the novel future-conditions.This enables an automatic specification inference and dynamic constraint generation.We prototype the proposal, present experimental results, and demonstrate nontrivial case studies to show its utility.ProveNFix is the first program repair tool, which is guided by temporal properties.

Fig. 2 .
Fig. 2. Four kinds of malloc wrappers and their inferred future-conditions.

4. 2
Triplet specification inference Relation E |= P denotes the reasoning for program P starting with an environment E. As shown in [SI -Proc], given any procedure declaration, the rule takes { ∧, 0} as the initial state, and reasons about the actual behavior Φ actual of the procedure body.Being the strongest postcondition, Φ actual is added as the inferred spec of nm( * ) into the environment.The notation E [C].post ↦ → Φ means to associate the inferred spec Φ with C's postcondition, and same for pre and future.[SI-Proc] P=nm( * ){} ; P ′ E, nm( * ) ⊢ { ∧, 0}  {Φ actual } E [nm( * )].post ↦ → Φ actual E |= P ′ E |= P

Based on Definition 3 ,
we formally define the inclusion relation ⊑ between trace formulae in Definition 4, where the derivative function  -1  1 eliminates a trace  from the head of  1 and returns what remains.Intuitively, proving the inclusion  1 ⊑  2 (presented in Sec.5.2) amounts to checking whether all the possible traces in the antecedent  1 are legitimately allowed in the possible traces from the consequent  1 .Next, we define the IntRE inclusion in Definition 5. Proc.ACM Softw.Eng., Vol. 1, No. FSE, Article 11.Publication date: July 2024.ProveNFix: Temporal Property-Guided Program Repair 11:11
• We propose a novel program repair approach, guided by constraints encoded using an expressive temporal spec called IntRE, an abbreviation for Integrated Regular Expressions.• We prototype our proposal as a repair tool ProveNFix, on top of Infer front-end, to support large-scale C projects.• We evaluate ProveNFix on an extensive benchmark and demonstrate that ProveNFix outperforms state-of-the-art tools in fixing a various types of bugs.The source code of ProveNFix and dataset are available at [46].

Table 1 .
Summary of the Annotated API Specifications.

Table 2 .
Experimental results for analyzing 10 C projects, comparing with Infer-v1.1.0.Columns #NPD, #ML and #RL record the numbers of null pointer dereferences, memory leaks, and resource leaks, respectively.The numbers of false positives found by Infer and more true positives found by ProveNFix are represented by +n and +n respectively.Columns in #Time record the analysis time spent.

Table 3 .
Experimental results for repairing 10 C projects, comparing with SAVER and FootPatch.Columns marked as # are numbers of the total true positives found by Infer-v1.1.0andProveNFix,summarised from Table2.The numbers of false positives reported by Infer-v0.9.3 are marked as +n.

Table 4 .
Automatically finding/fixing double free bugs.