Optimistic Prediction of Synchronization-Reversal Data Races

Dynamic data race detection has emerged as a key technique for ensuring reliability of concurrent software in practice. However, dynamic approaches can often miss data races owing to nondeterminism in the thread scheduler. Predictive race detection techniques cater to this shortcoming by inferring alternate executions that may expose data races without re-executing the underlying program. More formally, the dynamic data race prediction problem asks, given a trace \sigma of an execution of a concurrent program, can \sigma be correctly reordered to expose a data race? Existing state-of-the art techniques for data race prediction either do not scale to executions arising from real world concurrent software, or only expose a limited class of data races, such as those that can be exposed without reversing the order of synchronization operations. In general, exposing data races by reasoning about synchronization reversals is an intractable problem. In this work, we identify a class of data races, called Optimistic Sync(hronization)-Reversal races that can be detected in a tractable manner and often include non-trivial data races that cannot be exposed by prior tractable techniques. We also propose a sound algorithm OSR for detecting all optimistic sync-reversal data races in overall quadratic time, and show that the algorithm is optimal by establishing a matching lower bound. Our experiments demonstrate the effectiveness of OSR on our extensive suite of benchmarks, OSR reports the largest number of data races, and scales well to large execution traces.

The two conflicting events  1 = ⟨ 1 , w()⟩ and  12 = ⟨ 4 , w()⟩ is a predictable data race of  1 which is also an optimistic sync-reversal race, witnessed by the correct reordering  1 that reverses critical sections.
techniques, are the preferred class of techniques for detecting data races for industrial scale software applications [59].
A dynamic data race detector observes an execution of a concurrent program  and infers the presence of a data race by analysing the trace of the observed execution.A key challenge in the design of such a technique is sensitivity to non-deterministic thread schedules -even for a fixed program input, a data race may be observed under a very specific thread schedule, but not under other thread schedules.This means that a simplistic race detector that, say, only checks for two conflicting events appearing simultaneously in the execution trace, is likely going to miss many bugs.This is where predictive analysis techniques shine -instead of looking for bugs only in the execution that was observed, they additionally also detect bugs in executions that, while not explicitly observed during testing, can nevertheless be inferred from the observed execution, without rerunning the underlying program  [31,34,45,57,60,65].Predictive techniques identify the space of executions or reorderings that can provably be inferred from a given observed execution , and then look for a reordering  in this space, that can serve as a witness to a bug such as a data race.Consider the execution  1 in Figure 1a consisting of events  1 ,  2 , . . .,  12 where   denotes the  th event from the top.The two write events on variable ,  1 and  12 , are far apart and not witnessed as a data race in  1 .However, the correct reordering  1 of  1 , in which the two write events appear consecutively, shows that it is nevertheless, a predictable data race of  1 .Indeed any program  that generates  1 will also generate  1 albeit with a different thread interleaving.
In general, sound (no false positives) and complete (no false negatives) data race prediction is known to be an intractable problem [44].Soundness is a key desired property, since false positives need to be otherwise vetted manually, a task which is particularly challenging in the case of concurrent programs.Consequently, many recent works counter the intractability by proposing incomplete (but nevertheless sound) predictive race detection algorithms that work in polynomial time and have high precision in practice.The main contribution of this paper is a new race prediction algorithm OSR that is sound, has higher prediction power than prior algorithms and achieves high scalability in practice.
The design of our algorithm OSR stems from the observation that often, data races can be exposed only by inverting the relative order of (some pairs of) critical sections, or synchronizations.The data race ( 1 ,  12 ) in Figure 1a, for instance, can in fact only be observed in correct reorderings that invert the order of the two critical sections on lock ℓ.However, reversing synchronization (lock/unlock) operations in the reordering can further force a reversal in the order in which memory access events must appear in the reordering, and can be intractable to reason about [44,45].This strong tradeoff between precision (obtained by virtue of reversing the order of many synchronization operations) and performance has materialized on both the extremes.Algorithms such as those based on the happens-before partial order [42,55] or the recently proposed SyncP [45] run in linear time but fail to expose races that mandate reasoning about synchronization reversals.On the other extreme, methods that exhaustively search for reversals, either resort to expensive constraint solving [31,60] or saturation style reasoning [18,54], and do not scale to long execution traces observed in real world concurrent applications.Our proposed algorithm OSR aims to strike a balance -it is designed to optimistically reason about synchronization reversals, and identifies those reversals that do not lead to the reversal of memory operations.The pair ( 1 ,  12 ) in Figure 1 is an example of a race that OSR reports.
OSR reports all optimistic synchronization-reversal races in overall time  (N 2 ), spending  (N ) time for processing each event in the given execution trace .Here, N is the number of events in  and  hides polynomial multiplicative factors due to number of locks and threads which are typically considered constants.In order to check for the absence of memory reversals, OSR constructs a graph (optimistic reordering graph) of events and checks if it is acyclic.Naively, such an acyclicity check would take  (N ) time for every pair of conflicting events, resulting in a total cubic running time.A key technical contribution of our work is to perform this check in amortized constant time by constructing a succinct representation of this graph, called abstract optimistic reordering graph, of constant size.We show that this abstract graph preserves acyclicity, and can be constructed in an incremental manner in amortized constant time, allowing us to perform race prediction for the entire input execution in overall quadratic (instead of cubic) time.Finally, we show that the problem of checking the existence of an optimistic sync-reversal race also admits a matching quadratic time lower bound, thereby implying that our algorithm is optimal.
We implemented OSR and evaluate its performance thoroughly.Our evaluation demonstrates the effectiveness of our algorithm on a comprehensive suite of 153 Java and C/C++ benchmarks derived from real-world programs.Our results show OSR has comparable scalability as linear time algorithms SyncP and WCP, while it reports significantly more races than the second most predictive one on many benchmarks, confirming our hypothesis that going beyond the principle of synchronisation preservation allows us ) is not a data race but a predictable data race of  2 , witnessed by correct reorderings  2 and  ′ 2 .Correct reordering.Predictive race detection, given a trace , asks if an alternate execution trace  witnesses a data race, and more importantly,  can be inferred from .The notion of correct reorderings precisely formalizes this.Given well-formed traces  and , with Events() ⊆ Events(), we say that  is a correct reordering of  if  respects the thread order and reads-from relations of .This means that (1) Events() is (≤  TO , rf  )-closed, (2) for any two events  1 ,  2 ∈ Events(), if  1 ≤  TO  2 , then  1 ≤  TO  2 , and (3) for any two events  1 ,  2 ∈ Events(), if  1 = rf  ( 2 ), then  1 = rf  ( 2 ).Data races and predictable data races.A pair of events (,  ′ ) in  is said to be a conflicting pair, denoted  ⊲⊳  ′ , if both are access events to the same variable, and at least one of them is a write event, i.e., (op(), op( ′ )) ∈ {(w(), w()), (w(), r()), (r(), w())} for some  ∈ Vars().For a trace  with Events() ⊆ Events(), we say that event  is -enabled in  if  ∉ Events() but all threadpredecessors of  are in , i.e., { ′ ∈ Events() |  ′ ≠ ,  ′ ≤  TO } ⊆ Events().A conflicting pair (,  ′ ) is said to be a data race of  if there is a prefix  of  such that both  and  ′ are -enabled in .Finally, a conflicting pair (,  ′ ) is a predictable data race of  if there is a correct reordering  of  such that both  and  ′ are -enabled in some prefix of .In this case, we say that  witnesses the data race (,  ′ ).
Example 1.Consider trace  2 in Figure 2a containing 6 events performed by two threads  1 and  2 .As before, we use   to denote the  th event of  2 .The two events  1 = ⟨ 1 , w()⟩ and  5 = ⟨ 2 , w()⟩ are conflicting (i.e.,  1 ⊲⊳  5 ).The pair ( 1 ,  5 ) is not a data race in  2 as no prefix of  2 has both these events simultaneously enabled.Consider the trace  2 in Figure 2b; it is a correct reordering of  2 because it preserves both the thread order and reads-from relation of  2 .For the same reason,  ′ 2 is a correct reordering of  2 (and also of  2 ).Now, observe that ( 1 ,  5 ) is a data race in  2 (and also in  ′ 2 ) because in the prefix  = ⟨ 2 , acq(ℓ)⟩, both  1 and  5 are  2 -enabled (resp. ′ 2 -enabled) and thus  2 -enabled.Thus, while ( 1 ,  5 ) is not a data race in  2 , it is a predictable data race of  2 .
The problem of predicting data races -given an execution trace , determine if there is a predictable data race of  -has been studied before [31,34,54,57,60,65] and is known to be an intractable problem [44].This means that any sound and complete algorithm for predicting data races is unlikely to scale to real world software applications whose execution traces can have billions of events.To cater to this, practical data race predictors resort to incomplete but sound algorithms that run in polynomial time.In the next section, we discuss the recently proposed SyncP algorithm that employs the principle of synchronization preservation for predicting data races whose theoretical complexity is linear.

Sync-Preserving Data Races
Our work is closer in spirit to the work of [45] which presents the SyncP algorithm that works in linear time and is the current state-of-the-art race prediction algorithm.The principle employed by SyncP is to focus on a special class of reorderings and the data races witnessed by such reorderings; we discuss these next.Sync-preserving reorderings and data races.A correct reordering  of a trace  is said to be sync(hronization)-preserving if for any two critical sections of  (on the same lock) that are both present in , their relative order is the same, That is, for every lock ℓ ∈ Locks() and for any two acquire events  1 ,  2 ∈ Events() A pair of conflicting events (,  ′ ) in Events() is said to be a sync-preserving data race of  if there is a sync-preserving correct reordering  of  that witnesses this race.
Example 2. Consider again, the trace  2 and recall from Example 1 that the pair ( 1 ,  5 ) is not a data race of  2 but a predictable race witnessed by the correct reordering  2 .Observe however that  2 is not a sync-preserving reordering of  2 because it flips the order of the two critical sections on lock ℓ.Nevertheless, ( 1 ,  5 ) is a syncpreserving race of  2 .This is because the reordering  ′ 2 is, in fact, a sync-preserving reordering of  2 (even though it is a prefix of the non-sync-preserving reordering  2 ); there is only one critical section in  ′ 2 and thus vacuously, the relative order on critical sections is the same as in  2 .
Limited predictive power of SyncP.While the SyncP algorithm runs in overall linear time, it can miss data races which are not synchronization-preserving.These are precisely those conflicting pairs (,  ′ ) such that any correct reordering that witnesses a race on  and  ′ necessarily reverses the relative order of two critical sections on a common lock.We illustrate this next, and remark that, in general, reasoning about even a single reversal is intractable [45].
Example 3. Let us again consider the trace  in Figure 1a (Section 1).The two conflicting events  1 = ⟨ 1 , w()⟩ and  12 = ⟨ 4 , w()⟩, are a predictable data race of  1 as witnessed by the correct reordering  1 in Figure 1b, which is not a sync-preserving correct reordering of  1 .In fact, consider any correct reordering  of  1 that witnesses the race ( 1 ,  12 ).Then  must include the events  10 and  11 , and thus the corresponding write events  4 and  8 , together with the thread predecessors  3 = ⟨ 2 , acq(ℓ)⟩ and  7 = ⟨ 3 , acq(ℓ)⟩.Next, for well-formedness, at least one of the matching releases  6 = ⟨ 2 , rel(ℓ)⟩ as well as  9 = ⟨ 3 , rel(ℓ)⟩ must also be present in .However, including  6 in  would enforce that  5 = ⟨ 2 , r()⟩, and its write event  2 = ⟨ 1 , w()⟩ are present in , and then, the event  1 must also be present in the reordering making it no longer enabled in .This, therefore, means that  6 ∉ Events(), and thus, the only other available release event  9 must be present in  (for well-formedness).Further, to ensure well-formedness,  3 must appear after  9 in .Thus, any reordering  witnessing the race between  1 and  12 must reverse the order of the critical sections.

OPTIMISTIC REASONING FOR REVERSALS
Given that reasoning about synchronization reversals is computationally hard, how do we identify such races efficiently?At a high level, the intractability in data race prediction arises because a search for a correct reordering entails (1) a search for an appropriate set of events (amongst exponentially many sets) and further, (2) given an appropriate set of events, a search for a linear order (amongst exponentially many linear orders) on this set which is well-formed, is a correct reordering and witnesses the race.We propose (1) a new notion of data races called optimistic sync(hronization) reversal races which can be predicted by opting for an optimistic approach to resolve both these steps, and (2) an algorithm OSR to detect all such data races in  (N 2 ) time.In this section, we discuss this notion of data races and discuss our algorithm in Section 4.

Optimistic Sync-Reversal Races
A crucial aspect of choosing the correct set of events is to ensure that multiple acquire events on the same lock do not stay unmatched; otherwise, the set cannot be linearized to a well-formed trace.In general, adding a matching release event may lead to recursive addition of further events.Some choices may (recursively) at times lead to the addition of one of the two focal events ,  ′ (candidate data race), leading to them being no longer enabled.We define a simple and tractable notion of optimistic lock-closure, which, instead of considering all choices, simply includes all matching release events as long as the two focal events are not included.In the following, we fix a trace .Optimistic lock-closure.Let  1 ,  2 ∈ Events().We say that a set  ⊆ Events() is optimistically lock-closed with respect to ( 1 ,  2 ) if (a)  1 ,  2 ∉  and prev  ( 1 ), prev  ( 2 ) ∈ , (b)  is (≤  TO , rf  )-closed, and (c) for every acquire event  ∈ , if  1 ,  2 ∉ TRClosure(match  (a)), then match  () ∈ .We denote the smallest set that contains  and is optimistically lock-closed set, as OLClosure(S, e 1 , e 2 ) Example 4. Let us recall trace  1 from Figure 1 and consider the set  1 = { 3 ,  4 ,  7 ,  8 ,  9 ,  10 ,  11 }.Observe that  1 is optimistically lock-closed with respect to ( 1 ,  12 ), because (1)  1 doesn't include either of  1 ,  12 , (2)  1 is (≤  TO , rf  )-closed, and finally, (3)  1 ,  12 ∉ TRClosure(e 9 ).Note that  1 ∈ TRClosure(e 6 ) but  6 ∉  1 .
Even though the notion of optimistically lock-closed set is simple, in general, checking if such a set can be linearized into a correct reordering that witnesses a data race, is an intractable problem, as we show next (Theorem 3.1).Theorem 3.1.Let  be a trace, let  1 ,  2 be conflicting events and let  ⊆ Events() be an optimistically lock-closed set with respect to ( 1 ,  2 ).The problem of determining whether there is a correct reordering  such that Events() =  is NP-hard.
The proof of Theorem 3.1 is presented in appendix A.1.Given the above result, we also define the following more tractable notion of optimistic reordering that ensures that there are no memory reversals, and moreover, critical sections are reversed only when absolutely required, i.e., that unmatched critical sections appear later than matched ones.Optimistic correct reordering.A trace  is said to be an optimistic correct reordering of  if (a)  is a correct reordering of , (b) for all pairs of conflicting memory access events  1 ⊲⊳  2 in Events(), tr  2 , and (c) for any lock ℓ and for any two acquire events  1 ≠  2 (with op( 1 ) = op( 2 ) = acq(ℓ)), if  1 and  2 are both matched in  (i.e., match  (  ) ∈ Events() for both  ∈ {1, 2}), then we must have  1 ≤  tr  2 iff  1 ≤  tr  2 .We now formalize optimistic sync-reversal data races.
Definition 1 (Optimistic Sync-Reversal Race).Let  be a trace and let ( 1 ,  2 ) be a pair of conflicting events in .We say that ( 1 ,  2 ) is an optimistic sync-reversal data race if there is an optimistic correct reordering  of  such that Events() is optimistically lock-closed with respect to ( 1 ,  2 ) and both  1 and  2 are -enabled in .
Example 5.In Figure 1, the pair ( 1 ,  12 ) is an optimistic syncreversal race, because the prefix  ′ 1 with first 7 events of  1 is an optimistic reordering of the optimistically lock closed set  1 , outlined in Example 4, (in which  1 and  12 are  1 -enabled).This is because, all conflicting accesses of  ′ 1 have the same relative order as in  1 , and further, the unmatched acquire event is positioned after all closed critical sections.Similarly, for the trace  2 of Figure 2, the linearization  ′ 2 = ⟨ 2 , acq(ℓ)⟩ of the set  2 (outlined in Example 4) is trivially an optimistic correct reordering.

Comparison with other techniques
Here, we qualitatively compare our proposed class of races with those reported by other sound predictive race detection techniques proposed in the literature, namely SyncP [45] and M2 [54] and illustrate how the set of races reported by OSR is neither a strict subset, nor a strict super set of those detected by each.Example 6. Recall again the execution trace  1 in Figure 1.In Example 5 we established that the pair ( 1 ,  12 ) is an optimistic sync-reversal race, while in Example 3, we showed that it is not a sync-preserving data race.When determining if ( 1 ,  12 ) can be declared a predictive data race, the M2 algorithm computes the set  = { 1 ,  2 ,  3 ,  4 ,  5 ,  6 ,  7 ,  8 ,  10 ,  11 } to be the candidate set that witnesses the race.Observe however, this set contains the event  1 and thus cannot witness the race ( 1 ,  12 ) since one of these events is not enabled in .Thus, some optimistic sync-reversal races are neither sync-preserving races, nor can be detected by M2.
Example 7. Consider the trace in Figure 3a.The pair ( 1 ,  21 ) is a sync-preserving data race as witnessed by the correct reordering shown in Figure 3b.This pair, however is not an optimistic syncreversal data race since the smallest optimistically lock-closed set capable of witnessing the race is the set  OSR = { [3,9] ,  [12,14] ,  17,20 }, where  , is shorthand for   ,  +1 , . . .,   −1 ,   .Observe that  OSR contains two unmatched acquire events of lock ℓ 2 , and adding either matching release will bring  1 in the set.Likewise, M2 computes the set containing all events but  21 , and thus contains  1 .Thus, there are sync-preserving races which are neither optimistic sync-reversal races, nor can be detected by M2.
Example 8. Finally, consider the trace in Figure 3c, derived from [54].Here, the pair ( (also see Figure 3d for the witnessing execution).We remark that any correct reordering witnessing this race must reverse the order of the two acquire events  8 and  13 , as well as the order of conflicting memory access events  9 and  14 .Consequently, this is an example of a race reported by M2 that is neither a sync-preserving race, nor an optimistic sync-reversal race.

THE OSR ALGORITHM
We now describe our algorithm OSR that detects optimistic syncreversal data races.For ease of presentation, we will first discuss how to check if a given pair ( 1 ,  2 ) of conflicting events is an optimistic sync-reversal data race (Section 4.1), in  (N ) time, where N is the number of events in the given trace.Naively, it can be used to report all optimistic sync-reversal data races in  (N 3 ) time, by enumerating all  (N 2 ) pairs of conflicting events and checking each of them in  (N ) time.Instead, OSR runs in overall  (N 2 ) time and is based on interesting insights that enable it to perform incremental computation over the entire trace (Section 4.2).We present our overall algorithm and its optimality in Section 4.3.

Checking Race On A Given Pair Of Events
Based on Definition 1, the task of checking if a given pair ( 1 ,  2 ) of conflicting events is an optimistic sync-reversal data race entails examining all optimistic lock-closed sets and checking if any of these can be linearized.
Constructing optimistically lock-closed set.Our algorithm, however, exploits the following observation (Lemma 4.1), and focuses on only a single set, namely the smallest such set.In the following, we will abuse the notation and use OLClosure(e 1 , e 2 ) to denote the set OLClosure(S e 1 ,e 2 , e 1 , e 2 ), where   1 , 2 = {prev  ( 1 )} ∪ {prev  ( 2 )}.Here, prev  () is the last event  such that  ≤  TO ; if no such event exists, we say prev  () = ⊥, in which case {prev  ()} = ∅.Lemma 4.1.Let  1 ,  2 be conflicting events in trace .If ( 1 ,  2 ) is an optimistic sync-reversal race, then it can be witnessed in an optimistic correct reordering  such that Events() = OLClosure(e 1 , e 2 ).In Algorithm 1, we outline our algorithm to compute the smallest set that we identified in Lemma 4.1.It takes 3 arguments -the two events  1 ,  2 and a set  0 ; for computing OLClosure(e 1 , e 2 ), we must set  0 = ∅; later in Section 4.2 this set will be used to enable incremental computation.This algorithm performs a fixpoint computation starting from the set  0 ∪ TRClosure(prev  (e 1 )) ∪ TRClosure(prev  (e 2 )), and identifies an unmatched acquire event  and checks if its matching release  can be added without adding  1 or  2 ; if so,  is added; Acqs(S) denotes the set of acquire events in the set .The algorithm ensures that the set is (≤  TO , rf  )-closed at each step, and runs in  (T 2 N ) =  (N ) time.Checking optimistic reordering.First, we check if the set  constructed by Algorithm 1 is lock-feasible, i.e., the set of unmatched acquires OAcqs(S, ℓ) = { ∈ Acqs(S) | match  () ∉  } for each lock ℓ is either singleton or empty:  (,  ′ ), where  = prev  ( ′ ).The set  Opt ,⊲⊳ consists of all immediate conflict edges, i.e., all pairs (,  ′ ) in  such that  ⊲⊳  ′ ,  ≤  tr  ′ and there is no intermediate event in  that conflicts with both.The set  Opt ,match consists of all pairs (,  ′ ) such that  ≤  tr  ′ and there is a common lock ℓ for which op( ) = rel(ℓ), op( ′ ) = acq(ℓ), both  and  ′ are matched in , and there is no intermediate critical section on ℓ.Finally, the remaining set of edges order matched critical sections before unmatched ones, i.e.,  Opt ,unmatch = {(,  ′ ) | ∃ℓ, op( ) = rel(ℓ), op( ′ ) = acq(ℓ), match  ( ′ ) ∉  }.Since optimistic reorderings forbid reversal in the order of conflicting memory accesses, as well as in the order of same-lock critical sections that are completely matched, it suffices to check the acycliclity of  Opt , so that the existence of witness is guaranteed.Lemma 4.2.Let  be a trace and let  ⊆ Events() such that  is (≤  TO , rf  )-closed and also lock-feasible.Then, there is an optimistic reordering  of  on the set  iff the graph  Opt  is acyclic.
We remark that  Opt can be constructed and checked for cycles in time  (T N ) =  (N ).Thus the overall algorithm for checking if given ( 1 ,  2 ) is an optimistic sync-reversal race is -first compute OLClosure(e 1 , e 2 ) in  (N ) time, check lock-feasibility in  (LT ) =  (1) time and perform graph construction and cycle detection in  (N ) time.We thus have the following theorem.
Theorem 4.1.Let  be a trace and let  1 ,  2 be conflicting events in .The problem of determining if ( 1 ,  2 ) is an optimistic syncreversal race can be solved in time  T (T N + L) =  (N ) time.

Incremental Race Detection
Overview.Recall that there are  (N 2 ) pairs of conflicting events, and instead of naively examining each of them, we develop an incremental algorithm that determines the existence of an optimistic sync-reversal race in total  (N 2 ) time.We achieve this by spending  (N ) time per (read/write) event  ∈ Events(), and determine in overall  (N ) time if there is some event  ′ such that ( ′ , ) is a race, by scanning the trace from earliest to latest events.To do so, our algorithm exploits several novel insights.Let us fix one of the events .First, we show that the optimistic lock closure can be computed incrementally from previously computed sets, instead of computing it from scratch for each  ′ .Even though the closure sets can be computed incrementally, the optimistic-reordering-graph  Opt (Section 4.1) cannot be computed in an incremental fashion, because the edges in this graph depend upon precisely which events are present in the set.In particular, a previously unmatched acquire event may become matched in a larger set, and thus, we may have fewer edges in the larger graph.Our second insight caters to this -we represent the graph succinctly as an abstract optimisticreordering-graph which has  (1) (instead of  (N )) nodes, and moreover, can be computed by pre-populating an appropriate data structure and performing range minima queries over it, to determine reachability information in the abstract graph in  (1) time.Incrementally constructing optimistic lock closure.The incremental closure computation relies on the observation that the closure is monotonic with respect to thread-order (Lemma 4.3).Thus, if we fix a thread , and scan the events of  from earliest to latest events, then we can reuse prior computations.In fact, Algorithm 1 already works in this fashion -it builds on top of the given input set . Lemma 4.3 establishes the correctness and time complexity of closure computation.
2 ).We have the following: (2) (,  ′ ) ∈  Abs if there is a path from  to  ′ in the graph  Opt  .In other words,  Abs  only contains  (L) vertices, corresponding to the last release events, and acquire events that are unmatched in , and preserves the reachability information  Constructing vertices and backward edges of  Abs  .Recall that  is a (≤  TO , rf  )-closed subset of Events().The set of vertices of this graph can be determined in  (L) time by maintaining the last event of every thread present in .This information can be inductively maintained as  is being computed incrementally.The 'backward' edges -namely those pairs (, ) where  ∈  is an unmatched acquire on some lock ℓ, and  = lastRel(S, ℓ) but  ≤  tr  -can be computed in  (L) time.Pre-computing earliest immediate successor.For constructing forward edges, we first pre-compute a map (for each pair of threads  1 ,  2 ), EIS  1 , 2 such that, for every and are computed as follows.Recall that we are given a (≤  TO , rf  )-closed subset  of Events(), and the path between two events must only be contained with the events of , thus the arrays {EIS  1 , 2 }  1 , 2 ∈Threads( ) cannot be used as is  to efficiently determine paths.However, a combination of range minima queries [7] and shortest path computation can nevertheless still be used to determine path information efficiently.Let us use succ  , to denote the earliest event in thread  that has a path from event , using only forward edges of  Opt  .The event succ  , can be computed using a Bellman-Ford-Moore [12,27,48] style shortest path computation, as shown in Algorithm 2. This algorithm performs rangeMin() [, ] queries which return the earliest event (according to ≤  TO ) in the segment of the array  starting at index  and ending at index .With  (N ) time and space pre-processing, each range minimum query takes  (1) time [7,28], Thus, the task of determining {succ  , }  ∈Threads( ) takes  (T 2 ) time.Now, in the graph  Abs  , we add an edge from  to  ′ if succ  ,th( ′ ) ≤  TO  ′ .Thus, we add all forward edges of the graph in overall  (T 2 L) time.
Checking if a given event  is in race with some event.We now have all the ingredients to describe our overall incremental algorithm to check if event  is in optimistic-sync-reversal race with some event of a given thread  (Algorithm 3).For this, we first initialize all the arrays {EIS  1 , 2 }  1 , 2 ∈Threads( ) using a linear scan of the trace , and also do pre-processing for fast performing range minima queries, spending overall time  (T N ).Then, we iterate over each event  ′ of thread  that conflict with , starting from the earliest to the latest.For each event, we incrementally update the optimistic lock-closure set  and check if it is lock-feasible.If so, we construct the abstract optimistic-reordering-graph  Abs  and check if it is acyclic, and report a race if so.Theorem 4.2.Let  be an execution,  ∈ Events() be a read or write event and let  ∈ Threads().The problem of checking if there is an event  ′ with th( ′ ) =  such that (,  ′ ) is an optimisticsync-reversal race, can be solved in time  (T 2 + L)LN .

Detecting All Optimistic Sync-Reversal Races
Given a trace , all the optimistic sync-reversal races in  can now be detected by enumerating all events  and threads  and checking if incrementalRaceDetection(, ) reports a race.Our resulting algorithm OSR (Algorithm 4) runs in time  T L (T 2 + L)N 2 .
Theorem 4.3.Given a trace , the problem of checking if  has an optimistic sync-reversal data race, can be solved in time Hardness of detecting optimistic sync-reversal races.We have, thus far, established that the problem of checking the existence of optimistic sync-reversal data races can be solved in quadratic time.
In the following, we also show a matching quadratic time lower bound, thus establishing that our algorithm OSR is indeed optimal.The lower bound is conditioned on the Strong Exponential Time Hypothesis (SETH), which is a widely believed conjecture.We use fine-grained reductions to establish a reduction from the orthogonal vectors problem which holds true under SETH [70].The full proof of the following result is presented in Appendix B.8.
Theorem 4.4.Assume SETH holds.Given an arbitrary trace , the problem of determining if  has an OSR race cannot be solved in time  (N 2− ) (where N = |Events()|) for every  > 0.

EVALUATION
We implemented our algorithm OSR in Java, using the Rapid dynamic analysis framework [4].We evaluate the performance and precision of OSR, on 153 benchmarks and compare it with prior state-of-the-art sound predictive race detection algorithms.We discuss our experimental set up in Section 5.1 and our evaluation results in Section 5.2, Section 5.3 and Section 5.4.

Experimental Setup
Benchmarks.Our evaluation subjects are both Java (Category-1) as well as C/C++/OpenMP (Category-2) benchmarks.Category-1, derived from [45], contains 30 Java programs from the IBM Contest benchmark suite [24], the Java Grande forum benchmark suite [63], DaCapo [13], SIR [22] and other standalone benchmarks.Category-2 contains 123 benchmarks from OmpSCR [23], DataRaceBench [39] DataRaceOnAccelerator [62], NAS parallel benchmarks [11], CORAL [5,6], ECP proxy applications [1] and the Mantevo project [2].For an apples-to-apples comparison, we evaluate all compared techniques on the same execution trace to remove bias due to thread-scheduler.For this, we generate traces out of these programs using ThreadSanitizer [64] (for Category-2) and using RVPredict [47] (for Category-1).For Java programs, we generate one trace per program and for C/C++ programs, we generate multiple traces of the same program with different thread number and input parameters.All compared methods then evaluate each generated trace 3 times.We did not exclude any traces from the benchmarks, except one corrupted trace.
As part of our evaluation, we also explored synthetically created benchmark traces from RaceInjector [3,69], that uses SMT solving to inject data races into existing traces.However, the traces in [3] are short, could not be used to distinguish most compared methods and were not useful for a conclusive evaluation.Our evaluation on these traces is deferred to Appendix C (Table 4).As observed in prior works [18,26,45,54], a large fraction of events in traces are thread-local, and do not affect the precision or soundness of race detection algorithms, but can significantly slow down race detection.Therefore, we filter out these thread-local events, as with prior work [34,45,54].Compared methods.We compare OSR with state-of-the-art sound predictive algorithms: WCP [34], SHB [42], M2 [54] and SyncP [45].Amongst these, SHB and WCP are partial order based methods and run in linear time.M2 and SyncP are closer in spirit to ours -they first identify a set of events and then a linearization of this set that can witness a data race.SyncP works in linear time while M2 has higher polynomial complexity of  (N 4 log(N )) [54].For all these algorithms, we use the publicly available source codes [34,42,45,54].To achieve fair comparison, we modify each of them, so that (1) each algorithm reports on the same criteria (events v/s memory locations v/s program locations) (2) any redundant operations not relevant to the reporting criteria are removed.A comparison with recent work SeqC [18] was not possible because the implementation of SeqC is neither publicly available nor could be obtained even after contacting the authors.Our evaluation didn't include comparison with solver-aided race predictors, such as RVPredict [31].Based on prior work [34], such predictors are known to not scale, have unpredictable race reports and typically have lower predictive power than the simplest of race prediction algorithms, thanks to the windowing strategy they implement.Machine configuration and evaluation settings.The experiments are conducted on a 2.0GHz 64-bit Linux machine.For Category-1 (Java) benchmarks, we set the heap size of JVM to be 60GB and timeout to be 2 hours; this set up is similar to previous works [34,45], except for the larger heap space, mandated by the larger memory requirement of M2.For Category-2 (C/C++) benchmarks, we set the heap size to be 400GB and timeout to be 3 hours, since these are much more challenging -the number of events, locks and variables in these are typically 10 − 100× more than traces in Category-1.All experiments are repeated 3 times and the times reported are averaged over these 3 runs.Reported metrics.Our evaluation aims to understand the prediction power (precision) as well as the scalability of OSR and assess how it compares against existing state-of-the-art race prediction techniques.For each execution trace , we report key characteristics (number of events, threads, locks, read events, write events, acquire events and release events) to estimate how challenging each benchmark is.Next, we measure and report the following : Running time.For each algorithm, we report the average running time (over 3 trials) for processing the entire execution.This is aimed to understand if the worst case quadratic complexity of OSR affects its performance in practice, or it is on par with other linear time methods such as WCP, SHB and SyncP.Race reports in Category-1.For benchmarks in Category-1, we report the number of racy events reported; an event  2 is racy if there is a conflicting event  1 earlier in the trace, such that ( 1 ,  2 ) is a race.We also report the number of distinct source code lines for these racy events.We note here one racy source code line could correspond to many racy events.Race reports in Category-2.For benchmarks in Category-2, we report the number of variables (memory locations) that are racy.A variable  is racy if there is a racy event  that accesses .The number of racy events in the C/C++ benchmarks is typically very large, and reporting each racy event throttles nearly all algorithms.If a compared method times out, we report the number of racy variables found before timing out.This enables us to better evaluate their ability to find races in a more reasonable setting.Besides, most algorithms report many races before they timeout.Scaling behavior of OSR.OSR runs in worst case quadratic time.We empirically evaluate how OSR scales with trace length, for a small set of benchmarks to gauge its in-practice behavior.

Evaluation Results For Java Benchmarks
Table 1 summarizes the results for Category-1.Prediction power.OSR reports the largest number of races on each trace; it reports about 200 more racy events and 3 extra racy locations over the second most predictive method (SyncP); we remark that any extra data race can be an insidious bug [15] and deserves rigorous attention by developers.Although WCP can detect syncreversal races in principle, and reports much fewer races than OSR (and also misses races reported by SyncP).M2 takes much more memory and time than OSR, and times out on two benchmarks (linkedlist and lufact), while runs out of memory on the benchmark tsp.On other benchmarks, OSR demonstrates the same prediction power as M2.Overall M2 detects 29.2k less races.In terms of racy source code locations, OSR also reports 24, 47, 3, 13 more than SHB, WCP, SyncP and M2, respectively.We remark that this class of benchmarks does not bring out the full potential of OSReven if OSR reports the highest number of races individually for each benchmark, at least one other method also reports this number of races.Category-2 though does better justice to OSR.Running time.SHB and WCP are lightweight partial order-based linear time algorithms and finish fastest.On the other hand, M2 performs an expensive computation, times out on some large traces and takes more than 6 hours to finish.SyncP runs in linear time, but our algorithm OSR outperforms it by about 1.5×.We note that the linkedlist benchmark is especially challenging, with large number of variables, as a result of which SyncP allocates a large memory to account for its heavy data structure usage.
Thus, for Category-1 benchmarks, OSR demonstrates highest race coverage, and runs faster than the state-of-the-art SyncP.

Evaluation Results For C/C++ Benchmarks
Table 2 summarizes our evaluation over Category-2 (C/C++) benchmarks.In Appendix C, we present detailed statistics of these benchmarks (see Table 6 and Table 5).Prediction power.OSR displays high race coverage on this set of traces.Overall, OSR reports 2.5× more races than the second most predictive method (SHB).On all, except 5, of the 118 benchmarks, OSR reports the highest number of racy variables.Each of the remaining 5 benchmark traces have a large number of events, and only the lightweight algorithms (SHB and WCP) finish within the 3 hour time limit.In terms of total races found, OSR reports 2.5× and 2.7× more races than SHB (2 nd highest) and WCP (3 rd highest).SyncP and M2 time out on most benchmarks.We speculate that this is because both these methods have high memory requirement and result in large time spent in garbage collection.OSR, therefore, has the highest race coverage even for the C/C++ benchmarks.
We remark that the number of racy variables in this class of benchmarks is very high.We speculate this is because our instrumentation using ThreadSanitizer does not explicitly tag atomic operations.Further many benchmarks perform matrix operations, giving rise to many distinct memory locations.Nevertheless, we choose to report all races because data races can render these programs potentially non-robust, and under weak memory consistency, data races can lead to undefined semantics.Running time.Overall, SHB runs the fastest.SyncP and M2, on the other hand, frequently time out.The difference in the performance between SyncP, M2 and OSR gets exacerbated on the C/C++ benchmarks because these contain much larger execution traces than Java benchmarks.The performance of OSR (total running time of 42 hours) is close to WCP (30 hours).OSR, therefore, achieves an optimal balance between predictive power and scalability -OSR has the highest predictive power and outperforms SHB, WCP, SyncP, M2, and often runs faster than more exhaustive techniques.

Scalability
In this section, we take a closer look at the run-time behavior of OSR to understand its unexpected high scalability on some benchmarks.We select the most challenging benchmarks from each of the following groups: HPCBench, CoMD, DataRaceBench, OMPRacer in Category-2.For these benchmarks, we measure the time to process every million events and report it in Figure 6.We observe that on these four benchmarks, OSR scales linearly for a large prefix, while gradually slows down on two of them.The nearlinear behavior of OSR is likely an artefact of the fact that, many of these benchmarks traces have large number of data races, thus the race check for a single event succeeds quickly instead of the worst case linear time requirement.Therefore, instead of spending overall quadratic time, OSR spends linear time on average.

RELATED WORK
Dynamic predictive analysis.Happens-before (HB) [37] based race detection [26,55] has been adopted by mature tools [49,64], and has subsequently been strengthened to SHB [42] so that all races reported are sound.Causal Precedence (CP) [65] and Weak Causal Precedence (WCP) [34] weaken HB in favor of predictive power, and run in polynomial and linear time, respectively.Other works such as DC [56,58] and SDP [29] are also partial order based methods that are either sound by design or perform graph-based analysis to regain soundness.SyncP [45], M2 [54], SeqCheck [18] work similar to OSR, by constructing an appropriate set of events and appropriate linearization over this set.SMT solver backed approaches [31,60] aim for sound and complete race prediction but do not scale to moderately large execution traces.The complexity of data race prediction was extensively studied in [44] and was shown to be NP-hard and also W[1]-hard, implying that an FPT algorithm (parameterized by the number of threads) for race prediction is unlikely.The fine-grained complexity of HB and SyncP was studied in [36]; in practice, HB can be sped up using the tree clock data structure [43].Predictive analyses have also been developed for deadlocks [33,68], atomicity violations [46,66], for more general temporal specifications [10] and more recently has been investigated from the lens of generalizing trace equivalence [25].
Other concurrency testing approaches.Static analysis techniques employ forms of lockset style reasoning [61] to detect data races [14,38,51,73] to report data races, but are known to report false positives.Model checking techniques for concurrent software [8,35,52] have been employed to detect concurrency bugs [30,53].Another class of systematic exploration techniques include controlled concurrency testing [9,21], including those that employ randomization [17,40,72] and state-based learning [50].More recently, feedback driven randomized techniques have been   employed for testing concurrent programs [32,71] Randomization has also been shown to reduce time overhead of dynamic data race detection [16,41,67].

CONCLUSIONS AND FUTURE WORK
We propose OSR, a sound polynomial time race prediction algorithm that identifies data races that can be witnessed by optimistically reversing synchronization operations.OSR significantly advances the state-of-the-art in sound dynamic data race prediction.OSR-style reasoning can be helpful for exposing other concurrency bugs such as deadlocks [33,68] and atomicity violations.

ACKNOWLEDGMENTS
This work is partially supported by the National Research Foundation, Singapore, and Cyber Security Agency of Singapore under its National Cybersecurity R&D Programme (Fuzz Testing <NRF-NCR25-Fuzz-0001>) and by a research grant (VIL42117) from VIL-LUM FONDEN.Any opinions, findings and conclusions, or recommendations expressed in this material are those of the author(s) and do not reflect the views of National Research Foundation, Singapore, and Cyber Security Agency of Singapore.
A PROOFS FROM SECTION 3 A.1 Proof of Theorem 3.1 Theorem 3.1.Let  be a trace, let  1 ,  2 be conflicting events and let  ⊆ Events() be an optimistically lock-closed set.The problem of determining whether there is a correct reordering  such that Events() =  and both  1 and  2 are -enabled in  is NP-hard.
We prove this theorem by instead establishing the following stronger Theorem A.1; it claims that, the problem of determining if the smallest optimistically lock-closed set can be linearized, is NP-hard problem.
Theorem A.1.Let  be a trace, let  1 ,  2 be conflicting events, and let  = OLClosure(e 1 , e 2 ).The problem of determining whether there is a correct reordering  s.t.Events() =  is NP-hard.
The high level idea behind our proof of is inspired from [44], which shows that the problem of checking if a given pair of conflicting events is a predictable data race, is NP-hard.In [44], the proof proceeds by first showing that an intermediate problem, namely RF-poset realizability, is NP-hard.An instance of this problem is a triple P = (, ,  ), where  is some set of read, write, acquire and release events,  ⊆  ×  is a partial order on  and  is a function that maps every read event  ∈  to a unique write event  ∈  on the same memory location.P is a positive intance of RF-poset-realixability if there is a linearization  of  that respects  ∪ {(,  ) |  =  ( )} and also ensures that between any read (resp.release) event  and its corresponding write (resp.matching acquire) event  =  ( ), there is no other write event of the same memory location (resp.lock) as  .In [44], the NP-hardness of RF-poset realizability is established via a reduction from INDEPENDENT-SET(c), which is the problem of checking if for an input graph , there is an independent set of  of size at least .Following this, [44] establishes a reduction from RF-poset realizability to the race prediction problem.
Our proof is inspired from this, but is a direct reduction from INDEPENDENT-SET(c) to our problem -given a trace  and a pair ( 1 ,  2 ), determine if there is a correct reordering containing exactly the events OLClosure(e 1 , e 2 ).Given an input graph  (instance of RF-poset realizability problem), we construct trace  with two events  1 and  2 in  as follows.We first construct an intermediate RF-poset instace P by slightly modifying the RF-poset instance constructed by [44], ensuring that P is realizable iff the graph  has an independent set of size ≥ .Starting with P, we can then construct a trace  with two specific events  1 ,  2 such that P can be realized iff there is a correct reordering of  for which OLClosure(e 1 , e 2 ) can be linearized.
Proof.Given an INDEPENDENT-SET(c) problem on graph , we encode a RF-poset realizability instance P as following.The set of events  belong to 2 + 2 threads  1 ,  2 . . . 2+2 , and we describe the total order   of events in each thread   next.
(  See an example of the reduction outlined above in Figure 7.Each variable in P is written once, so the read-from  relation is clear.The partial order  is the thread order induced by the threads  1 , . . . 2+2 . We remark that the poset P is almost identical to the one in [44] (let's call it P ′ ), except for the extra read and write events on variables { 1 , . . .,  2+1 } at the end of each thread in P. We will use  P ′ to denote the subset of events that belong to P ′ .
Let us first argue that P is realizable iff P ′ is realizable.If P ′ is realizable, then there is a linearization  ′ of  P ′ that preserves thread order and the reads-from of events in  P ′ .Consider the the trace  =  ′ , w( 1 ) . . .w( 2 ), r( 1 ) . . .r( 2 ), where we omit the obvious thread identifiers of events on { 1 , . . .,  2+1 }.Clearly,  ′ witnesses the realizability of P. Now, if P ′ is realizable using a linearization , it is easy to argue that the linearization  ′ obtained by removing events of memory locations { 1 , . . .,  2+1 } witnesses the realizability of P ′ .It thus also follows that  is a positive instance of INDEPENDENT-SET(c) iff P is realizable.
Let us now construct the trace  and complete our reduction.The set of events of trace will be Events() =  ⊎ {w 1 (), w 2 ()}, where  is a fresh memory location and w 1 () writes to  as the new last event of  2+1 , while w 2 () writes to  as the new last event of  2+2 .Observe that in P, every write event in thread   is read by events in  + for 1 ≤  ≤ .We let   be an arbitrary interleaving of   and  + that respects the thread order and read-from relation.
We construct the trace  to be the following.
First, it is clear that OLClosure(w 1 (a), w 2 (a)) =  .Thus, OLClosure(w 1 (a), w 2 (a)) can be linearized iff P can be realized.Consequently, the input graph  has an independent set of size ≥  iff (w 1 (), w 2 ()) is witnessed as a race of  using OLClosure(w 1 (a), w 2 (a)).Finally, it is clear that the construction takes polynomial time in the size of the graph .Thus, it follows that the problem of checking if, for a given trace  and a pair of conflicting events ( 1 ,  2 ) in , whether there is a correct reordering  of  with Events() = OLClosure(e 1 , e 2 ), is also NP-hard.□ B PROOFS FROM SECTION 4 B.1 Proof of Lemma 4.1 Lemma 4.1.Let ( 1 ,  2 ) be a conflicting pair of events in trace .If ( 1 ,  2 ) is an optimistic sync-reversal race, then it can be witnessed in an optimistic correct reordering  with Events() = OLClosure(e 1 , e 2 ).
Proof Sketch.Given  and two conflicting events e 1 , e 2 , let  = OLClosure(e 1 , e 2 ) and  be an arbitrary optimistically lock-closed Next, we show that for each backward edges (, ) in  Opt  , there is a path from  to  in  Opt  ; here, by backward edge we mean that  ≤  tr .Notice that since (, ) is a backward edge, it must be that  = last(rel(ℓ))  be the last release of lock ℓ and  = acqO(ℓ)  is an unmatched acquire in , for some ℓ.We first establish that indeed  must also be unmatched in  .Since  is optimistically lock closed, it must be that { 1 ,  2 } ∩ TRClosure(match  (a)) ≠ ∅.If, on the contrary, match  () ∈  , the fact that  is (≤  TO , rf  )-closed, we must have { 1 ,  2 } ∩  ≠ ∅.Clearly this would contradicts the fact that  is optimistically lock-closed.Thus,  is unmatched even in  orders all events of the same thread as in .Hence  respects ≤  TO .Second, for every read event  , its corresponding writer rf  ( ) is in , and further is ordered before  in the graph  Opt  , and every other conflicting write  ′ is either after  or before  in  and thus in  Opt  and thus in .Finally, lock semantics are preserved since  is lock-feasible, the matched critical sections are totally ordered and further the unmatched acquire is ordered after every other release of the same lock.Finally,  is an optimistic reordering because the order of matched critical sections and the order of conflicting events is preserved because they are explicit edges in  Opt  .Now assume that there is an optimistic reordering  of  with Events() = .We will argue that for every edge ( 1 ,  2 ) of   ∅).Now we prove  ⊆  ′ .We consider an arbitrary run of Algorithm 1 on computing  = ComputeOLClosure( 1 ,  2 , ∅), and construct another valid run of Algorithm 1 on computing  ′ = ComputeOLClosure( 1 ,  ′ 2 , ∅).During this process, we prove after  iterations,   ⊆  ′  for all .
For the third conclusion, we can take  as a start point and call ComputeOLClosure( 1 ,  ′ 2 , ) to compute  ′ , as  ⊆  ′ .Following the proof of the second conclusion, this takes Õ (| ′ | − | |) time.It remains to show ComputeOLClosure( 1 ,  ′ 2 , ) returns  ′ .We show this by induction.We denote  ′  as  ′ after  iterations in Algorithm 1. ) is an optimistic syncreversal race can be solved in time  T (T N + L) =  (N ) time.
Proof.For given  1 ,  2 , to determine if they are OSR race, we firstly compute their optimistic lock closure, check for lockfeasibility and then build the abstract graph to check for cycles.We have shown in Section 4.1 that for any given  1 ,  2 , OLClosure(e 1 , e 2 ) can be computed in  (T 2 N ).Lock-feasibility can be checked in  (T L) time.
To build the graph, we firstly add all vertices and backward edges.Later, we compute earliest successors for each vertex in the graph and add forward edges correspondingly.The abstract graph contains at most 2L nodes by definition.Also in Section 4.2, we have shown that it takes  (L) time to add all backward edges and  (T 2 L) time to add all forward edges.Checking cycles in the graph takes  (L + L 2 ) time, as there are at most  (L) vertices and  (L 2 ) edges.Therefore, building the graph and checking for cycle take  (L + L + T 2 L + L 2 ), i.e.  (L (T 2 + L)) in total.
To do race detection on given  1 ,  2 , it takes Theorem 4.2.Let  be an execution,  ∈ Events() be a read or write event and let  ∈ Threads().The problem of checking if there is an event  ′ with th( ′ ) =  such that (,  ′ ) is an optimisticsync-reversal race, can be solved in time  (T 2 + L)LN .
Proof.Following Algorithm 3, the computation ComputeOLClosure for each (,  ′ ), s.t.th( ′ ) =  is equivalent to compute the ComputeOLClosure for  and the last  ′ in thread , which can be done in  (T 2 N ) time.
We also need to check lock-feasibility, build graph and check cycles for each  ′ in .There are at most N such  ′ .The total time complexity to do so is  (N (LT + L (T 2 + L))), i.e.  (N L (T 2 + L)).
In total, we need  (T 2 N + N L (T 2 + L)), i.e.  (T 2 + L)LN time to check for all races between (,  ′ ), s.Proof.Following Algorithm 4, we iterate over all events and for a fixed event , we iterate over all threads.Therefore, Algorithm 3 is called at most  (T N ) times.Then the total complexity to check for races is bound by  (T N • (T 2 + L)LN ), i. Proof Sketch.Firstly, we show if there is a cycle  in  Abs  , then there is a cycle  ′ in  Opt  .For every edge  →  in , if it is a forward edge, then we replace it with the corresponding forward path from  to .If  →  is a backward edge, then we keep as it is.After this substitution, we get the replaced  as a cycle  ′ in  Opt  .If there is a cycle  ′ in  Opt  , then there is a cycle  in  Abs  .Considering the edges  in  ′ ,  must contain backward edges, otherwise  ′ cannot be a cycle.Let   be the set of backward edges in  and   be the set of nodes in   .We observe that   is a subset of the vertices in  Opt  , because the vertex set of  Opt  is a super set over  Abs  .Therefore,  can be constructed as following.First we keep all last release and open acquire event as nodes in  ′ .Second we add all backward edges in  to  ′ .Lastly, we replace all forward paths (paths don't contain backward edges) in  with a direct edge and add them into  ′ .We now have successfully constructed cycle  ′ in  Abs Orthogonal Vector Hypothesis (OV).The Orthogonal Vectors problem is defined as following.Given two sets ,  each containing N -dimensional 0-1 vectors, where  =  ((N )), determine if there exists two vectors  1 ∈ ,  2 ∈ , s.t.( 1 ,  2 ) has an inner product of zero.OV Hypothesis is a well-known conjecture and it has been widely accepted that it's not likely to give a sub-quadratic We now reduce the existence problem of OSR race to the OV problem and show that the problem of determining if there is a OSR race in  also has a lower bound of  (N ) 2 , unless OV Hypothesis fails.
Proof.Given two sets ,  of N -dimensional 0-1 vectors, we construct a trace  as following (shown in Figure 8). contains two threads   ,   .As ,  are finite sets, we enumerate elements from ,  as  1 ,  2 , . . .,  1 ,  2 , ....For an arbitrary vector   = ( 1 , ...,   ), assuming it contains  non-zero bits, we use a list (  1 , ...,   ) to denote the index of non-zero elements in   .For example, vector (0, 1, 0, 1) has non-zero bits [2, 4], as its 2nd and 4th bits are 1.We define an event clause associated with vector   as Then we observe a total order ≤  tr on .Now we show there is a pair of orthogonal vectors in , , iff there is a OSR race in .If there is a OSR race   (),   () in , they must correspond to vector  ∈ ,  ∈ .Since (  (),   ()) is a data race, they must be from different threads and their lock set must be disjoint.Therefore ∀ 1 ≤  ≤ , either [] = 0 or  [] = 0, thus ,  are orthogonal.
If there is a pair of orthogonal vector , , then we consider their clause   ,   .Let   (),   () be the two write operations in   ,   and now we show   ,   is a OSR race.For convenience, let  = OLClosure(w a (x), w b (x)).The following observations hold. ( Figure1: The two conflicting events  1 = ⟨ 1 , w()⟩ and  12 = ⟨ 4 , w()⟩ is a predictable data race of  1 which is also an optimistic sync-reversal race, witnessed by the correct reordering  1 that reverses critical sections.

Figure 2 :
Figure2: The two write events  1 = ⟨ 1 , w()⟩ and  5 = ⟨ 2 , w()⟩ in  2 are conflicting.( 1 ,  5 ) is not a data race but a predictable data race of  2 , witnessed by correct reorderings  2 and  ′ 2 .Correct reordering.Predictive race detection, given a trace , asks if an alternate execution trace  witnesses a data race, and more importantly,  can be inferred from .The notion of correct reorderings precisely formalizes this.Given well-formed traces  and , with Events() ⊆ Events(), we say that  is a correct reordering of  if  respects the thread order and reads-from relations of .This means that (1) Events() is (≤  TO , rf  )-closed, (2) for any two events  1 ,  2 ∈ Events(), if  1 ≤  TO  2 , then  1 ≤

Figure 5 :Example 10 .
Figure 5: In  3 , ( 4 ,  9 ) is not a predictable race.The optimistic reordering graph and the abstract optimistic reordering graph are cyclic.between these events.Lemma 4.4 formalizes the intuition behind this graph -it preserves the cyclicity information of the larger graph  Opt  , because any cycle in  Opt  must involve a 'backward' edge from a matched release and an unmatched acquire event. Abs  can thus be used to check for the existence of an optimistic reordering using an  (1) check instead of an  (N ) check based on Lemma 4.2.Lemma 4.4.Let  be a trace and let  ⊆ Events() be a (≤  TO , rf  )closed set. Opt  has a cycle iff  Abs  has a cycle.Example 10.Figure4bshows the abstract optimistic reordering graph for trace  1 in Figure1a, corresponding to the set  1 = OLClosure(e 1 , e 12 ), and contains the last release of lock ℓ in  1 as well as the only open acquire in  1 .This graph, like the graph in Figure4ais acyclic.In Figure5, the abstract graph (Figure5c) captures the path  2 →  3 →  7 →  8 of Figure5bwith a direct edge  2 →  8 , thereby preserving the cycle.
in the full graph  Opt Events( ) ; observe the subscript Events() instead of an arbitrary set . EIS  1 , 2 can be computed as a pre-processing step in  (T N ) =  (N ) time and stored as an array, indexed by the events of thread  1 .Determining forward edges of  Abs  .The forward edges of  Abs  summarize paths in  Opt

Figure 6 :
Figure 6: Time spent to process every million events for 4 selected traces.

Figure 7 :
Figure 7: Given a graph  and independent set size of 2, our construction to show NP-Hardness of linearizing OLClosure(e 1 , e 2 ) event set.By definition,  is the smallest optimistically lock-closed set and thus  ⊆  .Let  Opt  ,  Opt  be the optimistic-reorderinggraph of  and  , respectively, We now show if there is a cycle in  Opt  , then  Opt  also has a cycle.First we consider the nodes in  Opt  and we have for any node .Further, from the definition of  Opt  , it follows that ( ′ , ) is an edge of  Opt  .Now let  ′ = last(rel(ℓ))  We have  ≤  tr  ′ , because  ⊆  .This means that there is a path in  Opt  of the form  →  ′ →  in  Opt  .In other words, all paths of  Opt  are preserved in  Opt  and thus, so are the cycles of  Opt  .By definition, if (e 1 , e 2 ) is an optimistic sync-reversal race, there is a optimistic correct reordering , s.t. the optimistic-reorderinggraph of Events() has no cycle.Then we can conclude  Opt  also has no cycle, thus the linearization of

3 Theorem 4 . 3 .
t. th( ′ ) =  □ B.6 Proof of Theorem 4.Given a trace , the problem of checking if  has an optimistic sync-reversal data race, can be solved in time  T L (T 2 + L)N 2 =  (N 2 ) time.
e.  (T L (T 2 + L)N 2 time.□ B.7 Proof of Lemma 4.4 Lemma 4.4.Let  be a trace and let  ⊆ Events() be a (≤  TO , rf  )closed set. Opt  has a cycle iff  Abs  has a cycle.

𝑆 . □ B. 8
Proof of Theorem 4.4 Theorem 4.4.Assume SETH holds.Given an arbitrary trace , the problem of determining if  has an OSR race cannot be solved in time  (N 2− ) (where N = |Events()|) for every  > 0.

Figure 8 :
Figure 8: Given two sets ,  of vectors of length 2, our construction to show quadratic hardness of OSR race detection algorithm to solve the Orthogonal Vector problem [20, 36], i.e.OV Hypothesis states OV Problem has a lower bound of  (N ) 2 .We now reduce the existence problem of OSR race to the OV problem and show that the problem of determining if there is a OSR race in  also has a lower bound of  (N ) 2 , unless OV Hypothesis fails.
10 ,  19 ) is a data race that M2 can predict

Table 1 :
Evaluation on Category-1 (Java benchmarks).Columns 1-3 denote the name, number of events and number of threads for each benchmark.Columns 4-13 are the number of racy events (and racy program locations) reported and average running time of each algorithm.

Table 2 :
Evaluation summary on Category-2 (C/C++ benchmarks).Benchmarks are grouped based on their source, and each row corresponds to one group.Column 1 denotes the source and size of each group.Columns 2 and 3 respectively denote the range and the total number of events in each group.Column 4 denotes the range of number of threads in the benchmarks.Column 5-14 denote the total number of racy memory locations, and average running time (in minutes) reported by each algorithm.
□ B.4 Proof of Theorem 4.1 Theorem 4.1.Let  be a trace and let  1 ,  2 be conflicting events in .The problem of determining if ( 1 ,  2 1)   () ∈ Events(t A ) and   () ∈ Events(t B ), so that  ∈  and  ∈ .(2)= { |  ≤  TO prev  (  ())} ∪ { |  ≤  TO prev  (  ())} and ∀ lock ℓ ∈ Locks(S), there is at most one open acquire on ℓ, because ,  are orthogonal, so the clause   ,   don't hold the same lock.This proves  is potentially feasible (3)We guarantee  has no cycles, as no direct edge is from an acquire event to other events except thread order.Following the definition, it's obvious to see   () and   () is a OSR race, and thus we have proved there is a pair of orthogonal vectors in , , iff there is a OSR race in .If OV Hypothesis holds, then the problem of checking existence of OSR race has a lower bound of  (N ) 2 .

Table 3 :
Statistics of the Java benchmarks.N , T , V, L, Reads, Writes, Acq are the number of events, threads, variables, locks, read events, write events and acquire events after filtering, respectively.

Table 4 :
Summarized races and running time (in seconds) for RaceInjector traces

Table 5 :
Details on reported races and running time (in minute) by each algorithm on C/C++ benchmarks.Column 1-3 states the source of these benchmarks, trace name with number of threads and events number after filtering.Columns 4-13 are reported races and average running time by each algorithm.

Table 6 :
Details of C/C++ benchmarks.Columns 1-2 states the source of these benchmarks and trace name with number of threads.Columns 3-6 are number of events before filtering, number of events after filtering, number of variables, number of locks.Columns 7-10 are number of read, write, acquire and release events after filtering.