On-the-Fly Static Analysis via Dynamic Bidirected Dyck Reachability

Dyck reachability is a principled, graph-based formulation of a plethora of static analyses. Bidirected graphs are used for capturing dataflow through mutable heap data, and are usual formalisms of demand-driven points-to and alias analyses. The best (offline) algorithm runs in O(m+n· α(n)) time, where n is the number of nodes and m is the number of edges in the flow graph, which becomes O(n2) in the worst case. In the everyday practice of program analysis, the analyzed code is subject to continuous change, with source code being added and removed. On-the-fly static analysis under such continuous updates gives rise to dynamic Dyck reachability, where reachability queries run on a dynamically changing graph, following program updates. Naturally, executing the offline algorithm in this online setting is inadequate, as the time required to process a single update is prohibitively large. In this work we develop a novel dynamic algorithm for bidirected Dyck reachability that has O(n· α(n)) worst-case performance per update, thus beating the O(n2) bound, and is also optimal in certain settings. We also implement our algorithm and evaluate its performance on on-the-fly data-dependence and alias analyses, and compare it with two best known alternatives, namely (i) the optimal offline algorithm, and (ii) a fully dynamic Datalog solver. Our experiments show that our dynamic algorithm is consistently, and by far, the top performing algorithm, exhibiting speedups in the order of 1000X. The running time of each update is almost always unnoticeable to the human eye, making it ideal for the on-the-fly analysis setting.


INTRODUCTION
Dyck reachability in static analysis.Dyck reachability is an elegant and widespread graphbased formulation of a plethora of static analyses.The reachability problem is phrased on labeled directed graphs  = ( , ), where  is a set of nodes and  is a set of edges, labeled with opening and closing parenthesis symbols.Given two nodes  and , the task is to determine whether  is reachable from  by means of a path , such that the sequence of labels of the edges of  produce a parenthesis string that is properly balanced [Yannakakis 1990].In the static analysis domain,  serves as the program model, where nodes represent basic program constructs such as program variables, pointers, or statements, and edges capture data flow between these constructs.Effectively, program executions carrying data flow between distant program points are captured by paths in .Naturally, as  is an approximate model of the program, it may have paths that do not correspond to any valid program execution, leading to spurious data flow and thus to false positive warnings.In order to increase the precision of the analysis, parenthesis symbols are used to model certain restrictions in the data flow, such as sensitivity to calling contexts and field accesses [Reps 1997[Reps , 2000;;Späth et al. 2019].Focusing on Dyck reachability on  (as opposed to plain reachability) thus filters out paths that are guaranteed to not correspond to valid program executions, thereby increasing the precision of the analysis.
As a modeling formalism, Dyck reachability strikes a remarkable balance between simplicity and expressiveness, and has been used to drive a wide range of static analyses, such as interprocedural data-flow analysis [Reps et al. 1995], slicing [Reps et al. 1994], shape analysis [Reps 1995a], impact analysis [Arnold 1996], type-based flow analysis [Rehof and Fähndrich 2001], taint analysis [Huang et al. 2015], data-dependence analysis [Tang et al. 2017], alias/points-to analysis [Lhoták and Hendren 2006;Xu et al. 2009;Zheng and Rugina 2008], to only name a few.In practice, widely-used analysis tools, such as Wala [Wal 2003] and Soot [Bodden 2012] equip Dyck-reachability techniques to perform the analysis.From a complexity perspective, answering a single Dyck-reachability query "is  reachable from ?" takes  ( 3 ) time, where  is the number of nodes of , and although some slight improvements are possible [Chaudhuri 2008], this cubic dependency is often considered prohibitive [Heintze and McAllester 1997].
Bidirectedness.One important and practically motivated variant of Dyck reachability is that of bidirectedness.Intuitively, a bidirected graph  has the property that every (directed) edge is present in both directions with complementary labels: an edge  ( − →  is present in  if and only if  ) − →  is also present in .Bidirectedness makes reachability an equivalence relation: if  is reachable from  via a properly balanced path, the same path backwards (i.e., from  to ) is also properly balanced, thereby witnessing the reachability of  from .Thus, the nodes of  can be partitioned into equivalence classes of inter-reachable nodes, called Dyck Strongly Connected Components (or DSCCs, for short).For example, in Figure 1  From a semantics perspective, bidirectedness has been a standard approach to handle mutable heap data [Lu and Xue 2019;Sridharan and Bodík 2006;Xu et al. 2009;Zhang and Su 2017] -though it can sometimes be relaxed for read-only accesses [Milanova 2020], and the de-facto formulation of demand-driven points-to analyses [Shang et al. 2012;Sridharan et al. 2005;Vedurada and Nandivada 2019;Yan et al. 2011;Zheng and Rugina 2008].From an algorithmic perspective, bidirectedness allows Dyck reachability to be computed more efficiently [Yuan and Eugster 2009;Zhang et al. 2013], with the fastest algorithm running in  ( +  •  ()) time [Chatterjee et al. 2018], where  is the The edges are labeled , , ,  and correspond to the fields of ATree.For notational convenience, we only draw the edges labeled with closing parentheses, the reverse edges with opening parentheses are implied by bidirectedness.On the initial program (before1 and 2 ), bidirected Dyck-reachability reports four DSCCs: {, , }, { }, {}, and {ℎ}.The number of nodes,  is the number of edges, and  () is the inverse Ackermann function 1 .Compared to the cubic bound of directed Dyck reachability, bidirectedness thus allows for a large speedup in the order of  2 .Due to this algorithmic benefit, bidirectedness also serves as an overapproximation to directed reachability, and has been suggested as a mechanism to speed up challenging static analysis and verification problems [Ganardi et al. 2022b,a;Li et al. 2020].
On-the-fly static analysis.In the everyday practice of program analysis, the analyzed code base is not static but rather subject to perpetual change; source-code lines are added and removed following patches, new modules and libraries, and bug fixes.In an even more demanding setting, lightweight static analyzers are embedded inside the IDE and run on-the-fly, so as to enable faster, more robust and more productive development.As has been observed in the literature, a major computational challenge in static analyses is in their apparent inability to adapt to these continuous changes efficiently [Arzt and Bodden 2014;Zadeck 1984].Existing works approach this challenge mostly by tweaking the offline analyses and adapting them to the dynamic setting.Typically, such adaptations only focus on incremental updates (i.e., only the addition of code lines) and are based on some form of caching [Arzt and Bodden 2014;Burke and Ryder 1990;Liu et al. 2019;Pacak et al. 2020;Szabó et al. 2016].Decremental updates (i.e., the deletion of code lines) is much more difficult, intuitively, due to the fixpoint nature of static analyses.As such, analyses that fully follow all code changes (both incremental and decremental) thus far offer no efficiency guarantees.This paper tackles this challenge of on-the-fly, full dynamic analyses formulated as bidirected Dyck reachability in a rigorous and provably efficient way. Figure 1 illustrates the use of dynamic bidirected Dyck reachability for on-the-fly field-sensitive alias analysis on a small example.
Our contributions.We consider the problem of maintaining the DSCCs of a dynamic graph , representing the on-the-fly static analysis of a program, where source code changes are both incremental and decremental.In particular, an incremental update inserts an edge in , while a decremental update deletes edges from , and the task is to restore the DSCCs of  after each such update.To this end, we develop an algorithm DynamicAlgo, with guarantees as stated in the following theorem.Theorem 1.1.Given a bidirected graph  of  nodes, DynamicAlgo correctly maintains the DSCCs of  across edge insertions and edge deletions, and uses at most  ( •  ()) time for each update.
Observe that in general this is much faster than the optimal offline algorithm of [Chatterjee et al. 2018], as the latter spends time that is at least proportional to the number of edges  (which can be up to  = Θ( 2 )).On the other hand, it can be easily shown that a single update (either edge insertion or deletion) can affect Θ() DSCCs.Thus DynamicAlgo is effectively optimal, at least in the natural setting where DSCCs have to be explicitly maintained at all times.
We have implemented our dynamic algorithm and evaluated its performance on Dyck graphs that arise on data-dependence analysis and alias analysis, on standard benchmarks.Our experiments show that our dynamic algorithm is consistently, and by far, the top performing algorithm compared to (i) the optimal offline algorithm for the problem, and (ii) a fully dynamic Datalog solver, arguably the two most relevant approaches to our problem setting.In particular, our dynamic algorithm exhibits speedups that are between 100X-1000X for data-dependence analysis and 1000X for alias analysis (which is also the more demanding of the two), compared to the best alternative.Although its theoretical worst-case complexity is linear in , its average running time is barely (if at all) noticeable for all practical purposes, making it suitable for the on-the-fly analysis setting that runs continuously during development.
A note on earlier approaches.Dynamic bidirected Dyck reachability was studied recently in [Li et al. 2022].To this end, that work claims a dynamic algorithm with  ( •  ()) running time per update operation, similarly to our Theorem 1.1.Unfortunately, both the complexity and the correctness of [Li et al. 2022] overlook certain cases.In particular, that algorithm exhibits Ω( 2 ) running time, as opposed to the claimed (nearly) linear bound.This quadratic behavior arises even on sparse graphs (i.e., with  =  () edges), for which the vanilla offline algorithm runs in  ( •  ()) time.Hence on such graphs, processing a single line deletion in the source code is  times slower than performing the whole analysis from scratch.Moreover, that algorithm also suffers from correctness issues, as it fails to detect that certain nodes become unreachable after updates.In such cases, the on-the-fly static analysis returns wrong results.We provide further details in Appendix B.
Intuition behind our approach.In high-level, our approach works as follows.First, we observe that the component graph (i.e., the graph with a single node representing each DSCC) is sparse.Each edge insertion can then be handled by executing the offline algorithm of [Chatterjee et al. 2018] not from scratch, but directly on the already constructed component graph, yielding  ( •  ()) running time.The main difficulty arises in edge deletions, as each deletion seemingly requires repeating all the computation from scratch, i.e., operating on the initial graph, which might be dense and thus incur  ( 2 ) running time.We leverage techniques from dynamic undirected connectivity, together with some sparsification ideas that are specific to our richer bidirected setting, to start the recomputation not from scratch, but from a suitable preliminary component graph, called the primary component graph, that is already sparse, thereby recovering the  ( •  ()) running time.Maintaining the primary component graph efficiently is the main technical challenge.

PRELIMINARIES
In this section we develop general notation on labeled graphs, Dyck reachability, and fully-dynamic reachability queries.As this is somewhat standard material, our exposition follows closely that of related works (e.g., [Chatterjee et al. 2018;Kjelstrøm and Pavlogiannis 2022;Zhang et al. 2013]

Dyck Reachability on Bidirected Graphs
Dyck Languages.Hereinafter, we fix a natural number  ∈ N. Let [] denote {1, . . .,  }.A Dyck alphabet is a set Σ = {  ,   }  ∈ [ ] of  matched symbols, usually referred to as parentheses, where Open(Σ) = {  }  ∈ and Close(Σ) = {  }  ∈ are the sets of opening parenthesis symbols and closing parenthesis symbols of Σ, respectively2 .The Dyck language D is the set of strings over Σ * produced by the following grammar with initial non-terminal I: , where   is a closing parenthesis symbol if   is an opening parenthesis symbol, and vice versa.For example, if Graphs and Dyck reachability.We consider labeled directed graphs  = ( , ) where  is a set of nodes and  is a set of labeled edges (, , ), where ,  ∈  and  ∈ Σ ∪ {}.For notational convenience, given an edge , we will refer to its label as ().We occasionally are not interested in the label of an edge, in which case we will simply denote it with its endpoints, e.g.,  = (, ).We also adopt the pictorial notation that is, it is the concatenation of the labels of its edges.A path can also be an empty sequence of edges, in which case its label is .We often write  :  ⇝  to denote a path from node  to node .Naturally, we say that  is reachable from  if such a path exists.Moreover, we say that  is Dyck-reachable from  if there exists a path  :  ⇝  such that () ∈ D, in which case we call  a Dyck path.
In alignment to existing literature ( [Chatterjee et al. 2018;Zhang et al. 2013]), we consider that the number of parenthesis types is constant compared to the size of  (i.e.,  =  (1)).From a theoretical standpoint this is not an oversimplification, as any Dyck graph with arbitrary  can be transformed to one with  =  (1), while preserving the reachability relationships (see e.g., [Chistikov et al. 2022]).
Bidirected Graphs and Dyck SCCs.We call a graph  = ( , ) bidirected if every edge in  appears in both ways with complementary labels.Formally, let  = , and we have: Bidirectedness turns reachability into an equivalence relation, similarly to plain undirected graphs.
To realize this, notice that every path  :  ⇝  has a reverse version  :  ⇝  with () = ().
Hence, if  witnesses the Dyck reachability of  from , then  witnesses the Dyck reachability of  from .Naturally, the nodes of  are partitioned into strongly-connected components (SCCs), which are maximal sets of pairwise Dyck-inter-reachable nodes.We will call such sets Dyck strongly connected components, or DSCCs, for short.Note that, in contrast to standard SCCs, paths witnessing the Dyck reachability of two nodes in the same DSCC might have to traverse nodes outside the DSCC.For example, in Figure 1 (top right), the paths witnessing the Dyck reachability of nodes  to  and  to  go via node  , which is outside the DSCC {,  }.Given a node , we let DSCC() be the DSCC in which  appears.
For notational convenience, hereinafter we represent bidirected graphs by only explicitly denoting their edges labeled with closing parentheses, with the understanding that the reverse edges (labeled with an opening parentheses) are also present.In our figures, when a graph contains edges of different labels, we color-code the edges for easier visual representation.
The efficiency of DSCC computation.In Dyck-reachability formulations of static analyses, the analysis queries are phrased as Dyck-reachability queries in the underlying graph.As we have seen above, the nodes of bidirected graphs are partitioned into DSCCs.Thus, reachability queries can be handled by first computing these DSCCs.Then, given a reachability query on two nodes , , we answer that they are inter-reachable iff they are found in the same DSCC, which costs a simple lookup of constant time.Computing Dyck reachability on general (non-bidirected) graphs takes cubic time  ( 3 ) even for a single pair.On the other hand, it is known that the DSCCs of bidirected graphs can be computed efficiently.Theorem 2.1 ([Chatterjee et al. 2018]).Given a graph  of  nodes and  edges, the DSCCs of  can be computed in  ( +  •  ()) time, where  () is the inverse Ackermann function.

Dynamic Reachability
Dynamic reachability on bidirected graphs.As the program source is being developed, the underlying graph model changes shape due to the addition and removal of nodes and edges.The goal of the fully dynamic setting is to maintain Dyck-reachability information on the fly under such changes, so that analysis queries can be performed fast.More formally, we are processing a sequence of operations S = ( 1 ,  2 , . . .,   ), where each operation   has one of the following types.This sequence of operations creates a sequence of graphs , where  1 is the initial graph, and  +1 is obtained from   by performing the operation   (naturally, if   = DSCCRepr() is a query operation,  +1 =   ).Note that we have kept the node set identical across all   .Indeed, this is a standard approach for studying such problems: we may simply assume that  contains all nodes that are ever added to the graph, and instead of deleting a node, we can delete all its adjacent edges.Finally, the reachability of  from  can be checked with two successive queries, i.e., testing whether DSCCRepr() = DSCCRepr().
Although in general there might be a spectrum of trade offs in the time taken between operations modifying the graph (insert(, , ), delete(, , )) and reachability queries (DSCCRepr()), the linear upper bounds we establish here for updates means that query operations can be done in constant time  (1).This holds because, in theory, after every update operation we can easily, in  () time (which is within the upper bound we establish for processing the update operations), retrieve the DSCC of each node to a simple lookup table that will lead to subsequent queries taking  (1) time each.In practice, of course, this approach is to be avoided, as dynamic updates cost much less than  () time, hence building the full lookup

Reference
Update time Query time Guarantee [Eppstein et al. 1997]  ( √ )  (1) Worst-case [Holm et al. 2001]  (log 2 )  (log ) Amortized and undirected variants.However, the richer semantics of Dyck reachability makes the Dyck setting quite more intricate.Our solution to the problem uses as a black box a data structure for dynamic reachability on undirected graphs.Although there exist several results in this direction, Table 1 shows two standard data structures that are relevant to our work.The one developed in [Eppstein et al. 1997] offers standard worst-case guarantees for the time to perform each update (i.e., inserting/deleting an undirected edge) and query (i.e., obtaining the component of a node ).The one developed in [Holm et al. 2001] offers amortized guarantees for the respective tasks.That is, some operations might exceed the stated time bounds, but this excess is balanced in later operations, so that the average time per operation is the one stated.As we will see later, we will be using the former technique to establish our bound in Theorem 1.1, but the latter in our experiments.

EFFICIENT DYNAMIC BIDIRECTED DYCK REACHABILITY
In this section we present our dynamic algorithm for bidirected Dyck reachability.As this is the main technical section of our paper, we provide here an outline of its structure to guide the reader.
(1) In Section 3.1 we present the key algorithmic intuition for solving Dyck reachability on bidirected graphs, and revisit the offline algorithm for the problem, as presented in [Chatterjee et al. 2018].Some aspects of the offline algorithm (in particular, its fixpoint computation) will also be used later in our dynamic algorithm.(2) In Section 3.2 we present the main technical challenges for handling dynamic updates efficiently, and the key concepts our algorithm uses for handling each challenge.This section is filled with examples that illustrate how each concept is used, and what data structure is used to support it.(3) In Section 3.3 we present DynamicAlgo in detail, with pseudocode followed by a high-level description of each of its blocks.The pseudocode is also heavy in comments to guide the reader.(4) In Section 3.4 we give a step-by-step execution of the algorithm on the running example of Figure 1. ( 5) Finally, in Section 3.5, we establish the correctness and complexity of DynamicAlgo.We state its main invariants, together with some intuition around them and how they are used towards the correctness and complexity lemmas, while we delegate the proofs to Appendix A.

Offline Reachability
We start with the optimal offline algorithm for computing bidirected Dyck reachability [Chatterjee et al. 2018], i.e., for obtaining Theorem 2.1.This will allow us to build some intuition about the problem.Moreover, our data structure for handling graph updates later will use some components of the offline algorithm.
Intuitive description.The principle of operation of the algorithm behind Theorem 2.1 (initially observed in [Zhang et al. 2013]) is as follows.the same DSCC, witnessed by the path where the intermediate path  ⇝  is one witnessing the reachability of  from  -such a path exists since  and  belong to the same DSCC.In fact, any node  ′ in the DSCC of  can reach any node  ′ in the DSCC of , via the path Thus the DSCCs of  and  can be merged into one DSCC  2 .We can now repeat the above process to discover Dyck paths that go through  2 , and so on, up to a fixpoint.Figure 2 illustrates this process on a small example.
Algorithm OfflineAlgo.The above insights are made formal in Algorithm 1, which is an adaptation of the algorithm as appeared initially in [Chatterjee et al. 2018].The first insight that OfflineAlgo rests on is that, since DSCCs are equivalence classes that only merge together during the fixpoint computation, they can be maintained efficiently using a disjoint-sets data structure that supports Union/Find operations.The second insight is to use linked lists for storing the edges outgoing each DSCC and labeled with a given symbol .When two DSCCs are merged into one, the algorithm also merges the corresponding linked lists, which can be done very efficiently, in  (1) time (using simple pointers).In particular, OfflineAlgo performs two big steps.First, it initializes a disjointsets data structure DisjointSets and a worklist Q (function Initialization() in Line 1).Then, it uses DisjointSets and Q to compute the fixpoint (function Fixpoint() in Line 2), achieving the running time of  ( +  •  ()).As the correctness and complexity are established in [Chatterjee et al. 2018], we do not re-establish them here.

Main Concepts and Intuition
Having outlined the basic algorithm for offline reachability, we now turn our attention to the data structure for dynamic reachability.In high level, our data structure will use the same components as OfflineAlgo, in particular, the representation of DSCCs using the DisjointSets data structure, the Edges linked lists for storing the outgoing edges of each component, and the Fixpoint() function for computing the DSCCs as a fixpoint after every graph update.However, instead of starting the computation from scratch, DisjointSets and Edges (as well as the worklist Q of Fixpoint()) will be at a state that represent some DSCCs that are guaranteed to exist in the updated graph, and thus need not be recomputed.The key technical challenge we have to solve is in computing a non-trivial such state, and doing so efficiently.In this section we introduce the main concepts of our dynamic algorithm and provide the necessary intuition around them.The precise algorithm is presented in the next section.
High-level description of the dynamic algorithm.The main components that are new to our dynamic algorithm compared to the offline algorithm of Section 3.1 are geared towards handling edge deletions.Assume that we have computed the DSCCs of a graph , and we now have to process an operation delete(, , ).This will, in general, result in splitting some DSCCs into smaller ones.In high level, we handle such edge deletion in three steps.
(1) We compute a sound overapproximation of the DSCCs that are affected by the edge deletion, i.e., those that might have to be split into smaller components.We compute this overapproximation by effectively performing a forward search starting from DSCC() and repeatedly proceeding to neighboring DSCCs of .In particular given a current DSCC , we proceed to those DSCCs   ′ that have at least two incoming edges of the form   − → , with  ∈  and  ∈  ′ , and some label .This is because  ′ might have been formed through  and the corresponding edges Although this traversal might end up touching Θ() DSCCs, we expect that in practice it will perform much better, as the effect of an average edge deletion is usually very local.
(2) At this point, it would suffice to split all nodes  in the previously discovered DSCCs  ′ into singleton components, gather in the worklist Q (see Algorithm 1) all pairs (, ) corresponding to incoming edges   − →  where  is the unique node of such a singleton component, and restart the fixpoint computation using Fixpoint().Unfortunately, this would require Θ( 2 ) time, which is beyond our target bound, as there can be Θ( 2 ) such edges   − → , and Line 16 of Algorithm 1 would iterate over all of them (where  =  and  =  in the algorithm).To circumvent this difficulty, we introduce the novel notion of primary DSCCs (or PDSCCs), which, intuitively, are connected components of small size.We keep track of the formation of PDSCCs dynamically by leveraging techniques from undirected dynamic reachability (see e.g., Table 1).Now, instead of splitting each previously discovered DSCC  ′ into singleton components, we split it into its PDSCCs.This allows us to re-initiate the fixpoint computation of the function Fixpoint() after only traversing  () edges of the form   − →  (where  now is a node in a PDSCC).In practice and due to the previous step, we typically traverse much fewer edges than .In turn, this implies that Fixpoint() will converge after  () iterations, leading to the desired time bound.
We now describe the above concepts in detail.
Primal graphs and primary DSCCs.Consider a bidirected graph  = ( , ).The primal graph  = ( , ) is an unlabeled, undirected graph with the same node set, and edge set defined as: In words,  and  share an edge in  if they are connected in  via a Dyck path of length 2. A primary DSCC (PDSCC) of  is a (maximal) connected component of the primal graph  .It is not hard to see that the PDSCC partitioning of  is a refinement of its DSCC partitioning, i.e., each DSCC contains one or more PDSCCs.See Figure 3 for an illustration.Similarly to the case of DSCCs, given a node , we let PDSCC() be the PDSCC in which  belongs, while PDSCCRepr() returns a representative of PDSCC() that is common for all nodes  ∈ PDSCC().Hence, two nodes ,  are in the same PDSCC iff PDSCCRepr() = PDSCCRepr().Table 1) to maintain them efficiently.PDSCCs serve the following function.When an edge is deleted, some DSCCs  of  have to be split to smaller DSCCs.This can lead to effectively repeating the fixpoint computation from scratch in .However, this approach would require Θ( 2 ) running time, which is beyond our complexity bound.Instead, maintaining the PDSCCs allows us to split  to its PDSCCs (as opposed to individual nodes), and thus avoid recomputing the PDSCCs from scratch.
In turn, this allows us to achieve the desired  ( •  ()) bound for edge deletions.
A sparsification approach for maintaining the PDSCCs.Since each PDSCC corresponds to a connected component of the undirected primal graph  , we will maintain PDSCCs dynamically, by using any data structure for dynamic reachability on undirected graphs (see Table 1).We call this data structure for undirected reachability PrimCompDS.
An operation  = insert(, , ) inserts between 0 and  − 1 undirected edges in the primal graph  .Indeed, we have an edge (, ) for every node  ≠  for which already   − → , and there may be  − 1 such nodes .Note, however, that if there already exist two such nodes, , , with   − →  and   − → , then  and  already appear in the same PDSCC of .Hence, we can faithfully maintain the PDSCCs of  by only adding one of the two primal edges (, ) and (, ) in PrimCompDS.This has the desirable effect of maintaining PDSCCs after edge insertions by inserting  (1) undirected edges in PrimCompDS, as opposed to  ().This kind of technique is known as sparsification [Eppstein et al. 1997], in that we manage to represent the full connectivity of the primal graph  while only storing a subset of its edges.In particular, we achieve this effect by maintaining the outgoing edges of  labeled with  in a linked list OutEdges Conversely, an operation  = delete(, , ) deletes between 0 and  − 1 undirected edges in the primal graph  , with reasoning similar to the one above.However, because of the above invariant, we can restore the PDSCCs of  by removing the edges (, ) and (, ) in PrimCompDS, where  and  are the neighbors of  in OutEdges[] [].Moreover, as  and  are no longer connected in PrimCompDS, we restore their connectivity by inserting the edge (, ) in PrimCompDS. Figure 4 illustrates this sparsification technique on a small example.The update insert(, , ) creates two edges in the primal graph  2 , namely (, ) and (, ).However, the dashed edge (, ) is not stored explicitly in PrimCompDS, as  and  are already connected via .The update delete(, , ) removes the edges (, ) and (, ) from the primal graph  3 .To preserve the connectivity of  and , the dashed edge (, ) is restored in PrimCompDS.Maintaining the neighbors of PDSCCs.The maintenance of the PDSCCs allows us to efficiently identify the incoming neighbors of a PDSCC, in  () worst-case time, although the graph might have Θ( 2 ) edges.This fact stems from the following observation.For a given node  and label , if there are two edges   − →  and   − → , then  and  are in the same PDSCC.In other words, for every such  and , there is at most one PDSCC that  is a neighbor of via a -labeled edge.Thus, given a set of PDSCCs (as described in Item ( 2) above), we can obtain all edges incoming to them in  () time, by iterating over all nodes  and labels .Though this is sufficient towards our linear time bound, it can become unnecessarily slow when the number of such PDSCCs is small.Instead, we follow a different approach, which retains the  () worst-case behavior but is faster in practice.Processing a delete operation.We are now in position to outline our approach for processing a delete operation delete(, , ).As the edge   − →  might be used to connect  to other nodes in DSCC(), this operation might split DSCC() into smaller DSCCs.As nodes become disconnected in DSCC(), other DSCCs might also be split, if the paths connecting their nodes passed through DSCC(), resulting in a cascading effect.For instance, consider Figure 1  Thus, a single edge deletion may have an effect which propagates to the whole graph; however, we can obtain a sound overapproximation of the DSCCs affected by the initial edge deletion using the following observation.If the split of a DSCC  leads to the immediate split of another DSCC  ′ , then there exist two distinct edges    − →   , for  ∈ [2], such that   ∈  and   ∈  ′ .Thus, for delete(, , ), we can obtain our overapproximation by starting a forward search from DSCC(): given a current DSCC , we iterate over all labels  such that  has at least two -labeled outgoing edges, and proceed to the resulting DSCC  ′ (note that there can be at most one such  ′ ).
After the overapproximation of the set of potentially splitting DSCCs has been performed, each such DSCC  is split to its constituent PDSCCs, using the undirected connectivity data structure PrimCompDS.Note that this splitting is an underapproximation of the set of the final DSCCs, as some PDSCCs have to be merged again.We will discover this by running the fixpoint computation again, up to convergence.For this, we re-insert in the worklist Q (see function Fixpoint() in Example. Figure 6 illustrates the above process on a small example, processing an update delete(, , ).Before the deletion (top), the graph has four DSCCs, namely  1 ,  2 ,  3 ,  4 (marked).In the first step, we perform a forward search from  2 , following pairs of same-label edges, which discovers  2 ,  3 and  4 as the set of DSCCs that are potentially affected by the delete operation.Observe that  1 does not have to be split, and the algorithm avoids recomputation on  1 .Then,  2 ,  3 and  4 are split to their constituent PDSCCs (middle).Observe that  4 is only partly split, as its subset {, , } is a PDSCC (with  = PDSCCRepr()) in the new graph (after deletion).Hence the algorithm avoids recomputing  4 from scratch.Afterwards, the algorithm uses the InPrimary data structure to collect the set of pairs L = {(, ), (ℎ, ), (, ), (, ), (, ), (, ), (, ), (, ), (, ), ( , )}, i.e., the pairs (DSCCRepr(), ) such that DSCC()  − → PDSCC( ), where  is a node in the newly formed PDSCCs.Observe that the pair ( , ) is not collected in L even though   − → , as this edge is incoming to DSCC  1 which was deemed as not affected by the edge deletion.Note that some of these PDSCCs can be merged during the Fixpoint computation to obtain the new set of DSCCs post deletion.For this, a pair (, ) ∈ L is added to Q, if there are two outgoing edges on  from 's component.In this example, only (, ) is added in Q since there are two edges on  from 's component, namely

The Dynamic Algorithm
We now make the concepts of the previous section formal, by developing precise algorithms for handling each insert and delete operation.In a static analysis context, the edges of the underlying graph correspond to statements in the analyzed source code.This means that a certain edge might be inserted several times in the graph, if, for example, it comes from a repeating program statement.Note, however, that only the presence of the edge impacts the reachability analysis and not its Operation insert(, ,  ).Algorithm 2 handles each insert(, , ) operation.In words, the algorithm first implements the logic for sparsely maintaining the PDSCCs (Line 3 to Line 8).In particular, if  is the first -neighbor of , then  is marked as a -neighbor of PDSCC() via , by inserting  in InPrimary[] [] (Line 4).Otherwise,  is already marked via some existing -neighbor  of , which is also in PDSCC().Then the algorithm inserts an edge (, ) in the PrimCompDS data structure (which maintains undirected connectivity) to reflect the change in 's PDSCC.
The second step of Algorithm 2 (Line 9 to Line 13) prepares the fixpoint computation that might be triggered due to the new edge.For this, it marks that  is a new -neighbor of DSCC() (Line 9, as the DSCC is represented by its root node ).If there already exists another -labeled edge out of DSCC(), then the pair (, ) is inserted in Q, and the Fixpoint() is called to continue the fixpoint computation for merging the DSCCs (in DisjointSets) (note that Fixpoint() is also used by OfflineAlgo and defined in Algorithm 1). of PDSCCs (Line 15).As the edge deletion might lead to the splitting of some DSCCs, the second step of Algorithm 3 splits some of the previously computed DSCCs and re-initiates the fixpoint computation (Line 16 and Line 17).
Function MakePrimary().Algorithm 4 prepares the new fixpoint computation by partially splitting some DSCCs to their constituent PDSCCs.In words, the algorithm first computes an overapproximation of the splitting DSCCs by running a forward graph search from DSCC() (Line 1 to Line 10), every time transitioning to new DSCCs by following pairs of same-label edges outgoing the current DSCC.This transitioning reflects the intuition that such pairs of edges might have used to form the neighboring DSCC -by splitting the current one, we might have to split the neighbor as well.It begins the forward search by populating DSCCRepr() in a set Z and a queue S. Eventually, after Line 1 to Line 10, Z will contain the roots of all affected DSCCs (including DSCC()) while S helps with the forward search of affected DSCCs.
The second step (Line 11 to Line 22) of Algorithm 4 splits these potentially affected DSCCs to their constituent PDSCCs.In Line 19, the algorithm performs this split in DisjointSets by iterating over the nodes of the DSCCs stored in Z. R holds the roots of all the newly formed PDSCCs after splitting (Line 20).Then Edges[] [•] is reset, where  is either the root node of an affected DSCC (Line 22) or is the root node of an unaffected DSCC which has an outgoing edge entering one of the affected DSCCs (Line 18).The re-initialization is done keeping in mind the split of the affected DSCCs.These will be re-populated in the third step with edges outgoing or entering the nodes of the newly formed PDSCCs.Although we are guaranteed that these DSCCs do not have to be split below their PDSCCs, it may be the case that certain PDSCCs have to merge again.
In its third step, Algorithm 4 identifies the edges incoming to these PDSCCs, and inserts them in the set L (Line 23 to Line 30).L is later used to decide whether two PDSCCs must be merged or not.It also populates Edges with (i) edges incoming to the PDSCCs (lines Line 27 to Line 29) as well as with (ii) edges (Line 31 to Line 34) outgoing from the PDSCCs, and entering an unaffected DSCC (those whose roots are not in R).Note that edges coming out of a PDSCC and entering an affected DSCC (those whose roots are in R) are covered by Line 27 to Line 29.
To merge PDSCCs, the worklist Q is populated using relevant entries of L to drive the new fixpoint computation (Line 23 to Line 37).In more detail, the algorithm iterates over all PDSCCs (Line 24), and for each node  in each PDSCC (Line 25) and label  (Line 26), it identifies the edges  By structuring MakePrimary this way we have the following benefits: (i) we spend no time in the non-affected DSCCs, and (ii) because we maintain the PDSCCs, the time cost is only linear in the number of nodes of these affected DSCCs (and hence  () in the worst case), as opposed to linear in the number of the edges incoming to these affected DSCCs (which would be  ( 2 )).This ensures that ℓ = , |Edges[] [] | =  () at the end of MakePrimary before calling Fixpoint; as shown in [Chatterjee et al. 2018, Lemma 3.4, Lemma 3.5], a call to Fixpoint() takes time  (ℓ •  ()).

Example
Here we give a step-by-step illustration of DynamicAlgo on the example of Figure 1.We start with a description of the contents of the various data structures before handling the edge insertion.
Let us fix some notations.Let ( 1 ,  2 , ) :  denote that there are  edges from  1 to  2 labeled .The elements of the set Count will contain entries of this kind.Likewise, let (, ) : [ 1 ,  2 , . . .,   ] represent that there are out edges labeled  from  to  1 , . . .,   .OutEdges contains entries of this kind.We let InPrimary to contain entries of the kind (, ) : [ 1 , . . .,   ] to denote that there are edges labeled  from each of  1 , . . .,   to .The components in the DisjointSets and PrimCompDS data structures are listed as sequences of nodes {(), , , . . .}, where the node in the parenthesis denotes the component representative.
(3) Z contains  , ℎ and by following InPrimary, algorithm re-initializes Edges [𝑓 ] [], Edges [𝑓 ] [] and Edges [ℎ] [] to an empty list.(4) Since Z contains  and ℎ, we split DSCC (𝑓 ) and DSCC(ℎ) to their PDSCCs.So, for the L, computed in above step, no element will be added to the Q. (6) Since Q is empty, the call to Fixpoint() will not merge any components.
After the deletion, the data structures are in the following state.Next, we state the main invariants that DynamicAlgo maintains which support its correctness and complexity, as well as some intuition behind them.For proofs, we refer to Appendix A.
Correctness.The basis of the correctness of DynamicAlgo is a number of invariants that are maintained along edge insertions and deletions.Observe that OutEdges[] [] is, at all times, a linked list representation of the edge set   − → •.
Our first invariant concerns the correct maintenance of the PDSCCs of  in the PrimCompDS data structure maintaining undirected connectivity.To prove the invariant, we argue that DynamicAlgo inserts and removes sufficient and necessary undirected edges in PrimCompDS to (sparsely) represent the connected components of the corresponding primal graph.This follows directly from the sparsification approach outlined in Section 3. The next lemma captures the correctness of MakePrimary() and the invariants concerning the state that it passes on to function Fixpoint() for the final fixpoint computation after processing a delete(, , ) update.Recall that MakePrimary() identifies an overapproximation of the DSCCs that have to be split and rebuilt as a result of this edge deletion.The lemma has two parts.Item (1) states that the components that MakePrimary() passes on to Fixpoint() (i.e., those found in DisjointSets) are a refinement of the DSCC decomposition of  after the deletion, i.e., it suffices to merge some of them in order to arrive at the correct DSCC-decomposition of the graph after the edge deletion.The Fixpoint() function will perform this merging by processing the edges found in the Edges data structure.Item (2) states that MakePrimary() populates the Edges data structure with sufficiently many edges for Fixpoint() to process and arrive at the correct DSCC decomposition of .Given the above invariants, MakePrimary creates a correct state of the worklist Q and the Edges linked lists for the Fixpoint() call of Algorithm 3 (Line 17) to compute the correct DSCC decomposition.This leads ot the correctness of DynamicAlgo.Lemma 3.5.After every insert(, , ) and delete(, , ), DisjointSets contains the DSCCs of .
Complexity.We now turn our attention to the complexity bound of Theorem 1.1.We always start with a graph of  nodes but without any edges, for which the initialization of all data structures takes  () time.In practice, the initial graph might already have some edges, which can be thought of being inserted one-by-one.
In high level, the time DynamicAlgo spends in each edge insertion and deletion is the sum of two parts: (i) the time taken for maintaining the PDSCCs in the PrimCompDS data structure, and (ii) the time taken for all other computations.Regarding (i), so far we have not specified the precise data structure for implementing PrimCompDS, as DynamicAlgo treats PrimCompDS as a black box.The theoretical guarantees of Theorem 1.1 can be obtained by using the data structure of [Eppstein et al. 1997] for undirected connectivity (see Table 1).Although this guarantees  ( √ ) and  (1) time for edge insertions/deletions and queries, respectively, here we only use the fact that both bounds are less than .Regarding (ii), the algorithm spends  ( •  ()) for each operation, which stems from the fact that the algorithm encounters each node of  across all data structures a constant number of times (recall that we have  =  (1) labels in ).In particular, we have the following lemma.Lemma 3.6.Every insert(, , ) and delete(, , ) operation is processed in  ( •  ()) time.
Proof.After every insert and delete operation, Fixpoint() is invoked for merging the DSCCs (in DisjointSets).As shown in [Chatterjee et al. 2018, Lemma 3.4, Lemma 3.5] Edge insertions.Consider the processing of an operation insert(, , ) by Algorithm 2. Observe that the algorithm has no loops, thus the running time is dominated by the time taken to access the various data structures that the algorithm maintains.In particular, OutEdges is a simple linked list, and checking its length (Line 3), as well as accessing and inserting in the head (Line 6 and Line 8) takes constant time.Similarly, the data structures InPrimary and Edges are simple sets and linked lists with  (1) accesses.As we have already argued in Section 2.2, performing a DisjointSets.Find() operation (Line 9) takes  (1) time.Finally, inserting the edge (, ) in PrimCompDS takes  ( √ ) time [Eppstein et al. 1997].
Edge deletions.Consider the processing of an operation delete(, , ) by Algorithm 3. The time spent in the body of Algorithm 3 is  (), by an analysis very similar to Algorithm 2 for edge deletions, and we will not repeat it here.Instead, we focus on the time spent in the call to MakePrimary(), which is the more complex part of processing the edge deletion (Algorithm 4).In the first step (Line 1 to Line 10), for each iteration of the while loop of Line 4, the condition in Line 7 can be checked in  (| DSCC()|) time, by iterating over all nodes of DSCC().Observe that DSCC() is examined only once throughout this loop of Line 4, and since the DSCCs partition the node set, total time for running this loop is  ().
The time spent in the second step (Line 11 to Line 22) is the sum of two parts.The first part corresponds to the time spent in the nested loops, which is proportional to the number of times the inner-most loop in Line 15 is taken.That is, the number of nodes  that exist in the lists InPrimary [𝑡] [] of nodes  in the components represented by their roots in Z.It suffices to argue that there do not exist nodes ,  .Thus, for every label , the loop in Line 15 is executed at most once per node  leading to  () total iterations.The second part corresponds to the total time spent in Line 19.In Line 19, the algorithm splits the potentially affected DSCCs (in DisjointSets) into its PDSCCs.This takes  () time as the algorithm simply iterates over the nodes of the DSCCs stored in Z.For each such node, the algorithm performs a single membership query to PrimCompDS, which takes  (1) time [Eppstein et al. 1997].
Finally, we consider the time spent by MakePrimary() in the third step (Line 23 to Line 37).Note that the time spent in these nested loops is proportional to the number of times the inner-most loop (Line 27) is executed.Again, it suffices to argue that there do not exist nodes ,  1 ,  2 and label  such that Thus, for every label , the loop in Line 27 is executed at most once per node , leading to  () total iterations.In the end, the loop in Line 35 runs for |L| iterations, which is bounded by the iterations of the previous loop.
Since for every label  and node  the loop in Line 27 is executed at most once, it follows that Line 29 will be executed  () times.This leads to total  () new entries to Edges at Line 29.Similarly, Line 34 is executed  () times leading to  () new entries to Edges.Therefore, at the end of MakePrimary() we have

EXPERIMENTS
In this section we report on an implementation of DynamicAlgo algorithm behind Theorem 1.1, and an evaluation of its performance on various datasets on real-world static analyses.To some extent, our experimental setting follows [Li et al. 2022].
Compared algorithms.We compare three standard approaches to bidirected Dyck reachability3 .
(1) OfflineAlgo, as developed and implemented in [Chatterjee et al. 2018].For each graph update (edge insertion/deletion), the algorithm is invoked from scratch to handle the updated graph.(2) Our DynamicAlgo, which is implemented in C/C++, and closely follows the pseudocode presented in Section 3. DynamicAlgo uses as a black box a data structure PrimCompDS for dynamic undirected connectivity (see Table 1).Although the one developed in [Eppstein et al. 1997] works best towards the complexity guarantees of Theorem 1.1, in our implementation we use the one developed in [Holm et al. 2001] (and implemented in [Tseng 2020]), as it is conceptually simpler and well-performing in practice.To handle our dynamic setting, we rely on an efficient, fully dynamic Datalog solver [Ryzhyk and Budiu 2019], as opposed to solving the Datalog program from scratch every time.
The two algorithms (OfflineAlgo and dynamic Datalog) that support our experimental comparison serve as very fitting baselines.OfflineAlgo is an algorithm dedicated to the problem we are solving (i.e., bidrected Dyck reachability) but not dedicated to the setting we are solving it in (i.e., under dynamic updates).On the other hand, the dynamic-Datalog approach is dedicated to the dynamic setting, but not dedicated to the bidirected Dyck-reachability problem.As such, both algorithms are theoretically of worse complexity that our DynamicAlgo, yet still the closest that exist in the literature for this problem and setting.Our experiments aim to highlight to what extent the theoretical superiority of DynamicAlgo is realized in practice.
Benchmarks.We evaluate the above algorithms on two popular static analyses.
(1) Context-sensitive data dependence analysis as formulated in [Tang et al. 2015], and evaluated on benchmark programs from [SPE 2008].In this case the parenthesis labels represent calling contexts, and a properly-balanced-parenthesis path represents an interprocedurally-valid dataflow via parameter-passing in function invocation and return.(2) Field-sensitive alias analysis for Java as formulated in [Yan et al. 2011;Zhang et al. 2013], and evaluated on DaCapo benchmarks [Blackburn et al. 2006].In this case the parenthesis labels represent field accesses on composite objects, as illustrated in Figure 1 of Section 1.
Our graph models for the above analyses and benchmarks are obtained from [Li et al. 2022].
On-the-fly formulation and update sequences.To simulate the on-the-fly setting where the source code undergoes a sequence of changes, for each benchmark graph , we generate three sequences S  of updates (edge insertions/deletions), as follows.
(1) Incremental setting: We randomly select a set  + of 90% of the edges of  and remove them from the graph.We create a sequence of edge insertions S inc  as a random permutation of  + .This is a fully incremental setting, where code lines are only added but never removed in the program.
(2) Decremental setting: We randomly select a set  − of 90% of the edges of .We create a sequence of edge deletions S dec  as a random permutation of  − .This is a fully decremental setting, where code lines are only removed but never added in the program.
(3) Mixed setting: We randomly split the edges of  into two sets  + and  − , with proportion 10% to 90%, and start with an initial graph containing the edges of  − .We create a sequence of mixed operations (both insertions and deletions) S dec  by repeated stochastic sampling: in each step, we randomly choose the next operation as an edge insertion/edge deletion.In the former case, we randomly select an edge from  + , move it to  − , and insert it in .In the latter case, we randomly select an edge from  − , move it to  + , and delete it from .The length of the sequence is equal to 90% of the number of edges in .
In each case, for each benchmark we report the amortized time that each algorithm took to handle the whole sequence: that is, the total running time over the whole sequence divided by the length of the sequence.To gain more confidence in our results, we repeat the above process three times and report the average numbers.We run our experiments on a conventional laptop with a 2.6GHz CPU and 16GBs of memory, which was always sufficient for the analysis.As a sanity check, we have verified that all three algorithms give the same results on each benchmark and update sequence.
Results on data-dependence analysis.Our experimental results on on-the-fly the datadependence analysis are shown in Figure 7 (left column).We see that in all three settings (incremental/decremental/mixed), the dynamic-Datalog approach is measurably faster than OfflineAlgo.Although, in the worst case, the Datalog solver has worse complexity than OfflineAlgo, the nature of the updates on the analyses graphs allows an efficient Datalog solver dedicated to dynamic updates to perform faster, as it never exhibits its worst-case performance.Still, the time spent by the Datalog solver is typically quite large, given that we are looking at updates by a single edge (i.e., a single source-code line).
On the other hand, our DynamicAlgo spends much less time than both OfflineAlgo and the dynamic Datalog solver, leading to typical speedups between two and three orders of magnitude.The only difference is in xml, where DynamicAlgo appears to spend more time than in the rest of the benchmarks.We have identified that this benchmark has a disproportionately large number of parenthesis symbols, which is the likely cause of this behavior.Still, DynamicAlgo is by far the fastest approach also on this benchmark.Naturally, decremental updates yield the least speedup on average.This is expected given how much more complex our procedure for handling deletions is compared to that of handling insertions.Indeed, when processing a insert(, , ) update, our algorithm only merges DSCCs, while processing a delete(, , ) update both splits and merges DSCCs.Nevertheless, the speedups are still in the range of two orders of magnitude, enough to render DynamicAlgo the clear best approach overall.
Results on alias analysis.Our experimental results on on-the-fly the alias analysis are shown in Figure 7 (right column).In contrast to the data-dependence analysis, here the dynamic-Datalog approach is always a bit worse that OfflineAlgo, indicating that reachability patterns in these graphs are more challenging; enough so to push the dynamic-Datalog solver to worse performance than the from-scratch OfflineAlgo.Note, also, that the analysis times here are considerably larger than in the case of data-dependence analysis.
DynamicAlgo is again the best-performing approach by far, consistently by three orders of magnitude.In several benchmarks, its running time is not visible despite the log-scale of the plots.Finally, we again observe that decremental updates are overall more challenging that incremental updates.In summary.Our experiments clearly show that DynamicAlgo is the right approach to on-the-fly static analyses formulated as bidirected Dyck reachability, giving several orders of magnitude speedups over both (i) the offline algorithm that is dedicated (and optimal) for bidirected Dyck reachability, and (ii) the dynamic Datalog approach that is dedicated to dynamic updates (but agnostic to the setting of bidirected Dyck reachability).Although the worst-case running time of DynamicAlgo is linear, we rarely observed this in our experiments.Instead, the time cost of each update is barely (if at all) noticeable to the human eye, and thus the algorithm is suitable for continuous analysis, e.g., integrated inside an IDE.

RELATED WORK
The importance of Dyck reachability in static analyses has lead to a systematic study of its complexity in various settings.The problem has a simple  ( 3 ) upper bound [Yannakakis 1990], which has resisted improvements beyond logarithmic [Chaudhuri 2008].Due to this reason, the complexity of Dyck reachability has also been studied in terms of lower bounds.Even for a single-pair query, the problem has been known to be 2NPDA-hard [Heintze and McAllester 1997], while its combinatorial cubic hardness persists even on constant-treewidth graphs [Chatterjee et al. 2018].All-pairs Dyck reachability was recently shown to have a conditional  2.5 lower bound based on popular complexitytheoretic hypotheses [Koutris and Deep 2023].Despite the cubic hardness of the general problem, it is known to have sub-cubic certificates for both positive and negative instances [Chistikov et al. 2022].All-pairs reachability with  = 1 parenthesis (aka one-counter systems) was recently shown to admit an  (  • log 2 ) bound [Mathiasen and Pavlogiannis 2021], where  is the matrix multiplication exponent, which is also tight even for single-pair queries [Cetti Hansen et al. 2021].We refer to [Pavlogiannis 2023] for a recent survey on this rich problem.
The technique developed in this work for on-the-fly bidirected Dyck reachability is motivated by the same setting on undirected connectivity, which has been studied extensively.We leverage the data structure developed in [Eppstein et al. 1997] for maintaining the PDSCCs of a graph as connected components of the underlying primary graph (which is indeed undirected), as well as a sparsification technique that is specific to our setting (even though the concept of sparsification was developed in [Eppstein et al. 1997] to handle undirected connectivity).It would be interesting to investigate whether techniques from undirected connectivity can be used further in our setting so as to reduce the complexity to sublinear (as is the status quo in undirected connectivity).However, as a single update can create or merge Θ() DSCCs, sublinear complexity can only arise in the amortized sense, or by not requiring the explicit maintenance of DSCCs throughout updates.Though we can easily obtain an  ( ()) amortized insertion cost for our technique (by amortizing over the linear cost of deletes), tighter bounds appear non-trivial and are open to interesting future work.
Bidirected Dyck reachability is very similar to unification closure, though the later problem is typically phrased with labels on the nodes as opposed to the edges of the input graph [Kanellakis and Revesz 1989].Unification closure has found widespread applications in programming languages, such as in efficient binding-time analysis [Henglein 1991], simple first-order type-inference [Møller and Schwartzbach 2018], efficient dynamic type inference for LISP [Henglein 1992], as well as Steensgaard's famous pointer analysis [Steensgaard 1996].The insights developed here for the onthe-fly setting are likely extendable to these applications, though this merits further investigation.
In this work we have focused on online analyses where the analyzed source code changes frequently.Another "on-the-fly" style of analysis is that of on-demand analysis.Here the analyzed program remains unchanged, but the task is to answer a sequence of analysis queries that is not known in advance.This setting has been studied a lot in the context of pointer analysis [Heintze and Tardieu 2001;Sridharan et al. 2005;Yan et al. 2011;Zheng and Rugina 2008] and data flow analysis [Horwitz et al. 1995;Lerch et al. 2015;Naeem et al. 2010]; the latter have also been efficiently parameterized by treewidth [Chatterjee et al. 2016[Chatterjee et al. , 2020[Chatterjee et al. , 2015] ] and treedepth [Goharshady and Zaher 2023].

CONCLUSION
On-the-fly analysis is a very appealing feature to static analyzers, in order to run in real-time during code development and incorporate the constant changes in the source code.However, on-the-fly analysis algorithms with provable complexity benefits have been missing, due to the intricacies of typical static analyses.In this work we have considered a wide class of static analyses phrased as bidirected Dyck reachability.We have developed a dynamic algorithm for handling the addition and removal of source code lines, with a provable guarantee that each such modification takes only (nearly) linear time in the worst case.Our experiments show that our dynamic algorithm is extremely performant in practice, with a clear advantage over the (optimal) offline static analysis algorithm, as well as dynamic Datalog approaches, with typical speedups of three orders of magnitude.From a practical standpoint, our results indicate that our algorithm can directly support lightweight analyses inside IDEs at a time cost that is barely (if at all) noticeable to the developers.Proof.First, observe that the first step of MakePrimary() (Algorithm 4) correctly stores in Z a sound overapproximation of the DSCCs that have to be split after removing the edge   − →  (in particular, Z stores the representative nodes of the corresponding DSCCs).Indeed, the algorithm first marks DSCC() as affected (Line 3).In turn, any other DSCC  ′ that has to be split must contain two nodes  ≠  with incoming edges DSCC()  − →  and DSCC()  − → , where  is a DSCC that has to be split.Thus, after the end of the first step (Line 1 to Line 10), Z soundly overapproximates the DSCCs that have to be split.After executing step 2 (Line 11 to Line 22), every component in DisjointSets is either a previously computed DSCC, or a PDSCC (computed from PrimCompDS) of a previously computed DSCC.Since PDSCCs are also DSCCs, Item (1) follows (Lemma 3.1 guarantees that the components of PrimCompDS are the PDSCCs).
We now turn our attention to Item (2).Note that the algorithm modifies the Edges lists of nodes of two types: (i) roots of the potentially affected DSCCs (those stored in Z), in Line 34 and (ii) roots of the unaffected DSCCs that have -edges to nodes of affected DSCCs, in Line 29.   (3) Finally, if  is neither in one of the two types (i) or (ii) above, then it is part of an unaffected DSCC which does not have an edge entering an affected DSCC.Since these DSCCs are untouched by the algorithm, the statement holds by the induction hypothesis.In this case, we have the guarantees from the Fixpoint computation.In this case, we can have  ≠ , which comes from Line 25 in Fixpoint.
( (3) If  is neither in one of the two types (i) or (ii) above, then as seen above, the statement holds by the induction hypothesis.□ Lemma 3.5.After every insert(, , ) and delete(, , ), DisjointSets contains the DSCCs of .
Proof.We argue that the statement holds after each insert(, , ) and delete(, , ) operation.
Edge insertions.Consider the processing of an operation insert(, , ).The Fixpoint() function (Algorithm 1) guarantees that, upon termination, DisjointSets represents the DSCCs of the input graph, while for every node  that is the representative node of some set in DisjointSets, Edge deletions.Consider the processing of an operation delete(, , ).Lemma 3.4 guarantees at that at the end of MakePrimary(), every component in DisjointSets is a DSCC, while edges between components represented in the Edges data structure capture in a sound and complete way edges between nodes in the corresponding components.Finally, Line 37 of MakePrimary() inserts in Q all the pairs (, ) that can trigger new fixpoint steps, hence after the call to Fixpoint() in Line 17  The setting of dynamic bidirected Dyck reachability was studied recently in [Li et al. 2022].Unfortunately, that approach suffers correctness and complexity issues, which we illustrate here.
Complexity counterexamples.The approach developed in [Li et al. 2022] claims a running time of  ( • ()) for a graph of  nodes.As we show here, that statement is wrong: the proposed algorithm can take Ω( 2 ) time for a single edge deletion.At close inspection, there are two independent parts of the deletion algorithm that can exhibit this quadratic bound, and the complexity analysis fails to account for both.Here we illustrate these counterexamples (Figure 8) and the runtime behavior that the tool accompanying [Li et al. 2022] has on them (Figure 9).This behavior occurs in [Li et al. 2022, Procedure 4, Lines 12-18], while [Li et al. 2022, Lemma 4.16] does not make a thorough complexity analysis of these lines.[Li et al. 2022] splits  1 , which in turn splits  2 and so on, up to   .Due to the edges    − →   and    − →   , after splitting each   , the algorithm also processes the DSCC  = { 1 , . . .,   }, and checks whether it has to be split.Naturally, it discovers that  should not be split, due to the presence of the edges   − →   .However, each attempt to split  requires time that is proportional to its size, i.e., Ω() time.Since this process repeats after splitting each of the  DSCCS   , the algorithm takes quadratic time on this graph as well, as shown in the plot of Figure 9b.

Sparse inputs.
Note that here the vanilla offline algorithm takes  ( •  ()) time for handling this edge deletion.Hence, for sparse graphs, the dynamic algorithm of [Li et al. 2022] can even become  times slower than the vanilla offline algorithm, which is evident in Figure 9b.This behavior occurs in [Li et al. 2022, Procedure 4, Lines 1-6 and Line 27], while [Li et al. 2022, Lemma 4.16] does not make a thorough complexity analysis of these lines.
Correctness counterexample.Figure 10 showcases a simple example in which [Li et al. 2022] gives an incorrect answer.
the bidirected graph  (top right) corresponds to the input program, and nodes  and  are inter-reachable via the paths  and thus they belong to the same DSCC.

Fig. 1 .
Fig.1.An input program (left), to which two updates are made, an insertion 1 and a deletion 2 , and the corresponding flow graphs for fieldsensitive alias analysis after each update (right).The edges are labeled , , ,  and correspond to the fields of ATree.For notational convenience, we only draw the edges labeled with closing parentheses, the reverse edges with opening parentheses are implied by bidirectedness.On the initial program (before 1 and 2 ), bidirected Dyck-reachability reports four DSCCs: {, , }, { }, {}, and {ℎ}.The
to denote the existence of an edge (, , ), and write   − → • to denote the existence of an edge outgoing  and labeled with .Given a set  , we sometimes write   − →  to denote that there exists some  ∈  such that   − → .We also use similar notation for incoming, instead of outgoing edges (e.g.,

( 2 )
delete(, , ), which deletes the edge   − →  (as well as the implicit inverse edge   − → ).(3)DSCCRepr(), which returns a representative node of the DSCC of  that is the same for all nodes  ∈ DSCC().

Fig. 5 .
Fig. 5. Maintenance of the sets InPrimary (shown in orange dashed) along edge insertions and deletions.The first edge insertion   − →  leads to  ∈ InPrimary[] [].The following two edge insertions   − →  and   − →  do not modify InPrimary, as ,  and  belong to the same PDSCC, thus  can be retrieved as a -neighbor via the set InPrimary[] [].When processing deleting the edge   − → , we move  to InPrimary[] [], thus  can still be retrieved as a -neighbor of the PDSCC {, }.
For every node  and label , we maintain a set InPrimary[]  [], which stores nodes  that have an edge   − → .We maintain the invariant that  will be the only node in PDSCC() with  ∈ InPrimary[] [], even though there might be other nodes  ∈ PDSCC() also having an edge   − → .To identify the neighbors of each PDSCC, it now suffices to iterate over its nodes , and for each label , collect the nodes from InPrimary[] [].The invariant guarantees that every neighbor  will be accessed exactly once per label , retaining the  () worst-case running time.The sets InPrimary[][], together with their invariant, can be maintained as follows.Upon inserting an edge insert(, , ), if  does not have any other outgoing edges labeled with , we insert  in InPrimary[] [].Otherwise, there exists another edge   − → , with already  ∈ InPrimary[] [].The existence of the two edges   − →  and   − →  implies that PDSCC() = PDSCC(), hence it is sound to not insert  in InPrimary[] [], as  is retrievable via .Similarly, upon deleting an edge delete(, , ), if  ∈ InPrimary[] [], we move  to another set InPrimary[] [] for which there exists an edge   − → .Figure 5 illustrates the maintenance of InPrimary on a small example.

Algorithm 1 )
node-label pairs (DSCCRepr(), ) that represent edges   − → • that might lead to further component merging.Due to the efficient maintenance of neighbors of PDSCCs (see the previous paragraph), this can be achieved by iterating over all nodes  of the PDSCCs, and for each label  ∈ Σ  , insert in Q, the pair (DSCCRepr(), ) corresponding to  ∈ InPrimary[] [] if |Edges[DSCCRepr()] [] | ≥ 2. Finally we continue with the fixpoint computation as in OfflineAlgo with the guarantee that upon convergence, DisjointSets will contain the DSCCs of the new graph after deleting the edge   − → .

Fig. 6 .
Fig. 6.Illustration of handling delete(, , ), showing the state of the algorithm before deletion (top), after preparing the new fixpoint computation (middle) and after completing the fixpoint computation (bottom).
the nodes  ∈ InPrimary[] [] (Line 27).It then identifies the root node  of 's component (Line 28), and inserts  in Edges[] [] as well as the pair (, ) in L, to record that we have an incoming edge •  − →  to a PDSCC that the fixpoint computation needs to process.Then, Line 34 inserts in Edges[ ] [] the edges   − →  that outgo 's component (rooted at  ) and for which  is not in a breaking component.Finally, the algorithm identifies the pairs (, ) in L for which it has recorded at least two  neighbors out of 's component, and inserts them in Q to be processed by the fixpoint algorithm later (Line 35).
2 and Figure 4. Indeed, the algorithm maintains in PrimCompDS a set of edges that connect neighboring nodes in each linked list OutEdges[] [].Thus, the omitted edges (i.e., between non-neighboring nodes in OutEdges[] []) are anyway transitively connected in PrimCompDS.Formally, we have the following lemma.Lemma 3.1.After every insert and delete operation, the connected components in PrimCompDS are precisely the PDSCCs of .The next invariant concerns the correct maintenance of the InPrimary data structure, and is established in Lemma 3.2 and Lemma 3.3.In words, the invariant states that  ∈ InPrimary[] [] iff we have an edge   − →  and  is the last node in OutEdges[] [].Thus, when MakePrimary() constructs the PDSCCs and discovers their incoming edges, the edge   − → PDSCC() is correctly discovered by finding that  ∈ InPrimary[] [] (Line 27).Lemma 3.2 is concerned with the soundness, and is straightforward.Any time the algorithm inserts  ∈ InPrimary[] [], this is followed by inserting  in OutEdges[] [] (in the case of edge insertions, Line 4 and Line 8 in Algorithm 2).In the case of edge deletions, the insertion  ∈ InPrimary[] [] is preceded by inserting  to OutEdges[] [] in a previous update (Line 6 and Line 7 in Algorithm 3).In this case,  ∈ InPrimary[] [] takes place when  becomes the last node in OutEdges[] [] after deletion of edges   − → , where  appeared later than  in OutEdges[] [].Lemma 3.2.After every insert and delete operation, for every two nodes ,  ∈  and label  ∈ Σ  , if  ∈ InPrimary[] [] then   − →  and  is the last node in OutEdges[] [].Similarly, for Lemma 3.3, if   − → PDSCC(), then the last node  in OutEdges[] [] is also in PDSCC(), while the algorithm maintains that  ∈ InPrimary[] [].Hence, the fact that   − → PDSCC() is recoverable via discovering that  ∈ InPrimary[] [].Lemma 3.3.After every insert and delete operation, for every pair of nodes ,  ∈  and label  ∈ Σ  , if   − → PDSCC() then there exists a node  ∈ PDSCC() such that  ∈ InPrimary[] [].

Lemma 3 . 4 .
At the end of MakePrimary(), the following assertions hold.(1)Every component in DisjointSets is a (not necessarily maximal) DSCC of .(2)For every component in DisjointSets rooted in some node , the following hold.(a) For every node  and label , if  ∈ Edges[] [] then there is an edge   − → , where  is a node in the component of , and  is a node in the component of  in DisjointSets.Proc.ACM Program.Lang., Vol. 1, No. POPL, Article .Publication date: November 2024.(b)For every node  and label  such that (i)  is in the component rooted at node  in DisjointSets, and (ii) there is an edge   − → , there exists a node  in the component of  in DisjointSets such that  ∈ Edges[] [].

( 3 )
A declarative approach in which the production rules of Dyck reachability are encoded as Datalog constraints and dispatched to a Datalog solver [Reps 1995b].Datalog-based static analyses have been popularized in the Flix programming language [Madsen and Lhoták 2020] and the Doop framework [Bravenboer and Smaragdakis 2009].Our bidirected setting allows us to optimize the Datalog program by explicitly focusing only on closing-parentheses edges, in similar style to OfflineAlgo and DynamicAlgo.To be fair in our comparison, we follow this approach here.In particular, we use the following Datalog program.R e a c h e s ( u , u ) C l o s e ( x , u ,  ) : − Edge ( x , u ,  ) C l o s e ( x , u ,  ) : − Edge ( y , u ,  ) , R e a c h e s ( x , y ) R e a c h e s ( u , v ) : − C l o s e ( x , u ,  ) , C l o s e ( x , v ,  ) R e a c h e s ( u , v ) : − R e a c h e s ( u , x ) , R e a c h e s ( x , v )

Fig. 7 .
Fig.7.The average time to handle a single update on on-the-fly data-dependence analysis (a, c, e) and on-the-fly alias analysis (b, d, f) for sequence files generated with 90-10 split of original graph.Note that all results are in log-scale.

Lemma 3 . 4 .
there exists a node  ∈ PDSCC() such that  ∈ InPrimary[] [], and we are done.Otherwise  is the first -neighbor of , and Line 4 of Algorithm 2 will set  ∈ InPrimary[] [], as desired.Edge deletions.Consider the processing of an operation delete(, , ), and it suffices to prove the statement for  =  and any node  such that   − → .If  is not the last node in OutEdges[] [], the statement holds by the induction hypothesis.Otherwise  is the last node in OutEdges[] [], and there exists a penultimate node  in OutEdges[] [].Then Line 7 of Algorithm 3 will set  ∈ InPrimary[] [], thereby restoring the invariant.□ At the end of MakePrimary(), the following assertions hold.(1) Every component in DisjointSets is a (not necessarily maximal) DSCC of .(2) For every component in DisjointSets rooted in some node , the following hold.(a) For every node  and label , if  ∈ Edges[] [] then there is an edge   − → , where  is a node in the component of , and  is a node in the component of  in DisjointSets.(b) For every node  and label  such that (i)  is in the component rooted at node  in DisjointSets, and (ii) there is an edge   − → , there exists a node  in the component of  in DisjointSets such that  ∈ Edges[] [].

( 1 )
If  is the root node of a potentially affected DSCC (case (i)),  must have been added to Edges[] [] in Line 34 (where  =  in the pseudocode), as Edges[] [] was reinitialized to an empty list earlier in Line 22.But then  appears in OutEdges[] [] for some node  ∈ DSCC() ( =  in the pseudocode, Line 32) and thus we have an edge   − →  as desired (here we obtain  = ).

( 2 )
On the other hand, if  is the root of an unaffected DSCC that has -edges to nodes of affected DSCCs (case (ii)),  must have been added to Edges[] [] in Line 29 (where  =  in the pseudocode), as Edges[] [] was reinitialized to an empty list earlier in Line 18.But then there exists a node  ∈ DSCC() (Line 28,  =  in the pseudocode) such that  ∈ InPrimary[] [] (Line 27).Hence, by Lemma 3.2, we have an edge   − →  as desired (here we obtain  = ).
Edges[] [] either is empty, if there are no edges DSCC()  − → •, or has a exactly one node  such that DSCC()  − →  (note that all other nodes  for which we also have DSCC()  − →  belong to DSCC(), and thus in the same set of DisjointSets).Now, when processing insert(, , ), Algorithm 2 inserts  in Edges[] [], where  = DisjointSets.Find() is the representative of DSCC() Line 10.The new edge   − →  leads to the (potential) merge of DSCC() and DSCC(), where  is another node with DSCC() ≠ DSCC() and DSCC()  − → .The condition |Edges[] [] | ≥ 2 will further insert (, ) in Q and trigger a new fixpoint computation, and the lemma then follows from the correctness of Fixpoint() (as established in[Chatterjee et al. 2018]).

Fig. 8 .
Fig.8.A family of dense graphs (a) and a family of sparse graphs (b) on which the edge deletion method of[Li et al. 2022] takes quadratic time.
Figure 8b showcases a family of sparse graphs parameterized by  and having  () nodes and Θ() edges.Before deleting the edge   − →  1 , we have a sequence of DSCCs Proc.ACM Program.Lang., Vol. 1, No. POPL, Article .Publication date: November 2024.

Fig. 9 .
Fig. 9. Running time of the offline and dynamic algorithms of [Li et al. 2022] on the dense graphs of Figure 8a (a) and sparse graphs of Figure 8b (b).

Fig. 11 .Fig. 12 .
Fig. 11.The average time to handle a single update on on-the-fly data-dependence analysis (a, c, e) and on-the-fly alias analysis (b, d, f) for sequence files generated with 80-20 split of original graph.Note that all results are in log-scale.
table causes an unnecessary cost.Moreover, even without a lookup table, queries cost  ( ()) time, which is constant for all practical purposes.

Table 1 .
Classic results on dynamic undirected reachability.
Assume that we have computed a DSCC  1 that has two nodes ,  ∈  1 , and further,  has two edges  Here  and  are not necessarily distinct, meaning that possibly  = .Then we can conclude that  and  also belong to  form another DSCC  2 = {,  }.
Since the PDSCCs of  have a direct representation as connected components in the undirected graph  , we can leverage existing techniques from dynamic undirected connectivity (see, e.g., Sparsification for the maintenance of the PDSCCs of   across edge insertions and deletions (top), by maintaining connected components in the corresponding primal graphs   (bottom).
1 ,  2 and label  such that  ∈ InPrimary[ 1 ] [] and  ∈ InPrimary[ 2 ] [].Indeed, by Lemma 3.2, if  ∈ InPrimary[ 1 ] [] then  1 is the last node in OutEdges[] [].Clearly, as a linked list, OutEdges[] [] can have at most one last node, hence  ∉ InPrimary[ 2 ] [] If  is a node of a potentially affected DSCC (case (i)), when Line 31 is executed for some node  in the component of  ( = ,  =  in the pseudocode), since we have an edge  On the other hand, if  is the root of an unaffected DSCC that has -edges to nodes of affected DSCCs (case (ii)).By definition, all these nodes having the incoming  edge will belong to the same PDSCC.By Lemma 3.3, one such node  from the PDSCC will have  ∈ InPrimary[][]where  ∈ DSCC().Then Line 29 will insert  to Edges[] [] for  being one of these nodes .