Abstract
Determining upper bounds on the time complexity of a program is a fundamental problem with a variety of applications, such as performance debugging, resource certification, and compile-time optimizations. Automated techniques for cost analysis excel at bounding the resource complexity of programs that use integer values and linear arithmetic. Unfortunately, they fall short when the complexity depends more intricately on the evolution of data during execution. In such cases, state-of-the-art analyzers have shown to produce loose bounds, or even no bound at all.
We propose a novel technique that generalizes the common notion of recurrence relations based on ranking functions. Existing methods usually unfold one loop iteration and examine the resulting arithmetic relations between variables. These relations assist in establishing a recurrence that bounds the number of loop iterations. We propose a different approach, where we derive recurrences by comparing whole traces with whole traces of a lower rank, avoiding the need to analyze the complexity of intermediate states. We offer a set of global properties, defined with respect to whole traces, that facilitate such a comparison and show that these properties can be checked efficiently using a handful of local conditions. To this end, we adapt state squeezers, an induction mechanism previously used for verifying safety properties. We demonstrate that this technique encompasses the reasoning power of bounded unfolding, and more. We present some seemingly innocuous, yet intricate, examples that previous tools based on cost relations and control flow analysis fail to solve, and that our squeezer-powered approach succeeds.
1 INTRODUCTION
Cost analysis is the problem of estimating the resource usage of a given program, over all of its possible executions. It complements functional verification—of safety and liveness properties—and is an important task in formal software certification. When used in combination with functional verification, cost analysis ensures that a program is not only correct but also completes its processing in a reasonable amount of time, uses a reasonable amount of memory, communication bandwidth, and so forth. In this work, we focus on runtime complexity analysis. Although the area has been studied extensively (e.g., [3, 7, 10, 17, 19, 21, 23, 26, 35]), the general problem of constraining the number of iterations in programs containing loops with arbitrary termination conditions remains hard.
A prominent approach to computing upper bounds on the time complexity of a program identifies a well-founded numerical measure over program states that decreases in every step of the program, also called a ranking function. In this case, an upper bound on the measure of the initial states comprises an upper bound on the program’s time complexity. Finding such measures manually is often extremely difficult. The cost relations approach, dating back to the work of Wegbreit [35], attempts to automate this process by using the control flow graph of the program to extract recurrence formulas that characterize this measure. Roughly speaking, the recurrences relate the measures (costs) of adjacent nodes in the graph, taking into account the cost of the step between them. In this way, the cost relations track the evolution of the measure between every pair of consecutive states along the executions of the program.
One limitation of cost relations is the need to capture the number of steps remaining for execution in every state—that is, all intermediate states along all executions. If the structure of the state is complex, this may require higher-order expressions, such as summing over an unbounded number of elements. As an example, consider the program in Figure 1 that implements a binary counter represented by an array of bits.
Fig. 1. A program that produces all combinations of n bits.
In this case, a ranking function that decreases between every two consecutive iterations of the loop, or even between two iterations that print the value of the counter, depends on the entire content of the array. Attempting to express a ranking function over the scalar variables of this program is analogous to abstracting the loop as a finite-state system that ignores the content of the array, and as such contains transition cycles (e.g., the abstract state \( \langle n\mapsto n_0,i\mapsto 0\rangle \), obtained by projecting the state to the scalar variables only, repeats multiple times in any trace)—meaning that no strictly decreasing function can be defined in this way. Similarly, any attempt to consider a bounded number of bits will encounter the same difficulty.
In this article, we propose a novel approach for extracting recurrence relations capturing the time complexity of an imperative program, modeled as a transition system, by relating whole traces instead of individual states. The key idea is to relate a trace to (one or more) shorter traces. This allows to formulate a recurrence that resolves to the length of the trace and recurs over the values at the initial states only. We sidestep the need to take into account the more complex parts of the state that change along the trace (e.g., in the case of the binary counter, the array is initialized with zeros).
Our approach relies on the notion of state squeezers [27], previously used exclusively for the verification of safety properties. We present a novel aspect where the same squeezers can be used to determine complexity bounds, by replacing the safety property check with trace length judgments.
Squeezers provide a means to perform induction on the “size” of (initial) states to prove that all reachable states adhere to a given specification. This is accomplished by attaching ranks from a well-founded set to states, and defining a squeezer function that maps states to states of a lower rank. Note that the notion of a rank used in our work is distinct from that of a ranking function, and the two should not be confused; in particular, a rank is not required to decrease on execution steps. Previously, squeezers were utilized for safety verification: the ability to establish safety is achieved by having the squeezer map states in a way that forms a relaxed form of a simulation relation, ensuring that the traces of the lower-rank states simulate the traces of the higher-rank states. Due to the simulation property, which is verified locally, safety over states with a base rank carries over to states of any higher rank by induction over the rank.
In this work, we use the construction of well-founded ranks and squeezers to define a recurrence formula representing an upper bound on the time complexity of the procedure being analyzed. We do so by expressing the complexity (length) of traces in terms of the complexity of lower-rank traces. This new setting raises additional challenges: it is no longer sufficient to relate traces to lower-rank traces; we also need to quantify the discrepancy between the lengths of the traces, as well as between their ranks. This is achieved by a certain form of simulation that is parameterized by stuttering shapes, which capture the discrepancy in length, and by means of a rank bounding function, which captures the decrease in rank. Furthermore, although our earlier work [27] limit each trace to relate to a single lower-rank trace, we have found that it is sometimes beneficial to employ a decomposition of the original trace into several consecutive trace segments so that each segment corresponds to some, possibly different, lower-rank trace. The segmentation simplifies the analysis of the length of the entire trace, as it creates sub-analyses that are easier to carry out, and the sum of which gives the desired recurrence formula. This also enables a richer set of recurrences to be constructed automatically, namely non-single recurrences (meaning that the recursive reference may appear more than once on the right-hand side of the equation).
The base case of the recurrence is obtained by computing an upper bound on the time complexity of base-rank states. This is typically a simpler problem that may be addressed, such as by symbolic execution due to the bounded nature of the base. The solution to the recurrence formula with the respective base case soundly overapproximates the time complexity of the procedure.1
We show that, conceptually, the classical approach for generating recurrences based on ranking functions can be viewed as a special case of our approach where the squeezer maps a state to its immediate successor. The real power of our approach is in the freedom to define other squeezers, producing simpler recursions, and avoiding the need for complex ranking functions.
Our use of squeezers for extracting recurrences that bound the complexity of imperative programs is related to the way analyses for functional programs (e.g., [25]) use the term(s) in recursive function calls to extract recurrences. The functional programming style coincidentally provides such candidate terms. The novelty of our approach is in introducing the concept of a squeezer explicitly, leading to a more flexible analysis because it does not restrict the squeezer to follow specific terms in the program. In particular, this allows reasoning over space in imperative programs as well.
The main results of this work can be summarized as follows:
We propose a novel technique for runtime complexity analysis of imperative programs based on state squeezers. Squeezers, together with rank bounding functions, are used for extracting recurrence relations whose solutions overapproximate the length of executions of the input program.
We formalize the notions of state squeezers, partitioned simulation, and rank bounding functions that underlie the approach, and establish conditions that ensure soundness of the recurrence relations.
We demonstrate that there are cases where compact squeezers and rank bounding functions exist, and can be verified efficiently, including cases where explicit ranking functions are too complex for existing tools.
We implemented our approach and applied it successfully to several small but intricate programs, some of which could not have been handled by existing techniques.
A preliminary version of this work appeared in [28]. The original version can only handle deterministic programs. This extended version of that work shows how to extend the approach to non-deterministic programs. We consider two ways to do so and discuss the tradeoff between them.
2 OVERVIEW
In this section, we give a high-level description of our technique for complexity analysis using the binary counter example in Figure 1. Example: Binary counter. The procedure in Figure 1 receives as an input a number n of bits and iterates over all their possible values in the range \( 0...2^n-1 \). The “current” value is maintained in an array c that is initialized to zero and whose length is n. \( c[0] \) represents the least significant bit. The loop scans the array from the least significant bit forward looking for the leftmost 0 and zeroing the prefix of 1 second. As soon as it encounters a 0, it sets it to 1 and starts the scan from the beginning. The program terminates when it reaches the end of the array \( (i=n) \), all array entries are zeros, and the last value was \( 111\ldots \); at this point, all values have been enumerated.
Existing analyses. All recent methods that we are aware of (e.g., [4, 21, 25]) fail to analyze the complexity of this procedure (in fact, most methods will fail to realize that the loop terminates at all). One reason for that is the need to model the contents of the array whose size in unknown at compile time. However, even if data were modeled somehow and taken into account, finding a ranking function, which underlies existing approaches, is hard since this function is required to decrease between any two consecutive iterations along any execution. Here, for instance, to the best of our knowledge, such a function would depend on an unbounded number of elements of the array; it would need to extract the current value as an integer, along the lines of \( \sum _{j=0}^{n-1}c[j] \cdot 2^j \).
The use of a ranking function for complexity analysis is somewhat analogous to the use of inductive invariants in safety verification. Both are based on induction over time along an execution. This work is inspired by previous work [27] showing that verification can also be done when the induction is performed on the size (rank) of the state rather than on the number of iterations, where the size of the state may correspond, for example, to the size of an unbounded data structure. We argue that similar concepts can be applied in a framework for complexity classification. In other words, we try to infer a recurrence relation that is based on the rank of the state and correlates the lengths of complete executions—executions that start from an initial state—of different ranks. This sidesteps the need to express the length of partial executions, which start from intermediate states. Although the approach applies to bounded-state systems as well, its benefits become most apparent when the program contains a priori unbounded stores, such as arrays.
Our approach. Roughly speaking, our approach for computing recurrence formulas that provide an upper bound on the complexity of a procedure is based on the following ingredients:
A rank function \( \mathit {r}: \mathit {init}\rightarrow X \) that maps initial states to ranks from a well-founded set \( (X,\prec) \) with base B. Intuitively, the rank of the initial state governs the time complexity of the entire trace, and we also consider it to be the rank of the trace. As we shall soon see, this rank can be significantly simpler than a ranking function.
A squeezer \( {\curlyvee }: \Sigma \rightarrow \Sigma \) that maintains (some variant of) a simulation relation between states in \( \Sigma \), thus ensuring a bona fide correspondence between higher-rank traces and lower-rank traces through correspondence between states.
A trace partition \( p_d: \Sigma \rightarrow [1..d] \) that maps each state to a segment-identifier \( i \in [1..d] \), and induces a decomposition of a trace into segments, allowing \( {\curlyvee } \) to map each of them to a separate, lower-rank mini-trace.
A rank bounding function \( \hat{{\curlyvee }}: X \times [1..d] \rightarrow X \) that provides an upper bound on the rank of the initial states of the d mini-traces based on the rank of the higher-rank trace. (The rank is not required to be uniform across mini-traces.)
All of these ingredients are synthesized automatically, as we discuss in Section 4. Next, we elaborate on each of these ingredients and illustrate them using the binary counter example. We further demonstrate how we use these ingredients to find recurrence formulas describing (an upper bound on) the complexity of the program.
Some notations. We adopt a standard encoding of a program as a transition system over a state space \( \Sigma \), with a set of initial states \( \mathit {init}\subseteq \Sigma \) and transition function \( \mathit {tr}: \Sigma \rightarrow \Sigma \), where a transition corresponds to a loop iteration. We use \( \mathit {reach}\subseteq \Sigma \) to denote the set of reachable states, \( \mathit {reach}= \lbrace \sigma ~|~ \exists \sigma _0,k.~\mathit {tr}^k(\sigma _0)=\sigma \wedge \sigma _0\in \mathit {init}\rbrace \).
Defining the rank of a state. Ranks are taken from a well-founded set \( (X,\prec) \) with a basis \( B \subseteq X \) that contains all minimal elements of X. The rank function, \( \mathit {r}: \mathit {init}\rightarrow X \), aims to abstract away irrelevant data from the (initial) state that does not effect the execution time, and only uses state “features” that do. When proper ranks are used, the rank of an initial state is all that is needed to provide a tight bound on its trace length. Since ranks are taken from a well-founded set, they can be recursed over. In the binary counter example, the chosen rank is n, namely the rank function maps each state to the size of the array. (Notice that the rank does not depend on the contents of the array; in contrast, bounding the trace length from any intermediate state, and not just initial states, would have required considering the content of the array.)
Given the rank function, our analysis extracts a recurrence formula for the complexity function \( \mathit {comp}_x: X \rightarrow \mathbb {N}\cup \lbrace \infty \rbrace \) that provides an upper bound on the number of iterations of \( \mathit {tr} \) based on the rank of the initial states. In our exposition, we sometimes also refer to a time complexity function over states, \( \mathit {comp}_s : \mathit {init}\rightarrow \mathbb {N}\cup \lbrace \infty \rbrace \), which is defined directly on the (initial) states, as the number of iterations in an execution that starts with some \( \sigma _0\in \mathit {init} \).
Defining a squeezer. The squeezer \( {\curlyvee }: \Sigma \rightarrow \Sigma \) is a function that maps states to states that belong to some lower-rank traces (where the rank of a trace is determined by the rank of its initial state), down to the base ranks B. Its importance is in defining a correspondence between higher-rank traces and lower-rank ones that can be verified locally, by examining individual states rather than full traces. The kind of correspondence that the squeezer is required to ensure affects the flexibility of the approach and the kind of recurrence formulas that it may yield. To start off, consider a rather naive squeezer that satisfies the following local properties:
Rank decrease of non-base initial states, \( \sigma _0 \in \mathit {init}\wedge \mathit {r}(\sigma _0) \notin B \Rightarrow \mathit {r}({\curlyvee }(\sigma _0)) \prec \mathit {r}(\sigma _0) \), and
Simulation
We refer to an instance of the stuttering-step property witnessed by a fixed k as the “k-step” property. Whenever a state \( \sigma \) satisfies the k-step property, we will refer to it as being \( (k,1) \)-stuttering. (We usually only care about the smallest k that satisfies the property for a given \( \sigma \).) As an example, the squeezer we consider for the binary counter program is rather intuitive: it removes the least significant bit (\( c[0] \)) and adjusts the index i accordingly. Doing so yields a state with rank \( \mathit {r}({\curlyvee }(\sigma _0)) = \mathit {r}(\sigma _0) - 1 \). Figure 2 shows the correspondence between a 4-bit binary counter and a 3-bit one. The figure illustrates the simulation k-step property for \( k=1,2,3 \): both \( \sigma _{0} \) and \( \sigma _{3} \) are \( (3,1) \)-stuttering, \( \sigma _{1} \) and \( \sigma _{4} \) are \( (2,1) \)-stuttering, and \( \sigma _{2} \), \( \sigma _{5} \), and \( \sigma _{6} \) are \( (1,1) \)-stuttering.
Fig. 2. Correspondence between two traces of the binary counter program. Squeezer removes the leftmost array entry, which represents the least significant bit. The rank is the array size, i.e., four on the upper trace and three on the lower one. The simulation includes only 1-,2- and 3-steps, so the length of the upper trace is at most three times that of the lower trace, yielding an overall complexity bound of \( O(3^{n}) \) .
The simulation property induces a correlation between a higher-rank trace \( \tau \) and a lower-rank one \( \tau ^{\prime } \) such that every step of \( \tau ^{\prime } \) is matched by k steps in \( \tau \). Now suppose that there exists some \( \widehat{k} \in \mathbb {N}^{+} \) such that for every trace \( \tau (\sigma _{0}) \) and every state \( \sigma \in \tau (\sigma _{0}) \), \( \sigma \) is \( (k,1) \)-stuttering with \( 1 \le k \le \widehat{k} \). This would yield the following complexity bound: (1) \( \begin{equation} comp_s(\sigma _0) \le \widehat{k} \cdot comp_s({\curlyvee }(\sigma _0)). \end{equation} \)
All your base.2 What should happen if we repeatedly apply \( {\curlyvee } \) to some initial state \( \sigma _0 \), each time obtaining a new, lower-rank trace? Since \( \mathit {r}({\curlyvee }(\sigma _0)) \prec \mathit {r}(\sigma _0) \), and since \( (X,\prec) \) is well founded, we will eventually hit some state of base rank: \( \begin{equation*} {\curlyvee }({\curlyvee }(\ldots (\sigma _{0}))\ldots) = \sigma _{0}^{\circ } \hspace{10.0pt}\text{such that}\hspace{10.0pt} \mathit {r}(\sigma _{0}^{\circ }) \in B. \end{equation*} \) Hence, if we know the complexity of the initial states with a base rank, we can apply Equation (1) iteratively to compute an upper bound of the complexity of any initial state.
How many steps will be needed to get from an arbitrary initial state \( \sigma _0 \) to \( \sigma _{0}^{\circ } \)? Clearly, this depends on the rank and the way in which \( {\curlyvee } \) decreases it.
Consider the binary counter program again, with the rank \( \mathit {r}(\sigma)=n \). \( (\mathbb {N},\lt) \) is well founded, with a single minimum 0. If we define, for example, \( B=\lbrace 0,1\rbrace \), we know that the length of any trace with \( n\in B \) is bounded by a constant, 2. (Bounding the length of traces starting from an initial state \( \sigma _0 \) where \( \mathit {r}(\sigma _0)\in B \) can be done with known methods, e.g., symbolic execution.) Since the rank decreases by 1 on each “squeeze,” we get the following exponential bound: (2) \( \begin{equation} \mathit {comp}_s(\sigma _{0}) \le 2\cdot 3^{n-1} = O(3^{n}). \end{equation} \)
The last logical step, going from (1) to (2), is, in fact, highly involved: since Equation (1) is a mapping of states, solving such a recurrence for arbitrary \( {\curlyvee } \) cannot be carried out using known automated methods. Instead, we implicitly used the rank of the state, n, to extract a recurrence over scalar values and obtain a closed-form expression. Let us make this reasoning explicit by first expressing Equation (1) in terms of \( \mathit {comp}_x \) instead of \( \mathit {comp}_s \): \( \begin{equation*} comp_x(n) \le \widehat{k} \cdot comp_x(n-1). \end{equation*} \) Here, \( n-1 \) denotes the rank obtained when squeezing an initial state of rank n. Unlike Equation (1), this is a recurrence formula over \( (\mathbb {N},\lt) \) that may be solved algorithmically, leading to the solution \( comp_x(n) = O(3^{n}) \).
.
Surplus analysis
Assuming the worst k for all states in the trace can be too conservative—in particular, if there are only a few states that satisfy the \( \widehat{k} \)-step property and all of the others satisfy the 1-step property. In the latter case, if we know that at most b states in any one trace have \( k \gt 1 \), we can formulate the tighter bound: (3) \( \begin{equation} comp_s(\sigma _{0}) \le comp_s({\curlyvee }(\sigma _{0})) ~+~ \widehat{k} \cdot b. \end{equation} \)
Incidentally, in the current setting of the binary counter program, the number of \( \widehat{k} \)-steps (3-steps) is not bounded. So we cannot apply the inequality (3) repeatedly on any trace, as the number of 3-steps depends on the initial state. However, we can improve the analysis by partitioning the trace to two parts, as we explain next.
Segments and mini-traces
Note that both (1) and (3) “suffer” from an inherent restriction that the right-hand side contains exactly one recursive reference. As such, they are limited in expressing certain kinds of complexity classes.
To get more diverse recurrences, including recurrences with multiple recursive terms, we propose an extension of the simulation property that allows more than one lower-rank trace:
Partitioned simulation:
This definition allows a new mini-trace to start at any point along a higher-rank trace \( \tau \), thus marking the beginning of a new segment of \( \tau \). When this occurs, we call \( \mathit {tr}(\sigma) \) a switch state. For the sake of uniformity, we also refer to all initial states \( \sigma _0\in \mathit {init} \) as switch states. Hence, each segment of \( \tau \) starts with a switch state, and the mini-traces are the lower-level traces that correspond to the segments (these are the traces that start from \( {\curlyvee }(\sigma _s) \), where \( \sigma _s \) is a switch state). The length of \( \tau \) can now be expressed as the sum of lower-level mini-traces.
However, there are two problems remaining. First, we need to extend the “rank decrease of non-base initial states” requirement to any switch state to ensure that the ranks of all mini-traces are indeed lower. Namely, we need to require that if \( \sigma _s \) is any switch state in a trace from \( \sigma _0 \), then \( r\big ({\curlyvee }(\sigma _s)\big) \prec r(\sigma _0) \). Second, even if we extend the rank decrease requirement, this definition does not suggest a way to bound the number of correlated mini-traces and their respective ranks, and therefore suggests no effective way to produce an equation for \( \mathit {comp}_s \) as before.
To sidestep the problem of a potentially unbounded number of mini-traces, we augment the definition of simulation with a trace partition function; to address the challenge of the rank decrease, we use a rank bounding function, which is responsible both for ensuring that the rank of the mini-traces decreases and for bounding their ranks.
Defining a partition. We define a function \( p_{d}: \Sigma \rightarrow \lbrace 1,\ldots ,d\rbrace \), parameterized by a constant d, called a partition function, that is weakly monotone along any trace (\( p_d(\sigma) \le p_d(\mathit {tr}(\sigma)) \)). This function induces a partition of any trace \( \tau \) into (at most) d segments by grouping states based on the value of \( p_d(\sigma) \). To ensure the segments and mini-traces are aligned, we require that switch states only occur at segment boundaries:
d-Partitioned simulation:
In our running example, let us change \( {\curlyvee } \) so that it shrinks the state by removing the most significant bit instead of the least. This leads to a partition of the execution trace for \( \mathit {r}(\sigma _0)=n \) into two segments, as shown in Figure 3. The partition function is \( p_d = (i \ge n~||~c[n-1])~?~2~:~1 \) (essentially, \( c[n-1]+1 \), except that the final state is slightly different). As can be seen from the figure, each segment simulates a mini-trace of rank \( n-1 \), with \( k=1 \) for all steps except for the last step (at \( \sigma _{28}) \) where \( k=2 \). In this case, it would be folly to use the recurrence (1) with \( \widehat{k}=2 \) since all steps are 1:1 except for the 2-step leaving \( \sigma _{28} \). Instead, we can formulate a tighter bound: \( \begin{equation*} \mathit {comp}_s(\sigma _0) \le \mathit {comp}_s(\sigma ^{\prime }_0) + \mathit {comp}_s(\sigma ^{\prime \prime }_0) + 2, \end{equation*} \) where \( \mathit {comp}_s(\sigma ^{\prime }_0) \) and \( \mathit {comp}_s(\sigma ^{\prime \prime }_0) \) are the lengths of the mini-traces, and 2 is the surplus from the switch transition \( \sigma _{14}\rightarrow \sigma _{15} \) plus the extra step at \( \sigma _{28} \). In the case of this program, we know that \( \mathit {r}(\sigma ^{\prime }_0) = \mathit {r}(\sigma ^{\prime \prime }_0) = \mathit {r}(\sigma _0) - 1 \), for any initial state \( \sigma _0 \); therefore, turning to \( \mathit {comp}_x \), we can derive and solve the recurrence \( \mathit {comp}_x(n)=2\cdot \mathit {comp}_x(n-1) + 2 \), which together with the base yields the following bound: \( \begin{equation*} \mathit {comp}_x(n) = 2^{n+1}-2. \end{equation*} \)
Fig. 3. An execution trace of the binary counter program that corresponds to two mini-traces of lower rank.
Clearly, a general condition is required to identify the ranks of the corresponding initial states of the (lower-rank) mini-traces (and at the same time ensure that they decrease).
Bounding the ranks of squeezed switch states. This is not a trivial task, because as noted previously, the squeezed ranks could be different and may depend on properties present in the corresponding switch states. To achieve this goal, once a partition function \( p_d \) is defined, we also define a rank bounding function \( \hat{{\curlyvee }}: X \times \lbrace 1,\ldots ,d\rbrace \rightarrow X \), where for any \( \sigma _0\in \mathit {init} \) and switch state \( \sigma _s \) along the trace that starts at \( \sigma _0 \), \( \hat{{\curlyvee }} \) provides a bound for the rank of \( {\curlyvee }(\sigma _s) \) based on that of \( \sigma _0 \): (4) \( \begin{equation} \mathit {r}({\curlyvee }(\sigma _{s})) \preceq \hat{{\curlyvee }}\big (\mathit {r}(\sigma _{0}), p_d(\sigma _s)\big) \prec \mathit {r}(\sigma _0). \end{equation} \)
The rightmost inequality ensures that a mini-trace that starts from \( {\curlyvee }(\sigma _s) \) is of lower rank than \( \sigma _0 \), and as such extends the “rank decrease” requirement to all mini-traces. Based on this restriction, we can formulate a recurrence for \( \mathit {comp}_x \) based on the initial rank \( \rho = \mathit {r}(\sigma _0) \), as follows: (5) \( \begin{equation} comp_x(\rho) \le \sum _{i=1}^{d}\mathit {comp}_x\big (\hat{{\curlyvee }}(\rho , i)\big) ~+~ (d-1) ~+~ \widehat{k} \cdot b, \end{equation} \) where b, as before, is the number of k-steps for which \( k\gt 1 \), and \( \widehat{k} \) is the bound on k (\( k\le \widehat{k} \)). The expression \( (d-1) \) represents the transitions between segments, and \( \widehat{k} \cdot b \) represents the surplus of the \( \rho \)-rank trace over the total lengths of the mini-traces.
It should be clear from the preceding definition that \( \hat{{\curlyvee }} \) is quite intricate. How would we compute it effectively? The rank decrease of the initial states and the simulation properties were local by nature and thus amenable to validation with an SMT solver. The \( \hat{{\curlyvee }} \) function is inherently global, defined w.r.t. an entire trace. This makes the property (4) challenging for verification methods based on SMT. To render this check more amenable to first-order reasoning, we introduce two special cases where the problem of checking (4) becomes easier: rank preservation and a single segment, explained next.
Taming \( \hat{{\curlyvee }} \) with rank preservation. To obtain rank preservation, we extend the rank function to all states (instead of just the initial states) and require that the rank is preserved along transitions. This is appropriate in some of the scenarios we encountered. For example, the binary counter illustration satisfies the property that along any execution \( \lbrace \sigma _{i}\rbrace _{i=0}^{\infty } \), the rank is preserved: \( r(\sigma _{i})=r(\sigma _{i+1}) \). Rank preservation means that given a switch state \( \sigma _{s} \) of an arbitrary segment i, we know that \( r(\sigma _{s})=r(\sigma _{0}) \). Once this is set, \( \hat{{\curlyvee }} \) only needs to overapproximate the rank of \( {\curlyvee }(\sigma _{s}) \) in terms of the rank of the same state \( \sigma _{s} \).
Taming \( \hat{{\curlyvee }} \) with a single segment. In this case, checking (4) reduces to a single check of the initial state, which is the only switch state. It turns out that the restriction to a single segment is still expressive enough to handle many loop types.
Putting it all together. Theoretically, \( \mathit {r} \), \( {\curlyvee } \), \( p_d \), and \( \hat{{\curlyvee }} \) can be manually written by the user. However, this is a rather tedious task, which is straightforward enough to be automated. We observed that all aforementioned functions are simple enough entities that can be expressed through a strict syntax using first-order logic. Similar to our earlier work [27], we apply a generate-and-test synthesis procedure to enumerate a space of possible expressions representing them. This process is explained in Section 4.
Outline
The rest of the article is organized as follows:
Section 3 lays out the theoretical foundations for state squeezers, their properties, and the time complexity bounds that can be obtained with them.
Section 4 describes our approach to find appropriate state squeezers for imperative programs automatically through enumerative synthesis.
Section 5 presents an empirical evaluation and benchmarks, comparing our implementation with state-of-the-art techniques based on cost relations.
Section 6 shows an extension to the method that allows it to be applicable to non-deterministic programs as well. Two alternative ways to achieve this are explored.
3 COMPLEXITY ANALYSIS BASED ON SQUEEZERS
In this section, we develop the formal foundations of our approach for extracting recurrence relations describing the time complexity of an imperative program based on state squeezers. We present the ingredients that underly the approach, the conditions they are required to satisfy, and the recurrence relations they induce. In the next section, we explain how to extract the recurrences automatically. Given the recurrence relation, a dedicated (external) tool may be applied to end up with a closed formula, similar to Albert et al. [3].
We use transition systems to capture the semantics of a program.
(Transition Systems).
A transition system is a tuple \( (\Sigma ,\mathit {init},\mathit {tr}) \), where \( \Sigma \) is a set of states, \( \mathit {init}\subseteq \Sigma \) is a set of initial states, and \( \mathit {tr}: \Sigma \rightarrow \Sigma \) is a transition function (rather than a transition relation, since, for now, only deterministic procedures are considered). The set of terminal states\( F\subseteq \Sigma \) is implicitly defined by \( \mathit {tr}(\sigma)=\sigma \). An execution trace (or a trace in short) is a finite or infinite sequence of states \( \tau = \sigma _0,\sigma _1,\ldots \) such that \( \sigma _{i+1} = \mathit {tr}(\sigma _i) \) for every \( 0 \le i \lt |\tau |-1 \). A state \( \sigma \in \Sigma \) defines an execution trace \( \tau (\sigma) = \lbrace \mathit {tr}^{i}(\sigma)\rbrace _{i \in \mathbb {N}} \). Whenever there exists an index \( 0 \le k \lt |\tau | \) s.t. \( \sigma _k \in F \), we truncate \( \tau (\sigma) \) into a finite trace \( \lbrace \mathit {tr}^{i}(\sigma)\rbrace _{i=0}^{k} \), where k is the minimal such index. The trace is initial if it starts from an initial state (i.e., \( \sigma \in \mathit {init} \)). Unless explicitly stated otherwise, all traces we consider are initial. The set of reachable states is \( \mathit {reach}= \lbrace \sigma \in \Sigma \mid \exists \sigma _0 \in \mathit {init}~.~\sigma \in \tau (\sigma _0)\rbrace \).
Roughly, to represent a program by a transition system, we translate it into a single-loop program, where \( \mathit {init} \) consists of the states encountered when entering the loop and transitions correspond to iterations of the loop. In the case of nested loops, they are first translated to a single, flat loop by introducing an auxiliary variable that serves as program counter. This transformation is standard and straightforward, so we do not delve into it.
In the sequel, we fix a transition system \( (\Sigma ,\mathit {init},\mathit {tr}) \) with a set F of terminal states and a set \( \mathit {reach} \) of reachable states.
(Complexity over States).
For a state \( \sigma \in \Sigma \), we denote by \( \mathit {comp}_{s}(\sigma) \) the number of transitions from \( \sigma \) to a terminal state along \( \tau (\sigma) \) (the trace that starts from \( \sigma \)). Formally, if \( \tau (\sigma) \) does not include a terminal state (i.e., the procedure does not terminate from \( \sigma \)), then \( \mathit {comp}_{s}(\sigma) = \infty \). Otherwise, \( \begin{equation*} \mathit {comp}_{s}(\sigma) = \min \lbrace k \in \mathbb {N}\mid \mathit {tr}^{k}(\sigma) \in F\rbrace . \end{equation*} \) The complexity function of the program maps each initial state \( \sigma _0 \in \mathit {init} \) to its time complexity \( \mathit {comp}_{s}(\sigma _0) \in \mathbb {N}\cup \lbrace \infty \rbrace \).
Our complexity analysis derives a recurrence relation for the complexity function by expressing the length of a trace in terms of the lengths of traces that start from lower-rank states. This is achieved by (i) attaching to each initial state a rank from a well-founded set that we use as the argument of the complexity function and that we recur over, and (ii) defining a squeezer that maps each state from the original trace to a state in a lower-rank trace; the mapping forms a partitioned simulation according to a partition function that decomposes a trace to segments; each segment is simulated by a (separate) lower-rank trace, allowing to express the length of the former in terms of the latter, and, finally, (iii) defining a rank bounding function that expresses (an upper bound on) the ranks of the lower-rank traces in terms of the rank of the higher-rank trace. We elaborate on these components next.
3.1 Time Complexity as a Function of Rank
We start by defining a rank function that allows us to express the time complexity of an initial state by means of its rank.
(Rank).
Let X be a set and \( \prec \) be a well-founded partial order over X. Let \( B \supseteq \min (X) \) be a base for X, where \( \min (X) \) is the set of all minimal elements of X w.r.t. \( \prec \). A rank function \( r: \mathit {init}\rightarrow X \) maps each initial state to a rank in X. We extend the notion of a rank to initial traces as follows. Given an initial trace \( \tau = \tau (\sigma _0) \), we define its rank to be the rank of \( \sigma _0 \). We refer to states \( \sigma _0 \) such that \( r(\sigma _0)\in B \) as the base states. Similarly, (initial) traces whose ranks are in B are called base traces.
In our analysis, ranks range over \( X = \mathbb {N}^m \) (for some \( m \in \mathbb {N}^{+} \)), with \( \prec \) defined by the lexicographic order. Ranks let us abstract away data inside the initial execution states, which does not affect the worst-case bound on the trace length. For example, the length of traces of the binary counter program (Figure 1) is completely agnostic to the actual content of the array at the initial state. The only parameter that affects its trace length is the array size, not which integers are stored inside it. Hence, a suitable rank function in this example maps an initial state to its array length. This is despite the fact that the execution does depend on the content of the array, and, in particular, the number of remaining iterations from an intermediate state within the execution depends on it. The partial order \( \prec \) and the base set B will be used to define the recurrence formula as we explain in the sequel.
We will assume from now on that \( (X,\prec ,B) \), as well as the rank function, are fixed, and can be understood from context. The rank function r induces a complexity function \( \mathit {comp}_{x}: X \rightarrow \mathbb {N}\cup \lbrace \infty \rbrace \) over ranks, defined as follows.
(Complexity over Ranks).
The complexity function over ranks, \( \mathit {comp}_{x}: X \rightarrow \mathbb {N}\cup \lbrace \infty \rbrace \), is defined by \( \begin{equation*} \mathit {comp}_{x}(\rho) = \max \lbrace \mathit {comp}_{s}(\sigma _0) \mid r(\sigma _0) \preceq \rho \wedge \sigma _0 \in \mathit {init}\rbrace . \end{equation*} \)
The definition ensures that for every initial state \( \sigma _{0} \in \mathit {init} \), we can compute (an upper bound on) its time complexity based on its rank, as follows: \( \mathit {comp}_s(\sigma _{0}) \le \mathit {comp}_x(r(\sigma _{0})) \). The complexity of \( \rho \) takes into account all initial states with \( \mathit {r}(\sigma) \preceq \rho \) and not only those with rank exactly \( \rho \), to ensure monotonicity of \( \mathit {comp}_x \) in the rank (i.e., if \( \rho _1 \preceq \rho _2, \) then \( \mathit {comp}_x(\rho _1) \le \mathit {comp}_x(\rho _2) \)). Our approach is targeted at extracting a recurrence relation for \( \mathit {comp}_x \).
3.2 Complexity Decomposition by Partitioned Simulation
To express the length of a trace in terms of the lengths of traces of lower ranks, we use a squeezer that maps states from the original trace to states of lower-rank traces and (implicitly) induces a correspondence between the original trace and the lower-rank trace(s). For now, we do not require the squeezer to decrease the rank of the trace; this requirement will be added later. The squeezer is accompanied by a partition function to form a partitioned simulation that allows a single higher-rank trace to be matched to multiple lower-rank traces such that their lengths may be correlated.
(Squeezer, ⋎).
A squeezer is a function \( {\curlyvee }: \Sigma \rightarrow \Sigma \).
A function \( p_d: \Sigma \rightarrow \lbrace 1,\ldots ,d\rbrace \), where \( d \in \mathbb {N}^{+} \) is called a d-partition function if for every trace \( \tau = \sigma _0,\sigma _1, \ldots \) it holds that \( p_d(\sigma _{i+1}) \ge p_d(\sigma _{i}) \) for every \( 0 \le i \lt |\tau |-1 \).
The partition function partitions a trace into a bounded number of segments, where each segment consists of states with the same value of \( p_d \). We refer to the first state of a segment as a switch state and to the last state of a finite segment as a last state (note that if \( \tau \) is infinite, its last segment has no last state). In particular, this means that the initial state of a trace is a switch state, and that terminal states are last states. In other words,
Switch states are \( \mathit {init}\cup \lbrace \mathit {tr}(\sigma) ~|~ \sigma \in \mathit {reach}\wedge p_d(\sigma) \lt p_d(\mathit {tr}(\sigma))\rbrace \), and
Last states are \( \lbrace \sigma \in \mathit {reach}~|~ \sigma \in F\vee p_d(\sigma) \lt p_d(\mathit {tr}(\sigma))\rbrace \).
For example, in Figure 3, \( \sigma _0 \) and \( \sigma _{15} \) are switch states and \( \sigma _{14} \) and \( \sigma _{30} \) are last states. (Note that a state may be a switch state in one trace but not in another, whereas a last state is a last state in any trace, as long as the same partition function is considered.)
Our complexity analysis requires the squeezer to form a partitioned simulation with respect to \( p_d \). Roughly, this means that the squeezer maps each segment of a trace to a (lower-rank) trace that “simulates” it. To this end, we require all states \( \sigma \) within a segment of a trace to be \( (h,\ell) \)-“stuttering,” for some \( h\ge 1 \) and \( h\ge \ell \ge 0 \). Stuttering lets h consecutive transitions of \( \sigma \) be matched to \( \ell \) consecutive transitions of its squeezed counterpart. If \( h= \ell \), the state \( \sigma \) contributes to the complexity the same number of steps as the squeezed state. Otherwise, \( \sigma \) contributes \( h-\ell \) additional steps, resulting in a longer trace. Recall that terminal states also have outgoing transitions (to themselves), but these transitions do not capture actual steps; they do not contribute to the complexity. Hence, stuttering also requires that “real” transitions of \( \sigma \) are matched to “real” transitions of its squeezed counterpart. This is ensured by requiring that none of the states \( \mathit {tr}^{i}({\curlyvee }(\sigma)) \) for \( i \lt \ell \) may be terminal states. For the last states of segments, the requirement is slightly different, as the simulation ends at the last state and a new simulation begins in the next segment. To account for the transition from the last state of one segment to the first (switch) state of the next segment, last states are considered \( (1,0) \)-stuttering if they are squeezed into terminal states, unless they are terminal themselves. In any other case, they are considered \( (1,1) \)-stuttering. The formal definitions follow.
(Stuttering States).
Let \( {\curlyvee } \) be a squeezer and \( p_d \) a partition function. (a) A non-last state \( \sigma \in \Sigma \) is called a \( (h,\ell) \)-stuttering state, for \( h\ge 1 \), \( h\ge \ell \ge 0 \), if (i) \( \mathit {tr}^\ell ({\curlyvee }(\sigma)) = {\curlyvee }(\mathit {tr}^{h}(\sigma)) \)3 and (ii) for every \( i \lt \ell \), \( \mathit {tr}^i({\curlyvee }(\sigma)) \not\in F \). (b) A last state \( \sigma \in \Sigma \) is said to be \( (1,0) \)-stuttering if \( \sigma \not\in F \) and \( {\curlyvee }(\sigma) \in F \); otherwise, it is \( (1,1) \)-stuttering.
For example, in Figure 3, all states are \( (1,1) \)-stuttering with the exception of \( \sigma _{14} \) (a non-terminal last state that is squeezed into a terminal state) and \( \sigma _{29} \), which are both \( (1,0) \)-stuttering. Note that in Section 2, we refer to \( \sigma _{28} \) as \( (2,1) \)-stuttering (and gloss over \( \sigma _{29} \)) since, there, to simplify the exposition, we only consider k-steps, which are a special case of \( (h,\ell) \)-stuttering pairs where \( \ell =1 \).
To obtain a partitioned simulation, switch states (along any trace), which start new segments, are further required to be squeezed into initial states (since our complexity analysis only applies to initial states). We denote by \( \mathbb {S}_{p_d}(\tau) \) the switch states of trace \( \tau \) according to partition \( p_d \) and by \( \mathbb {S}_{p_d} \) the switch states of all traces according to the partition \( p_d \). Namely, \( \mathbb {S}_{p_d} = \mathit {init}\cup \lbrace \mathit {tr}(\sigma) ~|~ \sigma \in \mathit {reach}\wedge p_d(\sigma) \lt p_d(\mathit {tr}(\sigma))\rbrace \).
(Partitioned Simulation).
We say that a squeezer \( {\curlyvee }: \Sigma \rightarrow \Sigma \) forms a \( \lbrace (h_i,\ell _i)\rbrace _{i=1}^{n} \)-partitioned simulation according to \( p_d \), denoted \( {\curlyvee }\sim \mathbb {PS}_{p_d}{ { (}} {{ \lbrace }}(h_i,\ell _i){{ \rbrace }}_{i=1}^{n} { {)}} \), if for every non-terminal reachable state \( \sigma \) we have that
When \( { { \lbrace }}(h_i,\ell _i){ { \rbrace }}_{i=1}^{n} \) is irrelevant or clear from the context, we omit it from the notation and simply write \( {\curlyvee }\sim \mathbb {PS}_{p_d} \).
A trace squeezed by \( {\curlyvee }\sim \mathbb {PS}_{p_d}\big (\lbrace (h_i,\ell _i)\rbrace _{i=1}^{n}\big) \) may have an unbounded number of \( (h_i,\ell _i) \)-stuttering states, which hinders the ability to define a recurrence relation based on the simulation. To overcome this, our complexity decomposition may use \( \widehat{k} \ge 1 \) to capture a common multiplicative factor of all stuttering pairs, with the target of leaving only a bounded number of states whose stuttering exceeds \( \widehat{k} \) and needs to be added separately. This will become important in Theorem 3.10. Given \( \widehat{k} \ge 1 \), we denote by \( {\mathbb {E}}_{\widehat{k}} \subseteq \lbrace 1,\ldots ,n\rbrace \) the set of indices such that \( \tfrac{h_i}{\ell _i} \gt \widehat{k} \) (this includes indices where \( \ell _i = 0 \), here and elsewhere).
Let \( {\curlyvee }\sim \mathbb {PS}_{p_d}\big (\lbrace (h_i,\ell _i)\rbrace _{i=1}^{n}\big), \) and \( \widehat{k} \ge 1 \). Then for every \( \sigma _0 \in \mathit {init}, \) we have that \( \begin{equation*} \mathit {comp}_{s}(\sigma _{0}) \le \sum _{\sigma \in \mathbb {S}_{p_d}(\tau (\sigma _0))} \widehat{k}\cdot \mathit {comp}_{s}({\curlyvee }(\sigma)) + \sum _{i \in {\mathbb {E}}_{\widehat{k}}} \sum _{\sigma \in \mathbb {K}_i(\tau (\sigma _0))} h_i - \ell _i {\cdot } \widehat{k,} \end{equation*} \) where \( \mathbb {K}_i\big (\tau (\sigma _0)\big) \) is the multiset of \( (h_i,\ell _i) \)-stuttering states in \( \tau (\sigma _0) \).
In the observation, the first addend summarizes the complexity contributed by all lower-rank traces while using \( \widehat{k} \) as an upper bound on the “inflation” of the traces. However, the states that are \( (h_i,\ell _i) \)-stuttering with \( \tfrac{h_i}{\ell _i} \) that exceeds \( \widehat{k} \) contribute additional \( h_i - (\ell _i \cdot \widehat{k}) \) steps to the complexity and, as a result, need to be taken into account separately. This is handled by the second addend, which adds the steps that were not accounted for by the first addend: for every such state \( \sigma ^{\prime } \), \( h_i \) steps of the higher-rank trace are matched to \( \ell _i \) steps of the lower-rank trace. The \( \ell _i \) steps of the lower-rank trace are already counted as part of \( \mathit {comp}_{s}({\curlyvee }(\sigma)) \) in the first addend, where \( \sigma \) is the switch state at the beginning of the segment of \( \sigma ^{\prime } \). Further, these steps are inflated by \( \widehat{k} \) in the first addend. Hence, we only need to add the remaining \( h_i - (\ell _i \cdot \widehat{k}) \) to the length of the higher-rank trace in the second addend. Although we use the same inflation factor \( \widehat{k} \) across the entire trace, a simple extension of the decomposition property may consider a different factor \( \widehat{k} \) in each segment. Note that the first addend always sums over a finite number of elements since the number of switch states is at most d—the number of segments. If \( \tau (\sigma _0) \) is finite, the second addend also sums over a finite number of elements.
For example, in Figure 3, \( {\curlyvee }\sim \mathbb {PS}_{p_d}\big (\lbrace (1,1),(1,0)\rbrace \big) \), \( \widehat{k}=1, \) and \( {\mathbb {E}}_{\widehat{k}}=\lbrace 2\rbrace \). The trace \( \tau (\sigma _0) \) has two switch states (\( \sigma _0 \) and \( \sigma _{15} \)), and two occurrences of states (\( \sigma _{14} \) and \( \sigma _{29}) \)) whose stuttering is \( (1,0) \) and exceeds \( \widehat{k} \). Hence, Observation 1 yields the inequality \( \mathit {comp}_s(\sigma _0) \le 1 \cdot \mathit {comp}_s(\sigma ^{\prime }_0) + 1 \cdot \mathit {comp}_s(\sigma ^{\prime \prime }_0) + 2 \cdot (1-0 \cdot 1) \), which simplifies to \( \mathit {comp}_s(\sigma _0) \le \mathit {comp}_s(\sigma ^{\prime }_0) + \mathit {comp}_s(\sigma ^{\prime \prime }_0) + 2 \). Note that in Section 2, we obtain the same inequality but in a slightly different manner, by considering the stuttering pairs \( \lbrace (1,1), (2,1)\rbrace \). There, the total surplus of 2 is charged to the state \( \sigma _{14} \) (as we do here) and to the \( (2,1) \)-stuttering state \( \sigma _{28} \), instead of to \( \sigma _{29} \)—the surplus of \( \sigma _{29} \) is rolled over to \( \sigma _{28} \).
Observation 1 considers the complexity function over states and is oblivious to the rank. In particular, it does not rely on the squeezer decreasing the rank of states. Next, we use this observation as the basis for extracting a recurrence relation for the complexity function over ranks, in which case decreasing the rank becomes important.
3.3 Extraction of Recurrence Relations over Ranks
Based on the complexity decomposition, we define recurrence relations that capture \( \mathit {comp}_x \)—the time complexity of the initial states as a function of their ranks. To go from the complexity as a function of the actual states (as in Observation 1) to the complexity as a function of their ranks, we need to express the rank of \( {\curlyvee }(\sigma _{s}) \) for a switch state \( \sigma _{s} \) as a function of the rank of \( \sigma _0 \). To this end, we define \( \hat{{\curlyvee }} \).
Given r, \( {\curlyvee }, \) and \( p_{d} \) such that \( {\curlyvee }\sim \mathbb {PS}_{p_d} \), a function \( \hat{{\curlyvee }}: X \times \lbrace 1,\ldots ,d\rbrace \rightarrow X \) is a rank bounding function if for every \( \rho \in X - B \) and \( 1 \le i \le d \), if \( \tau = \tau (\sigma _{0}) \) is an initial trace such that \( r(\sigma _0) = \rho \), and \( \sigma _{s} \in \mathbb {S}_{p_d}(\tau) \) is a switch state in \( \tau \) such that \( p_{d}(\sigma _{s}) = i \), the following holds: \( \begin{equation*} \text{ (i) upper bound: } r\big ({\curlyvee }(\sigma _s)\big) \preceq \hat{{\curlyvee }}(\rho ,i) \hspace{14.22636pt} \text{and} \hspace{14.22636pt} \text{ (ii) rank decrease: } \hat{{\curlyvee }}(\rho ,i) \prec \rho . \end{equation*} \)
In other words, Definition 3.9 requires that for every non-base initial state \( \sigma _0 \in \mathit {init} \) and switch state \( \sigma _s \) at segment i of \( \tau (\sigma _0) \), we have that \( r({\curlyvee }(\sigma _s)) \preceq \hat{{\curlyvee }}(r(\sigma _0),i) \prec r(\sigma _0) \). Recall that \( r({\curlyvee }(\sigma _s)) \) is well defined since \( {\curlyvee }(\sigma _s) \) is required to be an initial state. The definition states that \( \hat{{\curlyvee }}(\rho ,i) \) provides an upper bound on the rank of squeezed switch states in a non-base trace of rank \( \rho \). \( \mathit {comp}_x(r({\curlyvee }(\sigma))) \le \mathit {comp}_x(\hat{{\curlyvee }}(\rho ,i)) \) is ensured by the monotonicity of \( \mathit {comp}_x \). This definition also requires the rank of non-base traces to strictly decrease when they are squeezed, as captured by the “rank decrease” inequality. Rank decrease is essential for ensuring that the extracted recurrences for \( \mathit {comp}_x \) have a solution, as the recurrences, formally defined in Theorem 3.10, bound \( \mathit {comp}_x(\rho) \) in terms of \( \mathit {comp}_x(\hat{{\curlyvee }}(\rho ,i)) \); rank decrease guarantees that \( \hat{{\curlyvee }}(\rho ,i) \) is strictly smaller than \( \rho \).
For example, in Section 2, both in the case of \( d=1 \) and of \( d=2 \), we can bound the ranks of the squeezed switch states via \( \hat{{\curlyvee }}(n,i)=n\mathbin {\dot{-}}1 \). For instance, in Figures 2 and 3, we have that \( \hat{{\curlyvee }}(r(\sigma _0),1) = \hat{{\curlyvee }}(4,1)= 3 = r(\sigma ^{\prime }_0) = r({\curlyvee }(\sigma _0)) \). In Figure 3, we have, additionally, \( \hat{{\curlyvee }}(r(\sigma _0),2) = \hat{{\curlyvee }}(4,2) = 3 = r(\sigma ^{\prime \prime }_{0}) = r({\curlyvee }(\sigma _{15})) \).
Obtaining a rank bounding function, or even verifying that a given \( \hat{{\curlyvee }} \) satisfies this requirement, is a challenging task. We return to this question later in this section.
These conditions allow to substitute the states for ranks in the first addend of Observation 1 and hence obtain recurrence relations for \( \mathit {comp}_x \) over the (decreasing) ranks. To handle the second addend, we also need to bound the number of states whose stuttering, \( \tfrac{h_i}{\ell _i} \), exceeds \( \widehat{k} \). This is summarized by the following theorem.
Let \( r: \mathit {init}\rightarrow X \) be a rank function, \( {\curlyvee }: \Sigma \rightarrow \Sigma \) a squeezer, and \( p_d: \Sigma \rightarrow \lbrace 1,\ldots ,d\rbrace \) a partition function such that \( {\curlyvee }\sim \mathbb {PS}_{p_d}\big (\lbrace (h_i,\ell _i)\rbrace _{i=1}^{n}\big) \). Let \( \hat{{\curlyvee }}: X \times \lbrace 1,\ldots ,d\rbrace \rightarrow X \) be a rank bounding function w.r.t. r, \( {\curlyvee }, \) and \( p_d \). If, for some \( \widehat{k} \ge 1, \) the number of \( (h_i,\ell _i) \)-stuttering states that appear along any non-base initial trace is bounded by a constant \( b_i \in \mathbb {N} \) whenever \( i \in {\mathbb {E}}_{\widehat{k}} \), then (6) \( \begin{equation} \mathit {comp}_{x}(\rho) \le \sum _{i=1}^d \widehat{k}\cdot \mathit {comp}_{x}\big (\hat{{\curlyvee }}(\rho ,i)\big) + \sum _{i \in {\mathbb {E}}_{\widehat{k}}} b_i \cdot \big (h_i - \ell _i {\cdot } \widehat{k}\,\big). \end{equation} \)
Note that a state may be \( (h_i,\ell _i) \)-stuttering for several i’s, in which case it is sound to count it toward any of the \( b_i \)’s; in particular, we choose the one that minimizes \( h_i - \ell _i {\cdot } \widehat{k} \).
Under the premises of Theorem 3.10, if \( f: X \rightarrow \mathbb {N}\cup \lbrace \infty \rbrace \) satisfies \( f(\rho) = \sum _{i=1}^d \widehat{k}\cdot f(\hat{{\curlyvee }}(\rho ,i)) + \sum _{i \in {\mathbb {E}}_{\widehat{k}}} b_i \cdot (h_i - \ell _i \cdot \widehat{k}) \) for every \( \rho \in X - B \), and \( \mathit {comp}_x(\rho) \le f(\rho) \) for every \( \rho \in B \), then \( \mathit {comp}_{x}(\rho) \le f(\rho) \) for every \( \rho \in X \). We conclude that \( \mathit {comp}_s(\sigma _0) \le f(r(\sigma _0)) \) for every \( \sigma _0 \in \mathit {init} \).
For example, in Section 2, we bound the length of traces of the binary counter program using two recurrence relations that take as parameter the length of the array. Formally, these recurrences are obtained as follows. In the case of \( d=1 \) (Figure 2), we consider the stuttering pairs \( \lbrace (1,1),(2,1),(3,1)\rbrace \) and choose \( \widehat{k} = 3 \), hence \( {\mathbb {E}}_{\widehat{k}} = \varnothing \), leading to \( f(n) = 3 \cdot f(n-1) \). In the case of \( d=2 \) (Figure 3), we consider the stuttering pairs \( \lbrace (1,1),(1,0)\rbrace \) and choose \( \widehat{k} = 1 \), hence \( {\mathbb {E}}_{\widehat{k}} = \lbrace 2\rbrace \) and the bound on the number of occurrences of \( (1,0) \)-stuttering states is \( b_2=2 \), leading to \( f(n)=2\cdot f(n-1) + 2 \). (As explained before, in Section 2 we slightly deviate from this formulation to simplify the presentation.)
Base-case complexity. To apply Corollary 3.11, we need to accompany Equation (6) with a bound on \( \mathit {comp}_x(\rho) \) for the base ranks, \( \rho \in B \). Fortunately, this is usually a significantly easier task. In particular, the running time of the base cases is often constant, because intuitively, the following are correlated: (a) the rank, (b) the size of the underlying data structure, and (c) the number of iterations. In this case, symbolic execution may be used to obtain bounds for base cases (as we do in our work). In essence, any method that can yield a closed-form expression for the complexity of the base cases is viable. In particular, it might be possible to apply our technique on the base case as a subproblem, using a different rank function.
3.4 Establishing the Requirements of the Recurrence Relations Extraction
Theorem 3.10 defines a recurrence relation from which an upper bound on the complexity function, \( \mathit {comp}_x \), can be computed (Corollary 3.11). However, to ensure correctness, the premises of Theorem 3.10 must be verified. The requirement that \( {\curlyvee }\sim \mathbb {PS}_{p_d}(\lbrace (h_i,\ell _i)\rbrace _{i=1}^{n}) \) (see Definition 3.8) may be verified locally by examining individual (reachable) states: for any (reachable) state \( \sigma \), the check for \( (h_i,\ell _i) \)-stuttering and switch states can, and should, be done in tandem, and require only observing at most \( \max _i h_i \) transition steps from \( \sigma \) and \( \max _i \ell _i \) from \( {\curlyvee }(\sigma) \). In contrast, the property required of \( \hat{{\curlyvee }} \) is global: it requires \( \hat{{\curlyvee }}(\rho ,i) \) to provide an upper bound on the rank of any squeezed switch state that may occur in any position along any non-base initial trace whose initial state has rank \( \rho \). Similarly, the property required of the bounds \( b_i \) is also global: that the number of \( (h_i,\ell _i) \)-stuttering states along any non-base initial trace is at most \( b_i \). It is therefore not clear how these requirements may be verified in general. We overcome this difficulty by imposing additional restrictions, as we discuss next.
3.4.1 Establishing Bounds on the Number of Occurrences of Stuttering States.
Bounds on the number of occurrences per trace that are sound for every trace are difficult to obtain in general. Although clever analysis methods exist that can do this kind of accounting (e.g., [23]), we found that a stronger, simpler condition applies in many cases:
For every \( \sigma \in \mathit {reach} \), either
This restricts these cases to occur only at the beginnings and ends of segments. It implies a total bound of \( 2d{\cdot }\max _i (h_i - \ell _i{\cdot }\widehat{k}) \) on the “surplus” of any trace, and therefore we substitute this expression for the rightmost sum in Equation (6). This expression assumes the “worst case” scenario where the surplus occurs both at the beginning and at the end of every segment; in some cases, this may be tightened, such as when we can verify that the surplus never occurs at the beginning of segments, in which case we may tighten the bound on the surplus to \( d{\cdot }\max _i (h_i - \ell _i{\cdot }\widehat{k}) \). This is the case in Figure 3. In this example, the states whose stuttering exceeds \( \widehat{k} = 1 \) are the state \( \sigma _{14} \), which is a last state, and the state \( \sigma _{29} \), for which \( tr(\sigma _{29}) \) is a terminal state. Both occur at the end of segments. Since both are \( (1,0) \)-stuttering, we obtain a bound of \( d{\cdot }\max _i (h_i - \ell _i{\cdot }\widehat{k}) = 2 \) on the surplus.
More generally, it is possible to obtain the bound on the number of states whose stuttering exceeds \( \widehat{k} \) by defining an auxiliary function \( {\it fuel}: \Sigma \rightarrow \mathbb {N} \) that is non-increasing along every transition of the system and is required to decrease between \( \sigma \) and \( \mathit {tr}^{h}(\sigma) \) whenever the stuttering pair, \( (h,\ell) \), of \( \sigma \) exceeds \( \widehat{k} \). In this case, the total bound on the “surplus” of any trace is \( d \cdot \max _i (h_i - \ell _i{\cdot }\widehat{k}) \) for \( d = \max _{\sigma _0 \in \mathit {init}}{\it fuel}(\sigma _0) \).
3.4.2 Validating a Rank Bounding Function.
The definition of a rank bounding function (Definition 3.9) encapsulates two parts. Part (ii) ensures that the rank decreases: \( \hat{{\curlyvee }}(\rho ,i) \prec \rho \) for every \( \rho \in X - B \). Verifying that this requirement holds does not involve any reasoning about the states, nor traces, of the transition system. Part (i) ensures that \( \hat{{\curlyvee }} \) provides an upper bound on the rank of squeezed switch states. Formally, it requires that \( r({\curlyvee }(\sigma _s)) \preceq \hat{{\curlyvee }}(r(\sigma _0),i) \) for every switch state \( \sigma _s \) in segment \( i \in \lbrace 1,\ldots ,d\rbrace \) along a trace that starts from a non-base initial state \( \sigma _0 \). Namely, it relates the rank of the squeezed switch state, \( {\curlyvee }(\sigma _s) \), to the rank of the initial state, \( \sigma _0 \), where no bound on the length of the trace between the initial state \( \sigma _0 \) and the switch state \( \sigma _s \) is known a priori. As such, it involves global reasoning about traces. We identify two cases in which such reasoning may be avoided: (i) the partition \( p_d \) consists of a single segment (i.e., \( d=1 \)), or (ii) the rank function extends to any state (and not just the initial states), while being preserved by \( \mathit {tr} \). In both of these cases, we are able to verify the correctness of \( \hat{{\curlyvee }} \) locally.
A single segment. In this case, the only switch state along a trace is the initial state, and hence the upper-bound requirement of \( \hat{{\curlyvee }} \) boils down to the requirement that for every \( \sigma _0 \in \mathit {init} \) such that \( r(\sigma _0) \in X - B \), we have that \( r({\curlyvee }(\sigma _0)) \preceq \hat{{\curlyvee }}(r(\sigma _0),1) \). This is the case in Figure 2, for instance.
Let \( r, \) \( {\curlyvee }, \) and \( p_{1}: \Sigma \rightarrow \lbrace 1\rbrace \) such that \( {\curlyvee }\sim \mathbb {PS}_{p_1} \). Then, \( \hat{{\curlyvee }}: X \times \lbrace 1\rbrace \rightarrow X \) satisfies the upper-bound requirement of a rank bounding function if and only if \( r({\curlyvee }(\sigma _0)) \preceq \hat{{\curlyvee }}(r(\sigma _0),1) \) for every \( \sigma _0 \in \mathit {init} \) such that \( r(\sigma _0) \in X - B \).
Rank preservation. Another case in which the upper-bound property of \( \hat{{\curlyvee }} \) may be verified locally is when the r can be extended to all states while being preserved by \( \mathit {tr} \).
A function \( \hat{r}: \Sigma \rightarrow X \) extends the rank function \( r: \mathit {init}\rightarrow \Sigma \) if \( \hat{r} \) agrees with r on the initial states—that is, \( \hat{r}(\sigma _0) = r(\sigma _0) \) for every initial state \( \sigma _0\in \mathit {init} \). The extended rank function \( \hat{r} \) is preserved by \( \mathit {tr} \) if for every reachable state \( \sigma \) we have that \( \hat{r}(\mathit {tr}(\sigma)) = \hat{r}(\sigma) \).
Preservation of \( \hat{r} \) by \( \mathit {tr} \) ensures that all states along an initial trace share the same rank. In particular, for a reachable switch state \( \sigma _s \) that lies along \( \tau (\sigma _0) \), rank preservation ensures that \( \hat{r}(\sigma _s) = \hat{r}(\sigma _0) = r(\sigma _0) \) (the last equality is due to the extension property), allowing us to recover the rank of \( \sigma _0 \) from the rank of \( \sigma _s \). Therefore, the upper-bound requirement of \( \hat{{\curlyvee }} \) simplifies into the local requirement that for every reachable switch state \( \sigma _s \) such that \( \hat{r}(\sigma _s) \in X - B \), we have that \( \hat{r}({\curlyvee }(\sigma _s)) \preceq \hat{{\curlyvee }}(\hat{r}(\sigma _s),i) \), for every \( i \in \lbrace 1,\ldots ,d\rbrace \).
Let \( r, \) \( {\curlyvee }, \) and \( p_{d}: \Sigma \rightarrow \lbrace 1,\ldots ,d\rbrace \) such that \( {\curlyvee }\sim \mathbb {PS}_{p_d} \). Suppose that \( \hat{r}: \Sigma \rightarrow X \) extends r and is preserved by \( \mathit {tr} \). Then, \( \hat{{\curlyvee }}: X \times \lbrace 1,\ldots ,d\rbrace \rightarrow X \) satisfies the upper-bound requirement of a rank bounding function if and only if \( \hat{r}({\curlyvee }(\sigma _s)) \preceq \hat{{\curlyvee }}(\hat{r}(\sigma _s),i) \) for every reachable switch state \( \sigma _s \) such that \( \hat{r}(\sigma _s) \in X - B \) and for every \( i \in \lbrace 1,\ldots ,d\rbrace \).
For example, in Figure 3, we extend r from initial states to all states using the same definition, \( \hat{r}(\langle n,i,c\rangle) = n \). Since n is not changed by the program, it is sufficient to check the upper bound property of \( \hat{{\curlyvee }} \) locally on the switch states, which are, in the example trace, \( \sigma _0 \) and \( \sigma _{15} \). For \( \sigma _0 \), the check is already local since it is an initial state. However, for \( \sigma _{15} \), rank preservation allows us to check the simpler, local, property \( \hat{{\curlyvee }}(\hat{r}(\sigma _{15}),2) = \hat{{\curlyvee }}(4,2) = 3 = \hat{r}({\curlyvee }(\sigma _{15})) = \hat{r}(\sigma ^{\prime \prime }_{0}) \), as opposed to checking \( \hat{{\curlyvee }}(r(\sigma _0),2) \le r({\curlyvee }(\sigma _{15})) \), where we need to track the relation between \( \sigma _0 \) and \( \sigma _{15} \). We use this simplification to extend the verification to all traces.
The notion of a partitioned simulation requires a switch state \( \sigma _s \) to be squeezed into an initial state. This requirement may be relaxed into the requirement that \( \sigma _s \) is squeezed into a reachable state \( {\curlyvee }(\sigma _s) \), provided that we are able to still ensure that the rank of (some) initial state \( \sigma ^{\prime }_0 \) leading to \( {\curlyvee }(\sigma _s) \) is smaller than the rank of the trace on which \( \sigma _s \) lies, and that the rank of \( \sigma ^{\prime }_0 \) is properly captured by \( \hat{{\curlyvee }} \). One case in which this is possible is when r is extended to \( \hat{r} \) that is preserved by \( \mathit {tr} \), as in this case \( \hat{r}({\curlyvee }(\sigma _s)) = \hat{r}(\sigma ^{\prime }_0) = r(\sigma ^{\prime }_0) \).
This section described local properties that ensure that a given program satisfies the requirements of Theorem 3.10. The locality of the properties facilitates the use of SMT solvers to perform these checks automatically. This is a key step for effective application of the method.
3.5 Trace-Length vs. State-Size Recurrences with Squeezers
A plethora of work exists for analyzing the complexity of programs (see Section 7 for a discussion of related works). Most existing techniques for automatic complexity analysis aim to find a recurrence relation on the length of the execution trace, relating the length of a trace from some state to the length of the remaining trace starting at its successor. These are recurrences on time, if you will, whereas our approach generates recurrences on the state size (captured by the rank). Is our approach completely orthogonal to preceding methods? Not quite. It turns out that from a conceptual point of view, our approach can formulate a recurrence on time as well, as we demonstrate in this section.
Obtaining trace-length recurrences based on state squeezers. The key idea is to use \( \mathit {tr} \) itself as a squeezer that squeezes each state into its immediate successor. Putting aside the initial-anchor requirement momentarily, such a squeezer forms a partitioned simulation with a single segment (i.e., \( p_d \equiv 1 \)), in which all states along a trace are \( (1,1) \)-stuttering, except for the penultimate one (if the trace is finite), which is \( (1,0) \)-stuttering. Recall that squeezers must also preserve initial states (see Definition 3.8), a property that may be violated when \( {\curlyvee }= \mathit {tr} \), as the successor of an initial state is not necessarily an initial state. We restore the initial-anchor property by setting \( \widehat{\mathit {init}} = \Sigma \)—that is, every state is considered an initial state.4
A consequence of this definition is that \( \mathit {comp}_x \) will now provide an upper bound on the time complexity of every state and not only of the initial states, in terms of a rank that needs to be defined. If we further define a rank bounding function \( \hat{{\curlyvee }}, \) we may extract a recurrence relation of the form \( \begin{equation*} \mathit {comp}_{x}(\rho) = \mathit {comp}_{x}(\hat{{\curlyvee }}(\rho))+1 \end{equation*} \) (we use \( \hat{{\curlyvee }}(\rho) \) as an abbreviation of \( \hat{{\curlyvee }}(\rho ,1) \), since this is a special case where \( d=1 \)).
Defining the rank and the rank bounding function. Recall that the rank \( \mathit {r}:\Sigma \rightarrow X \) captures the features of the (initial) states that determine the complexity. To allow maximal precision, especially since all states are now initial, we set X to be the set of states \( \Sigma \), and define \( \mathit {r} \) to be the identity function, \( \mathit {r}(\sigma) = \sigma \). With this definition, \( \mathit {comp}_x \) and \( \mathit {comp}_s \) become one. Next, we need to define \( \prec \) and B while ensuring that \( {\curlyvee } \) squeezes the (non-base) initial states, which are now all the states, into states of a lower rank according to \( \prec \). Since squeezers act like transitions now, having that \( {\curlyvee }=\mathit {tr} \), they have the effect of decreasing the number of transitions remaining to reach a terminal state (provided that the trace is finite). We use this observation to define \( {\prec } \subseteq \Sigma \times \Sigma \). Care is needed to ensure that \( (\Sigma ,\prec) \) is well founded—that is, every descending chain is finite, even though the program may not terminate. Here is the definition that achieves this goal: (7) \( \begin{equation} \sigma _{1} \prec \sigma _{2} ~\Leftrightarrow ~ \mathit {comp}_{s}(\sigma _{1}) \lt comp_{s}(\sigma _{2}). \end{equation} \)
Since \( {\curlyvee }=\mathit {tr} \) does not decrease \( \mathit {comp}_s \) for states that belong to infinite (non-terminating) traces (\( \mathit {comp}_s({\curlyvee }(\sigma)) = \mathit {comp}_s(\sigma) = \infty \), hence \( {\curlyvee }(\sigma) \not\prec \sigma \)), they must be included in B, together with the terminal states, which are minimal w.r.t. \( \prec \). Namely, \( B = F\cup \lbrace \sigma \mid \mathit {comp}_s(\sigma) = \infty \rbrace \). Technically, this means that the base of the recurrence needs to define \( \mathit {comp}_x \) for these states.
The final piece in the puzzle is setting \( \hat{{\curlyvee }}= \mathit {tr} \). Since \( {\curlyvee }\sim \mathbb {PS}_{p_d}\big (\lbrace (1,1),(1,0)\rbrace \big) \) (when \( \widehat{\mathit {init}} = \Sigma \)), where the number of \( (1,0) \)-stuttering states that appear along any non-base initial trace is bounded by 1, we may use Theorem 3.10, setting \( \widehat{k} =1 \), to derive the following recurrence relation, which reflects induction over time: \( \begin{equation*} \mathit {comp}_{x}(\sigma) = \mathit {comp}_{x}(\mathit {tr}(\sigma))+1. \end{equation*} \)
The preceding formulation represents a degenerate, naive choice of ingredients for the sake of a theoretical construction, whose purpose is to lay the foundation for a general framework that takes its strengths from both induction over time and induction over rank. This construction does not exploit the full flexibility of our framework. In particular, ranking functions obtained from termination proofs, as used in the work of Albert et al. [5], may be used to augment the rank in this setting. Further, invariants inferred from static analysis can be used to refine the recurrences.
4 SYNTHESIS
So far we have assumed that the rank function r, partition function \( p_{d} \), squeezer \( {\curlyvee }, \) and a rank bounding function \( \hat{{\curlyvee }} \) are all readily available. Clearly, they are specific to a given program. It would be too tedious for a programmer to provide these functions for the analysis of the underlying complexity. In this section, we show how to automate the process of obtaining \( (\mathit {r},p_{d},{\curlyvee },\hat{{\curlyvee }}) \) for a class of typical looping programs. We take advantage of the fact that these components may be compact even in cases where other kinds of auxiliary functions commonly used for resource analysis, such as monotonically decreasing measures used as ranking functions, are complicated. For example, a ranking function for the binary counter program shown in Figure 1 is \( \begin{equation*} m(n,i,c) = \left(n\cdot \sum _{j=0}^{n-1} 2^j\cdot c[j]\right) + (2^i-1) + (n-i), \end{equation*} \) whereas the rank, partition, \( {\curlyvee }, \) and \( \hat{{\curlyvee }} \) are \( \begin{eqnarray*} \mathit {r}(n,i,c) = n & {\curlyvee }(n,i,c) = \big (n-1, (i\ge n)~?~i-1:i, c[{:\,}n-1]\big) \\ \hat{{\curlyvee }}(\rho) = \rho - 1 & p_d(n,i,c) = (i \ge n~||~c[n-1])~?~2~{:}~1. \end{eqnarray*} \) This enables the use of a relatively naive enumerative approach of multi-phase generate-and-test, employing some early pruning to discard obviously non-qualifying candidates.
4.1 SyGuS
The generation step of the synthesis loop applies syntax-guided synthesis (SyGuS [8]). Like any other SyGuS method, defining the underlying grammars is more art than science. It should be expressive enough to capture the desired terms but strict enough to effectively bound the search space. The grammars in Figures 4 and 5 describe essentially infinite spaces of expressions, but, as is customary in SyGuS, a bound on the depth of the expressions is imposed. In our preliminary experiments described in Ssection 5, a bound of 2 has been used.
Fig. 4. Grammar for paritition functions ( \( p_{d} \) ) generated by our SyGuS procedure.
Fig. 5. Grammars for candidate squeezers ( \( {\curlyvee } \) ) and rank bounding functions ( \( \hat{{\curlyvee }} \) ). “vexp” denotes integer expressions, and “bexp” denotes Boolean conditions, drawn from combinations of program variables, integer constants, and arithmetic and logical operators (the specific expressions used in our experiments are described in Section 5.1).
Ranks are taken from \( \mathbb {N}^{m} \), where \( m \in \lbrace 1,2,3\rbrace , \) and \( \prec \) is the usual lexicographic order. The rank function \( \mathit {r} \) comprises one expression for each coordinate, constructed by adding/subtracting integer variables and array sizes. Boolean variables are not used in rank expressions.
Partition functions \( p_d \). Our implementation currently supports a maximum number of two segments. This means that the partition function only assigns the values 1 and 2, and we synthesize it by generating a condition over the program’s variables, \( \mathit {cond} \), that selects between them: \( p_d(\sigma) = \mathit {cond}(\sigma)~?~2~{:}~1 \). Figure 4 shows the syntax of expressions used for defining partition functions.
Handling up to two segments is not an inherent limitation, but it is sufficient for our examples. Squeezers \( {\curlyvee } \) are the only ingredient that requires substantial synthesis effort. We represent squeezers as small loop-free imperative programs, which are natural for representing state transformations. We use a rather standard syntax with “if-then-else” and assignments, plus a \( \texttt {remove-adjust} \) operation that removes array entries and adjusts indices relating to them accordingly. We choose a minimal, loop-free fragment of the C programming language, as a natural representation of state-to-state mappings. The corresponding grammar is listed in Figure 5. An important feature of the grammar is the “remove” operation used to squeeze array stores.
Rank bounding functions \( \hat{{\curlyvee }} \). With a well-chosen squeezer \( {\curlyvee } \), it suffices to consider quite simple rank bounds for the mini-traces. Hence, the rank bounds defined by \( \hat{{\curlyvee }} \) are obtained by adding, subtracting, and multiplying variables with small constants (for each coordinate of the rank). This is shown by the grammar in Figure 5. Similar to the choice of ranks, targeting simple expressions for \( \hat{{\curlyvee }} \) helps reduce the complexity of the final recurrence that is generated from the process.
4.2 Verification
For the sake of verifying the synthesized ingredients, we fix a set \( \lbrace (h_i,\ell _i)\rbrace _{i=1}^n \) of stuttering pairs and check the requirements of Theorem 3.10 as discussed in Section 3.4. In particular, we check that \( p_d \) is weakly monotone (i.e., that \( \mathit {cond} \) cannot change from \( \mathsf {true} \) to \( \mathsf {false} \) in any step of \( \mathit {tr} \)). Note that some of the properties may be used to discriminate some of the ingredients independent of the others. For example, the simulation requirement only depends on \( {\curlyvee } \) and \( p_d \). As such, it may be used to completely eliminate such candidates, for any choice of r and \( \hat{{\curlyvee }} \). Similarly, the rank decrease property can be used to eliminate some choices of \( \hat{{\curlyvee }} \) irrespective of the other ingredients. Initial screening. Each candidate tuple \( \mathit {r},p_d,{\curlyvee },\hat{{\curlyvee }} \) is first examined against a pool of concrete traces. If one of the properties is violated by any of the states along these traces, the candidate is discarded. Unbounded verification. Once candidates pass the preliminary screening phase, they are verified by encoding the program and all components \( \mathit {r},p_d,{\curlyvee },\hat{{\curlyvee }} \) as first-order logic expressions, and using an SMT solver (Z3 [18]) to verify that the requirements are fulfilled for all traces of the program. As mentioned in Section 3.4, all checks are local and require observing a bounded set of steps starting from a given \( \sigma \). The only facet of the criteria that is difficult to encode is the fact they are required of the reachable states (and not any state). Of course, if we are able to ascertain that these are met for all \( \sigma \in \Sigma \), including unreachable states, then the result is sound. However, for some programs and squeezers, the required properties (especially simulation) do not hold universally but are violated by unreachable states. To cope with this situation without having to manually provide invariants that capture properties of the reachable states, we use a CHC solver, Spacer [29], which is part of Z3, to check whether all reachable states in the unbounded-state system induced by the input program satisfy these properties. This can be seen as a reduction from the problem of verifying the premises of Theorem 3.10 to that of verifying a safety property.
Figure 6 demonstrates that in some cases, trying to prove the simulation property for all states is indeed too strong. The following is a valid solution for \( {\curlyvee } \) (with a single segment, \( d=1 \)): \( \begin{equation*} \begin{array}{@{}llll@{}} {\curlyvee }~{~\widehat{=}~}~ {\color{blue}{\texttt {if}}} \ \ (m+1=n \text{$\texttt {) \{ }$}~ i=i-1; m=m-1; n=n-1;~{\color{blue}{\texttt {else}}}~ n=n-1; ~\text{$\texttt { \} }$.} \end{array} \end{equation*} \) However, the simulation property (Definition 3.7) is violated, for example, if we start from this concrete state, \( \begin{equation*} \check{\sigma }=\big (n\mapsto 6,m\mapsto 5,\mathit {dir}\mapsto \mathsf {true},i\mapsto 1\big), \end{equation*} \) since \( {\curlyvee }(\mathit {tr}^{h}\!(\check{\sigma })) = \big (n{\hspace{1.0pt}\mapsto \hspace{1.0pt}}5,m{\hspace{1.0pt}\mapsto \hspace{1.0pt}}4,\mathit {dir}{\hspace{1.0pt}\mapsto \hspace{1.0pt}}\mathsf {true},i{\hspace{1.0pt}\mapsto \hspace{1.0pt}} \) h\( \big) \), but \( \mathit {tr}^{\ell }\!({\curlyvee }(\check{\sigma })) = \big (n{\hspace{1.0pt}\mapsto \hspace{1.0pt}}5,m{\hspace{1.0pt}\mapsto \hspace{1.0pt}}4,\mathit {dir}{\hspace{1.0pt}\mapsto \hspace{1.0pt}}\mathsf {true},i{\hspace{1.0pt}\mapsto \hspace{1.0pt}} \) 0\( \big) \), which, for \( h\ge \ell \ge 1 \), are not equal. A closer look reveals that this is not a reachable state, since the following property is a loop invariant that is violated by \( \check{\sigma } \): \( \begin{equation*} dir \Rightarrow (i \ge m). \end{equation*} \) Spacer manages to automatically discover an appropriate invariant, which eliminates this and other spurious counterexamples, establishing the simulation property for all reachable states.
Fig. 6. A case demonstrating that the reachability assumption is, in fact, required for successful verification of the simulation property Definition 3.7. Here, an unreachable state exists that violates the property. Restricting the check to reachable states (using Spacer) allows the verification to succeed.
5 EMPIRICAL EVALUATION
We implemented a prototype of our complexity analyzer as a publicly available tool, SqzComp, that receives a program in a subset of C and produces recurrence relations. SqzComp is written in C++, using the Z3 C++ API [18], and using Spacer [29] via its SMTLIB2-compatible interface. For the base case of generated recurrences, we use the symbolic execution engine KLEE [12] to bound the total number of iterations by a constant.
5.1 Implementation
Our enumerative synthesis implementation prioritizes simpler expressions for all of the components, as they lead to a lighter burden on the SMT solver based checks and also contribute to simpler recurrences if they succeed. For example, prioritizing one-dimensional (scalar) ranks makes sense, since eventually the rank determines the structure of the recurrence relation. For similar reasons, we prioritize rank bounding functions \( \hat{{\curlyvee }} \) that do not involve disjunctive bounds (i.e., conditionals). For partition functions, \( p_{d} \equiv 1 \) (no switch states) is considered first, and failing that, non-trivial partitions are tried (the search space is finite since we restrict the size of all expressions, as well as the set of constants that may be used). Following the guidelines of Lemma 3.14, a non-trivial \( p_d \) raises the need for rank preservation, which imposes further restrictions on \( \mathit {r} \) (which becomes, in fact, \( \hat{\mathit {r}} \)). Unlike the other ingredients, there is no clear way to prioritize the squeezer enumeration. A general rule of thumb says that the “simpler” the expressions are, the easier it will be to handle them during verification. Empirically, however, changing the enumeration order had no significant effect on the verification time.
Limitations.
Our prototype implementation only handles single-loop programs where \( \texttt {if-then-else} \) statements are at the outermost level of the loop body (we manually carry out the general transformation of more complex control structures to this form). We note that the automatic synthesis of a squeezer, rank, and rank bounding function is currently the most significant barrier for scaling the approach. In the implementation, partitioned simulations are restricted to the stuttering pairs \( (1,1) \), \( (1,0) \), \( (2,1) \), \( (3,2) \), \( (4,3) \). These are relevant both for the screening phase and for the verification phase. For integer constants used in synthesized expressions, a small set of pre-defined values is considered.
Implementation Details.
The implementation consists of several steps, pertaining to the steps discussed in Section 4. Screening by random traces. To enable fast screening of false candidates \( (\mathit {r},p_{d},{\curlyvee },\hat{{\curlyvee }}) \), we generate a set \( \mathbb {P}_{\mathit {init}} \) of approximately 100 initial states (as we explain in the sequel). We repeatedly apply the transition function \( \mathit {tr} \) on each \( \sigma _{0} \in \mathbb {P}_{\mathit {init}} \), to construct \( \tau (\sigma _{0}) \) until either it reaches a terminal state and the trace is inserted to the pool of traces, \( \mathbb {P}_{\tau } \), or it violates some max threshold of iterations, \( |\tau (\sigma _{0})| \gt \mathbf {M} \) and is excluded from the pool (in which case, additional initial states may be added to \( \mathbb {P}_{\mathit {init}} \)). We generate \( \mathbb {P}_{\mathit {init}} \) by sampling random concrete states and checking whether they satisfy the initial conditions. If randomization fails to provide an order of magnitude of 100 “bona fide” initial states within some maximal number of attempts, we encode the initial conditions as an SMT query and attempt to find a set of distinct initial states by repeatedly running this query, excluding the ones that are already in the pool.
Figure 7 shows an example that is bound to fail a random search for initial states (due to the use of specific initial values for the variables) but is handled easily with an SMT solver. We still favor using random sampling whenever possible, as it tends to produce a more uniform distribution of the initial states. As an example, Figure 8 shows a code example with a trivial initial condition (\( \mathit {init}= \Sigma \)), where the initial states produced by the SMT solver always satisfy \( x\lt y\lt n\lt m \), which limits the traces in \( \mathbb {P}_\tau \) to expose only some of the paths inside the loop (y is always smaller than m). Of course, real code can suffer from both limitations, which might harden the task of finding initial states. This is a generally known problem [16, 33] that is not the focus of this article.
Fig. 7. Naive randomization fails to satisfy initial conditions of the program since hitting an initial state happens with low probability.
Fig. 8. Distinct initial states extracted from subsequent calls to the Z3 SMT solver may appear “innocently random.” In fact, they are not. All returned initial states satisfied \( x\lt y\lt n\lt m \) , thus limiting the traces pool explored from \( \mathbb {P}_{\mathit {init}} \) .
SMT-based verification. To verify the candidates that passed the initial screening procedure, we encode the program and the properties in SMT. Our analyzer extracts the body of the loop and employs a standard translation to represent it as a transition function \( \mathit {tr} \) using SMTLIB2 operations, and constructs a representation of \( \mathit {init} \) as a Boolean formula based on the path up to and including the loop pre-header. We encode integer and Boolean variables with their corresponding SMT native sorts and use additional constraints to handle unsigned integers. As for arrays, we experimented with two different encodings: using the theory of arrays and the theory of strings (sequences). Although the theory of arrays seems like the natural choice, our squeezers may remove elements from the array (see Figure 5), which is easier to encode using string operations. In our experiments, we found that it is beneficial to restrict squeezers to only remove the first or last elements of an array, resulting in an efficient encoding using the theory of arrays. Given this encoding, the use of SMT arrays proved to be superior. Base case. In cases where there is a small, finite number of paths (as is expected in the base case for the right rank), KLEE converges almost instantaneously. Consequently, we set a small timeout (1 second) for computing the bound on the time complexity of the base case.
5.2 Experiments
We evaluated our tool, SqzComp, on a variety of benchmark programs taken from the work of Flores-Montoya [21], as well as three additional programs: the binary counter example from Section 2, a subsets example described in Section 5.3, and an example computing monotone sequences. These examples exhibit intricate time complexities. From the benchmark suite of Flores-Montoya [21], we filtered out non-deterministic programs, as well as programs that failed syntactic constraints that our frontend cannot currently handle. We compared SqzComp to CoFloCo [21]—a state-of-the-art tool for complexity analysis of imperative programs. Table 1 summarizes the results of our experiments. The first column presents the name of the program, which describes its characteristics (each of the “two-phase loop” programs consists of a loop with an if statement, where the branch executed changes starting from some iteration). The second column specifies the real complexity, whereas the following two columns present the bounds inferred by SqzComp and by CoFloCo, respectively. (For SqzComp, the reported bounds are the solutions of the recurrences output by the tool.) The fourth and fifth columns respectively present the analysis running time and the number of segments used in the analysis of SqzComp. CoFloCo’s analysis time is always in the order of magnitude of 0.1 second, whether it succeeds to find a complexity bound or not. Our analysis is considerably slower, mostly due to the naive implementation of the synthesizer. The runtime of our tool is dominated by the enumerative screening phase. The time required to verify the candidates that pass the screening phase is negligible (even when Spacer is called). When both CoFloCo and SqzComp succeed, the bounds inferred by CoFloCo are sometimes tighter. However, SqzComp manages to find tight complexity bounds for the new examples, which are not solved by CoFloCo and, to the best of our knowledge, are beyond reach of existing tools. (We also encoded the new examples as OCaml programs and ran the tool of Hoffmann et al. [25] on them, and it failed to infer bounds.) Although it may seem from Table 1 that the number of segments in the partition (d) determines whether the extracted complexity bound is polynomial or exponential, this is coincidental. For example, in Section 2, we obtain a bound of \( O(3^n) \) for the counterexample with a single segment in the partition. In general, the complexity class inferred by our approach depends on the combination of the number of segments, the structure of the rank, the stuttering pairs, and the rank bounding functions.
Table 1. Experimental Results
5.3 Case Study: Subsets Example
This section presents one challenging example from our benchmarks, the subsets example, and the details of its complexity analysis. Notably, our method is able to infer a binomial bound, which is asymptotically tight.
The code, shown in Figure 9, iterates over all subsets of \( \lbrace \)\( \texttt {m,...,n-1} \)\( \rbrace \) of size \( \texttt {k} \). The “current” subset is maintained in an array \( \texttt {I} \) whose length is \( \texttt {k} \), and which is always sorted, thus avoiding generating the same set more than once. The first k iterations of the loop fill the array with values \( \lbrace \)\( \texttt {m,m+1,...,m+k-1} \)\( \rbrace \), which represent the first subset generated. This is taken care of by the branches at lines 5 and 6 that perform a “right fill” phase, filling in the array with an ascending sequence starting from \( \texttt {m} \) at \( \texttt {I[0]} \). Once the first k iterations are done, \( \texttt {j} \) reaches the end of the array (\( \texttt {j=k} \)) and so the next iteration will execute line 4, turning off the flag \( \texttt {f} \), signifying that the array should now be scanned leftward. In each successive iteration, \( \texttt {j} \) is decreased, looking for the rightmost element that can be incremented. For example, if \( n=8, I=[2,6,7] \), this rightmost element is \( I[0]=2 \). After that element is incremented, the flag \( \texttt {f} \) is turned on again, completing the “left scan” phase and starting a “right fill” phase. A univariate recurrence. Consider the rank function \( r(I,n,k,m,j,f) = n-m, \) defined with respect to \( (\mathbb {N},\lt) \), and the squeezer shown below the program in Figure 9. The squeezer observes the first element of the array: if it is equal to m (the lower bound of the range), it removes it from the array, shrinking its size (k) by 1. It then adjusts the index j to keep pointing to the same element, unless \( j=0 \), in which case that element is removed. This squeezer forms a 2-partitioned simulation, as illustrated by the traces in Figure 10. All states are \( (1,1) \)-stuttering, except for \( \sigma _0 \), which is \( (2,1) \)-stuttering, as caused by the removal of \( I[0] \) when \( j=0 \). The rank bounding function is \( \hat{{\curlyvee }}(i,\rho) = \rho -1 \) for \( i \in \lbrace 1,2\rbrace \). We therefore obtain the following recurrence relation: \( \begin{equation*} \mathit {comp}_x(\rho) \le 1 + \mathit {comp}_x(\rho -1) + \mathit {comp}_x(\rho -1). \end{equation*} \) The base of the recurrence is \( \mathit {comp}_x(0) = 1 \), leading to the solution \( \mathit {comp}_x(\rho) \le 2^{\rho +1} - 1 \). This means that for an initial state, \( \mathit {comp}_s(I,n,k,m,0,\mathsf {true}) \le \mathit {comp}_x(n-m) \le 2^{n-m+1}-1 \).
Fig. 9. An example program that produces all subsets of \( \lbrace m,\ldots ,n-1\rbrace \) of size k; at the bottom is the synthesized squeezer.
Fig. 10. An illustration of the 2-partitioned simulation for the subsets example. In the univariate case, the rank of the upper trace is \( n-m \) and that of the lower traces is \( n-m-1 \) . In the multivariate case, the upper trace is of rank \( (n-m,k) \) , and lower traces of ranks \( (n-m-1, k-1) \) and \( (n-m-1, k) \) .
A multivariate recurrence. Consider an alternative rank definition \( r(I,n,k,m,j,f) = (n-m,k) \) defined with respect to \( (\mathbb {N}\times \mathbb {N}, \lt) \), where “\( \lt \)” denotes the lexicographic order, together with the same squeezer and partition as before. The rank bounding function is now \( \begin{equation*} \hat{{\curlyvee }}((\rho _1,\rho _2),i) = \left\lbrace \begin{array}{@{}ll@{}} (\rho _1 - 1, \rho _2 - 1) & i = 1 \\ (\rho _1 - 1, \rho _2) & i = 2 \end{array} \right. \end{equation*} \) The corresponding recurrence relation is \( \begin{equation*} \mathit {comp}_x(\rho _1,\rho _2) \le 1 + \mathit {comp}_x(\rho _1 - 1, \rho _2 - 1) + \mathit {comp}_x(\rho _1 - 1, \rho _2), \end{equation*} \) with base \( \mathit {comp}_x(0,\_) = 1 \), resulting in the solution \( \mathit {comp}_x(\rho _1,\rho _2) \le \binom{\rho _1 + 2}{\rho _2} \). In other words, for an initial state, \( \mathit {comp}_s(I,n,k,m,0,\mathsf {true}) \le \mathit {comp}_x(n-m,k) \le \binom{n-m + 2}{k} \). Interestingly, this example demonstrates that the same squeezer may yield different recurrences when different ranks (and rank bounding functions) are considered. It also demonstrates a case where different segments of a trace are mapped to mini-traces of a different rank.
6 COMPLEXITY ANALYSIS FOR NON-DETERMINISTIC PROGRAMS
In this section, we extend our approach for extracting recurrence relations describing the time complexity of a program to handle non-deterministic programs. We consider two approaches. In Section 6.1, we use a transition relation rather than a transition function to model non-determinism; we adapt the definition of a partitioned simulation accordingly. In Section 6.2, we use (infinite) input arrays that store the non-deterministic choices to model non-determinism, thus reducing the complexity analysis of non-deterministic program to the analysis of deterministic programs with additional inputs. In this case, the program being analyzed is deterministic, thus its behavior is still captured by a transition function, but the transition function depends on the contents of the arrays of non-deterministic choices.
6.1 Partitioned Simulation for Transition Relations
The standard approach to modeling the semantics of non-deterministic programs as transition systems uses a transition relation to capture the transitions (steps) of the program.
(Non-deterministic Transition Systems).
A non-deterministic transition system is a tuple \( (\Sigma ,\mathit {init},\mathit {TR}) \), where \( \Sigma \) and \( \mathit {init}\subseteq \Sigma \) are sets of states and initial states, as in Definition 3.1, and \( \mathit {TR}\subseteq \Sigma \times \Sigma \) is a transition relation (as opposed to a transition function). We assume that the transition relation is right-total—that is, for every \( \sigma \in \Sigma , \) there exists \( \sigma ^{\prime } \in \Sigma \) such that \( (\sigma ,\sigma ^{\prime })\in \mathit {TR} \) (this may be obtained by adding self-transitions to states that have no outgoing transitions). The set of terminal states\( F\subseteq \Sigma \) is implicitly defined by \( \lbrace \sigma \mid (\sigma ,\sigma ^{\prime }) \in \mathit {TR}\Longleftrightarrow \sigma ^{\prime } = \sigma \rbrace \)—that is, the states that only transition to themselves. An execution trace (or a trace in short) is a finite or infinite sequence of states \( \tau = \sigma _0,\sigma _1,\ldots \) such that \( (\sigma _i,\sigma _{i+1}) \in \mathit {TR} \) for every \( 0 \le i \lt |\tau |-1 \). We use \( \tau _k \) to denote \( \sigma _k \) (i.e., the \( k+1 \)-th state along \( \tau \)). When there exists an index \( 0 \le k \lt |\tau | \) s.t. \( \tau _k \in F \), we say that \( \tau \) is terminating and we truncate \( \tau \) into a finite trace \( \sigma _0 .. \sigma _k \), where k is the minimal such index. A state \( \sigma \in \Sigma \) defines a set of execution traces \( T(\sigma) \) that consists of all traces that start at \( \sigma \), where terminating traces are truncated as presented previously. A trace is initial if it starts from an initial state (i.e., \( \sigma \in \mathit {init} \)). Unless explicitly stated otherwise, all traces we consider are initial. The set of reachable states is \( \mathit {reach}= \lbrace \sigma \in \Sigma \mid \exists \sigma _0 \in \mathit {init}~, \tau \in T(\sigma _0).~\sigma \in \tau \rbrace \).
The definition of the complexity (Definition 3.2) is adapted accordingly to take into account all execution traces of a state.
(Complexity of Non-deterministic Programs).
For a state \( \sigma \in \Sigma \), we denote by \( \mathit {comp}_{s}(\sigma) \) the maximal number of transitions that can be executed from \( \sigma \) before a terminal state is encountered. Formally, if there exists \( \tau \in T(\sigma) \) that does not include a terminal state (i.e., the procedure has an execution that does not terminate from \( \sigma \)), then \( \mathit {comp}_{s}(\sigma) = \infty \). Otherwise, \( \begin{equation*} \mathit {comp}_{s}(\sigma) = \max _{\tau \in T(\sigma)} \min _{k \in \mathbb {N}} \lbrace \tau _k \in F\rbrace . \end{equation*} \) As before, the complexity function of the program maps each initial state \( \sigma _0 \in \mathit {init} \) to its time complexity \( \mathit {comp}_{s}(\sigma _0) \in \mathbb {N}\cup \lbrace \infty \rbrace \).
The definition of complexity over ranks (Definition 3.4) carries over and requires no adaptation. To use squeezers as a mechanism for comparing high-rank traces to low-rank traces via a partitioned simulation, we need to take into account the property that a state in a non-deterministic transition system may have multiple outgoing traces. In this case, we need to ensure that every outgoing trace \( \tau \) of a high rank state is properly matched with some outgoing trace \( \tau ^{\prime } \) of the corresponding low-rank state. Note that we do not need to require a matching with every outgoing trace of the lower-rank state, since the complexity of the lower-rank state considers the maximum over all traces, including \( \tau ^{\prime } \). Since we are also interested in keeping track of the degree of stuttering (which determines the relation between the lengths of the traces and accordingly the complexities of the states), we first define a notion of stuttering traces. Roughly, a (higher-rank) trace of length h is \( \ell \)-stuttering if there exists a (lower-rank) trace of length \( \ell \) such that the first and last state of the traces are matched by the squeezer. We further require that the lower-rank trace does not reach a terminal state unless the higher-rank trace does.
(Stuttering Traces).
An h-step trace is a sequence \( \sigma _0 .. \sigma _{h} \) of \( h+1 \) states such that \( \langle \sigma _i, \sigma _{i+1}\rangle \in \mathit {TR} \) for every \( 0\le i \lt h \). Given a squeezer \( {\curlyvee } \), an h-step trace is \( \ell \)-stuttering if there is an \( \ell \)-step trace \( \sigma ^{\prime }_0..\sigma ^{\prime }_{\ell } \) such that (i) \( \sigma ^{\prime }_0= {\curlyvee }(\sigma _0) \) and \( \sigma ^{\prime }_{\ell } = {\curlyvee }(\sigma ^{\prime }_{h}) \), and (ii) for all \( i\lt \ell \), \( \sigma ^{\prime }_i\not\in F \). A (finite or infinite) trace \( \tau \) is \( (h,\ell) \)-stuttering if it has a prefix of length h that is \( \ell \)-stuttering.
Next, we refine the notion of stuttering states (Definition 3.7) to refer to a given trace on which the state resides. Last states require special treatment as before. Note that due to non-determinism, a state may be a last state in one trace but not in another.
(Stuttering States in a Trace).
Let \( {\curlyvee } \) be a squeezer and \( p_d \) a partition function. Let \( \tau \) be a trace and \( \sigma \in \Sigma \) a state that resides on \( \tau \). Denote by \( \tau ^\sigma \) the suffix of \( \tau \) that starts from \( \sigma \). (a) If \( \sigma \) is a non-last state in \( \tau \), then it is called \( (h,\ell) \)-stuttering in \( \tau \), for \( |\tau ^\sigma | \ge h\ge 1 \), \( h\ge \ell \ge 0 \), if \( \tau ^\sigma \) is \( (h,\ell) \)-stuttering; (b) if \( \sigma \) is a last state in \( \tau \), then it is said to be \( (1,0) \)-stuttering in \( \tau \) if \( \sigma \not\in F \) and \( {\curlyvee }(\sigma) \in F \), and otherwise it is \( (1,1) \)-stuttering in \( \tau \).
Note that when the transition relation \( \mathit {TR} \) is in fact a function, Definition 6.4 collapses into Definition 3.7. Having adapted the definition of stuttering states to be relative to a trace, we now adapt the definition of a partitioned simulation (Definition 3.8) to require that every outgoing trace of \( \sigma \) admits one of the stuttering behaviors in \( \lbrace (h_i,\ell _i)\rbrace _{i=1}^{n} \).
(Partitioned Simulation).
We say that a squeezer \( {\curlyvee }: \Sigma \rightarrow \Sigma \) forms a \( \lbrace (h_i,\ell _i)\rbrace _{i=1}^{n} \)-partitioned simulation according to \( p_d \), denoted \( {\curlyvee }\sim \mathbb {PS}_{p_d}{ { (}}{ { \lbrace }}(h_i,\ell _i){ { \rbrace }}_{i=1}^{n}{ {)}} \), if for every non-terminal reachable state \( \sigma \) and every outgoing trace \( \tau \) of \( \sigma \) we have that
The decomposition observation (Observation 1), which provides the foundation for our analysis, is refined to provide a bound on the length of each initial trace (as opposed to the complexity of an initial state, which corresponds to the maximum over all of its outgoing traces). The bound for trace \( \tau \) is given in terms of the complexities of lower-rank states that match the switch states along \( \tau \). Each trace may have different switch states, but their number is at most d (when a d-partition is considered). The decomposition observation is later used to provide a bound on the complexity of the initial states.
Let \( {\curlyvee }\sim \mathbb {PS}_{p_d}\big (\lbrace (h_i,\ell _i)\rbrace _{i=1}^{n}\big), \) and \( \widehat{k} \ge 1 \). Let \( {\mathbb {E}}_{\widehat{k}} \subseteq \lbrace 1,\ldots ,n\rbrace \) be the set of indices such that \( \tfrac{h_i}{\ell _i} \gt \widehat{k} \) (including indices where \( \ell _i = 0 \)). Then for every initial trace \( \tau , \) we have that \( \begin{equation*} |\tau | \le \sum _{\sigma \in \mathbb {S}_{p_d}(\tau)} \widehat{k}\cdot \mathit {comp}_{s}({\curlyvee }(\sigma)) + \sum _{i \in {\mathbb {E}}_{\widehat{k}}} \sum _{\sigma \in \mathbb {K}_i(\tau)} h_i - \ell _i {\cdot } \widehat{k,} \end{equation*} \) where \( \mathbb {S}_{p_d}(\tau) \) is the set of switch states in \( \tau \) and \( \mathbb {K}_i\big (\tau \big) \) is the multiset of \( (h_i,\ell _i) \)-stuttering states in \( \tau \).
As in the deterministic case, to make the leap from the complexity decomposition to a recurrence relation that captures the complexity by means of the rank, we use a rank bounding function. The requirement of a rank bounding function refers to all initial traces, as before (Definition 3.9); the only difference is that an initial state no longer induces a unique initial trace \( \tau (\sigma _0) \) but may be the source of multiple initial traces in \( T(\sigma _{0}) \):
Given r, \( {\curlyvee }, \) and \( p_{d} \) such that \( {\curlyvee }\sim \mathbb {PS}_{p_d} \), a function \( \hat{{\curlyvee }}: X \times \lbrace 1,\ldots ,d\rbrace \rightarrow X \) is a rank bounding function if for every \( \rho \in X - B \) and \( 1 \le i \le d \), if \( \tau \in T(\sigma _{0}) \) is an initial trace such that \( r(\sigma _0) = \rho \), and \( \sigma _{s} \in \mathbb {S}_{p_d}(\tau) \) is a switch state in \( \tau \) such that \( p_{d}(\sigma _{s}) = i \), the following holds: \( \begin{equation*} \text{ (i) upper bound: } r\big ({\curlyvee }(\sigma _s)\big) \preceq \hat{{\curlyvee }}(\rho ,i) \hspace{14.22636pt} \text{and} \hspace{14.22636pt} \text{ (ii) rank decrease: } \hat{{\curlyvee }}(\rho ,i) \prec \rho . \end{equation*} \)
With the adapted definitions, Theorem 3.10 and Corollary 3.11 are preserved, allowing us to extract recurrence relations for non-deterministic programs.
Establishing the Required Properties.
We now describe how to extend the results of Section 3.4 to the non-deterministic setting. Simulation property. With \( \mathit {TR} \) being a relation, we can no longer map a given \( \sigma \in \Sigma \) to a single outgoing trace and accordingly to a single state \( \mathit {tr}^{h_i}(\sigma) \) as in Section 3.7. Similarly, we cannot use \( \mathit {tr}^{\ell _i}\big ({\curlyvee }(\sigma)\big) \). Instead, we have to use Definitions 6.3 and 6.4, which mention trace prefixes of lengths \( h_i \) and \( \ell _i \) starting at \( \sigma \) and \( {\curlyvee }(\sigma) \), respectively. Validating the property that a state is \( (h_i,\ell _i) \)-stuttering in \( \tau \) for one of the pairs in \( \lbrace (h_i,\ell _i)\rbrace _i \) means introducing state variables for the intermediate states along sequences, and quantifying over them, as captured by the following formula. For uniformity of the presentation we use \( \sigma _0 \) to denote the state \( \sigma \) for which the stuttering requirement is formulated. We further denote \( h= \max _i{h_i} \) and \( \ell = \max _i{\ell _i} \): (8) \( \begin{equation} \begin{array}{l}\displaystyle \forall \sigma _1\cdots \,\sigma _{h}.~ \bigwedge _{0\le j\lt h}\!\!\mathit {TR}(\sigma _j,\sigma _{j+1}) \rightarrow \\ \qquad \qquad \exists \sigma ^{\prime }_0\cdots \,\sigma ^{\prime }_{\ell }.~ \displaystyle \bigvee _i~ \displaystyle \left(\sigma ^{\prime }_0={\curlyvee }(\sigma _0) \wedge \!\! \bigwedge _{0\le j\lt \ell _i^{\prime }}\!\! \mathit {TR}(\sigma ^{\prime }_j,\sigma ^{\prime }_{j+1}) ~~ \wedge ~ \sigma ^{\prime }_{\ell _i^{\prime }} = {\curlyvee }(\sigma _{h_i^{\prime }}) \right). \end{array} \end{equation} \) For simplicity of the presentation, the preceding formula encodes only requirement (i) of Definition 6.3. Requirements (ii) and (iii) are straightforward to add. Since \( (h_i,\ell _i) \) are known to the analysis the number of such variables is finite. This formulation does, however, introduce quantifier alternation that did not exist in the original, deterministic setting. For a system that uses SMT to verify properties, this may have adverse effects. Bound on the number of occurrences of stuttering states that exceed \( \widehat{k} \). We use the same approach to simplify the task of obtaining bounds \( b_i \) on the number of \( (h_i,\ell _i) \)-stuttering states where \( \frac{h_i}{\ell _i}\gt \widehat{k} \). Namely, we restrict such states \( \sigma \) to occur at most twice along each segment: at the beginning of the segment (meaning \( \sigma \) is a switch state) and at the end, totaling to at most 2d occurrences along each trace. The difference is that for non-deterministic programs, a state may have multiple outgoing traces and different stuttering behaviors in each of them. Therefore, to ensure that \( \sigma \) occurs at the end of the segments in all traces in which its stuttering behavior exceeds \( \widehat{k} \), we require that whenever an outgoing trace \( \tau \) of \( \sigma \) is \( (h_i,\ell _i) \)-stuttering for \( \frac{h_i}{\ell _i}\gt \widehat{k} \), the state at position \( h_i \) along \( \tau \) is a last state. The encoding of this requirement is added to Equation (8). Rank upper-bound property. This is, in fact, unchanged from our previous encoding. We had two cases in which rank bounds can be validated w.r.t. a given function \( \hat{{\curlyvee }}: X \rightarrow X \): a single segment (Lemma 3.12) and rank preservation (Lemma 3.14). Both still hold when non-deterministic programs are considered.
The program \( \texttt {Loopus2011_ex1} \) in Figure 11 demonstrates how a non-deterministic transition system may affect complexity analysis. The condition \( \texttt {nondet()} \)\( \texttt { > 0} \) may cause the inner loop to exit at any time, so it executes an arbitrary number of iterations between 0 and n. We show how our framework can handle reasoning about this loop. First, we transform the nested loop version to a single loop, shown to the right, using a standard transformation that introduces a program counter indicating whether the next iteration to execute is of the outer loop (\( \texttt {pos == 0} \)) or the inner loop (pos == 1). Next, we detail the rank, squeezer, and simulation parameters as used in Theorem 3.10. We denote a state by \( \sigma = \langle n,i,j,\text{$\texttt {pos}$}\rangle \). Then
Squeezer: \( {\curlyvee }(\sigma) = \langle n-1, i < n ~?~i: i-1, j, \text{$\texttt {pos}$}\rangle \)
Stuttering pairs: \( \lbrace (h_i, \ell _i)\rbrace _i = \lbrace (1,1), (2,2), (3, 0), (5, 1), (5, 3)\rbrace \)
Fig. 11. An example program with a nested loop and a non-deterministic choice (from Loopus [34]). On the right, the same nested loop is rewritten as a flat loop.
To demonstrate the simulation relation, we show that every reachable state is \( (h,\ell) \)-stuttering for one of the stuttering pairs defined previously. We split into three cases depending on the value of i:
\( i \lt n-1 \): Consider a state \( \sigma \) where \( i \lt n - 1 \), and let \( \sigma ^{\prime } = {\curlyvee }(\sigma) = \langle n-1, i, j, \text{$\texttt {pos}$}\rangle \). The only condition involving n in the program is \( \texttt {i < n} \), and that condition is true both in \( \sigma \) and in \( \sigma ^{\prime } \). As a result, if \( \sigma _1 \) is a successor of \( \sigma \) by executing command c, then executing c on \( \sigma ^{\prime } \) results in a state \( \sigma _1^{\prime } \) with the same mutation of \( \texttt {i,j,pos} \). Since \( i \lt n-1 \), this implies that \( {\curlyvee }(\sigma _1) = \sigma _1^{\prime } \), as required for \( (1,1) \)-stuttering.
\( i = n \): The case of a state \( \sigma \) where \( i = n \) is similar to the case where \( i \lt n-1 \): both \( \sigma \) and \( \sigma ^{\prime } = {\curlyvee }(\sigma) = \langle n-1, i-1, j, \text{$\texttt {pos}$}\rangle \) may execute the same command leading to \( \sigma _1 \) and \( \sigma _1^{\prime } = {\curlyvee }(\sigma _1), \) respectively. This holds since \( i -1 = n-1 \), and therefore the condition \( \texttt {i < n} \) is false in both \( \sigma \) and \( \sigma ^{\prime } \). This is consistent with \( (1,1) \)-stuttering. The only corner case occurs when \( \text{$\texttt {pos}$}=1 \) and \( j\gt 0 \): both instances execute the command \( \texttt {i--; pos = 0} \), leading to \( \sigma _1=\langle n,n-1,j,0\rangle \) and \( \sigma _1^{\prime }=\langle n-1,n-2,j,0\rangle \). At this point, \( {\curlyvee }(\sigma _1) = \langle n-1, \)\( n-1 \)\( ,j,0\rangle \ne \sigma _1^{\prime } \). One extra transition is required to synchronize the traces—observe the next step in each trace, \( \sigma _2=\langle n,n,j,0\rangle \) and \( \sigma _2^{\prime }=\langle n-1,n-1,j,0\rangle \). Now, \( {\curlyvee }(\sigma _2) = \sigma _2^{\prime } \), as needed, satisfying \( (2,2) \)-stuttering.
\( i = n-1 \): Consider the more subtle case of a state \( \sigma =\langle n, n-1, j, \text{$\texttt {pos}$}\rangle \). It is tricky because \( \texttt {i < n} \) is true in \( \sigma \) but false in \( \sigma ^{\prime } = {\curlyvee }(\sigma) = \langle n-1, n-1, j, \text{$\texttt {pos}$}\rangle \). Fortunately, \( \sigma \) and \( \sigma ^{\prime } \) are very close to the end of their respective traces, allowing more forms of \( (h,\ell) \)-stuttering. The choice of \( (h,\ell) \) depends on the values of j and \( \texttt {pos} \), as well as the non-determinisitic choice. A non-deterministic choice occurs when \( \text{$\texttt {pos}$}= 1 \); we use nd to denote the value chosen by \( \texttt {nondet()} \) in the current trace \( \tau \). The five cases are as follows:
The five cases are depicted in Figure 12. For ease of reading, the figure shows for each state the values \( \langle i, j\rangle \) with the value of \( \texttt {pos} \) on the outer rim. To make the diagram compact, we depict the property for \( n=5 \) (and in the squeezed trace, \( n=4 \)), but the same result is obtained if \( n, n-1, n-2 \) are used in place of \( 5, 4, 3 \) in all states. As can be seen, the possible stuttering pairs are \( (3,0) \), \( (5,1) \), \( (5,3) \), and \( (1,1) \). The cases that are not \( (1,1) \) end in a terminal state (\( \text{$\texttt {pos}$}= 0 \wedge i = n \)), as required by the definition.
With this, the recurrence relation obtained from 3.10 is \( \begin{equation*} \mathit {comp}_x(n) = \mathit {comp}_x(n - 1) + (3 + 4 + 2), \end{equation*} \) resulting in \( O(n) \) complexity bound.
Fig. 12. The five cases of stuttering occurring in Section 6.7 when \( \sigma =\langle n, n-1, j, \text{$\texttt {pos}$}\rangle \) . For presentation purposes, these are shown for \( n=5 \) , but the treatment is general. The pairs \( \langle i,j \rangle \) depict i and j in each state and the \( 0/1 \) values above/below display \( \text{$\texttt {pos}$} \) .
6.2 Pushing Non-determinism to the Inputs
An alternative approach to analyzing non-deterministic programs is via a reduction to deterministic programs with an extended input. Namely, given a non-deterministic program, we apply a rather standard transformation that introduces the following:
A new array input for every non-deterministic choice instruction in the program, and
An index variable for each array, initialized to 0.
Every non-deterministic instruction is replaced by an instruction that reads a value from the corresponding array and increments the index variable. The resulting program is deterministic and induces a deterministic transition system.
Let \( \Sigma \) be the set of states of the non-deterministic transition system that captures the semantics of the original program. Then the corresponding deterministic transition system is defined over an augmented set of states \( \Sigma ^{\prime } = \Sigma \times \Delta \), where \( \Delta \) is the auxiliary state storing the input arrays and their index variables. The initial states, \( \mathit {init}^{\prime } \), are augmented accordingly, where the auxiliary arrays may have an arbitrary content and the auxiliary index variables have value 0.
With this transformation, the complexity of an initial state \( \sigma _0 \in \mathit {init} \) in the original non-deterministic transition system (per Definition 6.2) is equal to the maximal complexity of an initial state \( \sigma _0^{\prime } \in \mathit {init}^{\prime } \) that augments \( \sigma _0 \) in the deterministic transition system (per Definition 3.2).
The analysis is applied to the deterministic transition system. Accordingly, all ingredients of the analysis are defined with respect to the augmented set of states \( \Sigma ^{\prime } \). In particular, the squeezer is defined over the augmented states (i.e., it squeezes both the original state and the auxiliary state), thus implicitly matching non-deterministic choices of the higher-rank trace to non-deterministic choices of the lower-rank trace. The operation of the squeezer on the auxiliary state may be understood as a Skolem function that takes the place of the \( \forall \exists \) quantifier alternation in the definition of partitioned simulation for non-deterministic transition systems (Equation (8)).
However, since we are interested in analyzing the worst-case complexity of the program over all non-deterministic choices (i.e., all values of the auxiliary input arrays), we do not allow the rank function \( r: \mathit {init}^{\prime } \rightarrow X \) to depend on the auxiliary state. Namely, we require that there exists a function \( r_{\Sigma } : \mathit {init}\rightarrow X \) such that for \( \sigma ^{\prime } \in \mathit {init}^{\prime } \), \( r(\sigma ^{\prime }) = r_{\Sigma }(\sigma ^{\prime }_{|\Sigma }) \), where \( \sigma ^{\prime }_{|\Sigma } \) denotes the projection of \( \sigma ^{\prime } \) to \( \Sigma \) (i.e., removing the auxiliary state). Combined with Definition 3.4, this restriction on r ensures that \( \mathit {comp}_x(\rho) \) provides an upper bound on the complexity of all initial states in \( \mathit {init}^{\prime } \) with rank \( \rho \) according to r, regardless of the non-deterministic choices, captured by the auxiliary state. As a result, \( \mathit {comp}_x(\rho) \), which is the target of the analysis, also provides an upper bound on the complexity of all initial states in the non-deterministic transition system with rank \( \rho \) according to \( r_\Sigma \).
.
Revisiting the example
In the program \( \texttt {Loopus2011_ex1} \), the extended state is now \( \sigma =\langle n, i, j, \text{$\texttt {pos}$}, \alpha , c_\alpha \rangle \). \( \alpha \) is an auxiliary array of \( {\color{blue} {\texttt {int}}} \) values, and \( c_\alpha \) is an index to \( \alpha \) that is incremented on every call to \( \texttt {nondet()} \). We now show that we can use the original definitions of Section 3 for the complexity analysis of the instrumented program.
All ingredients of Section 6.7 remain in place, except for \( {\curlyvee }, \) for which there is a trivial modification:
Squeezer: \( {\curlyvee }(\sigma) = \langle n-1, i \lt n ~?~i: i-1, j, \text{$\texttt {pos}$}, \alpha , c_\alpha \rangle . \)
In other words, \( {\curlyvee } \) has the same effect as before on the program variables and no effect at all on the auxiliary variables \( \alpha , c_\alpha \).
The stuttering scenarios shown in Figure 12 are still valid in this case. The only difference is that instead of a non-deterministic choice transition, symbolized by the edges labeled \( \mathit {nd}{\gt }0 \), \( \mathit {nd}{\not\gt }0 \), all transitions are deterministic and the value of \( \mathit {nd} \) is read from the auxiliary state, \( \alpha [c_\alpha ] \). Consequently, the simulation property still holds and the same complexity bound applies.
6.3 Discussion
We have presented two approaches for extending the complexity analysis to non-deterministic programs. The approach presented in Section 6.1 follows the more traditional approach of modeling non-determinism via a transition relation; however, the resulting simulation requirement is more involved and, in particular, exhibits quantifier alternation (see Equation (8)), which is challenging for automated solvers. The alternative approach that models non-determinism using additional inputs, presented in Section 6.2, shifts this burden into the squeezer: the squeezer has to match the non-deterministic choices of the higher-rank and lower-rank traces, much like a Skolem function.
Both approaches are only able to extract recurrence relations for programs where the worst-case time complexity bound does not depend on the non-deterministic choices. This results from the restrictions imposed on the rank. In the first approach, the rank that is used in the recurrence relations is the rank of the initial state. Hence, it may not reflect non-deterministic choices performed during the execution. Similarly, when non-determinism is modeled via auxiliary state, the rank is not permitted to depend on the auxiliary state. For example, the time complexity of the following program depends on the (unbounded) sequence of non-deterministically chosen values of the variable y, and no bound may be expressed in terms of x. Hence, since the rank is restricted to be defined over x, no squeezer, rank, and rank bounding functions that satisfy all requirements exist in either approach.
For example, we have the following program:
The total running time, although finite, is unbounded. The number of iterations of the inner loop depends on the non-deterministically selected values for
6.4 Evaluation
We evaluate our approach on non-deterministic programs using the encoding that pushes non-determinism to the inputs (Section 6.2). We use the benchmark suite of C4B [13], which includes seven non-deterministic programs. A comparison of our results to C4B, KoAT [11], Rank [6], Loopus [34], and SPEED [23] is presented in Table 2. The results reported for the external tools are taken from Carbonneaux et al. [13] since some of the tools do not take C programs as input, and for others, executables are not available.
Note: The dash (—) means that the tool failed to produce a result. The question mark (?) means that the corresponding data entry is not available.
\( \dagger \) A different squeezer exists that would allow discovering a bound of \( O(n^2) \), but that squeezer is not in the space explored by our synthesis pass. SPEED and C4B report a \( O(n) \) bound because, in this particular benchmark, they are set up to count the number of times a specific line, marked with
tick , executes; SqzComp lacks this feature.\( \dagger \!\dagger \) Requires running SqzComp twice; the second run is to bound the base case of the first run. This is currently a manual process.
Table 2. Empirical Results of Running SqzComp on the Non-deterministic Programs from the C4B Benchmarks Shown Alongside the Results Obtained from Previous Tools, Taken from Existing Publications
Note: The dash (—) means that the tool failed to produce a result. The question mark (?) means that the corresponding data entry is not available.
\( \dagger \) A different squeezer exists that would allow discovering a bound of \( O(n^2) \), but that squeezer is not in the space explored by our synthesis pass. SPEED and C4B report a \( O(n) \) bound because, in this particular benchmark, they are set up to count the number of times a specific line, marked with
tick , executes; SqzComp lacks this feature.\( \dagger \!\dagger \) Requires running SqzComp twice; the second run is to bound the base case of the first run. This is currently a manual process.
7 RELATED WORK
This section focuses on exploring existing methods for static complexity analysis of imperative programs. Dynamic profiling and analysis [32] are a separate research area, more related to testing, and generally do not provide formal guarantees. We further focus on works that determine asymptotic complexity bounds and use the number of iterations executed as their cost model; we refrain from thoroughly covering previous techniques that analyze complexity at the instruction level.
Static cost analysis. The seminal work of Wegbreit [35] defined a two-step meta-framework where recurrence relations are extracted from the underlying program, then analyzed to provide closed-form upper bounds. Broadly speaking, cost relations are a generalized framework that captures the essence of most of the works mentioned in this section.
COSTA [4] and CoFloCo [21] infer cost relations of imperative programs written in Java and C, respectively. Cost relations resemble somewhat limited C procedures: they are capable of recursive calls to other cost relations, and they can handle non-determinism that arises either as a consequence of direct
SPEED [23] uses multiple counter instrumentations that are automatically inserted in various points in the code, initialized, and incremented. These ghost counters enable to infer an overall complexity bound by applying appropriate abstract interpretation handling numeric domains. In the work of Gulwani et al. [22] and Gulwani and Zuleger [24], code transformations are applied to represent multi-path loops and nested loops in a canonical way. Then, paths connecting pairs of “interesting” code points \( \pi _{1},\pi _{2} \) (loop headers, etc.) are identified, in a way that satisfies some properties. For instance, \( \pi _{1} \) is reached twice without reaching \( \pi _{2} \). The path property induces progress invariants, which are then analyzed to infer the overall complexity bound.
In the work of Lee et al. [30], an abstraction of the program to a size-change graph is defined, where transition edges of the control flow graph are annotated to capture sound overapproximation relations between integer variables. The graph is then searched for infinitely decreasing sequences, represented as words in an \( \omega \)-regular language. This representation concisely characterizes program termination. The analysis of Zuleger et al. [37] then harnesses the size-change abstraction from Lee et al. [30] to analyze the complexity of imperative programs. First, they apply standard program transformations like path-wise analysis to summarize inner nested loops. Then, they heuristically define a set of scalar rank functions they call norms. These norms are somewhat similar to our rank function in the sense that they help to abstract away program parts that do not effect its complexity. The program is then represented as a size-change graph, and multi-path contextualization [31] prunes subsequent transitions that are infeasible.
In the work of Ben-Amram [9], difference constraints are introduced in the context of termination, to bound variables \( x^{\prime } \) in current iteration with some y in previous iteration plus some constant c: \( x^{\prime } \le y + c \). Loopus [34] extends difference constraints to complexity analysis. Indeed, it is quite often the case that ideas from the area of program termination are assimilated in the context of complexity analysis and vice versa. They exploit the observation that typical operations on loop counters like increment, decrement, and resets are essentially expressible as difference constraints. They design an abstraction based on the domain of difference constraints and obtain relevant invariants which are then used in determining upper bounds. KoAT [11] is very similar, only it represents a program as an integer transition system and allows non-linear numerical constraints and ranking functions.
In the work of Winkler and Moser [36], an approach is presented for analyzing the runtime complexity of Logically Constrained Rewrite Systems (LCTRSs). They use the dependency graph, which describes the dependencies between rules in a rewrite system, to decompose the analysis, and apply several techniques to modularly extract expressions that bound variables that appear in the head of rules in terms of their initial values. Ultimately this results in recurrence relations that provide bounds on the worst-case time complexity of the rewrite system.
As we mentioned earlier, all of these approaches are based on identifying the progress of executions over time, characterizing the progress between two given points in the program. In contrast, our approach allows to reason over state size and compares whole executions.
Squeezers. The notion of squeezers was introduced in our earlier work [27] for the sake of safety verification. Similarly to the use of squeezers, Diffy [14, 15] uses difference relations to verify the safety of parameterized array manipulating programs by comparing an instance of the program to an instance with a smaller value of the parameter. As discussed in Section 1, the challenges in complexity analysis are different and require additional ingredients beyond squeezers (or difference relations). In other works [1, 2, 20], well-structured transition systems are introduced, where a well-quasi order (wqo) on the set of states induces a simulation relation. This property ensures decidability of safety verification of such systems (via a backward reachability algorithm). Our use of squeezers that decrease the rank of a state and induce a sort of a simulation relation may resemble the wqo of a well-structured transition system. However, there are several key differences: we do not require the order (which is defined on ranks) to be a wqo. Further, we do not require a simulation relation between any states whose ranks are ordered but only between a state and its squeezed counterpart. Notably, our work considers complexity analysis rather than safety verification.
8 CONCLUSION
This work introduces a novel framework for runtime complexity analysis. The framework supports derivation of recurrence relations based on inductive reasoning, where the form of induction depends on the choice of a squeezer (and rank bounding function). The new approach thus offers more flexibility than the classical methods where induction is coupled with the time dimension. For example, when the rank captures the “state size,” the approach mimics induction over the space dimension, reasoning about whole traces, and alleviating the need to describe the intricate development of states over time. We demonstrate that such squeezers and rank bounding functions, which we manage to synthesize automatically, facilitate complexity analysis for programs that are beyond reach for existing methods. Thanks to the simplicity and compactness of these ingredients, even a rather naive enumeration was able to find them efficiently.
Footnotes
1 In this article, we focus on the extraction of recurrence relations rather than solving them automatically. In particular, in our experiments, recurrence relations are solved manually.
Footnote2 https://knowyourmeme.com/memes/all-your-base-are-belong-to-us.
Footnote3 In fact, case (a(i)) of Definition 3.7 can be relaxed to \( \mathit {tr}^\ell ({\curlyvee }(\sigma)) = {\curlyvee }(\mathit {tr}^{h}(\sigma)) \) or \( \mathit {tr}^{h}(\sigma) \) is a last state; this is sound because when \( \mathit {tr}^{h}(\sigma) \) is a last state, the higher-rank segment ends, so continuing the simulation is not needed.
Footnote4 In fact, it suffices to consider \( \widehat{\mathit {init}} = \mathit {reach} \), in which case we may be able to take advantage of information from static analyses.
Footnote
- [1] . 1996. General decidability theorems for infinite-state systems. In Proceedings of the 1996 11th Annual IEEE Symposium on Logic in Computer Science. IEEE, Los Alamitos, CA, 313–321.Google Scholar
Cross Ref
- [2] . 2000. Algorithmic analysis of programs with well quasi-ordered domains. Inf. Comput. 160, 1–2 (2000), 109–127.Google Scholar
Cross Ref
- [3] . 2008. Automatic inference of upper bounds for recurrence relations in cost analysis. In Static Analysis, and (Eds.). Springer, Berlin, Germany, 221–237.Google Scholar
- [4] . 2007. COSTA: Design and implementation of a cost and termination analyzer for Java bytecode. In Proceedings of the 6th International Symposium on Formal Methods for Components and Objects (FMCO’07): Revised Lectures. 113–132.Google Scholar
- [5] . 2019. Resource analysis driven by (conditional) termination proofs. Theory Pract. Log. Program. 19, 5–6 (2019), 722–739. Google Scholar
Cross Ref
- [6] . 2010. Multi-dimensional rankings, program termination, and complexity bounds of flowchart programs. In Proceedings of the International Static Analysis Symposium. 117–133.Google Scholar
Cross Ref
- [7] . 2012. On the limits of the classical approach to cost analysis. In Static Analysis, and (Eds.). Springer, Berlin, Germany, 405–421.Google Scholar
- [8] . 2015. Syntax-guided synthesis. In Dependable Software Systems Engineering, , , and (Eds.).
NATO Science for Peace and Security Series, D: Information and Communication Security , Vol. 40. IOS Press, 1–25.Google Scholar - [9] . 2008. Size-change termination with difference constraints. ACM Trans. Program. Lang. Syst. 30, 3 (May 2008), Article 16, 31 pages.Google Scholar
Digital Library
- [10] . 2020. Templates and recurrences: Better together. In Proceedings of the 41st ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI’20). ACM, New York, NY, 688–702.Google Scholar
Digital Library
- [11] . 2014. Alternating runtime and size complexity analysis of integer programs. In Tools and Algorithms for the Construction and Analysis of Systems. Lecture Notes in Computer Science, Vol. 8413. Springer, 140–155.Google Scholar
- [12] . 2008. KLEE: Unassisted and automatic generation of high-coverage tests for complex systems programs. In Proceedings of the 8th USENIX Conference on Operating Systems Design and Implementation (OSDI’08). 209–224. http://dl.acm.org/citation.cfm?id=1855741.1855756Google Scholar
Digital Library
- [13] . 2015. Compositional certified resource bounds. In Proceedings of the 36th ACM SIGPLAN Conference on Programming Language Design and Implementation. 467–478.Google Scholar
Digital Library
- [14] . 2020. Verifying array manipulating programs with full-program induction. In Proceedings of the 26th International Conference on Tools and Algorithms for the Construction and Analysis of Systems (TACAS’20). 22–39.Google Scholar
- [15] . 2021. Diffy: Inductive reasoning of array programs using difference invariants. In Proceedings of the 33rd International Conference on Computer Aided Verification (CAV’21): Part II. 911–935.Google Scholar
- [16] . 2011. QuickCheck: A lightweight tool for random testing of Haskell programs. ACM SIGPLAN Not. 46, 4 (2011), 53–64.Google Scholar
Digital Library
- [17] . 1978. Automatic discovery of linear restraints among variables of a program. In Proceedings of the 5th ACM SIGACT-SIGPLAN Symposium on Principles of Programming Languages (POPL’78). ACM, New York, NY, 84–96.Google Scholar
Digital Library
- [18] . 2008. Z3: An efficient SMT solver. In Tools and Algorithms for the Construction and Analysis of Systems. Lecture Notes in Computer Science, Vol. 4963. Springer, 337–340.Google Scholar
- [19] . 1993. Cost analysis of logic programs. ACM Trans. Program. Lang. Syst. 15, 5 (
Nov. 1993), 826–875.Google ScholarDigital Library
- [20] . 1998. Well-structured transition systems everywhere!Theor. Comput. Sci. 256, 1 (1998), 2001.Google Scholar
- [21] . 2016. Upper and lower amortized cost bounds of programs expressed as cost relations. In FM 2016: Formal Methods. Lecture Notes in Computer Science, Vol. 9995. Springer, 254–273.Google Scholar
- [22] . 2009. Control-flow refinement and progress invariants for bound analysis. In Proceedings of the 30th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI’09). ACM, New York, NY, 375–385.Google Scholar
Digital Library
- [23] . 2009. SPEED: Precise and efficient static estimation of program computational complexity. ACM SIGPLAN Not. 44, 1 (2009), 127–139. I http://dblp.uni-trier.de/db/conf/popl/popl2009.html#GulwaniMC09.Google Scholar
- [24] . 2010. The reachability-bound problem. In Proceedings of the 2010 ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI’10). ACM, New York, NY, 292–304. Google Scholar
Digital Library
- [25] . 2012. Resource aware ML. In Computer Aided Verification. Lecture Notes in Computer Science, Vol. 7358. Springer, 781–786.Google Scholar
- [26] . 2010. Amortized resource analysis with polynomial potential: A static inference of polynomial bounds for functional programs (extended version). In Proceedings of the 19th European Conference on Programming Languages and Systems.Google Scholar
- [27] . 2020. Putting the squeeze on array programs: Loop verification via inductive rank reduction. In Verification, Model Checking, and Abstract Interpretation. Lecture Notes in Computer Science, Vol. 11990. Springer, 112–135.Google Scholar
- [28] . 2021. Run-time complexity bounds using squeezers. In Programming Languages and Systems. Lecture Notes in Computer Science, Vol. 12648. Springer, 320–347. Google Scholar
Digital Library
- [29] . 2016. SMT-based model checking for recursive programs. Formal Methods Syst. Des. 48, 3 (2016), 175–205. Google Scholar
Digital Library
- [30] . 2001. The size-change principle for program termination. In Proceedings of the 28th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL’01). ACM, New York, NY, 81–92.Google Scholar
Digital Library
- [31] . 2006. Termination analysis with calling context graphs. In Computer Aided Verification, and (Eds.). Springer, Berlin, Germany, 401–414.Google Scholar
- [32] . 2007. Combining static analysis and profiling for estimating execution times. In Proceedings of the International Symposium on Practical Aspects of Declarative Languages. 140–154.Google Scholar
- [33] . 2009. ScalaCheck: Property-Based Testing for Scala. Retrieved April 5, 2022 from https://www.scalacheck.org.Google Scholar
- [34] . 2017. Complexity and resource bound analysis of imperative programs using difference constraints. J. Autom. Reasoning 59, 1 (2017), 3–45.Google Scholar
Digital Library
- [35] . 1975. Mechanical program analysis. Commun. ACM 18, 9 (
Sept. 1975), 528–539.Google ScholarDigital Library
- [36] . 2020. Runtime complexity analysis of logically constrained rewriting. In Proceedings of the 30th International Symposium on Logic-Based Program Synthesis and Transformation (LOPSTR’20). 37–55.Google Scholar
- [37] . 2011. Bound analysis of imperative programs with the size-change abstraction. In Static Analysis, (Ed.). Springer, Berlin, Germany, 280–297.Google Scholar
Index Terms
Runtime Complexity Bounds Using Squeezers
Recommendations
Run-time Complexity Bounds Using Squeezers
Programming Languages and SystemsAbstractDetermining upper bounds on the time complexity of a program is a fundamental problem with a variety of applications, such as performance debugging, resource certification, and compile-time optimizations. Automated techniques for cost analysis ...
Lower bounds using kolmogorov complexity
CiE'06: Proceedings of the Second conference on Computability in Europe: logical Approaches to Computational BarriersIn this paper, we survey a few recent applications of Kolmogorov complexity to lower bounds in several models of computation. We consider KI complexity of Boolean functions, which gives the complexity of finding a bit where inputs differ, for pairs of ...
Lower Bounds for Runtime Complexity of Term Rewriting
We present the first approach to deduce lower bounds for (worst-case) runtime complexity of term rewrite systems (TRSs) automatically. Inferring lower runtime bounds is useful to detect bugs and to complement existing methods that compute upper ...



















Comments