Abstract
Program sketching is a program synthesis paradigm in which the programmer provides a partial program with holes and assertions. The goal of the synthesizer is to automatically find integer values for the holes so that the resulting program satisfies the assertions. The most popular sketching tool, Sketch, can efficiently solve complex program sketches but uses an integer encoding that often performs poorly if the sketched program manipulates large integer values. In this article, we propose a new solving technique that allows Sketch to handle large integer values while retaining its integer encoding. Our technique uses a result from number theory, the Chinese Remainder Theorem, to rewrite program sketches to only track the remainders of certain variable values with respect to several prime numbers. We prove that our transformation is sound and the encoding of the resulting programs are exponentially more succinct than existing Sketch encodings. We evaluate our technique on a variety of benchmarks manipulating large integer values. Our technique provides speedups against both existing Sketch solvers and can solve benchmarks that existing Sketch solvers cannot handle.
1 INTRODUCTION
Program synthesis, the art of automatically generating programs that meet a user’s intent, promises to increase the productivity of programmers by automating tedious, error-prone, and time-consuming tasks. Syntax-guided Synthesis (SyGuS) [2], where the search space of possible programs is defined using a grammar or a domain-specific language, has emerged as a common program synthesis paradigm for many synthesis domains. One of the earliest and successful syntax-guided program synthesis frameworks is program sketching [22], where (i) the search space of the synthesis problem is described using a partial program in which certain integer constants are left unspecified (represented as holes) and (ii) the specification is provided as a set of assertions describing the intended behavior of the program. The goal of the synthesizer is to automatically replace the holes in the program with integer values so that the resulting complete program satisfies all the assertions. Thanks to its simplicity, program sketching has found wide adoption in applications such as data-structure design [23], personalized education [21], program repair [7], and many others.
The most popular sketching tool, Sketch [24], can efficiently solve complex program sketches with hundreds of lines of code. However, Sketch often performs poorly if the sketched program manipulates large integer values. Sketch’s synthesis is based on an algorithm called counterexample-guided inductive synthesis (Cegis) [24]. The Cegis algorithm iteratively considers a finite set \( I \) of inputs for the program and performs SAT queries to identify values for the holes so that the resulting program satisfies all the assertions for the inputs in \( I \). Further SAT queries are then used to verify whether the generated solution is correct on all the possible inputs of the program. Sketch represents integers using a unary encoding (a variable for each integer value) so that arithmetic computations such as addition, multiplication, and so on, can be represented efficiently in the SAT formulas as lookup operations. This unary encoding, however, results in huge formulas for solving sketches with larger integer values as we also observe in our evaluation. Recently, a technique that extends the SAT solver with native integer variables and integer constraints was proposed to alleviate this issue in Sketch. It guesses values for the integer variables and propagates them through the integer constraints and learns from conflict clauses. However, this technique does not scale well when the sketches contain complex arithmetic operations, e.g., non-linear integer arithmetic.
In this article, we propose a program transformation technique that allows Sketch to solve program sketches involving large integer values while retaining the unary encoding used by the traditional Sketch solver. Our technique rewrites a Sketch program into an equivalent one that performs computations over smaller values. The technique is based on the well-known Chinese Remainder Theorem, which states that, given distinct prime numbers \( p_1, \ldots , p_n \) such that \( N=p_1\cdot \ldots \cdot p_n \), for every two distinct numbers \( 0\le k_1,k_2\lt N \), there exists a \( p_i \) such that \( k_1 \ \mathrm{mod}\ p_i \ne k_2 \ \mathrm{mod}\ p_i \). Intuitively, this theorem states that tracking the modular values of a number smaller than \( N \) for each \( p_i \) is enough to uniquely recover the actual value of the number itself. We use this idea to replace a variable \( x \) in the program with \( n \) variables \( x_{p_1},\ldots ,x_{p_n} \), so that for every \( i \), \( x_{p_i}=x \ \mathrm{mod}\ p_i \). Using closure properties of modular arithmetic we show that, as long as the program uses the operators \( +,-,*,== \), tracking the modular values of variables and performing the corresponding operations on such values is enough to ensure correctness. For example, to reflect the variable assignment \( x=y+z \), we perform the assignment \( x_{p_i}=(y_{p_i}+z_{p_i})\ \mathrm{mod}\ p_i \), for every \( p_i \). Similarly, the Boolean operation \( x==y \) will only hold if \( x_{p_i}=y_{p_i} \) for every \( p_i \). To identify what variables and values in the program can be rewritten, we develop a dataflow analysis that computes what variables may flow into operations that are not sound in modular arithmetic, e.g., \( \lt ,\gt ,\le \), and \( / \).
We provide a comprehensive theoretical analysis of the complexity of the proposed transformation. First, we derive how many prime numbers are needed to track values in a certain integer range. Second, we analyze the number of bits required to encode values in the original and rewritten program and show that, for the unary encoding used by Sketch, our technique offers an exponential saving in the number of required bits. We also present an incremental algorithm that lazily increases the number of primes (and therefore the integer range) used in the modular semantics.
We evaluate our technique on 181 benchmarks from various applications of program sketching. Our results show that our technique results in significant speedups over existing Sketch solvers and is able to solve 48 benchmarks on which Sketch times out.
Contributions. In summary, our contributions are as follows:
A language
IMP-MOD together with a modular semantics that represents integer values using their remainders for a given set of primes and a proof that this semantics is equivalent to the standard integer semantics (Section 4).A dataflow analysis for detecting variables that can be soundly executed in the modular semantics and an algorithm for translating
IMP programs intoIMP-MOD ones (Section 5).A synthesis algorithm for
IMP-MOD programs and incremental synthesis algorithm that lazily increases the number of primes used in the modular semantics (Section 6).A complexity analysis that shows that synthesis for
IMP-MOD programs requires exponentially smaller SAT queries than synthesis inIMP (Section 7).An evaluation of our technique on 181 benchmarks that manipulate large integer values. Our solver outperforms the default Sketch unary solver, it can solve 48 new benchmarks that no Sketch solver can solve, and it is 15.9\( \times \) faster than the Sketch Native-Ints integer solver on the hard benchmarks that take more than 10 seconds to solve (Section 8).
This article is an extended version with proofs and additional examples of the short version of the article with the same title published at ESOP20 [17],
2 MOTIVATING EXAMPLE
In this section, we use a simple example to illustrate our technique and its effectiveness. Consider the Sketch program
Fig. 1. Sketch program (a) and rewritten version with values tracked for different moduli (b).
Solving the problem amounts to finding non-negative integer values for the holes (
When attempting to solve this problem, the Sketch synthesizer times out at 300 seconds. To solve this problem, Sketch creates SAT queries where the variables are the holes. Due to the large numbers involved in the computation of this program (including intermediate expression computations), the unary encoding of Sketch ends up with SAT formulas with approximately 45 million clauses.
Sketch Program with Modular Arithmetic. The technique we propose in this article has the goal of reducing the complexity of the synthesis problem by transforming the program into an equivalent one that manipulates smaller integer values and that yields easier SAT queries. Given the Sketch program in Figure 1(b), our technique produces the modified Sketch program
The program
In the rewritten program, the variables
Sketch can solve the rewritten program in less than 2 seconds and produce hole values that are correct solutions for the original program. This speedup is due to the small integer values manipulated by the modular computations. In fact, the intermediate SAT formulas generated by Sketch for the program
While this technique is quite powerful, it does have some limitations. In particular, the solution to the rewritten Sketch is guaranteed to be a correct solution only for inputs that cause intermediate values of the program to be in a range \( [d_1,d_2] \) such that \( d_2-d_1\le 2\times 3\times 5\times 7\times 11\times 13\times 17=510,510 \). We will prove this result in Section 4.
3 PRELIMINARIES
In this section, we describe the
3.1 IMP Language for Sketching
For simplicity, we consider a simple imperative language
Fig. 2. The syntax for a simple imperative language IMP with integer hole values.
Fig. 3. Semantics of IMP. Valuations \( \sigma \) and \( \sigma _H \) assign integer values to variables and holes, respectively.
An example
\( \begin{equation*} \mathtt{triple(n,h,??){ h=??; assert h*n==n+n+n; }} \end{equation*} \)
The goal of the synthesizer is to compute the value of the hole
3.2 Solving IMP Sketches
The Sketch solver uses the counter-example guided inductive synthesis algorithm (Cegis) to find hole values such that the desired assertions hold for all input values. Formally, the Sketch synthesizer solves the following constraint:
\( \begin{equation*} \exists \vec{\texttt {??}}\equiv (\texttt {??}_1, {\ldots }, \texttt {??}_m) {\in } \mathbb {Z}^m.~\forall {in}{\in } \mathcal {I}.~ [\![ f\texttt {(}in,\vec{\texttt {??}}\texttt {)} ]\!] ^\texttt {IMP} \ne \bot , \end{equation*} \) where \( \mathbb {Z} \) denotes the domain of all integer values, \( \vec{\texttt {??}} \) denotes the list of unknown hole values \( (\texttt {??}_1, \ldots , \texttt {??}_m) \in \mathbb {Z}^m \), \( \mathcal {I} \) denotes the domain of all input argument values to the function \( f \), and \( [\![ f\texttt {(}in,\vec{\texttt {??}}\texttt {)} ]\!] ^\texttt {IMP} \ne \bot \) denotes that the program satisfies all assertions. The synthesis problem is in general undecidable for a language with complex operations such as the
The bounded domains make the synthesis problem decidable, but the formula with nested quantifiers results in a search space of hole values that is still huge for any reasonable bounds. To solve such bounded equations efficiently, Sketch uses the Cegis algorithm to incrementally add inputs from the domain until obtaining hole values \( \vec{\texttt {??}} \) that satisfy the assertion predicates for all the input values in the bounded domain. The algorithm solves the formula with two quantifiers by iteratively solving a series of first-order queries with a single quantifier. It first encodes the existential query (synthesis query) over a randomly selected input value \( in_0 \) to find the hole values \( \vec{H} \) that satisfy the predicate for \( in_0 \) using a SAT solver in the backend, \( \begin{equation*} \exists \vec{\texttt {??}}\equiv (\texttt {??}_1, {\ldots }, \texttt {??}_m) \in \mathbb {Z}_b^m.~[\![ f\texttt {(}in_0,\vec{\texttt {??}}\texttt {)} ]\!] ^\texttt {IMP} \ne \bot . \end{equation*} \) It then encodes another existential query (verification) to now find a counter-example \( in_1 \) for which the predicate is not satisfied for the previously found hole values, \( \begin{equation*} \exists {in} \in \mathcal {I}_b.~ \lnot [\![ f\texttt {(}in,\vec{H}\texttt {)} ]\!] ^\texttt {IMP} \ne \bot . \end{equation*} \) If no counter-example input can be found, then the hole values are returned as the desired solution. Otherwise, the algorithm computes a new hole value that satisfies the assertion for all the counter-example inputs found so far. This process continues iteratively until either a desired hole value is found (i.e., no counter-example input exists), no satisfiable hole value is found (i.e., the synthesis problem is infeasible), or the SAT solver times out.
Integer Encoding. The Sketch solver can efficiently solve the synthesis constraint in many domains, but it does not scale well for sketches manipulating large numbers. Sketch uses a unary encoding to represent integers, where the encoded formula consists of a variable for each integer value. The unary encoding allows for simplifying the representation of complex non-linear arithmetic operations. For example, a multiplication operation can be represented as simply a lookup table using this encoding. In practice, the unary encoding is efficient for many practical problems. However, this also results in huge SAT formulas in presence of large integers. Recently, a new technique based on extending the SAT solver with native integer variables and constraints was proposed to alleviate this issue in Sketch (flag
4 MODULAR ARITHMETIC SEMANTICS
In this section, we present the language
4.1 The Chinese Remainder Theorem
The Chinese Remainder Theorem is a powerful number theory result that shows the following: Given a set of distinct primes \( \mathbb {P}=\lbrace p_1,\ldots ,p_k\rbrace \), any number \( n \) in an interval of size \( p_1\cdot \ldots \cdot p_k \) can be uniquely identified from the remainders \( [n\ \mathrm{mod}\ p_1,\ldots ,n\ \mathrm{mod}\ p_k] \). In Section 4.2, we will use this idea to define the semantics of the
For \( \mathbb {P}=[3,5,7] \) and an integer 101, its remainders \( [2,1,3] \) are much smaller than 101. However, any number of the form \( 101+105\times n \) also has remainders \( [2,1,3] \) with respect to the same prime set.
In general, one cannot uniquely determine an arbitrary integer value from its remainders for some set \( \mathbb {P} \), i.e., the mapping from a number to its remainders is an abstraction in the sense of abstract interpretation [6]. However, if we are interested in a limited range of integer values \( [L,U) \), then one can choose a set of primes \( \mathbb {P}=\lbrace p_1,\ldots ,p_k\rbrace \) such that, for values \( L\le x\lt U \), the map \( [r_1,\ldots ,r_k] \mapsto x \), where \( x\equiv r_i \ \mathrm{mod}\ p_i \), is an injection.
Let \( p_1 \), ..., \( p_k \) be positive integers that are pairwise co-prime, i.e., no two numbers share a divisor larger than 1. Denote \( N=\prod _{i=1}^kp_i \), and let \( d \), \( r_1 \), \( r_2 \), ..., \( r_k \) be any integers. Then there is one and only one integer \( d\le x \lt d+N \) such that \( x \equiv r_i \ \mathrm{mod}\ p_i \) for every \( 1\le i \le k \).
We define the translation function \( m_\mathbb {P} (x):=[x\ \mathrm{mod}\ p_i,\ldots ,x\ \mathrm{mod}\ p_k] \) that maps an integer to its set of remainders with respect to \( \mathbb {P} \). The following corollary follows from Theorem 4.2.
Let \( \mathbb {P}=[p_1, \ldots ,p_k] \) be a set of distinct primes such that \( p_1\cdot \ldots \cdot p_k=N \). For every integer \( d \), the map \( m_\mathbb {P} (x):[d,d+N)\rightarrow [0,p_1)\times \cdots \times [0,p_k) \) is a bijection.
In Section 4.2, we will use Corollary 4.3 to relate the semantics of
Let \( x \) be a integer in the range \( [0,105) \) (note that \( 105=3\times 5\times 7 \)). If we know that the value of \( x \) is congruent to \( [2,1,3] \) modulo \( \lbrace 3,5,7\rbrace \), then we can uniquely identify the value of \( x \) to be 101 by observing that \( 101\equiv 2\ \mathrm{mod}\ 3,\ 101\equiv 1\ \mathrm{mod}\ 5,\ \text{and}\ 101\equiv 3\ \mathrm{mod}\ 7. \)
The following lemma shows that the function \( m_\mathbb {P} \) is closed under addition, subtraction, and multiplication of integers.
4.2 The IMP-MOD Language
In this section, we define the
Fig. 4. Syntax of the IMP-MOD language.
Fig. 5. Modular semantics.
The key idea of the modular semantics is that the value of each program variable in \( v^\mathbb {P} \) and arithmetic expressions in \( a^\mathbb {P} \) is denoted by a tuple of values, one for each prime number \( p_i \in \mathbb {P} \). For example, the value of the constant \( c^\mathbb {P} \) is represented by the tuple \( [c\ \mathrm{mod}\ p_1,\ldots ,c\ \mathrm{mod}\ p_k] \), where each individual value denotes the remainder of \( c \) when divided by the prime number \( p_i \in \mathbb {P} \). Formally, the program \( f \) has two sets of variables \( V^\mathbb {Z}=\lbrace v_1,\ldots , v_n\rbrace \) and \( V^\mathbb {P}=\lbrace v_1^\mathbb {P},\ldots , v_m^\mathbb {P}\rbrace \), which contain all the integer and prime variables, respectively, and a set of holes \( H=\lbrace \texttt {??}_1, \ldots ,\texttt {??}_k\rbrace \). The denotation function, uses two valuation functions: (i) \( \sigma :V^\mathbb {Z}\cup H\rightarrow \mathbb {Z} \), which maps variables and holes to integer values, (ii) \( \sigma ^\mathbb {P}:V^\mathbb {P}\rightarrow [0,p_1)\times \cdots \times [0,p_k) \), which maps primed variables to modular values. The expression toPrime(a) converts the integer value of an integer expression \( a \) to a modular tuple. Arithmetic expressions in \( a^\mathbb {P} \) are computed using modular values with the result being obtained using modular arithmetic with respect to the corresponding primes in \( \mathbb {P} \). Note that the only comparison operator allowed over modular expressions is \( == \) and that the division operator cannot be applied to modular expressions. While the syntax does not directly allow for holes to be represented modularly (i.e., we do not have expressions of the form \( \texttt {??}^\mathbb {P} \)) an expression of the form toPrime\( \texttt ({??}) \) effectively achieves the objective of representing a hole \( \texttt {??} \) modularly.
4.3 Equivalence between the Two Semantics
Next, we provide an alternative integer semantics, which applies the
Integer Semantics. The integer semantics of
Fig. 6. Integer semantics.
Relation between the Two Semantics. We now show that the modular semantics is, in some sense, equivalent to the integer semantics. For the rest of this section, we fix a set of distinct primes \( \mathbb {P}=\lbrace p_1,\ldots ,p_k\rbrace \).
To prove the equivalence of the two program semantics, we will require the values of modular expressions to lie in some range that is covered by the prime numbers in \( \mathbb {P} \). The following definition captures this restriction.
Given a modular arithmetic expression \( a^\mathbb {P} \) (respectively, Boolean expression \( b \)) and some integers \( L\lt U \), we say \( a^\mathbb {P} \) with context \( (\sigma _1,\sigma _2) \) is uniformly in the range \( R:=[L,U) \) —\( a^\mathbb {P}\in _{\sigma _1,\sigma _2}R \) for short if, under the integer semantics, all evaluation of modular subexpressions of \( a^\mathbb {P} \) (respectively, \( b \)) are in the range \( R \),
\( a^\mathbb {P}\in _{\sigma _1,\sigma _2}R \), iff \( [\![ a^\mathbb {P} ]\!] _{\sigma _1,\sigma _2}\in R \);
\( a_1^\mathbb {P}~==~a_2^\mathbb {P}\in _{\sigma _1,\sigma _2}R \), iff \( a_1^\mathbb {P}\in _{\sigma _1,\sigma _2}R \), \( a_2^\mathbb {P}\in _{\sigma _1,\sigma _2}R \);
\( b_1~\texttt {and}~b_2\in _{\sigma _1,\sigma _2}R \), iff \( b_1\in _{\sigma _1,\sigma _2}R \), \( b_2\in _{\sigma _1,\sigma _2}R \);
\( b_1~\texttt {or}~b_2\in _{\sigma _1,\sigma _2}R \), iff \( b_1\in _{\sigma _1,\sigma _2}R \), \( b_2\in _{\sigma _1,\sigma _2}R \);
\( \texttt {not}~b\in _{\sigma _1,\sigma _2}R \), iff \( b\in _{\sigma _1,\sigma _2}R \);
\( a_1~op_c~a_2\in _{\sigma _1,\sigma _2}R \) for any arithmetic expressions \( a_1,\ a_2 \) and operator \( op_c \).
Given a valuation function \( \sigma :V^\mathbb {P}\mapsto \mathbb {Z} \), we write \( m_\mathbb {P} \circ \sigma \) to denote the modular valuation obtained by applying the \( m_\mathbb {P} \) function to \( \sigma \), i.e., for every \( v^\mathbb {P}\in V^\mathbb {P} \), \( (m_\mathbb {P} \circ \sigma)(v^\mathbb {P})=m_\mathbb {P} (\sigma (v^\mathbb {P})) \). Similarly, for a modular valuation function \( \sigma ^\mathbb {P}:V^\mathbb {P}\rightarrow [0,p_1)\times \cdots [0,p_k) \), we denote \( m_\mathbb {P} ^{-1,R}\circ \sigma ^\mathbb {P} \) the integer valuation from \( V^\mathbb {P} \) to \( R \) such that, for every \( v^\mathbb {P}\in V^\mathbb {P} \), \( (m_\mathbb {P} ^{-1,R}\circ \sigma ^\mathbb {P})(v^\mathbb {P})=m_\mathbb {P} ^{-1,R}(\sigma ^\mathbb {P}(v^\mathbb {P})) \). The following lemma shows that, when the values of modular arithmetic expressions lay in an interval of size \( N=p_1\cdot \ldots \cdot p_k \) the modular and integer semantics of modular arithmetic expressions are equivalent. In practice, the assumption that numbers need to lie in a given range is not a limitation, since one can choose enough prime numbers to cover any practical range. We will show in Section 8 how the choice of prime numbers affects the efficiency of our technique.
Given a set of primes \( \mathbb {P}=\lbrace p_1,\ldots ,p_k\rbrace \), an arithmetic expression \( a^\mathbb {P} \), and two valuation functions \( \sigma _1:V^\mathbb {Z}\cup H\mapsto \mathbb {Z} \) and \( \sigma _2:V^\mathbb {P}\mapsto \mathbb {Z} \), we have \( \begin{equation*} m_\mathbb {P} ([\![ a^\mathbb {P} ]\!] _{\sigma _1,\sigma _2})=[\![ a^\mathbb {P} ]\!] ^\mathbb {P} _{\sigma _1,m_\mathbb {P} \circ \sigma _2}. \end{equation*} \)
Moreover, if there exists an interval \( R \) of size \( N=p_1\cdot \ldots \cdot p_k \) such that \( a^\mathbb {P}~{\in }_{\sigma _1,\sigma _2}~R \), then \( \begin{equation*} m_\mathbb {P} ^{-1,R}([\![ a^\mathbb {P} ]\!] ^\mathbb {P} _{\sigma _1,m_\mathbb {P} \circ \sigma _2})=[\![ a^\mathbb {P} ]\!] _{\sigma _1,\sigma _2}. \end{equation*} \)
We prove this lemma by induction on \( a^\mathbb {P} \).
If \( a^\mathbb {P}=c^\mathbb {P} \), then we have \( m_\mathbb {P} ([\![ c^\mathbb {P} ]\!] _{\sigma _1,\sigma _2})=m_\mathbb {P} (c) \)
\( = [c\ \mathrm{mod}\ p_1,\ldots ,c\ \mathrm{mod}\ p_k]=[\![ c^\mathbb {P} ]\!] ^\mathbb {P} _{\sigma _1,m_\mathbb {P} \circ \sigma _2}. \)
If \( a^\mathbb {P}=v^\mathbb {P} \), then we have \( m_\mathbb {P} ([\![ v^\mathbb {P} ]\!] _{\sigma _1,\sigma _2})=m_\mathbb {P} (\sigma _2(v^\mathbb {P}))=(m_\mathbb {P} \circ \sigma _2)(v^\mathbb {P})=[\![ v^\mathbb {P} ]\!] ^\mathbb {P} _{\sigma _1,m_\mathbb {P} \circ \sigma _2}. \)
If \( a^\mathbb {P}=a_1^\mathbb {P}~op_a^\mathbb {P}~a_2^\mathbb {P} \), then we have from the induction hypothesis that \( m_\mathbb {P} ([\![ a_1^\mathbb {P} ]\!] _{\sigma _1,\sigma _2})=[\![ a_1^\mathbb {P} ]\!] ^\mathbb {P} _{\sigma _1,m_\mathbb {P} \circ \sigma _2} \) and \( m_\mathbb {P} ([\![ a_2^\mathbb {P} ]\!] _{\sigma _1,\sigma _2})=[\![ a_2^\mathbb {P} ]\!] ^\mathbb {P} _{\sigma _1,m_\mathbb {P} \circ \sigma _2} \). Then \( \begin{align*} m_\mathbb {P} ([\![ a_1^\mathbb {P}~op_a^\mathbb {P}~a_2^\mathbb {P} ]\!] _{\sigma _1,\sigma _2}) &=m_\mathbb {P} ([\![ a_1^\mathbb {P} ]\!] _{\sigma _1,\sigma _2}~op_a^\mathbb {P}~[\![ a_2^\mathbb {P} ]\!] _{\sigma _1,\sigma _2}) \\ \text{(Lemma. 4.5)}&=m_\mathbb {P} ([\![ a_1^\mathbb {P} ]\!] _{\sigma _1,\sigma _2})~op_a^\mathbb {P}~m_\mathbb {P} ([\![ a_2^\mathbb {P} ]\!] _{\sigma _1,\sigma _2})\\ \text{(Induction hyp.)}&=[\![ a_1^\mathbb {P} ]\!] ^\mathbb {P} _{\sigma _1,m_\mathbb {P} \circ \sigma _2}~op_a^\mathbb {P}~[\![ a_2^\mathbb {P} ]\!] ^\mathbb {P} _{\sigma _1,m_\mathbb {P} \circ \sigma _2} \\ \text{(Def. of }[\![ \cdot ]\!] ^\mathbb {P} _{\sigma ,\sigma ^\mathbb {P}}) &=[\![ a_1^\mathbb {P}~op_a^\mathbb {P}~a_2^\mathbb {P} ]\!] ^\mathbb {P} _{\sigma _1,m_\mathbb {P} \circ \sigma _2} \\ &=[\![ e^\mathbb {P} ]\!] ^\mathbb {P} _{\sigma _1,m_\mathbb {P} \circ \sigma _2}. \end{align*} \)
If \( a^\mathbb {P}= {\Tiny{TO}}P\Tiny{RIME}(a_1) \), then we have \( \begin{align*} m_\mathbb {P} ([\![ {\Tiny{TO}}P\Tiny{RIME}(a_1) ]\!] _{\sigma _1,\sigma _2}) &= m_\mathbb {P} ([\![ a_1 ]\!] _{\sigma _1,\sigma _2})\\ &=[[\![ a_1 ]\!] _{\sigma _1,\sigma _2}\ \mathrm{mod}\ p_1,\ldots ]\\ &=[[\![ a_1 ]\!] ^\mathbb {P} _{\sigma _1,m_\mathbb {P} \circ \sigma _2}\ \mathrm{mod}\ p_1,\ldots ] \text{(*)}\\ &=[\![ {\Tiny{TO}}P\Tiny{RIME}(a_1) ]\!] ^\mathbb {P} _{\sigma _1,m_\mathbb {P} \circ \sigma _2}, \end{align*} \) where the deduction (*) follows from the fact that, on an integer expression \( a_1 \), the semantics \( [\![ a_1 ]\!] \) and \( [\![ a_1 ]\!] ^\mathbb {P} \) are identical and are only affected by \( \sigma _1 \).
For the second part of the lemma, according to Corollary 4.3 and the assumption that \( a^\mathbb {P}{\in }_{\sigma _1,\sigma _2}~R\Rightarrow [\![ a^\mathbb {P} ]\!] _{\sigma _1,\sigma _2}\in R \), the function \( m_\mathbb {P} \) is a bijection from \( R \) to \( [0,p_1)\times \cdots \times [0,p_k) \). Hence, \( m_\mathbb {P} ^{-1,R}([\![ a^\mathbb {P} ]\!] ^\mathbb {P} _{\sigma _1,m_\mathbb {P} \circ \sigma _2})=[\![ a^\mathbb {P} ]\!] _{\sigma _1,\sigma _2} \) follows directly from \( m_\mathbb {P} ([\![ a^\mathbb {P} ]\!] _{\sigma _1,\sigma _2})=[\![ a^\mathbb {P} ]\!] ^\mathbb {P} _{\sigma _1,m_\mathbb {P} \circ \sigma _2} \).□
Similarly, we show that the two semantics are also equivalent for Boolean expressions.
Given a set of primes \( \mathbb {P}=\lbrace p_1,\ldots ,p_k\rbrace \), an interval \( R \) of size \( N=p_1\cdot \ldots \cdot p_k \), a Boolean expression \( b \), and two valuation functions \( \sigma _1:V^\mathbb {Z}\cup H\mapsto \mathbb {Z} \) and \( \sigma _2:V^\mathbb {P}\mapsto \mathbb {Z} \), if \( b~{\in }_{\sigma _1,\sigma _2}~R \), then \( [\![ b ]\!] _{\sigma _1,\sigma _2}=[\![ b ]\!] ^\mathbb {P} _{\sigma _1,m_\mathbb {P} \circ \sigma _2} \).
We prove this lemma by induction on \( b \).
If \( b=(a_1^\mathbb {P}==a_2^\mathbb {P}) \), then we have \( \begin{equation*} [\![ b ]\!] _{\sigma _1,\sigma _2}=([\![ a_1^\mathbb {P} ]\!] _{\sigma _1,\sigma _2}==[\![ a_2^\mathbb {P} ]\!] _{\sigma _1,\sigma _2}) \end{equation*} \) and \( \begin{equation*} [\![ b ]\!] ^\mathbb {P} _{\sigma _1,m_\mathbb {P} \circ \sigma _2}=([\![ a_1^\mathbb {P} ]\!] ^\mathbb {P} _{\sigma _1,m_\mathbb {P} \circ \sigma _2}==[\![ a_2^\mathbb {P} ]\!] ^\mathbb {P} _{\sigma _1,m_\mathbb {P} \circ \sigma _2}). \end{equation*} \) According to Lemma. 4.7 and the assumption that \( b\in _{\sigma _1,\sigma _2}R~\Rightarrow ~a_1^\mathbb {P},a_2^\mathbb {P}\in _{\sigma _1,\sigma _2}R \), the function \( m_\mathbb {P} \) is bijective over \( R \) and the following implication holds: \( \begin{eqnarray*} [\![ a_1^\mathbb {P} ]\!] _{\sigma _1,\sigma _2}==[\![ a_2^\mathbb {P} ]\!] _{\sigma _1,\sigma _2} & \Leftrightarrow &m_\mathbb {P} ([\![ a_1^\mathbb {P} ]\!] _{\sigma _1,\sigma _2}){==}m_\mathbb {P} ([\![ a_2^\mathbb {P} ]\!] _{\sigma _1,\sigma _2})\\ & \Leftrightarrow &[\![ a_1^\mathbb {P} ]\!] ^\mathbb {P} _{\sigma _1,m_\mathbb {P} \circ \sigma _2}{==}[\![ a_2^\mathbb {P} ]\!] ^\mathbb {P} _{\sigma _1,m_\mathbb {P} \circ \sigma _2}. \end{eqnarray*} \)
If \( b=b_1 ~\texttt {or}~ b_2 \), then we assume by induction that \( [\![ b_1 ]\!] _{\sigma _1,\sigma _2}=[\![ b_1 ]\!] ^\mathbb {P} _{\sigma _1,m_\mathbb {P} \circ \sigma _2} \) and \( [\![ b_2 ]\!] _{\sigma _1,\sigma _2}=[\![ b_2 ]\!] ^\mathbb {P} _{\sigma _1,m_\mathbb {P} \circ \sigma _2} \). Then \( \begin{eqnarray*} [\![ b_1 ~\texttt {or}~ b_2 ]\!] _{\sigma _1,\sigma _2}&\Leftrightarrow & \left([\![ b_1 ]\!] _{\sigma _1,\sigma _2}~\texttt {or}~[\![ b_2 ]\!] _{\sigma _1,\sigma _2}\right)\\ &\Leftrightarrow & \left([\![ b_1 ]\!] ^\mathbb {P} _{\sigma _1,m_\mathbb {P} \circ \sigma _2}~\texttt {or}~[\![ b_2 ]\!] ^\mathbb {P} _{\sigma _1,m_\mathbb {P} \circ \sigma _2}\right)\\ & \Leftrightarrow & [\![ b_1~\texttt {or}~b_2 ]\!] ^\mathbb {P} _{\sigma _1,m_\mathbb {P} \circ \sigma _2}. \end{eqnarray*} \)
The case of \( \texttt {and} \) and \( \texttt {not} \) is similar as the previous case.
If \( b=a_0~op_c~a_1 \), then \( [\![ a_0~op_c~a_1 ]\!] _{\sigma _1,\sigma _2}=[\![ a_0~op_c~a_1 ]\!] ^\mathbb {P} _{\sigma _1,m_\mathbb {P} \circ \sigma _2} \), since \( a_0 \) and \( a_1 \) are not modular expressions and the two semantics are therefore identical.
We are now ready to show the equivalence between the modular semantics and the integer semantics for programs \( P\in \texttt {IMP-MOD} \). The semantics of a program \( P=f\texttt {(}V^\mathbb {Z},V^\mathbb {P},H\texttt {)}~ \lbrace s \rbrace \) is a map from valuations to valuations, i.e., given a valuation \( \sigma _1:V^\mathbb {Z}\rightarrow \mathbb {Z} \) for integer variables, a valuation \( \sigma _2:V^\mathbb {P}\rightarrow \mathbb {Z} \) for modular variables and a valuation \( \sigma ^H:H\rightarrow \mathbb {Z} \) for holes, we have \( [\![ P ]\!] (\sigma _1,\sigma _2,\sigma ^H)=[\![ s ]\!] _{\sigma _1\cup \sigma ^H,\sigma _2} \) and \( [\![ P ]\!] ^\mathbb {P} (\sigma _1,\sigma _2,\sigma ^H)= [\![ s ]\!] ^\mathbb {P} _{\sigma _1\cup \sigma ^H,m_\mathbb {P} \circ \sigma _2} \). Therefore, it is sufficient to show that the two semantics are equivalent for any statement \( s \).
The two semantics are equivalent for a statement \( s \) if, under the same input valuations, the resulting valuations of the semantics can be translated to each other. Formally, given valuations \( \sigma _1 \), \( \sigma _2 \) and an interval \( R \) of size \( N \), we say \( [\![ s ]\!] _{\sigma _1,\sigma _2}\equiv _\mathbb {P}[\![ s ]\!] ^\mathbb {P} _{\sigma _1,m_\mathbb {P} \circ \sigma _2} \) iff \( \sigma ^{\prime }_1=\sigma ^{\prime \prime }_1 \), \( m_\mathbb {P} \circ \sigma ^{\prime }_2=\sigma ^\mathbb {P}_2 \) and \( \sigma ^{\prime }_2=m_\mathbb {P} ^{-1,R}\circ \sigma ^\mathbb {P}_2 \) where \( [\![ s ]\!] _{\sigma _1,\sigma _2}=(\sigma ^{\prime }_1,\sigma ^{\prime }_2) \) and \( [\![ s ]\!] ^\mathbb {P} _{\sigma _1,m_\mathbb {P} \circ \sigma _2}=(\sigma ^{\prime \prime }_1,\sigma ^\mathbb {P}_2) \).
We define uniform inclusion for statements.
Given a set of primes \( \mathbb {P} \), two integers \( L\lt U \) and a statement \( s \), we say \( s \) with context \( (\sigma _1,\sigma _2) \) is uniformly in the range \( R:=[L,U) \)—\( s\in _{\sigma _1,\sigma _2}R \) for short if under the integer semantics, all evaluation of modular subexpressions of \( s \) are in the range \( R \):
\( (v^\mathbb {P}=a^\mathbb {P})\in _{\sigma _1,\sigma _2}R \) iff \( a^\mathbb {P}\in _{\sigma _1,\sigma _2}R \).
\( \texttt {while}(b)\lbrace s\rbrace \in _{\sigma _1,\sigma _2}R \) iff \( s\in _{\sigma _1,\sigma _2}R \) and \( b\in _{\sigma _1,\sigma _2}R \).
\( s_1;s_2\in _{\sigma _1,\sigma _2}R \) iff \( s_1\in _{\sigma _1,\sigma _2}R \) and \( s_2\in _{\sigma _1,\sigma _2}R \).
\( \texttt {if}(b)~{s_1}~\texttt {else}~s_2\in _{\sigma _1,\sigma _2}R \) iff \( s_1\in _{\sigma _1,\sigma _2}R \), \( s_2\in _{\sigma _1,\sigma _2}R \) and \( b\in _{\sigma _1,\sigma _2}R \).
\( \texttt {assert}~b\in _{\sigma _1,\sigma _2}R \) iff \( b\in _{\sigma _1,\sigma _2}R \).
At last, the two semantics are equivalent for statements.
Given a set of primes \( \mathbb {P}=[p_1,\ldots ,p_k] \), a statement \( s \) and two valuation functions \( \sigma _1:V^\mathbb {Z}\cup H\rightarrow \mathbb {Z} \) and \( \sigma _2:V^\mathbb {P}\rightarrow \mathbb {Z} \), if there exists an interval \( R \) of size \( N \) such that \( s\in _{\sigma _1,\sigma _2}R \), then \( [\![ s ]\!] _{\sigma _1,\sigma _2}\equiv _\mathbb {P}[\![ s ]\!] ^\mathbb {P} _{\sigma _1,m_\mathbb {P} \circ \sigma _2} \).
We prove this theorem by induction on statements \( s \). In the following, we let \( [\![ s ]\!] _{\sigma _1,\sigma _2}=(\sigma ^{\prime }_1,\sigma ^{\prime }_2) \) and \( [\![ s ]\!] ^\mathbb {P} _{\sigma _1,m_\mathbb {P} \circ \sigma _2}=(\sigma ^{\prime \prime }_1,\sigma ^\mathbb {P}_2) \).
If \( s=(v^\mathbb {P}=a^\mathbb {P}) \), then we have that \( [\![ v^\mathbb {P}=a^\mathbb {P} ]\!] _{\sigma _1,\sigma _2}=(\sigma _1,\sigma _2[v^\mathbb {P}↤ [\![ a^\mathbb {P} ]\!] _{\sigma _1,\sigma _2}]) \) and \( [\![ v^\mathbb {P}=a^\mathbb {P} ]\!] ^\mathbb {P} _{\sigma _1,m_\mathbb {P} \circ \sigma _2}=(\sigma _1,(m_\mathbb {P} \circ \sigma _2)[v^\mathbb {P}↤ [\![ a^\mathbb {P} ]\!] ^\mathbb {P} _{\sigma _1,m_\mathbb {P} \circ \sigma _2}]) \). To show \( m_\mathbb {P} \circ \sigma ^{\prime }_2=\sigma ^\mathbb {P}_2 \), we need to show that \( \begin{equation*} m_\mathbb {P} \circ (\sigma _2[v^\mathbb {P}↤ [\![ a^\mathbb {P} ]\!] _{\sigma _1,\sigma _2}])=(m_\mathbb {P} \circ \sigma _2)[v^\mathbb {P}↤ [\![ a^\mathbb {P} ]\!] ^\mathbb {P} _{\sigma _1,m_\mathbb {P} \circ \sigma _2}]. \end{equation*} \) In fact, for any primed variable \( u^\mathbb {P}\ne v^\mathbb {P} \), both sides in the above equation will return \( (m_\mathbb {P} \circ \sigma _2)(u^\mathbb {P}) \). For the primed variable \( v^\mathbb {P} \), we have \( \begin{equation*} left~side=m_\mathbb {P} ([\![ a^\mathbb {P} ]\!] _{\sigma _1,\sigma _2})= [\![ a^\mathbb {P} ]\!] ^\mathbb {P} _{\sigma _1,m_\mathbb {P} \circ \sigma _2}= right~side, \end{equation*} \) where the middle equation is induced from Lemma. 4.7 with the assumption \( s\in _{\sigma _1,\sigma _2}R\Rightarrow a^\mathbb {P}\in _{\sigma _1,\sigma _2}R \).
If \( s=\texttt {while}(b)\lbrace s_1\rbrace \), then we have by induction that \( [\![ s_1 ]\!] _{\sigma _3,\sigma _4}\equiv _\mathbb {P}[\![ s_1 ]\!] ^\mathbb {P} _{\sigma _3,m_\mathbb {P} \circ \sigma _4} \) for any valuation \( \sigma _3 \) and \( \sigma _4 \). Deduced from Lemma. 4.8 and the assumption that \( s\in _{\sigma _1,\sigma _2}R\Rightarrow b\in _{\sigma _1,\sigma _2}R \), we know that the Boolean values \( [\![ b ]\!] _{\sigma _1,\sigma _2}=[\![ b ]\!] ^\mathbb {P} _{\sigma _1,m_\mathbb {P} \circ \sigma _2} \). If both \( [\![ b ]\!] _{\sigma _1,\sigma _2} \) and \( [\![ b ]\!] ^\mathbb {P} _{\sigma _1,m_\mathbb {P} \circ \sigma _2} \) are false, then \( \begin{eqnarray*} [\![ \texttt {while}(\texttt {false})\lbrace s_1\rbrace ]\!] _{\sigma _1,\sigma _2}=(\sigma _1,\sigma _2)\equiv _\mathbb {P}(\sigma _1,m_\mathbb {P} \circ \sigma _2)\\ =[\![ \texttt {while}(\texttt {false})\lbrace s_1\rbrace ]\!] ^\mathbb {P} _{\sigma _1,m_\mathbb {P} \circ \sigma _2}. \end{eqnarray*} \) So it is left to show the true-branch \( [\![ s_1 ]\!] _{\sigma _1,\sigma _2}\equiv _\mathbb {P}[\![ s_1 ]\!] ^\mathbb {P} _{\sigma _1,m_\mathbb {P} \circ \sigma _2} \), which is exactly the IH.
If \( s=s_1;s_2 \), then we assume by induction that \( [\![ s_1 ]\!] _{\sigma _1,\sigma _2}\equiv _\mathbb {P}[\![ s_1 ]\!] ^\mathbb {P} _{\sigma _1,m_\mathbb {P} \circ \sigma _2} \) and \( [\![ s_2 ]\!] _{\sigma _3,\sigma _4}\equiv _\mathbb {P}[\![ s_2 ]\!] ^\mathbb {P} _{\sigma _3,m_\mathbb {P} \circ \sigma _4} \) for any valuations \( \sigma _3 \) and \( \sigma _4 \). Then \( \begin{eqnarray*} [\![ s_1;s_2 ]\!] _{\sigma _1,\sigma _2}=[\![ s_2 ]\!] _{\sigma ^{\prime }_3,\sigma ^{\prime }_4}\equiv _\mathbb {P}[\![ s_2 ]\!] ^\mathbb {P} _{\sigma ^{\prime }_3,m_\mathbb {P} \circ \sigma ^{\prime }_4}\\ =[\![ s_1;s_2 ]\!] ^\mathbb {P} _{\sigma _1,m_\mathbb {P} \circ \sigma _2}, \end{eqnarray*} \) where \( [\![ s_1 ]\!] _{\sigma _1,\sigma _2}=(\sigma ^{\prime }_3,\sigma ^{\prime }_4)\equiv _\mathbb {P}(\sigma ^{\prime }_3,m_\mathbb {P} \circ \sigma ^{\prime }_4)=[\![ s_1 ]\!] ^\mathbb {P} _{\sigma _1,m_\mathbb {P} \circ \sigma _2} \) is implied by IH.
If \( s=\texttt {assert}~b \), then it is similar to the case of while-statements. Since the two semantics for Boolean expressions are equivalent, the two semantics for \( s \) will choose the same branch that is either \( \bot \) or unchanged.
If \( s=\texttt {if}~s_1~\texttt {else}~s_2 \), then it is also similar to the case of while-statements.
If \( s=(v=a) \), then the two semantics are equivalent, since there are no modular expressions in \( s \).
5 From IMP to IMP-MOD Programs
In this section, we develop a dataflow analysis for detecting variables in
5.1 Dataflow Analysis
The formalization of
Consider an integer variable \( x \) with modular value \( x_2 \) under modulus 2 and \( x_3 \) under modulus 3 and an integer variable \( y \) with modular value \( y_2 \), \( y_3 \) under corresponding moduli. Then the assignment of \( x = y + y; \) implies \( x_2 = (y_2 + y_2)\ \ \mathrm{mod}\ 2; \) and \( x_3 = (y_3 + y_3)\ \ \mathrm{mod}\ 3 \). However, \( x = x / y; \) does not imply \( x_2 = (x_2 / y_2)\ \ \mathrm{mod}\ 2; \) and \( x_3 = (x_3 / y_3)\ \ \mathrm{mod}\ 3 \).

Dataflow Analysis for Partitioning Variables. We now define a dataflow analysis for computing which variables in a program must be tracked with the integer semantics (i.e., the set \( V^\mathbb {Z} \)) and which variables can be soundly tracked using the modular semantics (i.e., the set \( V^\mathbb {P} \)). For each operator \( op \) in \( \lbrace /, \lt , \gt , \le , \ge \rbrace \), the analysis computes the set of variables that may flow into the operands of an expression of the form \( e_1~op~e_2 \). In practice, this is done via backward may analysis, noted as the
Implementation Remark. Since our implementation also supports arrays and recursion, the dataflow analysis in Algorithm 1 is inter-procedural and the set \( S \) also contains the array indexing operator \( [ \ ] \), i.e., given an expression \( arr[a] \), if a variable \( v \) may flow into \( a \), then \( a \) must be tracked using the integer semantics. Furthermore, while in our formalization we allow variables to be tracked using only one of the two semantics, in our implementation, we allow variables to be tracked differently (using actual values or modular values) at different program points by tracking, for each variable \( v \), the program points for which the actual value of \( v \) is needed, which is done by using the same dataflow analysis. In this case, a variable might initially need to be tracked using actual values but can later be tracked using modular values.
Consider the sketch program
5.2 From IMP to IMP-MOD
Now that we have computed what sets of variables can be tracked modularly, we can transform the
Fig. 7. Rules for the translation from IMP to IMP-MOD programs. Rules are parametric in \( V^\mathbb {Z} \) , \( V^\mathbb {P} \) with \( \mathbb {P} \) : \( {\rm\small R}_f(f(V,\texttt {??})\lbrace s\rbrace) = f(V^\mathbb {Z}, V^\mathbb {P}, \texttt {??})\lbrace {\rm\small R}_s(s)\rbrace \) .
Once we get a solution for the
Consider the example program transformation shown in Figure 8. For this program, the dataflow analysis computes \( V^\mathbb {Z}=\lbrace i,n\rbrace \) and \( V^\mathbb {P}=\lbrace x\rbrace \). Therefore, the transformation rewrites the statement \( x=x+i+1 \) to \( x^\mathbb {P}=x^\mathbb {P}+{\Tiny{TO}}P\Tiny{RIME}(i)+1^\mathbb {P} \).
Fig. 8. Transformation from the IMP program LoopInc in (a) to an IMP-MOD program LoopIncMod in (b).
The transformation \( {\rm\small R}_f \) is sound.
Given an
First a few lemmas (note that in the formalization we only consider the language described in Figure 4, which for example does not include arrays).
Lemma 1. For every \( a \), \( {\rm\small R}_a(a) \) is a modular arithmetic expression in \( a^\mathbb {P} \). We proceed by induction on \( a \). The interesting case is the “otherwise” case. In this case, \( {\rm\small R}_a(a)={\Tiny{TO}}P\Tiny{RIME}(a) \). Hence, we need to show that \( a \) is an
If \( a\equiv v \), and \( v\in V^\mathbb {Z} \), then \( a \) does not contain variables in \( V^\mathbb {P} \).
If \( a\equiv \texttt {??} \), then \( a \) does not contain variables in \( V^\mathbb {P} \).
If \( a\equiv a_1 / a_2 \), then, from the dataflow analysis, all variables in \( a_1 \) and \( a_2 \) belong to \( V^\mathbb {Z} \).
Lemma 2. For every \( b \), \( {\rm\small R}_b(b) \) is a
The other interesting case is the “otherwise” case. In this case, \( {\rm\small R}_b(b)=b \) and \( b \) is of the form \( a_1~op_c~a_2 \) where \( op_c\in \lbrace \lt ,\gt ,\le ,\ge \rbrace \). Hence, we need to show that \( b \) is an
Now, we are ready to show that \( {\rm\small R}_f(f)=f(V^\mathbb {Z}, V^\mathbb {P}, \texttt {??})\lbrace {\rm\small R}_s(s)\rbrace \) is in the
If \( s\equiv v^\mathbb {P}~=~a \), then \( {\rm\small R}_s(s)=v^\mathbb {P}~=~{\rm\small R}_a(a) \). From Lemma 1, \( {\rm\small R}_a(a) \) is a modular arithmetic expression in \( a^\mathbb {P} \). Hence we are done.
If \( s\equiv \texttt {assert}~b \), then \( {\rm\small R}_s(s)=\texttt {assert}~{\rm\small R}_b(b) \). From Lemma 2, \( {\rm\small R}_b(b) \) is a
IMP-MOD Boolean expression. Hence we are done.
The fact that \( [\![ f ]\!] ^\texttt {IMP} =[\![ {\rm\small R}_f(f) ]\!] \) follows from a straightforward induction on \( s, b \), and \( a \).□
6 Solving IMP-MOD Sketches
In this section, we discuss how synthesis in the modular semantics relates to synthesis in the integer semantics and provide an incremental algorithm for solving
6.1 Synthesis in IMP-MOD
Given a set of integers \( R \), we say that a variable valuation \( \sigma \) is in \( R \) (denoted \( \sigma \in R \)) if for every \( v \) we have \( \sigma (v)\in R \). Similarly to what we saw in Section 3, we assume that the sketch has to be solved for finite ranges of possible values for the hole (\( R_H \)) and input values (\( R_{in} \)). Solving an
According to Theorem 4.10, given a set of distinct primes \( \mathbb {P}=\lbrace p_1,\ldots ,p_k\rbrace \) and variable valuations \( \sigma ^H,\sigma _1, \) and \( \sigma _2 \), if there exists a range \( R \) of size \( N=p_1\cdot \ldots \cdots p_k \) such that \( s\in _{\sigma _1\cup \sigma ^H,\sigma _2} R \), then the modular semantics and the integer semantics are equivalent to each other. Using this observation, we can define the set of variable valuations for which the two semantics are guaranteed to be equivalent: \( \begin{eqnarray*} \mathcal {I}_R^\mathbb {P}{:=}\left\lbrace (\sigma _1,\sigma _2)\mid \forall \sigma ^H{\in } R_H. \exists R.~|R|{=}N \wedge s{\in _{\sigma _1\cup \sigma ^H,\sigma _2}}R\right\rbrace \!. \end{eqnarray*} \)
Since for every \( \sigma ^H\in R_H \) and \( \sigma _1,\sigma _2\in \mathcal {I}_R^\mathbb {P} \) we have that \( [\![ s ]\!] ^\mathbb {P} _{\sigma _1\cup \sigma ^H,m_\mathbb {P} \circ \sigma _2}=[\![ s ]\!] _{\sigma _1\cup \sigma ^H,\sigma _2} \), any solution to an
To summarize, if the synthesizer returns UNSAT for the
6.2 Incremental Synthesis Algorithm
In this section, we propose an incremental synthesis algorithm that builds on the following observation. The set of variable valuations for which modular and integer semantics are equivalent increases monotonically in the size of \( \mathbb {P} \): (1) \( \begin{eqnarray} \mathbb {P}_1\subseteq \mathbb {P}_2 \Longrightarrow \mathcal {I}_R^{\mathbb {P}_1}\subseteq \mathcal {I}_R^{\mathbb {P}_2}. \end{eqnarray} \)
Algorithm 2 uses Equation (1) to add prime numbers lazily during the synthesis process. The algorithm first constructs a set \( \mathbb {P}^{\prime }=\lbrace p_1\rbrace \) with the first prime number \( p_1 \in \mathbb {P} \) and synthesizes a solution that is correct for computations modulo the set \( \mathbb {P}^{\prime } \). It then checks if the synthesized solution \( f_\texttt {syn} \) satisfies the assertions with respect to all prime numbers in \( \mathbb {P} \). If yes, then \( f_\texttt {syn} \) is returned as the solution. Otherwise, the algorithm finds a prime \( p_\texttt {cex} \in \mathbb {P} \) where \( \texttt {Verify}(f_\texttt {syn},p_\texttt {cex}) \) does not hold and it adds it to the set \( \mathbb {P}^{\prime } \) continuing the iterative algorithm. Due to Equation (1), Algorithm 2 is sound and complete with respect to the synthesis algorithm that considers the full prime set \( \mathbb {P} \) all at once.
In practice, the user could use domain knowledge to estimate a suitable set of primes or alternatively use our incremental algorithm to discover appropriate prime sets. The set of prime numbers \( \lbrace 2,3,5,7,11,13, 17\rbrace \) could usually instantiate a range \( R \) that is large enough for most synthesis tasks based on Sketch.

7 COMPLEXITY OF REWRITTEN PROGRAMS
In this section, we analyze how many bits are necessary to encode numbers for both semantics using unary and binary4 bit-vector encodings of integers (Sections 7.1 and 7.2) and show how many prime numbers are necessary in the modular semantics to cover values up to a certain bound (Section 7.3). The following results build upon several number theory results that the reader can consult in References [9, 15].
7.1 Bit-complexity of Binary Encoding
In this section, we analyze how many bits are necessary when representing an interval of size \( N \) in binary in our modular semantics. In the rest of the section, we consider the set of primes \( \mathbb {P}_n=\lbrace p\mid p\lt n\rbrace =\lbrace p_1,\ldots , p_k\rbrace \) containing the prime numbers that have value smaller than \( n \). We will show in Section 8 that this choice of prime number also yields good performance in practice. Concretely, we are interested in knowing what is the magnitude of the number \( N=p_1\cdot \ldots \cdot p_k \) and how many bits are used to represent the numbers in \( \mathbb {P}_n \).
We start by introducing the notion of primorial.
(Primorial).
Given a number \( n \), the primorial \( n\# \) is defined as the product of all primes smaller than \( n \), i.e., \( n\# = \prod \nolimits _{p\in \mathbb {P}_n} p \).
The primorial captures the size \( N \) of the interval covered by the Chinese Remainder Theorem when using prime numbers up to \( n \). The following number theory result gives us a close form for the primorial and shows that the number \( N \) has approximately \( n \) bits, (2) \( \begin{equation} n\# = e^{(1 + o(1))n} = 2^{(1 + o(1))n}. \end{equation} \)
We use another number theory notion to quantify the number of bits in \( \mathbb {P}_n \).
(Chebyshev Function).
Given a number \( n \), the Chebyshev function \( \vartheta (n) \) is the sum of the logarithms of all the prime numbers smaller than \( n \),i.e., \( \vartheta (n) = \sum \nolimits _{p\in \mathbb {P}_n} \log p \).
The following number theory result relates the primorial to the Chebyshev function: (3) \( \begin{equation} \vartheta (n) = \log (n\#) =\log 2^{(1 + o(1))n} = (1+o(1))n. \end{equation} \) Aside from rounding errors, the Chebyshev function captures the number of bits required to represent the numbers in \( \mathbb {P}_n \). To obtain a more precise bound on this number, we need a bound for the formula \( \sum \nolimits _{p\in \mathbb {P}_n} \lceil \log p\rceil \).
We start by recalling the following fundamental number theory result.
The set \( \mathbb {P}_n \) has size approximately \( n/\log n \).
Using Theorem 7.3, we get the following result: (4) \( \begin{equation} \sum \limits _{p\in \mathbb {P}_n} \lceil \log p\rceil \le n/\log n + \sum \limits _{p\in \mathbb {P}_n} \log p \approx (1+o(1))n. \end{equation} \)
Representing a number \( e^n \) in a classic binary encoding requires \( \log _2 (e^n)=(1+o(1))n \) bits, and, combining Equations (2) and (4), we get the following result.
Representing a number \( 2^n \) in binary requires \( (1+o(1))n \) bits under both modular and integer semantics.
Hence, representing a number in binary requires the same number of bits in the both semantics.
Consider the set \( \mathbb {P}_{18}=\lbrace 2,3,5,7,11,13,17\rbrace \), which can model an interval of \( N=510,510 \) integers (i.e., \( n=18 \) in Theorem 7.4). Representing \( N \) in binary requires 19 bits while the binary representations of all the primes in \( \mathbb {P}_{18} \) use 22 bits. Both numbers are close to 18 as predicted by the theorem.
7.2 Bit-complexity of Unary Encoding
As discussed in Section 3, the default Sketch solver encodes numbers using a unary encoding, i.e., Sketch requires \( 2^n \) bits to encode the number \( 2^n \). Representing the same number in unary under the modular semantics requires only prime numbers smaller than \( n \) and therefore \( \sum \nolimits _{p\in \mathbb {P}_n} p \) bits. We can then use the following closed form to approximate this quantity: (5) \( \begin{equation} \sum \limits _{p\in \mathbb {P}_n} p \sim \frac{n^2}{2 \log n}. \end{equation} \) Equation (5) yields the following theorem.
Representing a number \( 2^n \) in unary requires \( 2^n \) bits in the integer semantics and approximately \( \frac{n^2}{2 \log n} \) bits in the modular semantics.
These results show that, under a unary encoding, the modular semantics is exponentially more succinct than the integer semantics.
Consider again the prime set \( \mathbb {P}_{18}=\lbrace 2,3,5,7,11,13,17\rbrace \), which can model an interval of \( N=510,510 \) integers. Representing \( N \) in unary requires 510,510 bits. However, the sum of the bits in the unary encoding of the primes in \( \mathbb {P}_{18} \) is 58.
7.3 Number of Required Primes
We analyze how many primes are needed to represent a certain number in the modular semantics. We start by introducing the following alternative version of the primorial.
(Prime Primorial).
For the \( n \)th prime number \( p_n \), the prime primorial \( p_n\# \) is defined as the product of the first \( n \) primes, i.e., \( p_n\# = \prod \nolimits _{k=1}^n p_i \).
The following known number theory result gives us an approximation for the prime primorial: (6) \( \begin{equation} p_n\# = e^{(1 + o(1))n\log n}. \end{equation} \) Notice how the approximation of the primorial differs from that of the prime primorial. This is due to the fact that prime numbers are sparse, i.e., the \( n \)th prime number is approximately \( n\log n \).
Using Equation (6) we obtain the following result.
Representing numbers in an interval of size \( N=e^{n\log n} \) in the modular semantics requires the first \( n \) prime numbers.
Since the relation \( k=n\log n \) does not admit a closed form for \( n \), we cannot derive exactly how many primes are needed to represent a number \( 2^k \) with \( k \) bits. It is, however, clear from the theorem that the number of required primes grows slower than \( k \).
Consider again the prime set \( \mathbb {P}_{18}=\lbrace 2,3,5,7,11,13,17\rbrace \), which can model an interval of \( N=510,510 \) integers and consists of the first seven primes. According to Theorem 7.9, \( \mathbb {P}_{18} \) should be able to model an integer interval of size approximately \( e^{(1+o(1))(7 \log 7)} \). In this case, \( N \) is approximately \( e^{2.22(7 \log 7)} \).
8 EVALUATION
We implemented a prototype of our technique as a simple compiler in Java. Our implementation provides a simplified Sketch frontend, which only allows the limited syntax we support. Given a Sketch file, our tool rewrites it into a different Sketch file that operates according to the modular semantics. We will use Unary to denote the result obtained by running the default version of Sketch with unary integer encoding on the original Sketch file; Native-Ints to denote the result obtained by running the version of Sketch using a native integer solver that extends the SAT solver with native integer variables, which it propagates them through the integer constraints, and learns from conflict clauses (flag
Our evaluation answers the following research questions:
Q1 | How does the performance of Unary-p compare to Unary and Native-Ints? | ||||
Q2 | How does the incremental algorithm compare to the non-incremental one? | ||||
Q3 | Is Unary-p’s performance sensitive to the set of selected prime numbers? | ||||
Q4 | How many primes are needed by Unary-p to produce correct solutions? | ||||
Q5 | Does Unary generate larger SAT queries than Unary-p? | ||||
8.1 Benchmarks
We perform our evaluation on three families of programs.
Polynomials. The first set of benchmarks contains 81 variants of the polynomial synthesis problem presented in Figure 1. The original version of this benchmark appears in the Sketch benchmark suite under the name
Invariants. The second set of benchmarks contain 46 variants of two invariant generation problems obtained from a public set of programs that require polynomial invariants to be verified [8]. We selected the two programs in which at least one variable could be tracked modularly by our tool (the other programs involved complex array operations or inequality operators) and turned the verification problems into synthesis problems by asking Sketch to find a polynomial equality (using the program variables) that is an invariant for the loop in the program. To control the size of the magnitudes of the inputs, we only require the invariants to hold for a fixed set of input examples.
The first problem,
The second problem,
Program Repair. The third set of benchmarks contains 54 variants of Sketch problems from the domain of automatic feedback generation for introductory programming assignments [7]. Each benchmark corresponds to an incorrect program submitted by a student and the goal of the synthesizer is to find a small variation of the program that behaves correctly on a set of test cases. We select the 6/11 benchmarks from the tool Qlose [7] for which (i) our implementation can support all the features in the program and (ii) our dataflow analysis identifies at least one variable that can be tracked modularly. Of the remaining benchmarks, 3/11 do not contain variables that can be tracked modularly and 2/11 call auxiliary functions that cannot be translated into Sketch. For each program, we consider the original problem and two variants where the integer inputs are multiplied by 10 and 100, respectively. Further, for each program variants, we impose an assertion specifying that the distance between the original program and the repaired program is within a certain bound. We select three different bounds for each program: the minimum cost \( c \), \( c+100 \), and \( c+200 \).
8.2 Performance of Unary-p
Table 1 summarizes our comparison.
| Polynomials | Invariants | Program repair | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| Solver | Solved | SAT | UNSAT | TO | SAT | UNSAT | TO | SAT | UNSAT | TO |
| Unary | 69/181 | 12 | 4 | 65 | 5 | 0 | 41 | 48 | 0 | 6 |
| Native-Ints | 127/181 | 70 | 6 | 5 | 17 | 0 | 29 | 34 | 0 | 20 |
| Unary-p | 169/181 | 73 | 5 | 3 | 41 | 2 | 3 | 48 | 0 | 6 |
| Unary-p-inc | 172/181 | 73 | 6 | 2 | 41 | 2 | 3 | 50 | 0 | 4 |
SAT (respectively, UNSAT) denotes the number of benchmarks for which solver could find a solution to the benchmarks (respectively, prove no solution existed) while TO denotes the number of timeouts.
Table 1. Effectiveness of Different Solvers
SAT (respectively, UNSAT) denotes the number of benchmarks for which solver could find a solution to the benchmarks (respectively, prove no solution existed) while TO denotes the number of timeouts.
First, we compare the performance of Unary-p and Unary. We use \( \mathbb {P}= \lbrace 2,3,5,7,11,13,17\rbrace \), which is enough for Unary-p to always find correct solutions (we verify the correctness of a solution by instantiating the hole values in the original sketch programs and using Sketch (without the
Next, we compare the performance of Unary-p and Native-Ints (Figure 10(b)). On the 64 easier benchmarks that Native-Ints can solve in less than 1 second, Native-Ints (average 0.55 seconds) outperforms Unary-p (average 2.32 seconds), but Unary-p still has reasonable performance. On the 49 benchmarks that Native-Ints can solve between 1 and 10 seconds, Unary-p (average 3.5 seconds) is on average 1.9\( \times \) faster than Native-Ints (average 6.9 seconds). Most interestingly, for the 14 harder benchmarks for which Native-Ints takes more than 10 seconds, Unary-p (average 5.7 seconds) is on average 15.9\( \times \) faster than Native-Ints (average 90.9 seconds). Remarkably, Unary-psolved 43 of the benchmarks (in less than 8 seconds each) for which Native-Intstimed out,6 and Unary-p only timed out for two benchmarks that Native-Ints could solve in less than a second and one benchmark that Native-Ints could solve in 260 seconds. Finally, we highlight that for \( 41/208 \) benchmarks, even Unary outperforms Native-Ints. As expected from the discussion throughout the article, these are benchmarks typically involving complex operations but not involving overly large numbers.
We can now answer Q1. First, Unary-pconsistently outperforms Unary across all benchmarks. Second, Unary-poutperforms Native-Intson hard-to-solve problems and can solve problems that Native-Intscannot solve, e.g., Unary-p solved 28/46 invariant problems that Sketch could not solve with either encoding. Unary-p and Native-Ints have similar performance on easy problems.
Comparison to full SMT encoding. For completeness, we also compare our approach to a tool that uses SMT solvers to model the entire synthesis problem. We choose the state-of-the-art SMT-based synthesizer Rosette [26] (version 3.0 with Z3 v4.8.7) for our comparison. Rosette is a programming language that encodes verification and synthesis constraints written in a domain-specific language into SMT formulas that can be solved using SMT solvers.
We only run Rosette on the set of Polynomials, because Rosette does support the theories of integers but does not have native support for loops, so there is no direct way to encode Invariants and Program Repair benchmarks. To our knowledge, Rosette provides a way to specify the number \( k \) it uses to model integers and reals as \( k \)-bit words, but the user has no control over how many bits it uses for unknown holes specifically. So we evaluate 27 instead of 81 variants of the polynomial synthesis problem on Rosette, i.e., we consider different numbers of cbits. We run Rosette with the theory of integers (though Rosette supports many SMT theories including reals) in our experiments.
Figure 9 shows the running times (log scale) for Rosette and Native-Ints with
Fig. 9. Rosette vs. Native-Ints.
Fig. 10. Performance of Unary, Native-Ints, and Unary-p.
We also consider the unsound encoding in which the holes and numbers are encoded as bit-vectors (16 bits) and directly encode the polynomial problems in SMT using quantifiers (to encode that the polynomial should be correct on all inputs) and the theory of bit-vectors. Z3 (v4.8.7) can efficiently solve all these instances (<1 seconds per instance). However, it often returns solutions that are correct in bit-vector arithmetic but not in integer arithmetic (due to overflow).
Finally, we tried applying our prime-based technique to Rosette (with integer variables), and the technique is not beneficial and causes all benchmarks to timeout.
To summarize, (i) SMT solvers cannot efficiently handle the synthesis problems considered in this article when considering the theory of integers but can efficiently solve (though at times unsoundly) some of the problems using the theory of bit-vectors, and (ii) our technique is better suited for the Sketch unary encoding of integers than for the encodings used for integers in SMT solvers.
The results leave open whether it is possible to build an SMT-based approach for solving programs sketches that can benefit from the techniques presented in this article. This direction is beyond the scope of this article, as our focus is on addressing the existing limitations of the Sketch solver.
8.3 Performance of Incremental Solving
Our implementation of the incremental solver Unary-p-inc first attempts to find a solution with the prime set \( \mathbb {P}=\lbrace 2,3,5,7\rbrace \). If the solver returns a correct solution, then Unary-p-inc terminates. Otherwise, Unary-p-inc incrementally adds the next prime to \( \mathbb {P} \) until it finds a correct solution, it proves
there is no solution, or it times out. Unary-p-inc is 25.2% (geometric mean) slower than Unary-p (Figure 11 (log scale)). Unary-p-inc can solve three benchmarks for which both Unary-p and Native-Ints timed out. To answer Q3, Unary-p-incperforms slightly better than Unary-p.
Fig. 11. Unary-p-inc vs. Unary-p.
8.4 Varying the Prime Number Set ℙ
In this experiment, we evaluate how different prime number sets affect the performance and correctness of Unary-p.
We consider the five increasing sets of primes: \( \mathbb {P}_5=\lbrace 2,3,5\rbrace \), \( \mathbb {P}_7=\lbrace 2,3,5,7\rbrace \), \( \mathbb {P}_{11}=\lbrace 2,3,5,7,11\rbrace \), \( \mathbb {P}_{13}=\lbrace 2,3,5,7,11,13\rbrace \), and \( \mathbb {P}_{17}=\lbrace 2,3,5,7,11,13,17\rbrace \). Figure 12(a) (log scale) shows the running times for all the polynomial benchmarks with
Fig. 12. Performance for different sets of prime numbers.
In terms of correctness, we find that smaller prime sets often yield incorrect solutions (\( \mathbb {P}_5 \) (37% correct), \( \mathbb {P}_7 \) (70%), \( \mathbb {P}_{11} \) (86%), \( \mathbb {P}_{13} \) (97 %), and \( \mathbb {P}_{17} \) (100%)), because there is not enough discriminative power with fewer primes and the solutions may overfit to the smaller set of intermediate values. It is interesting to note that even prime sets of intermediate size often lead to correct solutions in practice, which explains some of the speedups observed in the incremental synthesis algorithm. To answer Q4, Unary-pis able to synthesize correct solutions even with intermediate sized sets of primes.
Changing Magnitude of Primes. We also evaluate the performance of Unary-p when using primes of different magnitudes. We consider the sets of primes \( \lbrace 11,17,19,23\rbrace \), \( \lbrace 31,41,47\rbrace \), and \( \lbrace 251,263\rbrace \), which define similar integer ranges, but pose different tradeoffs between the number of used primes and their sizes, e.g., the set \( \lbrace 251,263\rbrace \) only uses two very large primes. Since the different sets cover similar integer ranges, they all produce correct solutions. Figure 12(b) (log scale) shows the running time of Unary-p for the same benchmarks as Figure 12(a). Larger prime sets of smaller prime values require less time to solve than smaller prime sets of larger prime values. This result is expected, since, in the unary encoding of numbers, representing larger numbers requires more bits.
8.5 Size of SAT Formulas
In this experiment, we compare the sizes of the intermediate SAT formulas generated by Unary-p and Unary. Figure 13(a) shows a scatter plot (log scale) of the number of clauses of the largest intermediate SAT query generated by the CEGIS algorithm for the two techniques. We only plot the instances in which Unary was able to produce at least a SAT formula. Unary produces SAT formulas that are on average 19.3\( \times \) larger than those produced by Unary-p. To answer Q5, as predicted by our theory, Unary-pproduces significantly smaller SAT queries than Unary.
Fig. 13. SAT formulas sizes and performance.
Performance vs. Size of SAT Queries. We also evaluate the correlation between synthesis time and size of SAT queries. Figure 13(b) plots the synthesis times of both solvers against the sizes of the SAT queries. It is clear that the synthesis time increases with larger SAT queries. The plot illustrates how the solving time strongly depends on the size of the generated formulas.
9 RELATED WORK
Program Sketching. Program sketching was designed to automatically synthesize efficient bit-vector manipulations from inefficient iterative implementations [24]. A program sketch is a program template where a finite number of constants (holes) are missing and the goal of the synthesizer is to find values for such holes that can make a set of assertions hold. While sketching is limited to synthesizing constants, it is easy to encode problems that involve synthesizing a finite set of expressions by using the holes as guards of if-then-else operators. The Sketch tool can in fact support complex language features and operations [22]. Thanks to its simplicity, sketching has found wide adoption in applications such as optimizing database queries [3], automated feedback generation [21], program repair [7], and many others. Our work further extends the capabilities of Sketch in a new direction by leveraging number theory results. In particular, our technique allows Sketch to handle sketches manipulating large integer numbers. To the best of our knowledge, our technique is the first one that can solve many of the benchmarks presented in this article.
Uses of Chinese Remainder Theorem. The Chinese Remainder Theorem and its derivative corollaries have found wide application in several branches of Computer Science and, in particular, in Cryptography [11, 29].
The idea of using modular arithmetic to abstract integer values has been used in program analysis. Since modular fields are finite, they can be used as an abstract domain for verifying programs manipulating integers [5, 16, 19], e.g., the abstract domain can track whether a number is even or odd. Our work extends this idea to the domain of program synthesis and requires us to solve several challenges. First, when used for verifying programs, the modular abstraction is used to overapproximate the set of possible values of the program and does not need to be precise. In particular, Clark et al. [5] allow program operations that are in the
Pruning Spaces in Program Synthesis. One of the key challenges in program synthesis is to tackle the large search space of possible programs [14] and many techniques have been proposed to efficiently prune large search spaces. Enumerative techniques [27] and version-space algebra synthesis techniques [12, 13, 20] enumerate programs in a search space and avoid enumerating syntactically and semantically equivalent terms.
Some synthesizers such as Synquid [18] and Morpheus [10] use refinement types and first-order formulas over specifications of DSL constructs to refute inconsistent programs. Recently, Wang et al. [28] proposed a technique based on abstraction refinement for iteratively refining abstractions to construct synthesis problems of increasing complexity for incremental search over a large space of programs.
We note that our work differs from these approaches because it targets program sketching, where the goal is identify constant values, instead arbitrary program synthesis. While sketching can be used to encode a number of synthesis problems, it limits the types of search spaces one can express and the ability to syntactically prune search spaces. We believe our technique could be used in tandem with existing program synthesis approaches to synthesize programs containing large constants, but this extension is beyond the scope of this article. In terms of technical difference, instead of pruning programs in the syntactic space as the aforementioned techniques do, our technique uses modular arithmetic to prune the semantic space (i.e., the complexity of verifying the correctness of the synthesized solution), while maintaining the syntactic space of programs. Our approach is related to that of Tiwari et al. [25], who present a technique for component-based synthesis using dual semantics where syntactic symbols in a language are provided two different semantics to capture different requirements. Our technique is similar in the sense that we also provide an additional semantics based on modular arithmetic. However, we formalize our analysis based on number theory results and develop it in the context of general-purpose Sketch programs that manipulate integer values, unlike Tiwari et al.’s work that is developed for straight-line programs composed of components.
Synthesis for Large Integer Values. Abate et al. propose Cegis(T), a modification of the Cegis algorithm for solving SyGuS problems with large constants [1]. Cegis(T) uses two nested loops. In an outer loop, it enumerates “template programs” that contain holes for the constants, while in an inner loop it relies on an SMT solver to find the valuations for these constants. The main novelty is that the inner loop can use techniques such as quantifier elimination to discard large sets of constants that cannot fill the holes and potentially prove that no constants exist.
Our approach is orthogonal to that of Abate et al. First, our work is focused on program sketching and not SyGuS. While the two ways of specifying synthesis problems have ways to be related, they are fundamentally different. Sketching limits the user to providing templates where only a finite number of constants are missing and by bounding many quantities (e.g., loop iterations, input sizes), but it allows one to use many complex programming constructs and retain practical synthesis. SyGuS allows the user to specify grammars of programs (which need to consist of terms in a given theory) from which the final program can be synthesized. Furthermore, existing SyGuS solvers are typically designed for specific theories (e.g., LIA). Many of the examples described in this article cannot directly be encoded in SyGuS, as they use complex control constructs (e.g., to encode loops one would require to explicitly unroll programs and operations such as non-linear integer arithmetic cannot be handled by existing SyGuS solvers). Dually, one cannot encode the infinite search spaces encoded by SyGuS grammar as a program sketch, making the two formalisms complementary. Both our approach and the one by Abate et al. were designed to address the inability of existing solvers to synthesize large constants, and they have different uses. Second, for Cegis(T) to work, the constants that need to be synthesized have to appear in specific places in the template, e.g., they cannot be the coefficients of linear terms. For our technique to work, the constants can appear as coefficients but cannot be manipulated using certain operators (e.g., division). In principle, the two approaches could be combined, and the Cegis(T) algorithm presented by Abate et al. could potentially make use of our technique to fill the holes in the templates they synthesize. This possible extension is beyond the scope of this article.
10 CONCLUSION
We presented a new technique for solving program sketches with large integer values. Our technique rewrites the sketches to operate in a different semantics that only manipulates small values. In particular, instead of tracking concrete integer values, the rewritten sketches only track the remainders of such values for an appropriate set of prime numbers. Using the Chinese Remainder Theorem, we showed that our technique is sound and yields correct solutions as long as the values manipulated by the program never exceed certain well-defined boundaries. We provide a dataflow analysis for detecting when program variables and values can be modeled using this modified semantics and implement our technique in Unary-p. The evaluation of our technique on 181 benchmarks that manipulate large integer values shows that our solver can solve 100 new benchmarks that the Sketch unary solver could not solve and is 15.9\( \times \) faster than the Sketch SMT-like integer solver on the hard benchmarks that take more than 10 seconds to solve.
This article opens new opportunity for encoding constraint problems involving large integer values. The proposed techniques are general and could be applied to other domains beyond program sketching. We leave investigating the effectiveness of the techniques on other domains (e.g., SMT solvers) as future work.
ACKNOWLEDGMENTS
We thank the anonymous reviewers for commenting on earlier drafts and Rong Pan for his contributions to this article.
Footnotes
1 In Sketch, holes can only assume positive values. This is why we need the sign holes, which are implemented using regular holes as follows: \( \texttt {if(??) then 1 else -1} \).
Footnote2 Our implementation also supports for-loops, recursion, C-style arrays (i.e., non-infinite arrays), and complex types.
Footnote3 We consider the simple subset for a clear presentation of the semantics, but our framework works for the full
FootnoteIMP language (and for more complex language constructs) as we will see in the later sections.4 Sketch does not currently implement a binary encoding, but we include this result for completeness and to show that applying our technique to a binary encoding is not as beneficial.
Footnote5 In the short version of this article published at ESOP20 [17], this version of Sketch was denoted with the name Binary. Since the encoding used by Native-Ints does not operate directly on a binary representation of numbers, we decided to use the name of the flag instead.
Footnote6 During our experiment, we observed that Native-Ints incorrectly reported UNSAT for 10 satisfiable benchmarks. We reported these benchmarks as timeouts and have contacted the authors of Sketch to address the issue.
Footnote
- [1] . 2018. Counterexample guided inductive synthesis modulo theories. In Proceedings of the International Conference on Computer-Aided Verification (CAV’18),
Lecture Notes in Computer Science . Springer.Google ScholarCross Ref
- [2] . 2013. Syntax-guided synthesis. In Proceedings of the Formal Methods in Computer-Aided Design (FMCAD’13). 1–8.Google Scholar
Cross Ref
- [3] . 2013. Optimizing database-backed applications with query synthesis. In Proceedings of the 34th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI’13). 3–14.Google Scholar
Digital Library
- [4] (Ed.). 2009. The Chinese Remainder Theorem. Springer, New York, NY, 253–281.
DOI: Google ScholarCross Ref
- [5] . 1994. Model checking and abstraction. ACM Trans. Program. Lang. Syst. 16, 5 (
Sept. 1994), 1512–1542.DOI: Google ScholarDigital Library
- [6] . 1977. Abstract interpretation: A unified lattice model for static analysis of programs by construction or approximation of fixpoints. In Proceedings of the 4th ACM SIGACT-SIGPLAN Symposium on Principles of Programming Languages (POPL’77). ACM, New York, NY, 238–252.
DOI: Google ScholarDigital Library
- [7] . 2016. Qlose: Program repair with quantitative objectives. In Proceedings of the International Conference on Computer-Aided Verification (CAV’16),
Lecture Notes in Computer Science , Vol. 9780. Springer, 383–401.Google ScholarCross Ref
- [8] . 2016. Polynomial invariants by linear algebra. In Automated Technology for Verification and Analysis, , , and (Eds.). Springer International Publishing, Cham, 479–494.Google Scholar
Cross Ref
- [9] . 2016. Estimates of \( \psi \),\( \vartheta \) for large values of x without the Riemann hypothesis. Math. Comput. 85, 298 (2016), 875–888.
DOI: Google ScholarCross Ref
- [10] . 2017. Component-based synthesis of table consolidation and transformation tasks from examples. In Proceedings of the 38th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI 2017). ACM, New York, NY, 422–436.
DOI: Google ScholarDigital Library
- [11] . 2000. The chinese remainder theorem and its application in a high-speed RSA crypto chip. In Proceedings of the 16th Annual Computer Security Applications Conference (ACSAC’00). IEEE Computer Society, Washington, DC, USA, 384–.Google Scholar
Cross Ref
- [12] . 2011. Automating string processing in spreadsheets using input-output examples. In Proceedings of the 38th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL’11). 317–330.
DOI: Google ScholarDigital Library
- [13] . 2012. Spreadsheet data manipulation using examples. Commun. ACM 55, 8 (2012), 97–105.
DOI: Google ScholarDigital Library
- [14] . 2017. Program synthesis. Found. Trends Program. Lang. 4, 1–2 (2017), 1–119.
DOI: Google ScholarCross Ref
- [15] . 2003. The Prime Number Theorem. Cambridge University Press.
DOI: Google ScholarCross Ref
- [16] . 1995. Residue BDD and its application to the verification of arithmetic circuits. In Proceedings of the 32nd Design Automation Conference. 542–545.
DOI: Google ScholarCross Ref
- [17] . 2020. Solving program sketches with large integer values. In Proceedings of the 29th European Symposium on Programming (ESOP’20),
Lecture Notes in Computer Science , (Ed.), Vol. 12075. Springer, 572–598.DOI: Google ScholarDigital Library
- [18] . 2016. Program synthesis from polymorphic refinement types. In Proceedings of the 37th ACM SIGPLAN Conference on Programming Language Design and Implementation, (PLDI’16). 522–538.
DOI: Google ScholarDigital Library
- [19] . 1996. Modular verification of multipliers. In Proceedings of the 1st International Conference on Formal Methods in Computer-Aided Design (FMCAD’96). Springer-Verlag, Berlin, 49–63.Google Scholar
Cross Ref
- [20] . 2016. Transforming spreadsheet data types using examples. In Proceedings of the 43rd Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL’16). 343–356.
DOI: Google ScholarDigital Library
- [21] . 2013. Automated feedback generation for introductory programming assignments. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI’13). 15–26.
DOI: Google ScholarCross Ref
- [22] . 2013. Program sketching. Int. J. Softw. Tools Technol. Transf. 15, 5–6 (2013), 475–495.
DOI: Google ScholarDigital Library
- [23] . 2008. Sketching concurrent data structures. In Proceedings of the ACM SIGPLAN 2008 Conference on Programming Language Design and Implementation. 136–148.
DOI: Google ScholarDigital Library
- [24] . 2006. Combinatorial sketching for finite programs. SIGOPS Operat. Syst. Rev. 40, 5 (
Oct. 2006), 404–415.DOI: Google ScholarDigital Library
- [25] . 2015. Program synthesis using dual interpretation. In Proceedings of the 25th International Conference on Automated Deduction (CADE’15). 482–497.
DOI: Google ScholarCross Ref
- [26] . 2014. A lightweight symbolic virtual machine for solver-aided host languages. In Proceedings of the 35th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI’14). ACM, New York, NY, 530–541.
DOI: Google ScholarDigital Library
- [27] . 2013. TRANSIT: Specifying protocols with concolic snippets. In Proceedings of the 34th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI’13). 287–296.Google Scholar
Digital Library
- [28] . 2018. Program synthesis using abstraction refinement. Proc. ACM Program. Lang. 2 (2018), 63:1–63:30.
DOI: Google ScholarDigital Library
- [29] . 2003. RSA speedup with chinese remainder theorem immune against hardware fault cryptanalysis. IEEE Trans. Comput. 52, 4 (
April 2003), 461–472.DOI: Google ScholarDigital Library
Index Terms
Solving Program Sketches with Large Integer Values
Recommendations
Solving Program Sketches with Large Integer Values
Programming Languages and SystemsAbstractProgram sketching is a program synthesis paradigm in which the programmer provides a partial program with holes and assertions. The goal of the synthesizer is to automatically find integer values for the holes so that the resulting program ...
A Branch-and-Bound Algorithm to Solve Large Scale Integer Quadratic Multi-Knapsack Problems
SOFSEM '07: Proceedings of the 33rd conference on Current Trends in Theory and Practice of Computer ScienceThe separable quadratic multi-knapsack problem (<em>QMKP</em>) consists in maximizing a concave separable quadratic integer (non pure binary) function subject to <em>m</em>linear capacity constraints. In this paper we develop a branch-and-bound ...



















Comments