Optimal Program Synthesis via Abstract Interpretation

We consider the problem of synthesizing programs with numerical constants that optimize a quantitative objective, such as accuracy, over a set of input-output examples. We propose a general framework for optimal synthesis of such programs in a given domain specific language (DSL), with provable optimality guarantees. Our framework enumerates programs in a general search graph, where nodes represent subsets of concrete programs. To improve scalability, it uses A* search in conjunction with a search heuristic based on abstract interpretation; intuitively, this heuristic establishes upper bounds on the value of subtrees in the search graph, enabling the synthesizer to identify and prune subtrees that are provably suboptimal. In addition, we propose a natural strategy for constructing abstract transformers for monotonic semantics, which is a common property for components in DSLs for data classification. Finally, we implement our approach in the context of two such existing DSLs, demonstrating that our algorithm is more scalable than existing optimal synthesizers.


INTRODUCTION
Due to their interpretability, robustness, and data-efficiency, there is recent interest in synthesizing programs to solve data processing and querying tasks, including handling semi-structured and unstructured data such as images and natural language text.Examples include neurosymbolic programs that incorporate deep neural network (DNN) components to extract semantic information from raw data [Chen et al. 2021;Shah et al. 2020], as well as fuzzy matching programs that use predicates with quantitative semantics to approximately match real-valued data [Mell et al. 2023].For instance, Shah et al. [2020] synthesizes programs that label sequence data, Mell et al. [2023] synthesizes queries over trajectories output by an object tracker, and Chen et al. [2021] synthesizes web question answering programs.Most work focuses on programming by example (PBE), where the user provides a set of input-output (IO) examples, and the goal is to synthesize a program that generates the correct output for each input.There are two key properties that distinguish synthesis of such programs from traditional PBE: • Quantitative objectives: The goal in neurosymbolic synthesis is typically to optimize a quantitative objective such as accuracy or 1 score rather than to identify a program that is correct on all examples (which may be impossible).
• Numerical constants: Programs operating on fuzzy real-world data or the outputs of DNN components typically include real-valued constants that serve as thresholds; for example, when querying video trajectories, one constant may be a threshold on the maximum velocity of the object.
While these properties occasionally arise in traditional PBE settings (e.g., minimizing resource consumption), they are fundamental issues in neurosymbolic synthesis.Furthermore, in the neurosymbolic setting, there is often additional structure that can be exploited to improve synthesis performance-for instance, some of the numerical components might be monotone in their inputs.
Most existing systems focus on synthesizing examples in a particular domain-specific language (DSL).In these settings, prior work has leveraged monotonicity of the semantics to prune the search space [Chen et al. 2021;Mell et al. 2023].One general framework is Shah et al. [2020], which uses neural relaxations to guide search over a general DSL.At a high level, they use * search to enumerate partial programs in the DSL, which are represented as a directed acyclic graph (DAG).In general, * search prioritizes the order in which to enumerate partial programs based on a score function (called a heuristic) that maps each partial program to a real-valued score.When the heuristic is admissible-i.e., its output is an upper bound on the objective value for any completion of that partial program (assuming the goal is to maximize the objective)-then * search is guaranteed to find the optimal program if it terminates. 1hah et al. [2020] proposes the following heuristic: fill each hole in the partial program with an untrained DNN, and then maximize the quantitative objective as a function of the DNN parameters using resulting objective value as the heuristic.However, this score function is only guaranteed to be admissible under assumptions that typically do not hold in practice: (i) the neural relaxations are sufficiently expressive to represent any program in the DSL, which requires very large DNNs, and (ii) maximization of the DNN parameters converges to the global optimum, which does not hold for typical strategies such as stochastic gradient descent (SGD).Furthermore, SGD cannot handle non-differentiable objectives, which include common objectives such as accuracy and 1 score.
Thus, a natural question is whether we can construct practical heuristics that are guaranteed to be admissible.In this work, we take inspiration from deduction-guided synthesis, which uses automated reasoning techniques such as SMT solvers [Bornholt et al. 2016;Gulwani et al. 2011;Solar-Lezama et al. 2006] or abstract interpretation [Cousot and Cousot 1977] to prune partial programs from the search space-i.e., prove that no completion of a given partial program can satisfy the given IO examples.In particular, we propose using abstract interpretation to construct heuristics for synthesis for quantitative objectives.Traditionally, abstract interpretation can be used to prune partial programs by replacing each hole with an abstract value overapproximating all possible concrete values that can be taken by that hole in the context of a given input.Then, if the abstract output does not include the corresponding concrete output, that partial program cannot possibly be completed into a program that satisfies that IO example, so it can be pruned.
Our key insight is that abstract interpretation can similarly be used to construct an admissible heuristic for a quantitative objective.Essentially, we can use abstract interpretation to overapproximate the possible objective values obtained by any completion of a given partial program; then, the supremum of concrete values represented by the abstract output serves as an upper bound of the objective, so it can be used as an admissible heuristic.Thus, given abstract transformers for the DSL components and for the quantitative objective, our framework can synthesize optimal programs.
In addition, we propose general strategies for constructing abstract domains and transformers for common DSLs and objectives.As discussed above, many DSLs have monotone components.In Optimal Program Synthesis via Abstract Interpretation 16:3 Fig. 1.A frame from a video of two mice interacting [Sun et al. 2021]; the mice are very close together, and are exhibiting the "sniff" behavior.The video has been processed using deep neural networks to produce certain keypoints, which are shown.
these settings, a natural choice of abstract domain is to use intervals for the real-valued constants; then, a natural abstract transformer is to evaluate the concrete semantics on the upper and lower bounds of the intervals.This strategy can straightforwardly be shown to correctly overapproximate the concrete semantics.
We implement our approach in the context of two DSLs-namely, the NEAR [Shah et al. 2020] DSL for the CRIM13 [Burgos-Artizzu et al. 2012] dataset, and the Quivr [Mell et al. 2023] DSL and benchmark.In our experiments, we demonstrate that our approach significantly outperforms an adaptation of Metasketches [Bornholt et al. 2016]-an existing optimal synthesis framework based on SMT solving-to our setting, as well as an ablation that uses breadth first search instead of * search.Our approach significantly outperforms both of these baselines in terms of running time.In summary, our contributions are: • We propose a novel algorithm for optimal synthesis which performs enumerative search over a space of generalized partial programs.To prioritize search, it uses the * algorithm with a search heuristic based on abstract interpretation.If it returns a program, then that program is guaranteed to be optimal (Section 3).• In practice, many DSLs have types that are equipped with a partial order, such as the real numbers or Booleans.For these types, we propose to use intervals as the abstract domains.For monotone DSL components-i.e., the concrete semantics respects the partial orders-a natural choice of abstract transformer is to simply apply the concrete semantics to the lower and upper bounds of the interval (Section 4).• We implement our framework in the context of two existing DSLs (Section 5) and show in our experiments that it outperforms Metasketches [Bornholt et al. 2016], a state-of-the-art optimal synthesis technique based on SMT solvers, and a baseline that uses breadth-first search instead of our search heuristic (Section 6).

MOTIVATING EXAMPLE
We consider a task where the goal is to synthesize a program for predicting the behavior of mice based on a video of them interacting [Shah et al. 2020], motivated by a data analysis problem in biology.In particular, biologists use mice as model animals to investigate both basic biological processes and to develop new therapeutic interventions, which sometimes requires determining the effect of an intervention on mouse behavioral patterns, such as the nature and duration of interactions with other mice.For example, Figure 1 depicts two mice in an enclosure, engaging in the "sniff" behavior.
Doing this behavior analysis typically involves researchers viewing and manually annotating these behaviors in hours of video, which is very labor intensive.As a result, program synthesis has been applied to automating this task [Shah et al. 2020].Their approach first uses an object tracker to track each mouse across frames in the video, producing trajectories represented as a sequence of 2D positions for each mouse.Then, they featurize each step in the trajectory-for instance, if there are two mice in the video, then one feature might be the distance between them in each frame.Based on this sequence of features, the goal is to predict a label for the behavior (if any) that the mice are engaged in during each frame (producing a sequence of labels, one for each frame).Shah et al. [2020] solves this problem by synthesizing a program in a functional DSL for processing trajectories (the NEAR DSL).In summary, the goal is to synthesize a program that takes as input a sequence of feature vectors, and outputs a sequence of labels.We consider the programming by example (PBE) setting, where we are given a number of human annotated examples, and the goal is to synthesize a program that maximizes some objective, such as accuracy or 1 score.
For example, consider synthesizing a program that, given a featurized trajectory representing two mice in a video, outputs the behavior of the mice at each time step.In particular, the input is a trajectory ∈ R * (where * denotes lists of s), where the feature [ ] ∈ R on time step encodes the distance between the two mice at that time step, and the output is ∈ {f, t} * , where [ ] encodes whether the mice are engaging in the "sniff" behavior ( = t if so, and = f if not) at time step .
We consider synthesizing such a program based on a single training example 1 = (101, 65) and 1 = (f, t) (i.e., the first frame has mouse distance 101 and is labeled "not sniff" and the second frame has mouse distance 65 and is labeled "sniff").Our goal is to find some program that classifies a dataset of videos well-i.e., if we evaluate the program on the videos to get predicted labels, and then compute a classification metric (e.g.accuracy) between the predicted and true labels, the discrepancy should be small.Given a program , its accuracy is (or vice versa), and 0 otherwise (where [ ] is the -th item in a sequence ).
Consider the candidate program "map( ≤ 50)".In traditional syntax, this program would be "map ( .≤ 50) ", but here the input is specified separately and the omitted (in combinatorstyle, similar to the NEAR DSL).This program performs a map over the sequence of frames, and for each one, it would output whether the mouse distance in that frame is less than or equal to 50.For example, when evaluated on 1 , it outputs Thus, its accuracy is 1/2, since it correctly labels the first frame but not the second.
One strategy for computing the optimal program (i.e. the one that maximizes the objective) is to enumerate partial programs (i.e.programs with holes representing pieces that need to filled to obtain a concrete program) in the DSL, evaluate the objective on every concrete program, and then choose the best one.There are several challenges to this approach: • Unlike traditional synthesis, where we can stop enumerating when we reach a concrete program that satisfies the given specification, for optimal synthesis, we need to enumerate all programs or risk returning a suboptimal program.• Traditional synthesizers use a variety of techniques to prune the search space to improve scalability.For instance, they might use deduction to prove that no completion of a partial program can satisfy the specification-i.e., no matter how the holes in the partial program are filled, the specification will not hold.However, these techniques are not directly applicable to optimal synthesis.
• Synthesizing real-valued constants poses a problem: one approach is to discretize the constants, enumerate all of them, and choose the best, but this approach can be prohibitively slow.For example, suppose we are enumerating completions of the partial program map( ≤ ??), where ?? is a hole that needs to be filled with a real value ∈ [0, 100] to obtain a concrete program.If we discretize ∈ {0, 1, ..., 100}, then we would enumerate map( ≤ 0), ..., map( ≤ 100), evaluate each of these on ( 1 , 1 ) to measure its accuracy, and choose the program with the highest accuracy.Our framework uses two key innovations to address these challenges: • Generalized partial programs: Our framework takes the traditional notion of partial programs, representing sets of concrete programs as completions of syntax with holes, and extends it to more general sets of programs, equipped with a directed acyclic graph (DAG) structure.This allows us to avoid discretization of real-valued constants and reduces the branching-factor of the search space.• A * search: Rather than enumerate programs in an arbitrary order (e.g., breadth first search), our framework uses * search to enumerate programs.This allows us to avoid considering all programs in the search space, similar to deductive pruning, while still returning the optimal program.We describe each of these techniques in more detail below.
Generalized Partial Programs.Traditionally, the search space over partial programs is a DAG, where the nodes are partial programs, and there is an edge ˆ → ˆ ′ if ˆ ′ can be obtained by filling a hole in ˆ using some production in the DSL.For instance, there is an edge map( ≤ ??) → map( ≤ 50) in this DAG, since we have filled the hole with the value 50.However, even if we discretize the search space, there are 101 ways to fill this hole.As a consequence, if even a single completion of map( ≤ ??) is valid, then we cannot prune it from the search space (even ignoring that we want the optimal program rather than just any valid program).
Instead, in our framework, we allow search DAGs beyond just programs with holes, so long as each node represents a set of concrete programs and the node's children collectively represent that set.As a practical instantiation of the general framework, we consider partial programs where holes for real-valued constants may be annotated with constraints on the value that can be used to fill them.For instance, the generalized partial program map( ≤ ?? [50,100] ) Then, the children of this generalized partial program in the search DAG should split the constraint in a way that covers the search space-e.g., children(map( ≤ ?? [50,100] )) = {map( ≤ ?? [50,75] ), map( ≤ ?? [75,100] )}.
This strategy presents more opportunities for pruning the search space.For instance, even if we cannot prune the program map( ≤ ?? [50,100] ), we may be able to prune the program map( ≤ ?? [50,75] ).Then, rather than needing to enumerate 51 programs (i.e., one for each ∈ {50, ..., 100}), we would only need to prune map( ≤ ?? [50,75] ) and evaluate 26 programs (i.e., one for each ∈ {75, ..., 100}).Of course, we can further subdivide the search space to further reduce enumeration.* Search.Next, we describe how we achieve the optimal synthesis analogue of pruning by using * search.At a high level, * search enumerates nodes in a search graph according to a heuristic; for our purposes, a heuristic is function that maps a partial program ˆ to a real value , and a heuristic is said to be admissible if it is an upper bound on the best possible objective value for any completion of ˆ -i.e., ∈ completions( ˆ ) ⇒ ≥ objective( ).
The heuristic adapts deductive reasoning to optimal synthesis: whereas deductive reasoning guarantees that no completion of ˆ can satisfy the given specification, the heuristic guarantees that no completion of ˆ can achieve objective value greater than -e.g., if we find a concrete program with objective value ≥ , we can safely prune completions of ˆ from the search DAG.
While * search has previously been used in synthesis for quantitative objectives [Shah et al. 2020], the heuristics used are not admissible and so do not provide theoretical guarantees.Our key contribution is showing that abstract interpretation can naturally be adapted to design admissible heuristics.Abstract interpretation can be used for traditional deductive reasoning as follows: fill each hole in the current partial program with an abstract value ⊤, evaluate the partial program using abstract semantics, and check if the abstract output is consistent with the specification.
In our example, a natural choice of abstract domain for constants is the interval domain.In addition, rather than fill each hole with ⊤, if a hole has a constraint ?? [ , ] , we can instead fill it with the interval [ , ].For instance, for our example program map( ≤ ?? [50,75] ), we can fill the hole with the interval [50, 75] to obtain map( ≤ [50, 75]).Then, the abstract semantics ⟦•⟧ # evaluate as follows: (1) In other words, the first element is f since 1 = (101, 65), and we know 101 ̸ ≤ for any ∈ [50, 75], and the second element is ⊤ since the relationship between 65 and ∈ [50, 75] can be either f or t.
Importantly, here the abstract values are over holes in the program rather than over the input to it.Traditionally, we would then check whether this abstract output is consistent with the specification.For optimal synthesis, we observe that we can define an abstract semantics for the objective function.In our example, we can compute an "abstract accuracy" as follows (where 1 is a indicator function, and ≤, 1, +, •, and = are abstracted in the obvious way): In other words, for the first frame, the concrete programs represented by map( ≤ ?? [50,75] ) all predict f, which equals 1 [1], so this frame is always correctly classified.In contrast, for the second frame, concrete programs programs may output either f or t, so we are uncertain whether this frame is correctly classified.Thus, the true accuracy is in the interval [1/2, 1].Since abstract interpretation is guaranteed to overapproximate the semantics, we can use the upper bound of the abstract objective value as our heuristic-e.g., for map( ≤ [50, 75]), this heuristic computes = 1.
The abstract accuracies of ˆ 1 and ˆ 2 are [1/2, 1/2] and [1/2, 1], so their heuristic values are 1 = 1/2 and 2 = 1, respectively.Since 2 > 1 , our algorithm explores ˆ 2 next, splitting it into ˆ 3 = map( ≤ ?? [50,75] ) and ˆ 4 = map( ≤ ?? [75,100] ), which have abstract accuracies of [1/2, 1] and [1, 1], respectively, and heuristic values of 3 = 1 and 4 = 1, respectively.In this example, the lower bound on the accuracy of ˆ 4 is also 1, so we know that any choice ∈ [75, 100] for filling this hole is guaranteed to achieve an accuracy of 1; thus, any concrete program map( ≤ ) such that ∈ [75, 100] is optimal, and our algorithm can terminate without ever considering ˆ 1 or ˆ 3 .In general, our algorithm terminates once the range of possible optimal values is sufficiently small.For each node on the search frontier, we have an upper and lower bound on the objective value of all the programs it represents.The greatest of these lower bounds provides a lower bound on the best possible objective value, and the greatest of these upper bounds provides an upper bound on the best possible objective value.As a result, once the difference between the bounds is ≤ , we know that we have a program within of being optimal.

OPTIMAL SYNTHESIS VIA ABSTRACT INTERPRETATION
In this section, we consider the program synthesis problem where: (i) programs in the domainspecific language may have real-valued constants, and (ii) the synthesis objective is real-valued, where the goal is to return the optimal program (Section 3.1).Then, we describe our synthesis algorithm for solving this problem, which uses * search in conjunction with a search heuristic based on abstract interpretation (Section 3.2).

Problem Formulation
Domain-Specific Language.For concreteness, consider a DSL whose syntax is given by a context-free grammar G = ( , Σ, , ), where is the set of nonterminals, Σ is the set of terminals, ∈ is the start symbol, and is the set of productions where is a symbol representing the input, ∈ C is a constant (including real-valued constants, i.e., R ⊆ C), and ∈ F is a DSL component (i.e., function), but our framework extends straightforwardly to more general grammars.We let ∈ P = L (G) ⊆ Σ * denote the concrete programs in our DSL.Furthermore, we assume the DSL has denotational semantics ⟦•⟧, where ⟦ ⟧ : X → Y maps inputs ∈ X to outputs ∈ Y according to the following rules: where we assume the functions : X 1 × ... × X → Y are given.
In Section 2, we considered programs like map( ≤ 50), which we will use as a running example in this section.They are simplified versions of programs from the NEAR DSL, and are generated by the grammar (where represents the input) Task Specification.We consider programming by example (PBE), where each task is specified by a sequence ∈ Z ≔ (X × Y) * of input-output (IO) examples, and the goal is to compute a program * that maximizes a given quantitative objective : is a function of the semantics applied to the examples ( , ) ∈ -i.e., there is a function In our running example, we choose 0 as follows, so ( , ) is the accuracy of on : where 1[•] is the indicator function.
Partial Programs.A common strategy for PBE is to enumerate partial programs-programs in the DSL that have holes-to try and find a concrete program that satisfies the given IO examples.
Intuitively, partial programs are partial derivations in the grammar G.To formalize this notion, given two sequences ˆ , ˆ ′ ∈ (Σ ∪ ) * , we write ˆ → ˆ ′ if ˆ ′ can be obtained by replacing a nonterminal symbol ∈ in ˆ with the right-hand side of a production = → 1 ... ∈ for that nonterminal-i.e.ˆ = 1 ... ... ℎ and ˆ ′ = 1 ... 1 ... ... ℎ .We denote this relationship by ˆ ′ = fill( ˆ , , )-i.e.we obtain ˆ ′ by filling the th hole in ˆ using production .Next, we write ˆ * − → ˆ ′ if there exists a sequence ˆ = ˆ 1 → ... → ˆ = ˆ ′ , and say ˆ ′ can be derived from ˆ .Note that concrete programs ∈ P are sequences ∈ Σ * that can be derived from the start symbol (i.e.* − → ).Similarly, a partial program is a sequence ˆ ∈ P ≔ (Σ ∪ ) * that can be derived from -i.e.* − → ˆ .The only difference is that ˆ may contain nonterminals, which are called holes.The space of partial programs naturally forms a directed acyclic graph (DAG) via the relation ˆ → ˆ ′ ; note that concrete programs are leaf nodes in this DAG.Thus, we can perform synthesis by enumerating partial programs according to the structure of this DAG.Furthermore, given a partial program ˆ ∈ P and a concrete program ∈ P, we say is a completion of ˆ if ˆ * − → (i.e., can be obtained from ˆ by iteratively filling the holes of ˆ ).In our running example, traditional partial programs correspond to the grammar

Generalized Partial Programs.
A key challenge is searching over real-valued constants ∈ R. Our grammar contains infinitely many productions of the form → for ∈ R, and even if we discretize this search space, the number of productions is still large in practice.
To address this challenge, we propose a strategy where we enumerate generalized partial programs ˆ ∈ P, which generalize (i) the fact that partial programs correspond to sets of concrete programs (i.e. the set of their completions), and (ii) the DAG structure of partial programs.Definition 3.1.A space of generalized partial programs is a set P together with a concretization function : P → 2 P and a DAG structure children : P → 2 P , such that Intuitively, ( ˆ ) ⊆ P is the set of concrete programs represented by the abstract program ˆ .In addition, children encodes a DAG structure on P that is compatible with -i.e., the children ˆ ′ of ˆ must collectively contain all the concrete programs in ˆ .
For example, to capture traditional partial programs, we let i.e. is a completion of ˆ , and In Section 4, we will propose generalized partial programs that can include constraints on realvalued holes-e.g., ?? [0,1] is a partial program that can only be filled by a real value ∈ [0, 1].
Finally, a simple way to satisfy Definition 3.1 is to define based on the children function-i.e., we can define ∈ ( ˆ ) if there exists a sequence ˆ = ˆ 1 , ..., ˆ = of generalized partial programs such that ˆ +1 ∈ children( ˆ ) for all 1 < < .In other words, we can reach the concrete program from the generalized partial program ˆ in the search DAG.This strategy straightforwardly guarantees (4), since by definition, every ∈ ( ˆ ) must be the descendant of some child of ˆ .
In our running example, the generalized partial programs correspond to the grammar while the concretization function satisfies Abstract Objective.For now, we consider an abstract objective ⟦•⟧ # that directly maps generalized partial programs to abstract real values; it is typically constructed compositionally by providing abstract transformers for each component ∈ F as well as for the objective function 0 , and then composing them together.In particular, the abstract objective has type ⟦ ˆ ⟧ # : Z → R, where R is an abstract domain for the reals representing the potential objective values ( , ) (e.g. the interval domain).Rather than require a concretization function for R, we only need an upper bound for this abstract domain-i.e., a function : R → R, which encodes the intuition that " ( ˆ ) is larger than any real number contained in ˆ ".
Definition 3.2.Given objective and generalized partial programs ( P, , children), an abstract objective and Intuitively, (5) says that (⟦ ˆ ⟧ # ( )) is an upper bound on the objective value ( , ) for concrete programs is contained in the abstract program ˆ .In addition, (6) says that for concrete programs, the abstract objective and concrete objective coincide.
Finally, a typical choice of R is the space of intervals R, where ( , ′ ) ∈ R represents the set of real numbers { ′′ ∈ R | ≤ ′′ ≤ ′ } (see Definition 4.1 for details).Then, the upper bound is Algorithm 1 Our algorithm takes as input a task specification , along with a DSL G, a space of generalized programs ( P, ), abstract objective (⟦•⟧ # , ), objective lower bound , objective error tolerance , and returns the optimal program * for task .To do so, it uses the abstract objective as a heuristic in * search, starting from the initial generalized partial program ˆ 0 .
In our running example, the objective decomposes according to Equation 3 into a semantics ⟦•⟧ and accuracy 0 .We thus define the abstract objective ⟦•⟧ # in terms of an abstract semantics ⟦•⟧ # : P → R (Equation 1), and abstract accuracy (Equation 2).

* Synthesis via Abstract Interpretation
Given a set of IO examples, Algorithm 1 uses * search over generalized partial programs ˆ ∈ P in conjunction with the heuristic ˆ ↦ → (⟦ ˆ ⟧ # ( )) to compute the optimal program.In particular, it uses a heap ℎ to keep track of the generalized partial program ˆ in the frontier of the search DAG; at each iteration, it pops the current best node ˆ , and then enumerates the children ˆ ′ of ˆ and adds them to ℎ according to the heuristic.Termination occurs when the maximum over ˆ ∈ ℎ of the objective lower bound : R → R (analogous to , but lower-bounding rather than upper-bounding the abstract objective) is within a tolerance ≥ 0 of the maximum over ˆ ∈ ℎ of the upper bound , returning the generalized partial program with the highest lower-bound.When when the objective abstraction R is real intervals, (( , ′ )) = is a natural choice.
Finally, we observe that, for the abstract objective intervals in the heap ( ), the distance between the greatest lower bound (1) and greatest upper bound (1) is 0, and so search terminates, returning ˆ 4 .

□
We can ensure termination straightforwardly by using a finite DAG as the search space; for instance, we can do so by discretizing the real-valued constants.In general, we can guarantee convergence when the abstract losses of every infinite chain converge.For example, this property holds when the objective is Lipschitz continuous in the real-valued program parameters.(The gap between the upper and lower bounds of the objective value is bounded by the Lipschitz constant times the diameter of the box, which goes to zero as search proceeds and the boxes become smaller.)However, Lipschitz continuity often does not hold; we leave exploration of alternative ways to ensure convergence to future work.

INSTANTIATION FOR INTERVAL DOMAINS
Section 3 described a general framework for optimal synthesis when given an abstract semantics and search DAG for the target DSL.In this section, we describe a natural strategy for constructing abstract semantics when the DSL types are partially ordered.We begin by showing how to construct abstract domains for partially ordered types (Section 4.1), abstract semantics for individual components with monotone semantics (Section 4.2), and abstract transformers for monotone objectives (Section 4.3).Next, we describe a space of partial programs where holes corresponding to real-valued constants are optionally annotated with interval constraints (Section 4.4).Finally, we show how to combine abstract transformers and objectives to perform abstract interpretation for our interval-constrained partial programs (Section 4.5).

Interval Domains from Partial Orders
Many DSLs have the property that their types are equipped with a partial order-e.g. the usual order on the real numbers, or the order f ≤ t on the Booleans.

Interval Transformers for Monotone Functions
Next, we define our notion of abstract transformer for the interval domain.DSLs often contain functions that are monotone, respecting the partial orders between their input and output types.
Definition 4.2.Consider a function : X 1 × ... × X → Y, and where X 1 , ..., X and Y are partially ordered sets.We say is monotone if For example, the + operator is monotone in both of its inputs.

Partial Programs with Interval Constraints
Next, we describe a space of generalized partial programs which extend partial programs with hole annotations that constrain the values that can be used to fill those holes.
Definition 4.5.Assume that the space of constants C is partially ordered, and let Ĉ be its interval domain.Then, an interval-constrained partial program ˜ = ( ˆ , ) is a partial program ˆ together with a mapping : holes( ˆ ) → Ĉ ∪ {∅}, where holes( ˆ ) ⊆ N are the indices of the holes in ˆ .
For example, suppose that ˆ = 1 ... ... ℎ is a partial program, where is a nonterminal (and therefore a hole), so ∈ holes( ˆ ).Intuitively, an annotation ( ) = ˆ imposes the constraint that the value used to fill must be a constant ∈ C, and that must satisfy ∈ ( ˆ ) (i.e., is contained in the interval [ 0 , 1 ]).Alternatively, if ( ) = ∅, then no such constraint is imposed-i.e., may be filled with any constant or a different production in the grammar.(Note that ∅ is not the same as providing the interval [−∞, +∞] because that constraint would require this hole to be filled by a constant, whereas the optimal program might need a non-constant expression.)We let P denote the space of interval-constrained partial programs.
These constraints are imposed by the structure of the search DAG, which is a extended version of the search DAG over partial programs.Below, we first describe the children function for intervalconstrained partial programs; the concretization function is constructed from the children function.
Children Function.Next, we describe the children of an interval constrained partial program children( ˜ ) ⊆ P. Intuitively, if a hole in a partial program ˆ can be filled with a constant value ∈ C (i.e., there is a production → ) to obtain ˆ ′ , then ˆ ′ ∈ children( ˆ ).In contrast, for an interval-constrained partial program ˜ , we include a child annotating with [−∞, ∞].Then, subsequent children can further split interval constraints to obtain finer-grained interval constraints (the concrete constant value can be represented by the constraint ( , ) ∈ Ĉ).To formalize this notion, we first separate out productions for constants.Definition 4.6.A production ∈ is constant if it has the form = → for some ∈ C. Now, we can partition into the set C of constant productions and its complement ˆ = \ C , which we call non-constant productions.Next, given an interval-constrained partial program ˜ = ( ˆ , ), its holes are simply the holes of the underlying partial program ˆ : holes( ˜ ) = holes( ˆ ).Then, we partition these holes into unannotated holes and annotated holes-i.e., respectively.Finally, we define three kinds of children for an interval-constrained partial program ˜ .First, we include children obtained by filling an unannotated hole with a non-constant production: These children are the same as the children constructed in the original search DAG over partial programs.Here, repair( ; ˆ , , ) "repairs" by accounting for how the indices in ˆ change after applying production to fill hole in ˆ .In particular, this operation changes the indices of nonterminals in ˆ ; ′ accounts for these changes without modifying the annotations themselves.Formally, if ˆ ′ = fill( ˆ , → 1 ... ℎ , ), with ˆ = 1 ... ... and ˆ ′ = 1 ... 1 ... ℎ ... , then we have In particular, ′ includes the same annotations as .
Second, we consider children obtained by filling an unannotated hole with the interval [−∞, ∞]: In other words, the partial program ˆ remains unchanged, but we introduce an annotation onto one of the previously unannotated holes of ˜ .Third, we consider children obtained by replacing an annotation with a tighter annotation: Here, subset( ˆ , ˆ ′ ) checks whether 1 , and one of these inequalities is strict; equivalently, . Finally, our overall search DAG is defined by the union of these children: Note that by defining children in this way, there may be infinitely many children; in addition, multiple children may cover the same concrete program, leading to redundancy in the search DAG.Practical implementations can restrict to a finite subset of these children as long as they satisfy (4)-i.e., the union of the concrete programs in the children of ˜ cover all concrete programs in ˜ .
In addition, these children are ideally chosen so the overlap is minimal.

Interval Transformers for Partial Programs with Interval Constraints
Next, we describe how to implement abstract interpretation for partial programs with interval constraints.While abstract interpretation is typically performed with respect to program inputs, in our case it is with respect to program constants.We assume all components ∈ F have an abstract transformer ˆ , and the objective 0 has an abstract transformer ˆ 0 (if they are monotone, their abstract transformers can be defined as in Sections 4.2 and 4.3).
First, we modify the grammar of programs so that the constants C are replaced by abstract values ( 0 , 1 ) ∈ Ĉ.While the concrete semantics cannot be applied to these programs, we will define abstract semantics for them.Now, given a generalized partial program ˜ = ( ˆ , ), for each unannotated hole ∈ holes ∅ ( ˜ ), we replace the corresponding nonterminal in ˆ with the abstract value (−∞, ∞), and for each annotated hole ∈ holes C ( ˜ ), we replace the corresponding nonterminal with the annotation ( ).Finally, we replace any constant in ˆ with the abstract value ( , ).Once we have performed this transformation, we can define the following abstract semantics for ˆ : In other words, we apply the abstract semantics ⟦ ˆ ⟧ # to each input , obtain the corresponding abstract output ˆ ′ , and apply ˆ 0 to the resulting set {( ˆ ′ , )}.

IMPLEMENTATION
We instantiate our framework for two different domain specific languages (DSLs): • NEAR DSL: The NEAR language for the CRIM dataset (Section 5.1).The motivating example described in Section 2 is derived from this setting.• Quivr DSL: A query language over video trajectories, which uses constructs similar to regular expressions to match trajectories (Section 5.2).Although both of these DSLs process sequence data, their computation models are quite different: NEAR focuses on folding combinators over the inputs, whereas Quivr's primary operations reduce to matrix multiplication.
For both DSLs, we refine the definition of children compared to Equation 8.In particular, in Section 4.4, when defining the search space over programs with interval constraints on holes, children C is defined such that the children of [ , ] are all of its strict subintervals.As discussed there, this means that each node may have infinitely many children, which is impractical for an implementation.Instead, our implementation splits intervals into just two children-i.e., the children of [ , ] are [ , ( + )/2] and [( + )/2, ].These intervals partition [ , ], so all concrete programs are still contained in the search space, retaining soundness.This splits an interval into child intervals of equal length, but with additional domain-knowledge, other choices could be made (e.g. with a prior distribution over parameters, splitting into intervals of equal probability mass).
Finally, we also describe how we construct the abstract transformer for the 1 score, which is commonly used as the objective function in binary classification problems (Section 5.3).

NEAR DSL for Trajectory Labeling
In the NEAR DSL [Shah et al. 2020], the program input is a featurized trajectory, which is a sequence of feature vectors ∈ (R ) * , where is the dimension of the feature vector for each frame.The output is a sequence of labels ∈ {t, f } * of the same length as the input, where [ ] = t if a frame exhibits the given behavior and [ ] = f otherwise (i.e. the task is binary classification at the frame level).
Syntax.This DSL has three kinds of expressions, encoded by their corresponding nonterminal in G: (i) represents functions mapping feature vectors to real values (e.g. the body of map), (ii) ℓ represents functions mapping lists to real values (e.g.fold), and (iii) ℓℓ represents functions mapping lists to lists (e.g.map).In particular, this DSL has the following productions: The DSL syntax is in the combinatory style, so s are omitted.In particular, the expressions encoded by are combinators designed to be used within a higher-order function such as map or fold.These combinators are applied to individual elements of the input list, where is a variable representing the th element of the current feature vector = [ ], and is a special symbol used inside fold to represent its accumulated running state.The start symbol is ℓℓ.The DSL is structured such that a list of feature vectors is mapped to a list of real values of the same length.Each real-valued output [ ] implicitly encodes the label The running example involved programs in a toy DSL of the form map( ≤ ).In the NEAR DSL, this would be represented as map(−1 • 4 + ), as "distance between mice" is the 4th feature, and the label will be obtained by comparing the program output to 0.
Finally, as described above, the labels are obtained by thresholding ⟦ℓℓ⟧(ℓ).That is, we let ⟦ℓℓ⟧ : → {t, f } * denote the label Abstract Semantics.We abstract R with the usual real intervals, R. We abstract with ( R) * , products of real intervals.Addition is monotone, and multiplication • , and thus can abstract it using addition, multiplication, and the abstraction sending Search Space.The search space over has a redundancy due to commutativity, associativity, and distributativity, which unduly hinders search.Instead, we use a normalized version, where expressions are constrainted to be sums-of-products (fully distributed), and where we ignore commutativity and associative in the sums of products.Essentially, we only consider polynomials, where the variables are the features , the fold variable , and indicator variables 1[ ≥ 0] for each (to maintain the expressiveness of ite).Further, we consider only constants in [−1, 1] and we also normalize the dataset so each feature is in We define a notion of size for programs, so that we can bound the search space, where ite, map and mapprefix have size 1, fold has size 0 (since mapprefix must contain a fold and already has size 1).Each monomial in the polynomial has size 1, and each polynomial variable in a monomial has size 1.In a given polynomial, the size of indicators is amortized: each nested has size 1 and produces a new polynomial variable.

5.2
ivr DSL For Trajectory eries In the Quivr DSL [Mell et al. 2023], the program input is a featurized trajectory, which is a sequence ∈ (R ) * as before.However, the output is now a single label ∈ {t, f } for the entire trajectory.This DSL is designed to allow the user to select trajectories that satisfy certain properties-e.g. the user may want to identify all cars that make a right turn at a certain intersection in a traffic video.
Syntax.This DSL is based on the Kleene algebra with tests [Kozen 1997], which, intuitively, are regular expressions where the "characters" are actually predicates.Its syntax is where F ∅ and F C are given sets of domain-specific predicates, the latter of which have constants ∈ R that need to be chosen by the synthesizer.
Semantics.Expressions in this DSL denote functions mapping sequences of feature vectors ∈ (R ) * to whole-sequence labels ({t, f }), defined as   2. The time (seconds, log scale) to identify the optimal program and prove its optimality, for our approach (blue, solid) and an SMT solver (red, dashed), as a function of the size of the training dataset, for two different tasks.

Comparison to Metasketches
Metasketches performs optimal synthesis using an SMT solver by, in addition to the correctness specification, adding an SMT constraint that the program's score be greater than the best score seen so far.Thus if the SMT solver returns "SAT", a better program will have been found, and the process is repeated.Note that in our setting, there is no correctness specification, and so achieving a better score is the only SMT constraint.
A similar strategy, which we found to be more effective, is to do binary search on the objective score: supposing that the objective is in [0, 1], we ask the SMT solver whether there is a program achieving score at least 1/2; if it returns "SAT", we ask for 3/4, and if "UNSAT" we ask for 1/4, and so on.Our implementation uses the Z3 [De Moura and Bjørner 2008] SMT solver.For a fairer comparison, in this experiment we restricted PyTorch to a single CPU core.
The two approaches were both run until they had converged to the exact optimal program.Figure 2 shows that the SMT solver scales very poorly as a function of the number of trajectories in the training dataset.While competitive for three or four trajectories, we would like to evaluate on datasets of hundreds or thousands of trajectories.

Comparison to Breadth-First Search
Next, to show the benefit of the search heuristic, we compare against an ablation which ignores the heuristic and does breadth-first search.
At any point in the search process, there is a heap of search nodes, each of which has a lower and upper bound of 1 scores reachable from it.Rather than using the lower bound from the abstract objective value, we instead evaluate the concrete program whose parameters are the midpoint of the hyper-rectangle of abstract parameters, to get a concrete objective value; this is a better lower-bound, and it is cheap to compute.The greatest of these lower bounds provides a lower bound on the optimal 1 score, and the greatest of these upper bounds provides an upper bound on the optimal 1 score.Note that if the lower and upper bounds are equal, they equal the true optimal 1 score, and search terminates.
We use 100 trajectories from each dataset.For CRIM13, these are randomly sampled from the training set.For Quivr, to ensure that we have some positive examples, because positives are very sparse on some tasks, we use 2 positive and 10 negative trajectories specially designated in the dataset, and the remaining 88 are sampled randomly from the training set.
Table 1 shows, at different times during the search process, the best found 1 score (the lowerbound of the interval), as well as the width of the interval of optimal 1 scores.On most tasks, our approach (H) achieves higher 1 scores more quickly than the ablation (B), as well as tighter intervals.

RELATED WORK
Neurosymbolic Synthesis.There has been a great deal of recent interest in neurosymbolic synthesis [Chaudhuri et al. 2021], including synthesis of functional programs [Gaunt et al. 2016;Shah et al. 2020;Valkov et al. 2018], reinforcement learning policies [Anderson et al. 2020;Inala et al. 2020], programs for extracting data from unstructured text [Chen et al. 2023[Chen et al. , 2021;;Ye et al. 2021], and programs that extract data from video trajectories [Bastani et al. 2021;Mell et al. 2023;Shah et al. 2020].Some of these approaches have proposed pruning strategies based on monotonicity [Chen et al. 2021;Mell et al. 2023], but for specific DSLs.NEAR is a general framework for neurosymbolic synthesis based on neural heuristics [Shah et al. 2020]; however, their approach is not guaranteed to synthesize optimal programs.To the best of our knowledge, our work proposes the first general framework for optimal synthesis of neurosymbolic programs.
Optimal Synthesis.More broadly, there has been recent interest in optimal synthesis [Bornholt et al. 2016;Smith and Albarghouthi 2016], typically focusing on optimizing performance properties of the program such as running time rather than accuracy; superoptimization is a particularly well studied application [Bansal and Aiken 2008;Massalin 1987;Mukherjee et al. 2020;Phothilimthana et al. 2016;Sasnauskas et al. 2017].Our experiments demonstrate that our approach outperforms Bornholt et al. [2016], a general framework for optimal synthesis based on SMT solvers.There has also been work on synthesizing a program that maximizes an objective (expressed as a neural network scoring function) [Ye et al. 2021], but they do not consider real-valued constants, the quantitative objective is syntactic, not semantic, and abstract interpretation is only used for pruning according to the Boolean correctness specification, not the quantitative objective.Optimal synthesis has also been leveraged for synthesizing minimal guards for memory safety [Dillig et al. 2014], chemical reaction networks [Cardelli et al. 2017], and optimal layouts for quantum computing [Tan and Cong 2020].
Abstract Interpretation for Synthesis.There has been work on leveraging abstract interpretation for pruning portions of the search space in program synthesis [Guria et al. 2023;So and Oh 2017], as well as using abstraction refinement [Wang et al. 2017]; however, these approaches target traditional synthesis.Rather than evaluating an abstract semantics on partial programs, Wang et al. [2017] constructs a data structure compactly representing concrete programs whose abstract semantics are compatible with the input-output examples.However, it is not obvious how their data structure (which targets Boolean specifications) can be adapted to our quantitative synthesis setting.
Abstract Interpretation for Planning.One line of work, initiated by the FF Planner [Hoffmann and Nebel 2001], uses abstract semantics to perform reachability analysis to prune invalid plans [Gregory et al. 2012;Hoffmann 2003;Zhi-Xuan et al. 2022].However, other than pruning invalid plans, the reachability analysis is not used in the computation of the search heuristic.Instead, traditional heuristics such as ℎ max (which computes the shortest plan in a "relaxed model" that drops delete lists from the postconditions of abstract actions, and outputs the length of this plan) are used.In contrast, we use an abstract transformer for the objective function to provide a lower bound that is directly used as the search heuristic.
A second line of work [Marthi et al. 2008;Vega-Brown and Roy 2018] considers computing optimal plans by underapproximating the cost function.They assume that the total cost of a plan equals the sum of the costs of the individual actions in that plan and then, given a lower bound on the cost of each action, simply sum these lower bounds to obtain a lower bound on the cost of the overall plan.This strategy makes strong assumptions about the structure of the overall cost function, whereas our abstract interpretation based approach requires no such assumptions.
Another key difference is that we are abstracting over real-valued parameters of partial programs, whereas the above approaches are abstracting over continuous states.Thus, our framework requires a way to iteratively refine the program space (specified by the "children" function), which is absent from their frameworks.

CONCLUSION
We have proposed a general framework for synthesizing programs with real-valued inputs and outputs, using * search in conjunction with a search heuristic based on abstract interpretation.Our framework searches over a space of generalized partial programs, which represent sets of concrete programs, and uses the search heuristic to establish upper bounds on the objective value of a given generalized partial program.In addition, we propose a natural strategy for constructing abstract transformers for components with monotone semantics.If our algorithm returns a program, then this program is guaranteed to be optimal.Our experimental evaluation demonstrates that our approach is more scalable than existing optimal synthesis techniques.Directions for future work include improving the scalability of our approach and applying it to additional synthesis tasks.