Superfusion: Eliminating Intermediate Data Structures via Inductive Synthesis

Intermediate data structures are a common cause of inefficiency in functional programming. Fusion attempts to eliminate intermediate data structures by combining adjacent data traversals into one; existing fusion techniques, however, are based on predefined rewrite rules and hence are limited in expressiveness. In this work we explore a different approach to eliminating intermediate data structures, based on inductive program synthesis. We dub this approach superfusion (by analogy with superoptimization, which uses inductive synthesis for program optimization). Starting from a reference program annotated with data structures to be eliminated, superfusion first generates a sketch where program fragments operating on those data structures are replaced with holes; it then fills the holes with constant-time expressions such that the resulting program is equivalent to the reference. The main technical challenge here is scalability because optimized programs are often complex, making the search space intractably large for naive enumeration. To address this challenge, our key insight is to first synthesize a ghost function that describes the relationship between the original intermediate data structure and its compressed version; this function, although not used in the final program, serves to decompose the joint sketch filling problem into independent simpler problems for each hole. We implement superfusion in a tool called SuFu and evaluate it on a dataset of 290 tasks collected from prior work on deductive fusion and program restructuring. The results show that SuFu solves 264 out of 290 tasks, exceeding the capabilities of rewriting-based fusion systems and achieving comparable performance with specialized approaches to program restructuring on their respective domains.


INTRODUCTION
Simplicity and efficiency are often at odds in programming.This is especially true in functional languages, where the idiomatic programming style is to compose library functions that operate on lists and other data structures.Programs written in this compositional style, however, are often inefficient because they have to allocate and traverse intermediate data structures.
Consider a function mts that returns the maximum tail sum of a (possibly negative) integer list, for example, mts [1, −2, 3, −1, 2] = 4, the sum of the last three elements [3, −1, 2].This function can be implemented by composing list functions, like so: § ¤ mts xs = maximum (map sum (tails xs)) ¦ ¥ where tails returns a (nested) list of all tails of the input, map applies the function sum to each tail and obtains the list of all tail sums, and maximum returns the maximum among these sums.This program is short and idiomatic.All four list functions it uses are commonly available in standard libraries, such as Data.List in Haskell.However, this program is also inefficient due to the large intermediate data structure constructed by tails, the list of all tails: the size of this data structure is quadratic in the size of the input list, causing mts to take quadratic time.

¦ ¥
The inefficiency of compositional programs can often be addressed by eliminating intermediate data structures, replacing them with scalar attributes that are sufficient to compute the final result.For example, in the mts program, we can replace the list of all tails with a pair of attributes: (1) the maximum tail sum, and (2) the sum of the whole list.The intuition is that the mts of a non-empty list is either the mts of its tail, or the sum of the tail plus the head element.Using this observation, we can rewrite mts into an efficient program mts', shown on the right, which only takes linear time.This program, however, is harder to write and understand than the original one.Hence we would like to write programs in the style of mts, and then transform them into efficient programs like mts' automatically.
Deductive Fusion.Automatic elimination of intermediate data structures is a well-studied problem in functional programming, also known as fusion or deforestation [Chin 1992;Coutts et al. 2007;Fokkinga 1992;Gill et al. 1993;Hamilton 2001;Hu et al. 1996;Meijer et al. 1991;Takano and Meijer 1995;Wadler 1988].Existing approaches to fusion are deductive, i.e. based on a predefined set of rewrite rules that transform the reference program into an optimized one.Deductive fusion is fast and its results are correct by construction, but its main downside is limited expressiveness: it only applies to programs that match the rewrite rules.The state-of-the-art deductive approach [Hinze et al. 2010] can only handle around 50% of our benchmark suite, which we collected from the literature, and to our knowledge, no existing automatic fusion system can handle the mts example.
Superfusion.When faced with a similar expressiveness limitation of traditional compiler optimizations, Massalin [Massalin 1987] proposed superoptimization, an approach that abandoned deductive rewrite rules in favor of inductive synthesis, i.e. constructing an optimized program from scratch, by searching the space of all programs, until one is found that matches the input-output behavior of the reference implementation.In this paper, we take a similar approach to eliminating intermediate data structures, which we refer to accordingly, as superfusion.
The input to superfusion is a reference program annotated with data structures to be eliminated; e.g. the nested list returned by tails in the mts example.The first step is to turn the reference program into a sketch [Solar-Lezama et al. 2006], where any program fragment that consumes or produces the undesirable intermediate data structure is replaced with a hole.The second step is to solve the sketch, filling the holes with new expressions that operate only on scalar attributes, while ensuring that the input-output behavior of the whole program is unchanged. 1hallenge 1: Scalability.The main technical challenge of superfusion is the scale of the search space.The target program of superfusion is often large because optimized programs are typically much more complex than idiomatic programs.Moreover, even when the solution to each single hole has a manageable size, there is still an issue that all holes have to be solved jointly, since our specification-equivalence to the reference program-is global.This scalability challenge makes a direct application of existing inductive synthesizers infeasible.
To address this challenge, our first insight is to synthesize a "ghost" compression function, denoted as ?compress, which maps the intermediate data structure to be eliminated to its scalar attributes required in the optimized program.For example, the compression function for mts takes a list of tails and returns the maximum tail sum and the sum of the first tail (i.e. the full list): § ¤ ?compress ts = (maximum (map sum ts), sum (head ts)) ¦ ¥ This is a "ghost" function, since it does not appear in the final solution mts'; its sole purpose is to decompose the global specification into local input-output specifications for each sketch hole, which can then be solved independently using existing programming-by-example (PBE) solvers [Alur et al. 2017b;Ji et al. 2021].For example, with the definition of ?compress above, it is straightforward to get input-output examples for the base case of tails' (the optimized version of tails): since tails [] = [[]] and ?compress [[]] = (0, 0), so tails ′ [] should return (0, 0).
Challenge 2: Synthesizing Compression Functions.At this point, the reader might be wondering why synthesizing ?compress is any easier than synthesizing the optimized program directly.After all, the only specification we have for ?compress is that all sketch holes can be filled correctly while only operating on the scalar attributes.This appears to be a chicken-and-egg problem: we need the definition of ?compress to efficiently fill the holes, but to decide if we got the right ?compress, we need to know whether the holes can be filled!Our second insight is that, as long as the program space of sketch holes is rich enough, this apparent circular dependency can be broken using second-order quantifier elimination, resulting in an independent synthesis task for ?compress.In comparison, this task is simpler than the original superfusion task because ?compress does not need to be efficient, and hence is smaller (and easier to synthesize) than the sketch holes.Evaluation.We implement superfusion in a tool called SuFu and evaluate it on a suite of 290 benchmarks collected from prior work.Our first source of benchmarks are fusion tasks from the deductive fusion literature [Bird 1989;Bird and de Moor 1997;Gill et al. 1993;Hu et al. 1997;Wadler 1988].For our second source of benchmarks, we turns to prior work on program restructuring [Acar et al. 2005;Farzan et al. 2022;Farzan andNicolet 2017, 2021a,b;Ji et al. 2024b;Morita et al. 2007;Pu et al. 2011], where the problem is to transform a reference program into a specific target form, such as "divide-and-conquer" or "single pass".We show that for several specific target forms studied in the literature, program restructuring can be reduced to fusion.
Our evaluation results show that SuFu can solve 264 out of 290 problems, at least half of which are beyond the reach of deductive fusion techniques.Moreover, while being general, SuFu is also competitive with two specialized synthesizers for program restructuring [Farzan et al. 2022;Ji et al. 2024b] on their respective domains, in terms of the number of solved problems (although it is somewhat slower in terms of synthesis times).Contributions.To sum up, this paper makes the following main contributions.

OVERVIEW
This section gives an overview of our tool SuFu, using the mts example from the introduction; the workflow of SuFu on this example is shown in Fig. 1.Recall that the idiomatic implementation of mts is inefficient because of the intermediate data structure produced by tails.To get a more efficient program, the user annotates the output type of tails with Packed to specify that it should be eliminated (Fig. 1a).SuFu then replaces the annotated data structure with scalar attributes and rewrites all related program fragments, generating a more efficient implementation (Fig. 1c).

Superfusion as Program Sketching
The first step in SuFu's workflow is to turn the annotated reference program into a sketch [Solar-Lezama et al. 2006], as shown in Fig. 1b.To this end, SuFu uses a type-directed approach, which we detail in Sec. 3, to identify the subterms that produce or consume data structures annotated with Packed-in our case, the NList generated by tails.There are three such terms in mts, labeled ?t 1 , ?t 2 , and ?t 3 in Fig. 1b: ?t 1 and ?t 2 produce an NList, while ?t 2 and ?t 3 consume an NList.Each of these three terms is replaced with a sketch hole, which the synthesizer will need to fill with a new term, only operating on scalar attributes of the original NList.
To make the synthesizer's job easier, SuFu attempts to reuse as much of the original program as possible, moving subterms that do not directly operate on Packed data structures out of the holes, into let-bindings.For example, the two invocations of tails in Fig. 1b are moved out, because they do not contain any NList-specific operations.
Once the sketch has been generated, SuFu's task is to solve it, i.e. to fill the holes so that the resulting program is both correct and efficient: • Correctness requires that the final program has the same input-output behavior as the reference.SuFu is based on the CEGIS framework [Solar-Lezama et al. 2006] and assumes the existence of an external verifier capable of generating counter-examples for incorrect programs; hence the synthesizer only needs to ensure correctness on a finite set of inputs.• Efficiency.Fusion is only helpful if the resulting program is more efficient than the reference, but without any restrictions on the program space for the holes, this is not guaranteed: for example, the solution for ?t 3 could ignore the new optimized tails and simply recreate the original implementation from scratch.To prevent this, we restrict the program space of sketch holes to include only recursion-free programs that run in O (1) time.With this restriction, we can prove an efficiency guarantee on the resulting program (Thm.4.11).

Sketch Solving with Compression Functions
Superfusion sketches are challenging to solve because optimized programs are often much more complex than idiomatic programs, so the expressions that need to be synthesized for each hole are relatively large.In our dataset, the average size of these expressions is 45.5 AST nodes, with a maximum of 559.This scale exceeds the capabilities of general-purpose sketch solvers [Solar-Lezama et al. 2006;Torlak and Bodík 2013], especially given that a superfusion sketch typically contains multiple holes, which must jointly satisfy the global IO specification.In our experiments, a state-of-the-art sketch solver [Lu and Bodík 2023] can only solve ∼30% of tasks in our dataset.
To overcome this challenge, SuFu decomposes the global IO examples for the whole sketch into local IO examples for each hole.With these local examples, SuFu uses off-the-shelf PBE solvers for recursion-free programs [Alur et al. 2017b;Ji et al. 2021] to solve each hole independently.
To generate the local examples, we leverage the reference program.Specifically, we start by observing the IO behavior of the original terms that were replaced by holes.For example, Tab. 1 illustrates the behavior of the three terms in the original mts program, corresponding to sketch holes ?t 1 , ?t 2 , and ?t 3 , when executed on the input list [2].Of course, these IO behaviors cannot directly be used as the specification for the holes because they involve intermediate data structures to be eliminated-the nested lists, shown in the table in blue.
Our first key insight is to bridge the gap between the behavior of the original and the target programs by introducing an unknown compression function, ?compress, which maps the undesired intermediate data structure to scalar attributes.Using this function, we can express local examples for each hole in a symbolic form, by simply compressing the intermediate data structure in each original behavior, as shown in the fourth column of Tab. 1.
Assume for a second we had an oracle for ?compress, which knew to compress the list of tails intro two scalar attributes, the maximum tail sum and the sum of list elements: ?compress ts maximum (map sum ts), sum (head ts) (1)  1c.The remaining challenge is to synthesize a suitable compression function, i.e. to guess which scalar attributes are sufficient to implement the sketch holes in O (1) time; the rest of this section is devoted to this task.

Synthesizing the Compression Function
The only specification we have for ?compress are symbolic local examples, such as those in Tab. 1.The issue with this specification, of course, is that it also refers to the unknown sketch holes, ?t 1 , ?t 2 , and ?t 3 .The naive way to approach this problem is to synthesize ?compress and all sketch holes simultaneously, but that defeats the purpose of introducing ?compress in the first place.
Our second key insight is that domain-specific properties of superfusion enable an efficient synthesis algorithm that combines enumeration and (quantifier) elimination.Specifically, we observe that the unknown programs (?compress and sketch holes) can be further decomposed into components that can be classified into two categories: (1) components with a small implementation, which can be efficiently enumerated; (2) components that can be quantified over as uninterpreted functions and eliminated.This observation leads to the following synthesis algorithm: • The top-level algorithm iteratively refines ?compress, adding scalar attributes as required by local examples for a single hole.• In each iteration, SuFu uses a second-order quantifier elimination method inspired by Ackermann's reduction [Lewis 1978] to obtain a simpler specification involving only the new scalar attributes plus at most a small component of a sketch hole, allowing efficient enumeration.
Next, we discuss the iterative refinement and the quantifier elimination method in more detail.

Iterative
is the space of all constant-time programs with integer output, and the two conjuncts originate from the two executions of mts mentioned above.Clearly such a ?t 3 does not exist because it produces different outputs for the same input.Hence we need to add a new scalar attribute to ?compress that provides enough information to differentiate between the two executions.
More formally, we need to find a function ?compress 1 that satisfies the following specification: . .This specification is also unrealizable.Intuitively, this is because the mts of the tail (i.e., the second input of ?t 2 , which is 0 in both cases) clearly does not contain enough information to compute the mts of the whole list, which is 2 in the first case and 1 in the second case; and although ?t 2 also has access to the input list xs through its first input, it cannot compute mts from xs in O (1) time since xs can be arbitrarily long.
To fix this unrealizable hole, we need to add another attribute, ?compress 2 , to the compression function, satisfying the specification: Once again, given enough examples, our quantifier elimination will discover that the additional attribute required here is the sum of the list elements: ?compress 2 ts sum (head ts) Iteration 3. Once we substitute ?compress ts (?compress 1 ts, ?compress 2 ts) into local examples, we find that all sketch holes are realizable, which concludes the synthesis of ?compress.
2.3.2Quantifier Elimination.We conclude this section by explaining how SuFu synthesizes each scalar attribute of ?compress from the specifications Eq. 2 and Eq. 3. Iteration 1.Let us consider the specification for ?compress 1 in Eq. 2. This specification involves two unknown programs, ?compress 1 and ?t 3 , and our goal is to avoid synthesizing both of them simultaneously.To this end, we observe that: (1) to synthesize ?compress 1 , we do not need to know the implementation of ?t 3 , only that such a function exists.
(2) a function exists if and only if it maps every input to a unique output.These observations suggest a simple second-order quantifier elimination procedure2 , which transforms Eq. 2 into the following equivalent form that involves only ?compress 1 : where E 3 is the set of local examples collected for ?t 3 from the executions of mts (such as the last row, the Example column of Tab. 1).
We can now enumerate candidate ?compress 1 from smaller to larger, checking them against the specification Eq. 4. SuFu can find the desired program ?compress 1 ts maximum (map sum ts) from 21 executions of mts.Tab. 2 illustrates three smaller but incorrect candidates for ?compress 1 , together with a pair of executions sufficient to reject them.
A careful reader might object that Eq. 4 is only equivalent to Eq. 2 when the space of ?t 3 can implement all possible functions, whereas in our case ?t 3 is restricted to the space of O (1)-time programs.In this case, however, it is reasonable to assume that the O (1)-time requirement imposes no extra restrictions on the space of ?t 3 , because both its inputs and outputs are scalar values, and most scalar calculations can be done in O (1) time.Iteration 2. The specification Eq. 3 for ?compress 2 is a little more involved.The main difference is that we cannot eliminate ?t 2 using the same reasoning as above, because ?t 2 no longer operates only on scalar values: it takes the list xs as its first input, and many list-processing functions do not have any O (1)-time implementation.
Fortunately, this issue can be addressed by further decomposing ?t 2 .Specifically, since ?t 2 is an O (1)-time program, it can only access a constant number of scalar values in the input.Therefore, without loss of generality, we can assume that ?t 2 has the form: ?t 2 (xs, ts) ?comb (?extract (xs, ts)) where ?extract extracts a tuple of scalar values from the input variables and ?comb combines them into the final result.With this decomposition in place, we can now use our previous technique to eliminate ?comb (which operates on scalars), and then enumerate ?extract jointly with ?compress 2 (note that ?extract is small because it does not perform any actual computation), resulting in: ?extract (xs, ts) (ts.1, ts.2, head xs) ?compress 2 ts sum (head ts)

SKETCH GENERATION
Given a reference program whose intermediate data structures have been annotated with Packed, SuFu's first step is to translate this program into a sketch by replacing some of its subterms with holes.Intuitively, sketch holes must satisfy the following requirements: (1) Correctness: the holes must cover all subterms that directly produce or consume intermediate data structures because these data structures must be replaced with scalar attributes.(2) Feasibility: the output of each hole must be scalar; otherwise, this hole's solution will need to reconstruct a non-scalar data structure from scalar attributes, which is generally impossible.(3) Minimality: the subterms covered by holes should be as small as possible because the larger they are, the more SuFu needs to synthesize, and thus the harder the synthesis task will be.To formalize these requirements, we design an intermediate language, dubbed λ sk , which makes the scope of the holes and all uses of intermediate data structures explicit.Importantly, any welltyped λ sk program corresponds to a sketch that is both correct and feasible.Sketch generation is then reduced to the problem of elaborating the reference program into a well-typed intermediate representation, while minimizing the scope of the holes.

Intermediate Language
The syntax of λ sk is shown in Fig. 2. The language is a simply typed lambda calculus with inductive data types, which we augment with the following annotations: Γ ⊢ s t : T for scope s ∈ {in, out} Typing rules for annotations in λ sk .The scope s represents whether the current term is inside a rewrite.
• At the type level, we introduce Packed annotations, which are treated as a type constructor: Packed T is a new type, different from T .Besides, λ sk treats Packed T as a scalar type because, intuitively, users annotate data structures as Packed to turn them into scalars.
• At the term level, we introduce three annotations-label, unlabel, and rewrite-which are added automatically during elaboration.label and unlabel are the constructor and destructor for Packed values; for example, label[1, 2] is of type Packed List and unlabel (label . These constructs make the production and consumption of intermediate data structures explicit.Finally, the rewrite annotation marks the scope of a hole; rewrite t has the same type and behavior as t, but is treated as a hole to be replaced with a constant-time term.The typing rules of λ sk are standard apart from those for annotations, shown in Fig. 3.Note that the typing judgment Γ ⊢ s t : T is parameterized by a scope s ∈ {in, out}, which keeps track of whether the current term t is inside a rewrite.The rules T-Label and T-Unlabel only allow producing and consuming intermediate data structures inside a rewrite, thereby enforcing the correctness requirement above.The rule T-Rewrite restricts the argument to rewrite to have a scalar type, thereby enforcing the feasibility requirement.

(D)
Term A inserts an unlabel to provide the correct argument type to map sum; it is still ill-typed, however, because unlabel is not inside a rewrite.Intuitively, this term violates the correctness requirement: the invocation of tails produces an intermediate data structure, but is not covered § ¤ Algorithm 1: Sketch solving.
Input: A sketch p and inputs I .
by a hole.Term B does cover the unlabel with a rewrite, but it is nevertheless ill-typed because the argument type of its rewrite is a List instead of a scalar type.Intuitively, this term violates the feasibility requirement: if we tried to synthesize a solution for this hole, we would need to reconstruct the full list of tail sums from only scalar attributes of the input list, which is impossible.Finally, terms C and D are both well-typed elaborations of mts; the difference is that term D moves the invocation of tails into a let binding and out of the rewrite, thereby reducing the scope of the hole and allowing the synthesizer to reuse the recursion structure of the original program.

Generating Minimal Annotations
With the intermediate language in place, we can now formalize the task of sketch generation: Definition 3.2 (Sketch Generation).Given a reference program p in λ sk (potentially ill-typed due to Packed annotations), find a sketch, i.e. a program p ′ in λ sk , such that: (1) p ′ can be obtained from p by adding label, unlabel, and rewrite annotations, or extracting sub-terms into let-bindings; (2) p ′ is well-typed; (3) the total size of arguments to rewrite in p ′ is minimized.
Note that the IO behavior of the sketch is always the same as that of the reference program, because all the transformations performed during sketch generation are semantics-preserving.
SuFu reduces the sketch generation problem to a MaxSAT instance and solves it using an off-theshelf constraint solver (Z3 [de Moura and Bjørner 2008] in our implementation).The encoding is straightforward: SuFu encodes the search space (condition 1 above) as a symbolic program, where the choice to insert an annotation or let-bindings for each subterm is controlled by a Boolean variable; it encodes the typing (condition 2) as a hard constraint by symbolically executing the type checker on the symbolic program; finally, it encodes the minimality requirement (condition 3) as a soft constraint by symbolically executing the objective function.
Example 3.3.Among the four different elaborations of mts in Example 3.1, all of them satisfy condition (1) from definition Def.3.2, but only terms C and D are well-typed and only term D is also minimal.The resulting sketch-i.e. the well-typed and minimal elaboration-for the full mts example from Fig. 1a is shown in Fig. 4.

SKETCH SOLVING
In this section, we introduce the sketch solving approach in SuFu.We start with the problem definition (Sec.4.1) and a high-level overview of the approach (Sec.4.2), and then delve into the details in the following subsections.

Synthesis Problem
A solution for a sketch in λ sk consists of (1) a list of scalar surface-level types to replace Packed T types, and (2) a list of constant-time surface-level terms to replace holes, i.e. rewrite t terms.The restriction to surface level simply means that the solution cannot use any annotations, and the restriction to scalar types and constant-time terms ensures that the solution is efficient.Definition 4.1 (Sketch Solution).A solution for a sketch p in λ sk is a pair (T i , t h ) of a list of types T i and a list of terms t h satisfying the following conditions.
• T i contains only scalar types in the surface language.
• t h contains only constant-time terms in the surface language, or formally, terms that can be evaluated within a fixed number of steps under any well-typed context.• lengths of T i and t h are respectively equal to the number of Packed and rewrite annotations.Given a solution (T i , t h ), we obtain a synthesized program from the sketch p by replacing each Packed T with the corresponding type in T i and each rewrite t with the corresponding term in t h .
Example 4.2.For the sketch mts_label in Fig. 4, the desired solution is to take the singleton list [Int × Int] as T i and take the three highlighted terms in Fig. 1c as t h .
Our synthesis problem then is to find a sketch solution such that the synthesized program has the same IO behavior as the sketch (and hence, as the reference program it was generated from).
Definition 4.3 (Sketching Problem).Given a sketch p in language λ sk and a finite set I of inputs, the sketching problem is to find a solution of p such that the synthesized program has the same output as p on every input in I .
For simplicity, we consider only a finite set of inputs in this definition.In our implementation, we incorporate the CEGIS framework [Solar-Lezama 2009] to reduce the general case of an infinite input space to a finite set of representative inputs.
In the remainder of this section, we focus our discussion on a special case where Packed is used only once (and hence, |T i | = 1).Our approach extends straightforwardly to the general case with multiple Packed annotations by synthesizing a separate compression function for each intermediate data structure.Details on this extension can be found in the appendix [Ji et al. 2024a].The core of this algorithm is functions CollectExamples and CompressSynthesis.We shall introduce these two functions in order in the next two subsections (Sec.4.3 and Sec.4.4).

Example Collection
We use big-step environment semantics [Dikotter 1990] of λ sk to formalize the collection of local examples.The evaluation in this semantics follows strictly along the syntax, hence local examples can be directly constructed from the results of evaluating rewrite terms.
Example 4.4.The following shows a part of the derivation for evaluating mts_label[2, −1].The judgment has the form of E ⊢ t ⇓ v, where E is an environment assigning values to free variables, t is the term to be evaluated, and v is the evaluation result.
]) ⊢ let ts = tails xs in rewrite (maximum (map sum (unlabel ts))) ⇓ 1 The right branch here evaluates the third rewrite term in mts_label, corresponding to hole ?t 3 .This term evaluates to 1 under the environment ).As you can see, the values collected from a sketch evaluation can contain the label constructor (in this case, in the input, but generally also in the output); these label v values correspond to intermediate data structures that must be "compressed" into scalar attributes.Hence, we replace label with the unknown function ?compress to obtain a symbolic local example for a hole; for instance: 1 is a symbolic local example for ?t 3 that corresponds to the sketch evaluation above and is added to E sym [?t 3 ] by CollectExamples.
To convert symbolic examples to concrete examples, Algorithm 1 uses Subst(E sym , compress), which substitutes the concrete compression function compress for the variable ?compress, and then β-reduces the resulting term.For example, invoking Subst on the symbolic example above with compress = λx .sum(head x) yields the concrete example ⟨(xs → [2, −1], ts → 1), 1⟩.

Synthesizing the Compression Function
Given the symbolic local examples E sym , our task is to pick an implementation for ?compress such that each sketch hole is realizable in constant time, or more formally: Here n is the number of sketch holes and L scalar O (1) is the space of constant-time scalar-typed terms in the surface language.We will refer to the constraint ρ (h) as the realizability constraint for hole h.
Example 4.5.Assume that we have collected a series of local symbolic examples for the hole ?t 3 , including the one from Example 4.4.Then the realizability constraint ρ (?t 3 ) for this hole is: examples) If we use a trivial compression function that does not extract any attributes, i.e. ?compress → λx .(),this constraint will be unrealizable since we cannot compute the mts in constant time just from the input list.However, if we add the attribute of the maximum tail sum, i.e. ?compress → λx .(maximum(map sum x)), this constraint becomes realizable with ?t 3 simply returning ts.
SuFu solves realizability constraints by iteratively refining ?compress with new attributes, as shown in Algorithm 2. This algorithm maintains the current tuple of attributes A. Every attribute α ∈ A is a term with a free variable x that denotes the intermediate data structure, so that λx .A is a valid compression function; for example, we might have: to denote that the current compression function computes two attributes from the list of tails xmaximum tail sum and sum of list elements, referred to hereafter as mts and sum, respectively.
The tuple A starts out empty and is iteratively extended with new attributes found by Refine.
then iterates through each hole h, synthesizing additional attributes A h required to make this hole realizable under the current concrete examples E[h]; the heavy lifting here is done by the function SolveSingleHole, which will be introduced in Sec.4.5.In Line 12, Refine returns the union of all new attributes modulo observational equivalence (two attributes are observationally equivalent if their value is the same for all intermediate data structures in E sym ).If Refine did not synthesize any new attributes, then all the holes are already realizable with the current compression function, and the algorithm terminates returning λx .A (Line 5).Otherwise, SuFu adds the new attributes to A and continues.3Iteration 1.We start with A = (); the corresponding concrete local examples are shown in Row 1 of Tab. 3. At this point, hole ?t 3 , whose output is a surface-language scalar, is unrealizable for the reasons explained in Example 4.5.On the other hand, hole ?t 2 , whose output is an intermediate data structure, is trivially realizable, since it simply returns unit; hence, Refine can safely ignore ?t 2 in this iteration (denoted by its examples being grayed out in the table).Based on the examples for ?t 3 , Refine will discover that the attribute mts is necessary to make this hole realizable, resulting in A = (mts) after the first iteration.
Iteration 2. Since the first iteration synthesized a new attribute mts, we need to call Refine again with the new concrete examples (shown in Row 2 of Tab.3), to check whether the new attribute caused any holes to become unrealizable.This is indeed the case in for hole ?t 2 : this hole now must compute the mts of the full list from only the mts of the tail (and the whole input list), which is impossible in constant time.Note that the hole ?t 3 is ignored in this iteration because its output did not change, so we already know it is realizable.In this round Refine will discover a new attribute sum-the sum of list elements-for hole ?t 2 , thus updating tuple A to A = (mts, sum).Iteration 3. Since a new attribute sum was added, SuFu invokes Refine once again on the new concrete examples (Row 3 of Tab. 3).This invocation can still ignore the hole ?t 3 , and additionally can ignore the old attribute mts in the output of ?t 2 , because previous iterations already ensure that it can be calculated in constant time.Hence, Refine focuses on the new output of ?t 2 , checking if it is realizable; since this is the case, no new attributes are added, and the algorithm terminates.

Synthesizing Attributes for a Single Sketch Hole
Algorithm 2 invokes the function SolveSingleHole to discover missing attributes for a hole h based on the concrete examples E[h] for that hole.In other words, this function needs to synthesize an additional compression function ?compress ′ such that the examples E[h] would become realizable after extending their inputs with the new attributes specified by ?compress ′ : Here we write in orig i denotes the ith intermediate data structure in the original input, which has been compressed into scalar attributes in the current input in.
Example 4.7.When SolveSingleHole is invoked on ?t 2 in the second iteration of Algorithm 2, the specification of ?compress ′ is as follows (with the example in Row 2 of Tab. 3 shown explicitly): examples) It is challenging to solve the specification Eq. 6 because it involves the sketch hole ?t h , which is too complex to be synthesized together with ?compress ′ .To overcome this challenge, we transform this specification in two steps to eliminate (most of) the sketch hole.
First, since ?th is limited to constant-time, it can only access a constant number of scalar values in the input.Hence, it can be decomposed as ?t h in ?comb (?extract in), where ?extract extracts a tuple of scalar values and ?comb accomplishes the calculation using the extracted values.With this decomposition, Eq. 6 can be rewritten as follows: , ∀⟨in, out⟩ ∈ E, ?comb ?extract in, ?compress ′ in orig i

= out
Second, since most common scalar calculations can be accomplished efficiently, we assume that our program space L scalar O (1) is expressive enough to implement all possible scalar functions.Assumption 4.8.For any function f whose input and output are both scalar values, there always exists a constant-time term with the same semantics, i.e., ∃t ∈ L scalar O (1) , ∀in, f in = t in.We then use a quantifier elimination procedure to remove ?comb from the specification.Specifically, by Asm.4.8, we can regard ?comb as an uninterpreted function and thus apply Ackermann's reduction [Lewis 1978] to ?comb.This reduction eliminates an uninterpreted function using its congruence property (i.e., identical inputs imply identical outputs), and here, the reduction result is an equivalent specification without involving ?comb, as shown below.
SuFu treats Eq. 7 as a joint specification for ?compress ′ and ?extract and synthesizes them simultaneously by enumeration.Given a program space for ?compress ′ , it enumerates all pairs of candidate programs for ?compress ′ and ?extract in the increasing order of the total size, until a pair satisfying Eq. 7 is found.This enumeration method, although straightforward, is effective in practice because in most cases, the target ?compress′ and ?extract are both small: ?compress ′ can be constructed compactly from library functions since its efficiency is not important, while ?extract needs only to access scalar values in the input as opposed to performing complex scalar calculations.

Properties
Soundness.SuFu ensures that the synthesized program is observationally equivalent to the sketch, and hence to the reference program.Theorem 4.9 (Soundness).For any sketching problem with sketch p and input set I , the synthesized program has the same output as p on set I , if the underlying PBE solver for sketch holes is sound.
Proof.Due to the space limit, we move the proofs to the appendix [Ji et al. 2024a].□ Completeness.The completeness of SuFu depends on the program space of compression functions.Since local examples are collected from the evaluation of rewrite terms, there is always enough information in the inputs to calculate the scalar outputs and any required scalar attributes.Consequently, when the program space of ?compress is expressive enough, SuFu can always find an appropriate ?compress, and with it, a solution to the sketching problem.
Theorem 4.10 (Completeness).For any sketching problem, SuFu can find a solution if Asm. 4.8 holds, the underlying PBE solver for sketch holes is complete, and the program space of ?compress can implement any function mapping from intermediate data structures to scalar values.
Efficiency of synthesized programs.SuFu replaces zero or more subterms in the reference program with constant-time terms.Hence, SuFu ensures that its synthesized program cannot have a higher asymptotic complexity than the reference.Theorem 4.11 (Efficiency).Let cost(p, in) be the size of the derivation tree of evaluating program p on input in.For any sketching problem, the following formula is always satisfied by the reference program p and the program p ′ synthesized by SuFu.
In practice, the synthesized program is usually strictly more efficient than the reference because the rewrite terms typically involve non-constant-time operators on intermediate data structures.

APPLICATIONS TO PROGRAM RESTRUCTURING
In this section, we consider several lines of prior work on synthesizing efficient recursive programs, which we collectively refer to as program restructuring.Program restructuring aims to rewrite a reference program to follow a given efficient template, such as divide-and-conquer [Farzan and Nicolet 2021b;Ji et al. 2024b;Morita et al. 2007] or single-pass recursion [Farzan et al. 2022;Pu et al. 2011].We show how this task can be reduced to a superfusion problem and solved by SuFu.Let us illustrate this reduction taking divide-and-conquer (D&C) as an example.For programs over lists, D&C suggests dividing the input list into two halves, recursively computing the result for each half, and then combining the two results.Fig. 5a shows a D&C program for our running example, mts.To combine the mts of the two halves, this program introduces the sum of list elements as an auxiliary output.The intuition here is that the mts of the whole list is either the mts of the right half or the sum of elements in the right half plus the mts of the left half.As we can see, D&C restructuring is a challenging task: not only do we need to determine how to combine the recursive results, but often we also need to discover auxiliary outputs.
To reduce this problem to superfusion, we introduce a template program dac_id, shown in Fig. 5b.A careful reader will see that dac_id is simply the identity function with the recursive structure of a D&C program: splitting the input list into two halves, only to put them back together again.
Since the template program is the identity, we can compose it with our original reference program mts from Fig. 1a to obtain a new program dac_mts, shown in Fig. 5b, which behaves equivalently to mts.In this new program, the template dac_id produces an intermediate data structure, which we can annotate with Packed, and apply SuFu to eliminate it.This will have the effect of forcing SuFu to use the recursive structure of dac_id-that is, divide-and-conquer-to compute the result of mts.In fact, the result of applying SuFu to dac_mts is precisely the D&C program in Fig. 5a.
This approach can be generalized to other templates, beyond D&C.To this end, the first step is to implement a template function that returns the input unchanged, but follows the desired recursive structure; the second step is to compose the template function with the reference program, and finally, we apply SuFu to eliminate the intermediate data structure produced by the template function, thus getting a synthesized program in the target form.Another example of applying this approach can be found in the appendix [Ji et al. 2024a], which is for the template of single-pass recursion.Please note that the template and the Packed annotations need only be written once for each target form, since they are independent of the reference program.

IMPLEMENTATION
Our implementation of SuFu is in C++ and is available online [Ji et al. 2024a].
Program spaces.SuFu is parameterized by two program spaces: one for sketch holes and one for compression functions.We construct these two spaces as follows.
• The program space of sketch holes (i.e., L scalar O (1) ) is based on the SyGuS theory of conditional linear integer arithmetic [Alur et al. 2017a], which includes the if-then-else operator, arithmetic operators such as +, relational operators such as ≤, and Boolean operators such as and.
Besides, to operate on tuples and inductive data types, we augment this program space with tuple constructors and projections, and the pattern-match operator on inductive data types (e.g.match ?with Nil -> ?| Cons(h,t) -> ?for lists).
• The program space of compression functions includes all functions defined in the reference program4 , the DeepCoder's library [Balog et al. 2017] of list functions, and the fold operators for all involved inductive data types.These fold operators enable SuFu to synthesize recursive compression functions not present in the reference program.
Verification.SuFu is based on the CEGIS framework and requires an external verifier to generate counterexamples.Following previous studies on synthesizing recursive programs [Miltner et al. 2022;Solar-Lezama et al. 2006;Torlak and Bodík 2014], we use bounded verification in our implementation (details in the appendix [Ji et al. 2024a]).To reduce the overhead, SuFu initializes the example set of CEGIS with 10 3 random examples.These examples can exclude the majority of the incorrect results and thus greatly reduce the number of invocations of the bounded verifier.

EVALUATION
We design our evaluation to answer the following research questions.
• RQ1: How effective is SuFu in eliminating intermediate data structures?
• RQ2: How does SuFu compare to specialized program restructuring tools?

Experimental Setup
Baseline solvers.We compare SuFu with three inductive synthesizers.The first two are specialized solvers for program restructuring tasks that can be reduced to fusion: • Synduce [Farzan et al. 2022] restructures recursive data-structure traversals according to a user-provided sketch (with the goal of making the traversals more efficient); Synduce uses a whitebox technique based on program unfolding, and requires both the reference and the target programs to traverse the data structure at most once.• AutoLifter [Ji et al. 2024b] restructures programs into the divide-and-conquer (D&C) paradigm; it leverages domain-specific properties of D&C to decompose the synthesis problem.
SuFu is strictly more general than these two baselines and can be applied to all tasks in their domains.We have explained the reduction from D&C restructuring to fusion in Sec.5; the reduction from the domain of Synduce to fusion can be found in the appendix [Ji et al. 2024a].
Our third baseline is Grisette [Lu and Bodík 2023], the most recent sketch solver at the time of writing.For this comparison, we first generate a sketch using the generation approach of SuFu (Sec.3) and then solve this sketch using either SuFu (Algorithm 1) or Grisette.
Note that we do not perform an empirical comparison with traditional deductive fusion systems: to our knowledge, no automated fusion tool (including the most recent work [Hinze et al. 2010]) can synthesize auxiliary attributes-such as the sum of list elements in our mts example-which are essential for many tasks in our dataset.
Dataset.We collect a dataset of 290 tasks from three different sources, each task specified by a reference program with some data types annotated as Packed.• Fusion.We collect 16 tasks from fusion literature [Bird 1989;Bird and de Moor 1997;Gill et al. 1993;Hu et al. 1997;Wadler 1988]; 8 of them come from work on manual optimization [Bird 1989;Bird and de Moor 1997], which cannot be straightforwardly automated.• Recursion.We include all 178 tasks from the original dataset of Synduce.In the process, we discovered that 60 of these tasks include manually provided auxiliary attributes, which presumably were added due to the limitations of Synduce.For example, in the mts task, the reference program returns not only the maximum tail sum but also the sum of list elements.We remove these auxiliary attributes when constructing our dataset.• Divide and Conquer (D&C).We include all 96 tasks from the original dataset of AutoLifter.
We list some statistics of our dataset in Tab. 4. Note that tasks from different sources present different challenges.For example, Fusion tasks often require eliminating multiple data structures at once, while Recursion tasks require eliminating 35 different data types, including lists, trees, zippers, natural numbers, and expression ASTs.D&C tasks are the most challenging, both in terms of requiring auxiliary attributes and the program size.Overall, about half of the tasks in our dataset require auxiliary attributes, and hence are out of scope for deductive fusion systems.
Our experiments are conducted on Intel Core i7-8700 3.2GHz 6-Core Processor, with a timeout of 10 minutes per task.Our full dataset and results are available in the supplementary material.

RQ1: Overall Effectiveness
We summarize the performance of SuFu in Tab. 5. Overall, SuFu successfully solves 264 out of 290 tasks in the dataset (91%), taking around 24 seconds on average, and many of them (115 out of 264) require auxiliary attributes.Note that although SuFu uses MaxSAT for sketch generation, in practice this step takes only fractions of a second, and the run time is dominated by sketch solving.In terms of the size of the synthesized expressions, Tab. 5 confirms our key hypothesis that the

Task Description Program Fragment
Synthesize a D&C program that calculates the maximum tail product of a (possibly negative) integer list.
Given two languages L 1 and L 2 of Boolean expressions, synthesize the interpreter of L 1 from the interpreter of L 2 .ghost function ?compress is much smaller than the solutions to sketch holes, and hence the extra work of synthesizing ?compress (as well as ?extract) pays off to decompose the synthesis problem.
Quality of synthesized programs.We manually examined all synthesized programs and confirmed that they are functionally equivalent to the reference implementations on the entire input domain (though SuFu only uses bounded verification).In terms of efficiency, for Recursion and D&C tasks, we have a theoretical guarantee that SuFu's results have the same asymptotic complexity as those synthesized by Synduce and AutoLifter respectively; this guarantee is a direct consequence of our restriction on the program space, that is, that SuFu always fills sketch holes with O (1)-time expressions.Finally, for the Fusion tasks, we manually inspected the 14 programs produced by SuFu and compared them to the fused implementations from the original papers; we confirmed that (1) 13/14 programs are the same as the original, and (2) the remaining program is strictly more efficient, as SuFu fuses a sum over the list [l, . . ., r ] into a constant-time expression (l + r )(r − l + 1)/2.
Sample programs.Tab.6 shows fragments from two sample programs synthesized by SuFu.The first one illustrates the scalability of our tool: it presents a term synthesized for a single hole in a D&C task (the size is fairly typical for the tasks in this domain).The second one demonstrates the applicability of SuFu beyond lists: in particular, this task requires eliminating an AST.

RQ2: Comparison with Program Restructuring Tools
In this experiment, we run Synduce and AutoLifter on their original datasets, and compare their performance with SuFu.The results are shown in Tab.7 and Fig. 6.Compared with Synduce, SuFu takes more time on simple tasks but can eventually solve more tasks.We find that SuFu has a clear advantage on tasks that require inventing auxiliary attributes.Although Synduce can do this in principle, its whitebox approach is too restrictive for some of the tasks; in comparison, SuFu uses a blackbox approach, which receives less guidance from the reference program, but is more flexible in terms of supported auxiliary attributes.
Compared with AutoLifter, SuFu is slower but eventually (after about five minutes) solves a similar number of tasks.It is not surprizing that AutoLifter has an advantage on this domain:  like SuFu, it is a blackbox inductive synthesizer, but it implements domain-specific optimizations for divide-and-conquer algorithms, making its synthesis task simpler.In summary, we observe that although SuFu is slower than specialized tools, it can solve a similar or higher number of tasks given enough time, while being strictly more general.

RQ3: Comparison with the Sketch Solver
In this experiment, we ablate SuFu's sketch solver, replacing it with an off-the-shelf sketch solver Grisette.The results are shown in Tab. 8 and Fig. 6.Overall, our synthesis algorithm significantly outperforms Grisette on both the number of solved tasks and the time cost.Predictably, SuFu's advantage is most pronounced on tasks that require synthesizing large expressions: for example, on D&C, where the average size of a sketch solution is a whopping 85.9 AST nodes, SuFu solves almost four times more tasks that Grisette and is more than four times faster on jointly solved tasks.This is because Grisette searches for all sketch holes simultaneously, while SuFu can effectively decompose the synthesis problem with the help of a (much smaller!) compression function.

Discussion
Failure analysis.SuFu fails to solve 26 out of 290 tasks in our dataset (Tab.5).We identify three reasons for these failures.First, on 17 tasks, SuFu times out finding a valid compression function ?compress.Second, on 4 tasks, SuFu succeeds in finding the expected ?compress, but the underlying PBE solver times out synthesizing sketch holes, despite having local IO examples available.
Finally, on the last 5 tasks, SuFu uses an unexpected ?compress whose sketch holes are more complex than necessary, making the PBE solver times out.Note that although our specification for ?compress (Eq.5) ensures the existence of valid sketch holes, their size may vary significantly regarding the choice of ?compress.For example, one reference program in our dataset is sqrsum (upto n), where upto constructs a Packed list of numbers from 1 to n, and sqrsum sums their squares.On this task, SuFu synthesizes an empty ?compress because the integer n is enough to determine the output as n(n + 1)(2n + 1)/6, but this expression is too complex for the PBE solver to synthesize.
Verification.Perhaps the biggest limitation of our tool is that it performs only bounded verification.Although this is common in program synthesis with loops and recursion [Miltner et al. 2022;Solar-Lezama et al. 2006;Torlak and Bodík 2013], this makes SuFu unsuitable for applications where correctness is critical, such as compiler optimization.Fortunately, since SuFu can be combined with any verifier, it can automatically benefit from advances in verification technology.Effective verifiers already exist for some of the domains targeted by SuFu, such as structural recursion [K. et al. 2022] and D&C algorithms with a single-pass reference [Farzan and Nicolet 2017].
Moreover, we believe that the compression functions produced by SuFu as a by-product of synthesis can serve as useful lemmas that aid unbounded verification.For example, in the mts task, the compression function connects the second output of tail' to the sum of the list, i.e. ∀xs, sum xs = (tails' xs).2.This is a necessary lemma for proving the correctness of the synthesis result mts', and it is challenging for existing verifiers to conjecture this lemma out of thin air.

RELATED WORK
Fusion.Many deductive fusion systems have been designed to eliminate intermediate data structures.They iteratively apply pre-defined rules to rewrite the reference program toward the direction with fewer intermediate data structures.Representatives of such systems include the fold/unfold framework for handling generic recursion [Chin 1992;Gill et al. 1993;Hamilton 2001;Wadler 1988] and the program calculation framework dealing with specific forms of recursive functions [Bird 1989;Bird and de Moor 1997;Hinze et al. 2010;Meijer et al. 1991;Takano and Meijer 1995].
Compared with these deductive systems, SuFu has both advantages and limitations: • On the one hand, deductive fusion is correct by construction, generally faster, and does not require users to annotate the intermediate data structures to be eliminated.• On the other hand, SuFu achieves significantly better expressiveness via inductive synthesis.Besides deductive transformation, there is another line of work that achieves fusion by permuting the instructions in the reference program [Sakka et al. 2017;Sundararajah and Kulkarni 2019;Wang et al. 2021].Although these approaches work well in certain domains, they can hardly be applied to our dataset because many of our tasks require generating entirely new expressions that cannot be obtained by shuffling around the sub-terms in the reference program.
Synthesizing efficient programs.There are several lines of previous research on synthesizing efficient programs.First, superoptimization [Bornholt et al. 2016;Phothilimthana et al. 2014Phothilimthana et al. , 2016;;Schkufza et al. 2013;Sharma et al. 2015] uses inductive synthesis to generate the most efficient implementation for the reference program.However, existing superoptimization approaches consider only low-level, loop-free programs, and thus cannot be applied to our problem.
Second, program restructuring tools rewrite a reference program into a given target form that is known to be efficient [Acar et al. 2005;Farzan et al. 2022;Farzan andNicolet 2017, 2021a;Fedyukovich et al. 2017;Ji et al. 2024b;Morita et al. 2007;Pu et al. 2011].Each one of this tools is designed to handle a specific target form, and thus cannot be applied to the general fusion problem.On the other hand, as discussed in Sec. 5, program restructuring for some target forms [Farzan et al. 2022;Ji et al. 2024b] can be reduced to fusion and solved by SuFu.
Finally, resource-aware synthesis [Hu et al. 2021;Knoth et al. 2019] aims to find a program satisfying an efficiency requirement specified in a type system.Compared with SuFu, these approaches can deal with more refined efficiency requirements via complex type systems but do not scale to synthesizing large programs because they synthesize the whole program from scratch.
Other related program synthesis approaches.First, since the core synthesis problem of SuFu is a sketch problem, SuFu is related to previous sketch solvers [Jeon et al. 2015;Lu and Bodík 2023;Lubin et al. 2020;Porncharoenwase et al. 2022;Solar-Lezama et al. 2006;Torlak and Bodík 2014].
General-purposed sketch solvers, however, can hardly scale to superfusion tasks, where the target programs of sketch holes are typically large.
Second, SuFu addresses the scalability challenge by decomposing the sketch problem into subproblems.Hence, SuFu is related to previous approaches for decomposing synthesis tasks.
• Angelic synthesis (or uninterpreted functions) [Ji et al. 2024b;Kuncak and Blanc 2013;Singh et al. 2014] uses the congruence property of functions (i.e., identical inputs imply identical outputs) to eliminate unknown programs from a complex specification.It is also the key idea of quantifier elimination in SuFu.However, these previous techniques cannot be directly applied to eliminate sketch holes in our tasks because the program space of sketch holes is limited to O (1)-time programs for the sake of efficiency.We cannot regard such a sketch hole as a whole as an uninterpreted function, because most functions operating on data structures cannot be implemented in O (1)-time.To address this issue, SuFu decomposes the sketch hole into ?extractand ?comb and applies the congruence property only to ?comb.• Model learning [Huang and Qiu 2022] synthesizes models (e.g., pre-and post-conditions) to replace concrete invocations of library functions and thus avoid analyzing complex library functions during synthesis.This process requires input-output oracles for library functions.However, in our case, such oracles are not available, neither for the compression function nor for the sketch holes, because we do not even know the type of the compressed data structures (i.e. the scalar attributes) these programs operate on, let alone the concrete values.
Finally, Revamp [Pailoor et al. 2024] solves a related synthesis problem, dubbed code refactoring for abstract data types (ADTs).Given an original implementation of an ADT and a relational specification relating the original data representation to a new one, Revamp synthesizes a new ADT implementation using the new representation.SuFu is similar to Revamp in that both are concerned with replacing data structures in a program, and our compression function is similar to the relational specification in Revamp.Despite these similarities, we believe Revamp and SuFu are complementary.Specifically, Revamp supports replacing a data structure with another complex data structure but requires the user to manually specify the relation connecting the two.In contrast, SuFu automatically synthesizes the relation (i.e., the compression function) but currently only supports compressing data structures into scalar values.

CONCLUSION
In this paper, we present superfusion, a novel approach to eliminating intermediate data structures in functional programs using inductive program synthesis.Given a reference program annotated with data structures to be eliminated, superfusion first generates a sketch by transforming the reference program into an intermediate language and then fills the sketch holes with O (1)-time expressions.To make the synthesis scale, our approach synthesizes a ghost function ?compress and uses it to decompose the sketch problem into independent synthesis problems for each hole.We implement superfusion in a tool called SuFu and evaluate it on a dataset of 290 tasks collected from previous studies.The results demonstrate SuFu's superior expressiveness compared to existing tools for fusion and program restructuring.
The two most exciting directions for future work are (1) unbounded verification and (2) support for non-scalar compression functions.Unbounded verification will enable using SuFu in settings like compiler optimizations, where correctness is critical and the user cannot be expected to inspect the synthesized program.Supporting arbitrary complex data structures as outputs of the compression function will enable SuFu to be used in a broader range of applications, promoting it from a fusion tool to a general-purpose program optimization tool.§ ¤  SuFu infers kind labels from types.Specifically, given a program in λ sk , SuFu first attaches a symbolic kind label to each Packed and label, and then runs a round of type-checking with these symbolic labels.In this process, SuFu regards Packed types with different kind labels as different types and takes label[x] as the specific constructor for type Packed[x ].Hence, the type-checking will raise a series of equivalent constraints between kind labels, and SuFu will assign the same value to each equivalent class.For example, suppose the kind label of the Packed in map is x, and the kind label of the label in hole ?t 1 is y.Then, the output type of map will be Packed  environment: To obtain symbolic local examples, SuFu introduces two compression functions ?compress 1 and ?compress 2 , each for one kind of intermediate data structures, as they can be replaced with different scalar attributes.Correspondingly, SuFu replaces every label with the compression function of its kind label, for example, the above two examples will result in the symbolic examples below.
Compression synthesis.Our synthesis algorithm for a single compression function (Algorithm 2) can be straightforwardly extended to synthesize both ?compress 1 and ?compress 2 , as shown below.
(1) Algorithm 2 can be extended to iterate with two sets of scalar attributes, respectively corresponding to the two compression functions.
(2) The specification solved by Refine (Eq. 6) can be extended as follows to synthesize new attributes for both ?compress 1 and ?compress 2 .
Here, unknown functions ?compress ′ 1 and ?compress ′ 2 are introduced to respectively specify new attributes for the two compression functions, and the intermediate data structures in the original input are divided into two sets, in orig 1,i and in orig 2,i , according to their kind labels.
(3) The transformation for eliminating ?t h still works, and the final enumeration approach can be extended to this case by enumerating ?extract, ?compress ′ 1 , and ?compress ′ 2 at the same time.

A.2 Program Restructuring for Data Structure Traversals
One baseline solver in our evaluation is Synduce [Farzan et al. 2022], which restructures recursive data-structure traversals according to a user-provided sketch that traverses the input data structure at most once.This kind of program restructuring tasks can be reduced to fusion and thus can be solved by SuFu.Fig. 8 shows an example of this reduction.
• In the original task (Fig. 8a), the reference program mits is a composition of two traversals: flatten traverses the input tree and collects all leaf values into a list in order, and mts then returns the maximum tail sum of the value list.The goal of this task is to restructure mits using the given template rec, which traverses the input tree only once and directly calculates the result during the traversal, via unknown functions ?f 1 and ?f 2 .• Fig. 8b shows the expected result of this restructuring task, which calculates the sum of leaf values as an auxiliary output for calculating mits in a single traversal.As mentioned in Sec. 5, SuFu can be applied to a program restructuring task by (1) implementing an identity function in the target form as the template function, (2) composing the template function with the reference program to be restructured, and (3) invoking SuFu to eliminate the intermediate data structure generated by the template function.Fig. 8c shows the input of SuFu for this task: rec is the template function that implements the identity function as a traversal, and res composes this template with the reference program mits.By invoking SuFu to eliminate the intermediate tree produced by rec, we can get the same program as the expected result in Fig. 8b.

B APPENDIX: VERIFICATION
SuFu follows the CEGIS framework and thus requires a verifier to generate counter-examples for incorrect results.We use bounded verification in our implementation.Specifically, we limit the size of inductive data structures to no larger than 10 and limit the range of integers to [−3, 3].Given a candidate program, our verifier will evaluate this program and the reference program on every input within these limits and check whether their outputs are the same.If not, the corresponding input will be returned as a counter-example.
Besides, to reduce the time cost of verification, SuFu initializes the input set of CEGIS with 10 3 random inputs (sampled from the same range as verification).These random inputs can exclude most incorrect results and thus can significantly reduce the number of necessary CEGIS iterations.

C APPENDIX: PROOFS
Theorem C.1 (Thm.4.9).For any sketching problem with sketch p and input set I , the synthesized program has the same output as p on set I , if the underlying PBE solver for sketch holes is sound.
Proof.A sound PBE solver ensures that the solution to every hole satisfies the concrete examples corresponding to the synthesized ?compress.Hence, to prove the soundness of SuFu, we only need to prove that any ?compressand sketch solution that satisfy all local examples always induce a correct synthesized program.We achieve this by proving the claim below.Packed types with T i , and rewrite terms with t h .If this claim is correct, we can obtain the correctness of the synthesized program by taking E as empty, taking t as p in for every in ∈ I , and taking v as the corresponding evaluation result.We prove this claim by induction on the derivation of E ⊢ t ⇓ v.This process is straightforward, and we demonstrate it using three representative evaluation rules shown in Fig. 9.
Case 1: the last evaluation rule is E-Var.In this case, the target claim becomes as follows.
Case 2: the last evaluation rule is E-Let.In this case, the target claim becomes as follows.
(subst ?compressE) ⊢ let x, subst (T i , t h ) t, subst (T i , t h ) ′ (subst ?compress v) Using the latter two premises, the induction hypothesis, and the definition of subst on environments, we obtain the following two evaluation judgments.
(subst ?compressE) ⊢ subst (T i , t h ) t ⇓ (subst ?compress v x ) subst ?compressE, x → subst ?compress v x ⊢ subst (T i , t h ) t ′ ⇓ (subst ?compress v) These judgments form the premise of applying rule E-Let, and by applying this rule, we obtain the consequence of the target claim in this case, as shown below.
(subst ?compressE) ⊢ let x, subst (T i , t h ) t, subst (T i , t h ) t ′ ⇓ (subst ?compress v) Case 3: the last evaluation rule is E-Rewrite.In this case, the target claim becomes as follows, where h denotes the index of the current rewrite term.E ⊢ rewrite(t ) ⇓ v =⇒ (subst ?compressE) ⊢ t h ⇓ (subst ?compress v) Recall that SuFu collects local examples from the evaluation judgments of rewrite terms.Here, SuFu will collect environment E and value v as a local example, whose symbolic form is as follows.
?t h (subst ?compressE) = (subst ?compress v) Hence, the target claim in this case is implied by the soundness of the PBE solver.□ Theorem C.2 (Thm.4.10).For any sketching problem, SuFu can find a solution if Asm. 4.8 holds, the underlying PBE solver for sketch holes is complete, and the program space of ?compress can implement any function mapping from intermediate data structures to scalar values.
Proof.A complete PBE solver can always find sketch holes from concrete examples (if realizable), hence SuFu can always find a sketch solution if it can find a valid compression function ?compress.Moreover, since the synthesis approach of ?compress (Algorithm 2) returns only when the result is known to be valid, we only need to prove the terminality of this algorithm.
We achieve this by proving the following claim.
• For any final set of symbolic local examples E sym , there always exists a function compress * such that for any candidate compression function compress, compress * is always a valid result for Refine(Subst(E sym , compress)) (Lines 3-4 in Algorithm 2).Let us first show how this claim implies the terminality of Algorithm 2. Suppose such a compress * exists but Algorithm 2 does not terminate.Then, there must be infinite invocations of Refine.
• On the one hand, the results of these invocations must be pairwise different because, in each invocation, those attributes found previously have been merged into the input, hence they will never be returned again by the bottom-up enumeration.• On the other hand, the results of these invocations cannot be larger than compress * because the bottom-up enumeration will return compress * as the result once visiting it.However, the number of programs no larger than compress * must be finite, so there cannot be an infinite number of different results, raising a conflict.Now the task remaining is to prove the claim, i.e. construct compress * for a finite set of local examples.Recall that the specification of ?compress ′ (the result of Refine) is to make sketch holes possible to satisfy concrete examples, and there are two restrictions on the sketch holes.
(1) The time complexity of the sketch hole should be O (1).
(2) The sketch hole should have a scalar type, which induces a requirement that the output type of the compression function should be scalar.If ignore both restrictions, the identity function id in = in must be a valid compression function.Note that the output of local examples includes only (1) the scalar outputs of the original rewrite term, and (2) some known attributes (specified by a known function compress) of intermediate data structures constructed by the original term.When the compression function is id, the input of local examples will include the whole input of the original term, hence we can construct the sketch hole by (1) using the original rewrite term to produce the original outputs, and (2) using compress to calculate the attributes for each intermediate data structure.
Moreover, id is still valid even if we bring back the restriction on the time complexity.This is because the sketch hole is only constrained by a finite set of local examples, hence we can always remove the recursions/loops in the sketch hole by unrolling them enough times, thus generating a constant-time program that has the same IO behavior on the given set of local examples.
Finally, when both restrictions are present, we tune the output type of id to scalar using the example set again.Specifically, in Algorithm 2, the compression function will only be evaluated on Proof.Let (T i , t h ) be the sketch solution found by SuFu.Since SuFu ensures t h to be constanttime, we can take constant c as a large enough integer such that the derivation tree of every t h will never be larger than c.The task remaining is to prove that Eq. 8 holds for this constant.
For any input in, let T and T ′ be the derivation tree of evaluating p and p ′ on in, respectively.Then, T ′ can always be transformed from T by replacing all subtrees related to rewrite terms with derivation trees of evaluating some t h .Therefore, the target theorem is implied by the inequality

Fig. 1 .
Fig. 1.The workflow of SuFu on the mts example.(a) The input to SuFu: a reference implementation of mts with the output type of tails annotated for elimination.(b) Sketch generation: subterms highlighted in red produce or consume NList and hence are replaced with holes.(c) Sketch solving: the optimized program synthesized by SuFu where terms filled into sketch holes are highlighted.

Fig. 2 .
Fig. 2. SuFu's intermediate language λ sk presented via abstract binding trees [Harper 2016].We use ind(•) to denote inductive data types, and C to denote data constructors.Annotations (highlighted in red ) are not part of the surface language.SuFu automatically introduces term-level annotations (rewrite, label, and unlabel) given the type-level Packed annotations provided by the user.

Example 3. 1 .
Consider again the mts program from Fig.1a.Here the user has annotated the output of tails with Packed, thereby making the program ill-typed in λ sk : specifically, the body of tails is ill-typed because it produces an NList where a Packed NList is expected, and conversely, the body of mts is ill-typed because tails xs returns a Packed NList where an NList is expected by map sum.The task of the elaboration process is to insert label, unlabel, and rewrite annotations to make the program well-typed in λ sk .Consider four possible elaborations of the body of mts: maximum (map sum (unlabel (tails xs))) (A) maximum (rewrite (map sum (unlabel (tails xs)))) (B) rewrite (maximum (map sum (unlabel (tails xs)))) (C) let ts = tails xs in rewrite (maximum (map sum (unlabel ts)))
Algorithm 1 is SuFu's top-level sketch solving algorithm.Given a sketch p and a set of inputs I , SuFu evaluates the sketch on each input and collects symbolic local examples E sym for the holes (Line 1).From the symbolic examples it synthesizes a compression function compress, which maps intermediate data structures to their scalar attributes (Line 2).It then substitutes compress into the symbolic examples to obtain concrete local examples E io (Line 3).Finally, SuFu uses a PBE solver to synthesize a solution for each sketch h from its concrete examples, E io [h] (Line 4).

Example 4. 6 .
Let us walk through Algorithm 2 for the mts task.Here we assume that enough executions of mts are available to avoid spurious solutions and focus on two local examples for holes ?t 2 and ?t 3 in Tab.3; both of these examples are collected from the execution mts_label [2, -1].The row sym shows the symbolic local examples (with intermediate data structures in blue), while the remaining rows show the concrete version in each iteration of Algorithm 2 (Lines 2-7), where intermediate data structures are replaced according to the current compression function.

Fig. 5 .
Fig.5.D&C program restructuring as superfusion.The input to SuFu includes the template dac_id, which is the same for all D&C restructuring tasks, and a reference program dac_mts that composes the template with the original reference program mts.For simplicity, we consider only non-empty lists.

Fig. 6 .
Fig. 6.Number of tasks solved by SuFu and each baseline solver over time.Comparisons with Synduce and AutoLifter are on their respective datasets, while the comparison with Grisette is on our full dataset.

Fig. 7 .
Fig. 7.An example of eliminating multiple data structures at once.(a) A reference program of mts with two annotated data structures.(b) The sketch generated by SuFu, where [i] following each label and Packed shows the kind label of that constructor.Here we omit the details in tails for simplicity.

Fig. 8 .
Fig. 8.An example for restructuring data structure traversals.(a) The input of the restructuring task, where mits is the reference program, and rec is the user-provided sketch.(b) The expected result of program restructuring, which introduces the sum of leaf values as an auxiliary output.(c) The corresponding input of SuFu for solving this restructuring task.
intermediate data structures in the local examples.Therefore, a truncation function trunc d that returns only the first d levels of the input will have the same effect as the identity function, where d is the maximum depth of the intermediate data structures in the examples.The output of trunc d includes only a constant number of values and thus can be trivially encoded into a scalar type.□ Theorem C.3 (Thm.4.11).Let cost(p, in) be the size of the derivation tree of evaluating program p on input in.For any sketching problem, the following formula is always satisfied by the reference program p and the program p ′ synthesized by SuFu.∃c > 0, ∀in, cost(p ′ , in) ≤ c • cost(p, in)

Table 1 .
Local examples collected by executing mts [2].From left to right: hole id and the variables it has in scope; the original term for this hole; IO behavior of the original term (with intermediate data structures marked in blue); symbolic local examples obtained by "compressing" intermediate data structures with an unknown program ?compress; concrete local examples obtained using the ?compress in Eq. 1.
This definition can now be substituted into the symbolic local examples to obtain concrete local examples for each sketch hole, as shown in the last column of Tab. 1.With enough local examples (collected by executing mts on enough inputs), an off-the-shelf PBE solver can efficiently synthesize the correct solution to each hole, shown in Fig.

Table 2 .
Three invalid candidates for ?compress 1 and examples sufficient to reject them based on Eq. 4.

Table 3 .
In each iteration, SuFu first instantiates the symbolic examples E sym with the current compression function λx .A to obtain the concrete local examples E (Line 3).The function Refine (Lines 8-12) Symbolic local examples E sym .Workflow of Algorithm 2 on mts.Row sym shows two symbolic local examples collected from for holes ?t 2 and ?t 3 (with intermediate data structures in blue).The remaining three rows show concrete versions of those examples in the first three iterations, with the intermediate data structures replaced by the scalar attributes added so far.Grayed out examples (and attributes in the output) are ignored because they are already known to be satisfied by the current compression function.

Table 4 .
Profile of our dataset.#No-Aux reports the number/percent of tasks that do not require inventing auxiliary attributes (which is an upper bound on the performance of deductive fusion).Program Size reports the number of AST nodes in the reference program per task.Packed annotations reports the number of Packed annotations per task, while Unique reports the total number of unique data types to be eliminated.

Table 5 .
Performance of SuFu on the full dataset.#Solved reports the number/percent of tasks solved.Time Cost reports the average time costs (in seconds) of sketch generation (Gen) and sketch solving (Syn).Result Size reports the average number of AST nodes in the programs synthesized by SuFu, Compress for ?compress,Extract for ?extract, and Holes for the sketch holes.

Table 6 .
Sample synthesis results.Terms filled into sketch holes are highlighted in red .

Table 7 .
Comparison between SuFu and program restructuring tools, where Time is the mean time (in seconds) on tasks solved by both tools.

Table 8 .
Comparison between SuFu and Grisette; Time is the mean time on tasks solved by both tools.
example, from the execution of mts[2], SuFu can collect the local example below for hole ?t 2 .environment:h→[2],ts→label[2] [0] value: label[2] [2, 0]Here both ts and the output value have the kind label [2] because they are constructed by the label[2] in map.Similarly, SuFu can collect the local example below for ?t 3 from the same execution mts[2], where ts has the kind label[1]because it is constructed by some label[1] in tails.

•
For any sketch solution (T i , t h ) and compression function ?compress, an evaluation judgment E ⊢ t ⇓ v still holds if all intermediate data structures in environment E and value v are replaced with ?compress, term t is rewritten by the sketch solution, and SuFu and the sketch solution satisfy all local examples collected from E ⊢ t ⇓ v. Or more formally:E ⊢ t ⇓ v =⇒ (subst ?compressE) ⊢ subst (T i , t h ) t ⇓ (subst ?compress v)where ?compress and (T i , t h ) are a compression function and a sketch solution satisfying all local examples collected from E ⊢ t ⇓ v, and function subst replaces label with ?compress,