Abstract
In application domains that store data in a tabular format, a common task is to fill the values of some cells using values stored in other cells. For instance, such data completion tasks arise in the context of missing value imputation in data science and derived data computation in spreadsheets and relational databases. Unfortunately, end-users and data scientists typically struggle with many data completion tasks that require non-trivial programming expertise. This paper presents a synthesis technique for automating data completion tasks using programming-by-example (PBE) and a very lightweight sketching approach. Given a formula sketch (e.g., AVG(?1, ?2)) and a few input-output examples for each hole, our technique synthesizes a program to automate the desired data completion task. Towards this goal, we propose a domain-specific language (DSL) that combines spatial and relational reasoning over tabular data and a novel synthesis algorithm that can generate DSL programs that are consistent with the input-output examples. The key technical novelty of our approach is a new version space learning algorithm that is based on finite tree automata (FTA). The use of FTAs in the learning algorithm leads to a more compact representation that allows more sharing between programs that are consistent with the examples. We have implemented the proposed approach in a tool called DACE and evaluate it on 84 benchmarks taken from online help forums. We also illustrate the advantages of our approach by comparing our technique against two existing synthesizers, namely Prose and Sketch.
- Parosh A Abdulla, Ahmed Bouajjani, Lukáš Holík, Lisa Kaati, and Tomáš Vojnar. 2008. Composed bisimulation for tree automata. In International Conference on Implementation and Application of Automata. Springer, 212–222. Google Scholar
Digital Library
- Aws Albarghouthi, Sumit Gulwani, and Zachary Kincaid. 2013. Recursive Program Synthesis (CAV). Springer-Verlag, 934–950.Google Scholar
- Rajeev Alur, Pavol Čern`y, and Arjun Radhakrishna. 2015. Synthesis through unification. In International Conference on Computer Aided Verification . Springer, 163–179. Google Scholar
Cross Ref
- James Bornholt, Emina Torlak, Dan Grossman, and Luis Ceze. 2016. Optimizing Synthesis with Metasketches (POPL). ACM, 775–788.Google Scholar
- Julien Cristau, Christof Löding, and Wolfgang Thomas. 2005. Deterministic Automata on Unranked Trees (FCT). SpringerVerlag, 68–79.Google Scholar
- Yu Feng, Ruben Martins, Jacob Van Geffen, Isil Dillig, and Swarat Chaudhuri. 2017. Component-based synthesis of table consolidation and transformation tasks from examples. In PLDI. ACM, 422–436. Google Scholar
Digital Library
- John K. Feser, Swarat Chaudhuri, and Isil Dillig. 2015. Synthesizing Data Structure Transformations from Input-output Examples (PLDI). ACM, 229–239.Google Scholar
- Sumit Gulwani. 2011. Automating String Processing in Spreadsheets Using Input-output Examples (POPL). ACM, 317–330.Google Scholar
- Haruo Hosoya and Benjamin C. Pierce. 2003. XDuce: A Statically Typed XML Processing Language. ACM Trans. Internet Technol. 3, 2 (2003), 117–148. Google Scholar
Digital Library
- Bishoksan Kafle and John P. Gallagher. 2015. Tree Automata-Based Refinement with Application to Horn Clause Verification (VMCAI 2015) . Springer-Verlag New York, Inc., 209–226.Google Scholar
- Kevin Knight and Jonathan May. 2009. Applications of weighted automata in natural language processing. In Handbook of Weighted Automata . Springer, 571–596. Google Scholar
Cross Ref
- Tessa Lau, Steven A. Wolfman, Pedro Domingos, and Daniel S. Weld. 2003. Programming by Demonstration Using Version Space Algebra. Mach. Learn. 53, 1-2 (2003), 111–156.Google Scholar
Digital Library
- A Solar Lezama. 2008. Program synthesis by sketching. Ph.D. Dissertation.Google Scholar
- Parthasarathy Madhusudan. 2011. Synthesizing Reactive Programs. In Computer Science Logic. 428–442.Google Scholar
- Wim Martens and Joachim Niehren. 2005. Minimizing Tree Automata for Unranked Trees. Springer Berlin Heidelberg, 232–246.Google Scholar
- Jonathan May and Kevin Knight. 2008. A Primer on Tree Automata Software for Natural Language Processing. (2008).Google Scholar
- Tom M Mitchell. 1982. Generalization as search. Artificial intelligence 18, 2 (1982), 203–226.Google Scholar
- Peter-Michael Osera and Steve Zdancewic. 2015. Type-and-example-directed Program Synthesis (PLDI). ACM, 619–630.Google Scholar
- Michael Pardowitz, Bernhard Glaser, and Rüdiger Dillmann. 2007. Learning Repetitive Robot Programs from Demonstrations Using Version Space Algebra. In Proceedings of the 13th IASTED International Conference on Robotics and Applications (RA) . ACTA Press, 394–399.Google Scholar
- Nadia Polikarpova, Ivan Kuraj, and Armando Solar-Lezama. 2016. Program Synthesis from Polymorphic Refinement Types (PLDI) . ACM, 522–538.Google Scholar
- Oleksandr Polozov and Sumit Gulwani. 2015. FlashMeta: A Framework for Inductive Program Synthesis (OOPSLA). ACM, 107–126.Google Scholar
- Eric Schkufza, Rahul Sharma, and Alex Aiken. 2013. Stochastic Superoptimization (ASPLOS). 305–316.Google Scholar
- Rishabh Singh and Sumit Gulwani. 2012. Synthesizing number transformations from input-output examples (CAV). Springer, 634–651.Google Scholar
- Calvin Smith and Aws Albarghouthi. 2016. MapReduce Program Synthesis (PLDI). ACM, 326–340.Google Scholar
- Armando Solar-Lezama, Gilad Arnold, Liviu Tancau, Rastislav Bodik, Vijay Saraswat, and Sanjit Seshia. 2007. Sketching Stencils (PLDI). ACM, 167–178.Google Scholar
- Armando Solar-Lezama, Rodric Rabbah, Rastislav Bodík, and Kemal Ebcioğlu. 2005. Programming by Sketching for Bit-streaming Programs (PLDI). ACM, 281–294.Google Scholar
- Armando Solar-Lezama, Liviu Tancau, Rastislav Bodik, Sanjit Seshia, and Vijay Saraswat. 2006. Combinatorial Sketching for Finite Programs (ASPLOS). ACM, 404–415.Google Scholar
- James W Thatcher and Jesse B Wright. 1968. Generalized finite automata theory with an application to a decision problem of second-order logic. Theory of Computing Systems 2, 1 (1968), 57–81.Google Scholar
- Abhishek Udupa, Arun Raghavan, Jyotirmoy V. Deshmukh, Sela Mador-Haim, Milo M. K. Martin, and Rajeev Alur. 2013. TRANSIT: specifying protocols with concolic snippets (PLDI). 287–296.Google Scholar
- Xinyu Wang, Sumit Gulwani, and Rishabh Singh. 2016. FIDEX: Filtering Spreadsheet Data using Examples (OOPSLA). ACM, 195–213.Google Scholar
- Navid Yaghmazadeh, Christian Klinger, Isil Dillig, and Swarat Chaudhuri. 2016. Synthesizing Transformations on Hierarchically Structured Data (PLDI). ACM, 508–521.Google Scholar
Index Terms
Synthesis of data completion scripts using finite tree automata
Recommendations
Can reactive synthesis and syntax-guided synthesis be friends?
PLDI 2022: Proceedings of the 43rd ACM SIGPLAN International Conference on Programming Language Design and ImplementationWhile reactive synthesis and syntax-guided synthesis (SyGuS) have seen enormous progress in recent years, combining the two approaches has remained a challenge. In this work, we present the synthesis of reactive programs from Temporal Stream Logic ...
Optimizing synthesis with metasketches
POPL '16Many advanced programming tools---for both end-users and expert developers---rely on program synthesis to automatically generate implementations from high-level specifications. These tools often need to employ tricky, custom-built synthesis algorithms ...
Algorithmic program synthesis: introduction
Program synthesis is a process of producing an executable program from a specification. Algorithmic synthesis produces the program automatically, without an intervention from an expert. While classical compilation falls under the definition of ...






Comments