skip to main content
research-article
Open Access

Synthesis of data completion scripts using finite tree automata

Published:12 October 2017Publication History
Skip Abstract Section

Abstract

In application domains that store data in a tabular format, a common task is to fill the values of some cells using values stored in other cells. For instance, such data completion tasks arise in the context of missing value imputation in data science and derived data computation in spreadsheets and relational databases. Unfortunately, end-users and data scientists typically struggle with many data completion tasks that require non-trivial programming expertise. This paper presents a synthesis technique for automating data completion tasks using programming-by-example (PBE) and a very lightweight sketching approach. Given a formula sketch (e.g., AVG(?1, ?2)) and a few input-output examples for each hole, our technique synthesizes a program to automate the desired data completion task. Towards this goal, we propose a domain-specific language (DSL) that combines spatial and relational reasoning over tabular data and a novel synthesis algorithm that can generate DSL programs that are consistent with the input-output examples. The key technical novelty of our approach is a new version space learning algorithm that is based on finite tree automata (FTA). The use of FTAs in the learning algorithm leads to a more compact representation that allows more sharing between programs that are consistent with the examples. We have implemented the proposed approach in a tool called DACE and evaluate it on 84 benchmarks taken from online help forums. We also illustrate the advantages of our approach by comparing our technique against two existing synthesizers, namely Prose and Sketch.

References

  1. Parosh A Abdulla, Ahmed Bouajjani, Lukáš Holík, Lisa Kaati, and Tomáš Vojnar. 2008. Composed bisimulation for tree automata. In International Conference on Implementation and Application of Automata. Springer, 212–222. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Aws Albarghouthi, Sumit Gulwani, and Zachary Kincaid. 2013. Recursive Program Synthesis (CAV). Springer-Verlag, 934–950.Google ScholarGoogle Scholar
  3. Rajeev Alur, Pavol Čern`y, and Arjun Radhakrishna. 2015. Synthesis through unification. In International Conference on Computer Aided Verification . Springer, 163–179. Google ScholarGoogle ScholarCross RefCross Ref
  4. James Bornholt, Emina Torlak, Dan Grossman, and Luis Ceze. 2016. Optimizing Synthesis with Metasketches (POPL). ACM, 775–788.Google ScholarGoogle Scholar
  5. Julien Cristau, Christof Löding, and Wolfgang Thomas. 2005. Deterministic Automata on Unranked Trees (FCT). SpringerVerlag, 68–79.Google ScholarGoogle Scholar
  6. Yu Feng, Ruben Martins, Jacob Van Geffen, Isil Dillig, and Swarat Chaudhuri. 2017. Component-based synthesis of table consolidation and transformation tasks from examples. In PLDI. ACM, 422–436. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. John K. Feser, Swarat Chaudhuri, and Isil Dillig. 2015. Synthesizing Data Structure Transformations from Input-output Examples (PLDI). ACM, 229–239.Google ScholarGoogle Scholar
  8. Sumit Gulwani. 2011. Automating String Processing in Spreadsheets Using Input-output Examples (POPL). ACM, 317–330.Google ScholarGoogle Scholar
  9. Haruo Hosoya and Benjamin C. Pierce. 2003. XDuce: A Statically Typed XML Processing Language. ACM Trans. Internet Technol. 3, 2 (2003), 117–148. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Bishoksan Kafle and John P. Gallagher. 2015. Tree Automata-Based Refinement with Application to Horn Clause Verification (VMCAI 2015) . Springer-Verlag New York, Inc., 209–226.Google ScholarGoogle Scholar
  11. Kevin Knight and Jonathan May. 2009. Applications of weighted automata in natural language processing. In Handbook of Weighted Automata . Springer, 571–596. Google ScholarGoogle ScholarCross RefCross Ref
  12. Tessa Lau, Steven A. Wolfman, Pedro Domingos, and Daniel S. Weld. 2003. Programming by Demonstration Using Version Space Algebra. Mach. Learn. 53, 1-2 (2003), 111–156.Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. A Solar Lezama. 2008. Program synthesis by sketching. Ph.D. Dissertation.Google ScholarGoogle Scholar
  14. Parthasarathy Madhusudan. 2011. Synthesizing Reactive Programs. In Computer Science Logic. 428–442.Google ScholarGoogle Scholar
  15. Wim Martens and Joachim Niehren. 2005. Minimizing Tree Automata for Unranked Trees. Springer Berlin Heidelberg, 232–246.Google ScholarGoogle Scholar
  16. Jonathan May and Kevin Knight. 2008. A Primer on Tree Automata Software for Natural Language Processing. (2008).Google ScholarGoogle Scholar
  17. Tom M Mitchell. 1982. Generalization as search. Artificial intelligence 18, 2 (1982), 203–226.Google ScholarGoogle Scholar
  18. Peter-Michael Osera and Steve Zdancewic. 2015. Type-and-example-directed Program Synthesis (PLDI). ACM, 619–630.Google ScholarGoogle Scholar
  19. Michael Pardowitz, Bernhard Glaser, and Rüdiger Dillmann. 2007. Learning Repetitive Robot Programs from Demonstrations Using Version Space Algebra. In Proceedings of the 13th IASTED International Conference on Robotics and Applications (RA) . ACTA Press, 394–399.Google ScholarGoogle Scholar
  20. Nadia Polikarpova, Ivan Kuraj, and Armando Solar-Lezama. 2016. Program Synthesis from Polymorphic Refinement Types (PLDI) . ACM, 522–538.Google ScholarGoogle Scholar
  21. Oleksandr Polozov and Sumit Gulwani. 2015. FlashMeta: A Framework for Inductive Program Synthesis (OOPSLA). ACM, 107–126.Google ScholarGoogle Scholar
  22. Eric Schkufza, Rahul Sharma, and Alex Aiken. 2013. Stochastic Superoptimization (ASPLOS). 305–316.Google ScholarGoogle Scholar
  23. Rishabh Singh and Sumit Gulwani. 2012. Synthesizing number transformations from input-output examples (CAV). Springer, 634–651.Google ScholarGoogle Scholar
  24. Calvin Smith and Aws Albarghouthi. 2016. MapReduce Program Synthesis (PLDI). ACM, 326–340.Google ScholarGoogle Scholar
  25. Armando Solar-Lezama, Gilad Arnold, Liviu Tancau, Rastislav Bodik, Vijay Saraswat, and Sanjit Seshia. 2007. Sketching Stencils (PLDI). ACM, 167–178.Google ScholarGoogle Scholar
  26. Armando Solar-Lezama, Rodric Rabbah, Rastislav Bodík, and Kemal Ebcioğlu. 2005. Programming by Sketching for Bit-streaming Programs (PLDI). ACM, 281–294.Google ScholarGoogle Scholar
  27. Armando Solar-Lezama, Liviu Tancau, Rastislav Bodik, Sanjit Seshia, and Vijay Saraswat. 2006. Combinatorial Sketching for Finite Programs (ASPLOS). ACM, 404–415.Google ScholarGoogle Scholar
  28. James W Thatcher and Jesse B Wright. 1968. Generalized finite automata theory with an application to a decision problem of second-order logic. Theory of Computing Systems 2, 1 (1968), 57–81.Google ScholarGoogle Scholar
  29. Abhishek Udupa, Arun Raghavan, Jyotirmoy V. Deshmukh, Sela Mador-Haim, Milo M. K. Martin, and Rajeev Alur. 2013. TRANSIT: specifying protocols with concolic snippets (PLDI). 287–296.Google ScholarGoogle Scholar
  30. Xinyu Wang, Sumit Gulwani, and Rishabh Singh. 2016. FIDEX: Filtering Spreadsheet Data using Examples (OOPSLA). ACM, 195–213.Google ScholarGoogle Scholar
  31. Navid Yaghmazadeh, Christian Klinger, Isil Dillig, and Swarat Chaudhuri. 2016. Synthesizing Transformations on Hierarchically Structured Data (PLDI). ACM, 508–521.Google ScholarGoogle Scholar

Index Terms

  1. Synthesis of data completion scripts using finite tree automata

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader
    About Cookies On This Site

    We use cookies to ensure that we give you the best experience on our website.

    Learn more

    Got it!