skip to main content
article
Public Access

Component-based synthesis of table consolidation and transformation tasks from examples

Published:14 June 2017Publication History
Skip Abstract Section

Abstract

This paper presents a novel component-based synthesis algorithm that marries the power of type-directed search with lightweight SMT-based deduction and partial evaluation. Given a set of components together with their over-approximate first-order specifications, our method first generates a program sketch over a subset of the components and checks its feasibility using an SMT solver. Since a program sketch typically represents many concrete programs, the use of SMT-based deduction greatly increases the scalability of the algorithm. Once a feasible program sketch is found, our algorithm completes the sketch in a bottom-up fashion, using partial evaluation to further increase the power of deduction for rejecting partially-filled program sketches. We apply the proposed synthesis methodology for automating a large class of data preparation tasks that commonly arise in data science. We have evaluated our synthesis algorithm on dozens of data wrangling and consolidation tasks obtained from on-line forums, and we show that our approach can automatically solve a large class of problems encountered by R users.

References

  1. Motivating Example 1. http://stackoverflow. com/questions/30399516/complex-datareshaping-in-r. Accessed 27-Mar-2017.Google ScholarGoogle Scholar
  2. Motivating Example 2. http://stackoverflow.com/ questions/33207263/finding-proportionsin-flights-dataset-in-r. Accessed 27-Mar-2017.Google ScholarGoogle Scholar
  3. Motivating Example 3. http://stackoverflow. com/questions/32875699/how-to-combinetwo-data-frames-in-r-see-details. Accessed 27-Mar-2017.Google ScholarGoogle Scholar
  4. Morpheus. https://utopia-group.github.io/ morpheus/. Accessed 27-Mar-2017.Google ScholarGoogle Scholar
  5. A. Albarghouthi, S. Gulwani, and Z. Kincaid. Recursive Program Synthesis. In Proc. International Conference on Computer Aided Verification, pages 934–950. Springer, 2013.Google ScholarGoogle Scholar
  6. D. W. Barowy, S. Gulwani, T. Hart, and B. G. Zorn. FlashRelate: extracting relational data from semi-structured spreadsheets using examples. In Proc. Conference on Programming Language Design and Implementation, pages 218–228. ACM, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. T. Dasu and T. Johnson. Exploratory data mining and data cleaning, volume 479. John Wiley & Sons, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. L. De Moura and N. Bjørner. Z3: An efficient SMT solver. In Proc. Tools and Algorithms for Construction and Analysis of Systems, pages 337–340. Springer, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Y. Feng, R. Martins, Y. Wang, I. Dillig, and T. Reps. Component-Based Synthesis for Complex APIs. In Proc. Symposium on Principles of Programming Languages. ACM, 2017. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. J. K. Feser, S. Chaudhuri, and I. Dillig. Synthesizing data structure transformations from input-output examples. In Proc. Conference on Programming Language Design and Implementation, pages 229–239. ACM, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. J. Frankle, P. Osera, D. Walker, and S. Zdancewic. Exampledirected synthesis: a type-theoretic interpretation. In Proc. Symposium on Principles of Programming Languages, pages 802–815. ACM, 2016. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. S. Gulwani. Automating string processing in spreadsheets using input-output examples. In Proc. Symposium on Principles of Programming Languages, pages 317–330. ACM, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. S. Gulwani. Automating string processing in spreadsheets using input-output examples. In ACM SIGPLAN Notices, volume 46, pages 317–330. ACM, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. S. Gulwani, S. Jha, A. Tiwari, and R. Venkatesan. Synthesis of loop-free programs. In Proc. Conference on Programming Language Design and Implementation, pages 62–73. ACM, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. P. J. Guo, S. Kandel, J. M. Hellerstein, and J. Heer. Proactive Wrangling: Mixed-initiative End-user Programming of Data Transformation Scripts. In Proc. Symposium on User Interface Software and Technology, pages 65–74. ACM, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. T. Gvero, V. Kuncak, I. Kuraj, and R. Piskac. Complete completion using types and weights. In Proc. Conference on Programming Language Design and Implementation, pages 27–38. ACM, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. W. R. Harris and S. Gulwani. Spreadsheet table transformations from examples. In Proc. Conference on Programming Language Design and Implementation, pages 317–328. ACM, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. S. Jha, S. Gulwani, S. Seshia, and A. Tiwari. Oracle-guided component-based program synthesis. In Proc. International Conference on Software Engineering, pages 215–224. IEEE, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. T. A. Johnson and R. Eigenmann. Context-sensitive domainindependent algorithm composition and selection. In Proc. Conference on Programming Language Design and Implementation, pages 181–192. ACM, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. S. Kandel, A. Paepcke, J. Hellerstein, and J. Heer. Wrangler: Interactive visual specification of data transformation scripts. In Proc. International Conference on Human Factors in Computing Systems, pages 3363–3372. ACM, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. E. Kitzelmann. A combined analytical and search-based approach for the inductive synthesis of functional programs. Künstliche Intelligenz, 25(2):179–182, 2011.Google ScholarGoogle ScholarCross RefCross Ref
  22. V. Le and S. Gulwani. FlashExtract: a framework for data extraction by examples. In Proc. Conference on Programming Language Design and Implementation, pages 542–553. ACM, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. D. Mandelin, L. Xu, R. Bod´ık, and D. Kimelman. Jungloid mining: helping to navigate the API jungle. In Proc. Conference on Programming Language Design and Implementation, pages 48–61. ACM, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. P.-M. Osera and S. Zdancewic. Type-and-example-directed program synthesis. In Proc. Conference on Programming Language Design and Implementation, pages 619–630. ACM, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. D. Perelman, S. Gulwani, D. Grossman, and P. Provost. Testdriven synthesis. In Proc. Conference on Programming Language Design and Implementation, page 43. ACM, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. N. Polikarpova, I. Kuraj, and A. Solar-Lezama. Program synthesis from polymorphic refinement types. In Proc. Conference on Programming Language Design and Implementation, pages 522–538. ACM, 2016. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. O. Polozov and S. Gulwani. FlashMeta: A framework for inductive program synthesis. In Proc. International Conference on Object-Oriented Programming, Systems, Languages, and Applications, pages 107–126. ACM, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. V. Raychev, M. Vechev, and E. Yahav. Code completion with statistical language models. In Proc. Conference on Programming Language Design and Implementation, pages 419–428. ACM, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. P. M. Rondon, M. Kawaguchi, and R. Jhala. Liquid types. In Proc. Conference on Programming Language Design and Implementation, pages 159–169. ACM, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. C. Smith and A. Albarghouthi. Mapreduce program synthesis. In Proc. Conference on Programming Language Design and Implementation, pages 326–340. ACM, 2016. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. A. Solar-Lezama, R. M. Rabbah, R. Bod´ık, and K. Ebcioglu. Programming by sketching for bit-streaming programs. In Proc. Conference on Programming Language Design and Implementation, pages 281–294. ACM, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. A. Solar-Lezama, L. Tancau, R. Bodik, S. Seshia, and V. Saraswat. Combinatorial sketching for finite programs. In Proc. International Conference on Architectural Support for Programming Languages and Operating Systems, pages 404– 415. ACM, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. A. Solar-Lezama, G. Arnold, L. Tancau, R. Bod´ık, V. A. Saraswat, and S. A. Seshia. Sketching stencils. In Proc. Conference on Programming Language Design and Implementation, pages 167–178. ACM, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. A. Stolcke. SRILM - an extensible language modeling toolkit. In Proc. International Conference on Spoken Language Processing, pages 901–904. ISCA, 2002.Google ScholarGoogle Scholar
  35. P. Vekris, B. Cosman, and R. Jhala. Refinement types for typescript. In Proc. Conference on Programming Language Design and Implementation, pages 310–325. ACM, 2016. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. N. Yaghmazadeh, C. Klinger, I. Dillig, and S. Chaudhuri. Synthesizing transformations on hierarchically structured data. In Proc. Conference on Programming Language Design and Implementation, pages 508–521. ACM, 2016. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. S. Zhang and Y. Sun. Automatically synthesizing sql queries from input-output examples. In Proc. International Conference on Automated Software Engineering, pages 224–234. IEEE, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Introduction Motivating Examples Problem Formulation Hypotheses as Refinement Trees Synthesis Algorithm SMT-based Deduction Sketch Completion Implementation Evaluation Related Work ConclusionGoogle ScholarGoogle Scholar

Index Terms

  1. Component-based synthesis of table consolidation and transformation tasks from examples

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        Full Access

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader
        About Cookies On This Site

        We use cookies to ensure that we give you the best experience on our website.

        Learn more

        Got it!