Abstract
This paper presents a novel component-based synthesis algorithm that marries the power of type-directed search with lightweight SMT-based deduction and partial evaluation. Given a set of components together with their over-approximate first-order specifications, our method first generates a program sketch over a subset of the components and checks its feasibility using an SMT solver. Since a program sketch typically represents many concrete programs, the use of SMT-based deduction greatly increases the scalability of the algorithm. Once a feasible program sketch is found, our algorithm completes the sketch in a bottom-up fashion, using partial evaluation to further increase the power of deduction for rejecting partially-filled program sketches. We apply the proposed synthesis methodology for automating a large class of data preparation tasks that commonly arise in data science. We have evaluated our synthesis algorithm on dozens of data wrangling and consolidation tasks obtained from on-line forums, and we show that our approach can automatically solve a large class of problems encountered by R users.
- Motivating Example 1. http://stackoverflow. com/questions/30399516/complex-datareshaping-in-r. Accessed 27-Mar-2017.Google Scholar
- Motivating Example 2. http://stackoverflow.com/ questions/33207263/finding-proportionsin-flights-dataset-in-r. Accessed 27-Mar-2017.Google Scholar
- Motivating Example 3. http://stackoverflow. com/questions/32875699/how-to-combinetwo-data-frames-in-r-see-details. Accessed 27-Mar-2017.Google Scholar
- Morpheus. https://utopia-group.github.io/ morpheus/. Accessed 27-Mar-2017.Google Scholar
- A. Albarghouthi, S. Gulwani, and Z. Kincaid. Recursive Program Synthesis. In Proc. International Conference on Computer Aided Verification, pages 934–950. Springer, 2013.Google Scholar
- D. W. Barowy, S. Gulwani, T. Hart, and B. G. Zorn. FlashRelate: extracting relational data from semi-structured spreadsheets using examples. In Proc. Conference on Programming Language Design and Implementation, pages 218–228. ACM, 2015. Google Scholar
Digital Library
- T. Dasu and T. Johnson. Exploratory data mining and data cleaning, volume 479. John Wiley & Sons, 2003. Google Scholar
Digital Library
- L. De Moura and N. Bjørner. Z3: An efficient SMT solver. In Proc. Tools and Algorithms for Construction and Analysis of Systems, pages 337–340. Springer, 2008. Google Scholar
Digital Library
- Y. Feng, R. Martins, Y. Wang, I. Dillig, and T. Reps. Component-Based Synthesis for Complex APIs. In Proc. Symposium on Principles of Programming Languages. ACM, 2017. Google Scholar
Digital Library
- J. K. Feser, S. Chaudhuri, and I. Dillig. Synthesizing data structure transformations from input-output examples. In Proc. Conference on Programming Language Design and Implementation, pages 229–239. ACM, 2015. Google Scholar
Digital Library
- J. Frankle, P. Osera, D. Walker, and S. Zdancewic. Exampledirected synthesis: a type-theoretic interpretation. In Proc. Symposium on Principles of Programming Languages, pages 802–815. ACM, 2016. Google Scholar
Digital Library
- S. Gulwani. Automating string processing in spreadsheets using input-output examples. In Proc. Symposium on Principles of Programming Languages, pages 317–330. ACM, 2011. Google Scholar
Digital Library
- S. Gulwani. Automating string processing in spreadsheets using input-output examples. In ACM SIGPLAN Notices, volume 46, pages 317–330. ACM, 2011. Google Scholar
Digital Library
- S. Gulwani, S. Jha, A. Tiwari, and R. Venkatesan. Synthesis of loop-free programs. In Proc. Conference on Programming Language Design and Implementation, pages 62–73. ACM, 2011. Google Scholar
Digital Library
- P. J. Guo, S. Kandel, J. M. Hellerstein, and J. Heer. Proactive Wrangling: Mixed-initiative End-user Programming of Data Transformation Scripts. In Proc. Symposium on User Interface Software and Technology, pages 65–74. ACM, 2011. Google Scholar
Digital Library
- T. Gvero, V. Kuncak, I. Kuraj, and R. Piskac. Complete completion using types and weights. In Proc. Conference on Programming Language Design and Implementation, pages 27–38. ACM, 2013. Google Scholar
Digital Library
- W. R. Harris and S. Gulwani. Spreadsheet table transformations from examples. In Proc. Conference on Programming Language Design and Implementation, pages 317–328. ACM, 2011. Google Scholar
Digital Library
- S. Jha, S. Gulwani, S. Seshia, and A. Tiwari. Oracle-guided component-based program synthesis. In Proc. International Conference on Software Engineering, pages 215–224. IEEE, 2010. Google Scholar
Digital Library
- T. A. Johnson and R. Eigenmann. Context-sensitive domainindependent algorithm composition and selection. In Proc. Conference on Programming Language Design and Implementation, pages 181–192. ACM, 2006. Google Scholar
Digital Library
- S. Kandel, A. Paepcke, J. Hellerstein, and J. Heer. Wrangler: Interactive visual specification of data transformation scripts. In Proc. International Conference on Human Factors in Computing Systems, pages 3363–3372. ACM, 2011. Google Scholar
Digital Library
- E. Kitzelmann. A combined analytical and search-based approach for the inductive synthesis of functional programs. Künstliche Intelligenz, 25(2):179–182, 2011.Google Scholar
Cross Ref
- V. Le and S. Gulwani. FlashExtract: a framework for data extraction by examples. In Proc. Conference on Programming Language Design and Implementation, pages 542–553. ACM, 2014. Google Scholar
Digital Library
- D. Mandelin, L. Xu, R. Bod´ık, and D. Kimelman. Jungloid mining: helping to navigate the API jungle. In Proc. Conference on Programming Language Design and Implementation, pages 48–61. ACM, 2005. Google Scholar
Digital Library
- P.-M. Osera and S. Zdancewic. Type-and-example-directed program synthesis. In Proc. Conference on Programming Language Design and Implementation, pages 619–630. ACM, 2015. Google Scholar
Digital Library
- D. Perelman, S. Gulwani, D. Grossman, and P. Provost. Testdriven synthesis. In Proc. Conference on Programming Language Design and Implementation, page 43. ACM, 2014. Google Scholar
Digital Library
- N. Polikarpova, I. Kuraj, and A. Solar-Lezama. Program synthesis from polymorphic refinement types. In Proc. Conference on Programming Language Design and Implementation, pages 522–538. ACM, 2016. Google Scholar
Digital Library
- O. Polozov and S. Gulwani. FlashMeta: A framework for inductive program synthesis. In Proc. International Conference on Object-Oriented Programming, Systems, Languages, and Applications, pages 107–126. ACM, 2015. Google Scholar
Digital Library
- V. Raychev, M. Vechev, and E. Yahav. Code completion with statistical language models. In Proc. Conference on Programming Language Design and Implementation, pages 419–428. ACM, 2014. Google Scholar
Digital Library
- P. M. Rondon, M. Kawaguchi, and R. Jhala. Liquid types. In Proc. Conference on Programming Language Design and Implementation, pages 159–169. ACM, 2008. Google Scholar
Digital Library
- C. Smith and A. Albarghouthi. Mapreduce program synthesis. In Proc. Conference on Programming Language Design and Implementation, pages 326–340. ACM, 2016. Google Scholar
Digital Library
- A. Solar-Lezama, R. M. Rabbah, R. Bod´ık, and K. Ebcioglu. Programming by sketching for bit-streaming programs. In Proc. Conference on Programming Language Design and Implementation, pages 281–294. ACM, 2005. Google Scholar
Digital Library
- A. Solar-Lezama, L. Tancau, R. Bodik, S. Seshia, and V. Saraswat. Combinatorial sketching for finite programs. In Proc. International Conference on Architectural Support for Programming Languages and Operating Systems, pages 404– 415. ACM, 2006. Google Scholar
Digital Library
- A. Solar-Lezama, G. Arnold, L. Tancau, R. Bod´ık, V. A. Saraswat, and S. A. Seshia. Sketching stencils. In Proc. Conference on Programming Language Design and Implementation, pages 167–178. ACM, 2007. Google Scholar
Digital Library
- A. Stolcke. SRILM - an extensible language modeling toolkit. In Proc. International Conference on Spoken Language Processing, pages 901–904. ISCA, 2002.Google Scholar
- P. Vekris, B. Cosman, and R. Jhala. Refinement types for typescript. In Proc. Conference on Programming Language Design and Implementation, pages 310–325. ACM, 2016. Google Scholar
Digital Library
- N. Yaghmazadeh, C. Klinger, I. Dillig, and S. Chaudhuri. Synthesizing transformations on hierarchically structured data. In Proc. Conference on Programming Language Design and Implementation, pages 508–521. ACM, 2016. Google Scholar
Digital Library
- S. Zhang and Y. Sun. Automatically synthesizing sql queries from input-output examples. In Proc. International Conference on Automated Software Engineering, pages 224–234. IEEE, 2013. Google Scholar
Digital Library
- Introduction Motivating Examples Problem Formulation Hypotheses as Refinement Trees Synthesis Algorithm SMT-based Deduction Sketch Completion Implementation Evaluation Related Work ConclusionGoogle Scholar
Index Terms
Component-based synthesis of table consolidation and transformation tasks from examples
Recommendations
Component-based synthesis for complex APIs
POPL '17: Proceedings of the 44th ACM SIGPLAN Symposium on Principles of Programming LanguagesComponent-based approaches to program synthesis assemble programs from a database of existing components, such as methods provided by an API. In this paper, we present a novel type-directed algorithm for component-based synthesis. The key novelty of ...
Component-based synthesis of table consolidation and transformation tasks from examples
PLDI 2017: Proceedings of the 38th ACM SIGPLAN Conference on Programming Language Design and ImplementationThis paper presents a novel component-based synthesis algorithm that marries the power of type-directed search with lightweight SMT-based deduction and partial evaluation. Given a set of components together with their over-approximate first-order ...
FrAngel: component-based synthesis with control structures
In component-based program synthesis, the synthesizer generates a program given a library of components (functions). Existing component-based synthesizers have difficulty synthesizing loops and other control structures, and they often require formal ...






Comments