Abstract
Programming-by-Examples (PBE) involves synthesizing an "intended program" from a small set of user-provided input-output examples. A key PBE strategy has been to restrict the search to a carefully designed small domain-specific language (DSL) with "effectively-invertible" (EI) operators at the top and "effectively-enumerable" (EE) operators at the bottom. This facilitates an effective combination of top-down synthesis strategy (which backpropagates outputs over various paths in the DSL using inverse functions) with a bottom-up synthesis strategy (which propagates inputs over various paths in the DSL). We address the problem of scaling synthesis to large DSLs with several non-EI/EE operators. This is motivated by the need to support a richer class of transformations and the need for readable code generation. We propose a novel solution strategy that relies on propagating fewer values and over fewer paths.
Our first key idea is that of "cut functions" that prune the set of values being propagated by using knowledge of the sub-DSL on the other side. Cuts can be designed to preserve completeness of synthesis; however, DSL designers may use incomplete cuts to have finer control over the kind of programs synthesized. In either case, cuts make search feasible for non-EI/EE operators and efficient for deep DSLs. Our second key idea is that of "guarded DSLs" that allow a precedence on DSL operators, which dynamically controls exploration of various paths in the DSL. This makes search efficient over grammars with large fanouts without losing recall. It also makes ranking simpler yet more effective in learning an intended program from very few examples. Both cuts and precedence provide a mechanism to the DSL designer to restrict search to a reasonable, and possibly incomplete, space of programs.
Using cuts and gDSLs, we have built FlashFill++, an industrial-strength PBE engine for performing rich string transformations, including datetime and number manipulations. The FlashFill++ gDSL is designed to enable readable code generation in different target languages including Excel's formula language, PowerFx, and Python. We show FlashFill++ is more expressive, more performant, and generates better quality code than comparable existing PBE systems. FlashFill++ is being deployed in several mass-market products ranging from spreadsheet software to notebooks and business intelligence applications, each with millions of users.
- Annika Aasa. 1995. Precedences in specifications and implementations of programming languages. Theoretical Computer Science, 142, 1 (1995), 3–26.
Google Scholar
Digital Library
- Maaz Bin Safeer Ahmad and Alvin Cheung. 2018. Automatically Leveraging MapReduce Frameworks for Data-Intensive Applications. In Proc. 2018 International Conference on Management of Data, SIGMOD Conference. ACM, 1205–1220. https://doi.org/10.1145/3183713.3196891
Google Scholar
Digital Library
- Alfred Aho, S. Johnson, and Jeffrey Ullman. 1973. Deterministic parsing of ambiguous grammars. Commun. ACM, 18 (1973), 01, 441–452. https://doi.org/10.1145/360933.360969
Google Scholar
Digital Library
- Rajeev Alur, Rastislav Bodík, Garvit Juniwal, Milo M. K. Martin, Mukund Raghothaman, Sanjit A. Seshia, Rishabh Singh, Armando Solar-Lezama, Emina Torlak, and Abhishek Udupa. 2013. Syntax-Guided Synthesis. In Formal Methods in Computer-Aided Design, FMCAD 2013. 1–8.
Google Scholar
- Rajeev Alur, Arjun Radhakrishna, and Abhishek Udupa. 2017. Scaling Enumerative Program Synthesis via Divide and Conquer. In TACAS. 319–336.
Google Scholar
- Sébastien Bardin, Alain Finkel, Jérôme Leroux, and Laure Petrucci. 2008. FAST: acceleration from theory to practice. Int. J. Softw. Tools Technol. Transf., 10, 5 (2008), 401–424. https://doi.org/10.1007/s10009-008-0064-3
Google Scholar
Cross Ref
- Denis Béchet, Philippe de Groote, and Christian Retoré. 1997. A Complete Axiomatisation for the Inclusion of Series-Parallel Partial Orders. In Rewriting Techniques and Applications, 8th Int. Conf., RTA-97 (Lecture Notes in Computer Science, Vol. 1232). Springer, 230–240. https://doi.org/10.1007/3-540-62950-5_74
Google Scholar
Cross Ref
- Bernard Boigelot. 2003. On iterating linear transformations over recognizable sets of integers. Theor. Comput. Sci., 309, 1-3 (2003), 413–468. https://doi.org/10.1016/S0304-3975(03)00314-1
Google Scholar
Digital Library
- Swarat Chaudhuri, Kevin Ellis, Oleksandr Polozov, Rishabh Singh, Armando Solar-Lezama, and Yisong Yue. 2021. Neurosymbolic Programming. Found. Trends Program. Lang., 7, 3 (2021), 158–243.
Google Scholar
Digital Library
- Mark Chen, Jerry Tworek, Heewoo Jun, Qiming Yuan, Henrique Ponde de Oliveira Pinto, Jared Kaplan, Harrison Edwards, Yuri Burda, Nicholas Joseph, Greg Brockman, Alex Ray, Raul Puri, Gretchen Krueger, Michael Petrov, Heidy Khlaaf, Girish Sastry, Pamela Mishkin, Brooke Chan, Scott Gray, Nick Ryder, Mikhail Pavlov, Alethea Power, Lukasz Kaiser, Mohammad Bavarian, Clemens Winter, Philippe Tillet, Felipe Petroski Such, Dave Cummings, Matthias Plappert, Fotios Chantzis, Elizabeth Barnes, Ariel Herbert-Voss, William Hebgen Guss, Alex Nichol, Alex Paino, Nikolas Tezak, Jie Tang, Igor Babuschkin, Suchir Balaji, Shantanu Jain, William Saunders, Christopher Hesse, Andrew N. Carr, Jan Leike, Joshua Achiam, Vedant Misra, Evan Morikawa, Alec Radford, Matthew Knight, Miles Brundage, Mira Murati, Katie Mayer, Peter Welinder, Bob McGrew, Dario Amodei, Sam McCandlish, Ilya Sutskever, and Wojciech Zaremba. 2021. Evaluating Large Language Models Trained on Code. CoRR, abs/2107.03374 (2021), arXiv:2107.03374. arxiv:2107.03374
Google Scholar
- Xinyun Chen, Petros Maniatis, Rishabh Singh, Charles Sutton, Hanjun Dai, Max Lin, and Denny Zhou. 2021. SpreadsheetCoder: Formula Prediction from Semi-structured Context. In Proceedings of the 38th International Conference on Machine Learning, Marina Meila and Tong Zhang (Eds.) (Proceedings of Machine Learning Research, Vol. 139). PMLR, 1661–1672. https://proceedings.mlr.press/v139/chen21m.html
Google Scholar
- Andrew Cropper. 2019. Playgol: Learning Programs Through Play. In Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence, IJCAI 2019, Macao, China, August 10-16, 2019, Sarit Kraus (Ed.). ijcai.org, 6074–6080. https://doi.org/10.24963/ijcai.2019/841
Google Scholar
Cross Ref
- Nachum Dershowitz and Jean-Pierre Jouannaud. 1990. Rewrite Systems. In Handbook of Theoretical Computer Science, Volume B: Formal Models and Semantics. Elsevier and MIT Press, 243–320.
Google Scholar
- Jacob Devlin, Rudy Bunel, Rishabh Singh, Matthew J. Hausknecht, and Pushmeet Kohli. 2017. Neural Program Meta-Induction. In NIPS, Isabelle Guyon, Ulrike von Luxburg, Samy Bengio, Hanna M. Wallach, Rob Fergus, S. V. N. Vishwanathan, and Roman Garnett (Eds.). 2080–2088.
Google Scholar
- Ian Drosos, Titus Barik, Philip J. Guo, Robert DeLine, and Sumit Gulwani. 2020. Wrex: A Unified Programming-by-Example Interaction for Synthesizing Readable Code for Data Scientists. In Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems (CHI ’20). Association for Computing Machinery, New York, NY, USA. 1–12. isbn:9781450367080 https://doi.org/10.1145/3313831.3376442
Google Scholar
Digital Library
- Jay Earley. 1974. Ambiguity and Precedence in Syntax Description. Acta Informatica, 4 (1974), 183–192.
Google Scholar
Digital Library
- Azadeh Farzan and Victor Nicolet. 2021. Phased synthesis of divide and conquer programs. In PLDI ’21: 42nd ACM SIGPLAN International Conference on Programming Language Design and Implementation. ACM, 974–986. https://doi.org/10.1145/3453483.3454089
Google Scholar
Digital Library
- Yu Feng, Ruben Martins, Jacob Van Geffen, Isil Dillig, and Swarat Chaudhuri. 2017. Component-based synthesis of table consolidation and transformation tasks from examples. In Proc. 38th ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI. ACM, 422–436. https://doi.org/10.1145/3062341.3062351
Google Scholar
Digital Library
- Alain Finkel. 1987. A Generalization of the Procedure of Karp and Miller to Well Structured Transition Systems. In Proc. 14th Intl. Colloquium on Automata, Languages and Programming, ICALP87, Thomas Ottmann (Ed.) (Lecture Notes in Computer Science, Vol. 267). Springer, 499–508. https://doi.org/10.1007/3-540-18088-5_43
Google Scholar
Cross Ref
- Bryan Ford. 2004. Parsing expression grammars: a recognition-based syntactic foundation. In Proc. 31st ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, POPL. ACM, 111–122. https://doi.org/10.1145/964001.964011
Google Scholar
Digital Library
- Tianyu Gao, Adam Fisch, and Danqi Chen. 2021. Making Pre-trained Language Models Better Few-shot Learners. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). Association for Computational Linguistics, Online. 3816–3830. https://doi.org/10.18653/v1/2021.acl-long.295
Google Scholar
Cross Ref
- Google. 2021. SpreadSheetCoder. https://github.com/google-research/google-research/tree/master/spreadsheet_coder
Google Scholar
- Sumit Gulwani. 2011. Automating string processing in spreadsheets using input-output examples. In POPL. 317–330.
Google Scholar
- Sumit Gulwani. 2016. Programming by Examples - and its applications in Data Wrangling. In Dependable Software Systems Engineering. 137–158.
Google Scholar
- Sumit Gulwani, William R. Harris, and Rishabh Singh. 2012. Spreadsheet data manipulation using examples. Commun. ACM, 55, 8 (2012), 97–105.
Google Scholar
Digital Library
- S. Gulwani, V. Korthikanti, and A. Tiwari. 2011. Synthesizing geometry constructions. In Proc. ACM Conf. on Prgm. Lang. Desgn. and Impl. PLDI. 50–61.
Google Scholar
- Sumit Gulwani, Oleksandr Polozov, and Rishabh Singh. 2017. Program Synthesis. Foundations and Trends in Programming Languages, 4, 1-2 (2017), 1–119.
Google Scholar
Cross Ref
- Zheng Guo, Michael James, David Justo, Jiaxiao Zhou, Ziteng Wang, Ranjit Jhala, and Nadia Polikarpova. 2020. Program synthesis by type-guided abstraction refinement. Proc. ACM Program. Lang., 4, POPL (2020), 12:1–12:28. https://doi.org/10.1145/3371080
Google Scholar
Digital Library
- Jan Heering, P. R. H. Hendriks, Paul Klint, and J. Rekers. 1989. The syntax definition formalism SDF - reference manual. ACM SIGPLAN Notices, 24, 11 (1989), 43–75.
Google Scholar
Digital Library
- Kangjing Huang, Xiaokang Qiu, Peiyuan Shen, and Yanjun Wang. 2020. Reconciling enumerative and deductive program synthesis. In Proc. 41st ACM SIGPLAN Intl. Conf. on Programming Language Design and Implementation, PLDI, Alastair F. Donaldson and Emina Torlak (Eds.). ACM, 1159–1174. https://doi.org/10.1145/3385412.3386027
Google Scholar
Digital Library
- Bertrand Jeannet, Peter Schrammel, and Sriram Sankaranarayanan. 2014. Abstract acceleration of general linear loops. In The 41st Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, POPL. ACM, 529–540. https://doi.org/10.1145/2535838.2535843
Google Scholar
Digital Library
- Ashwin Kalyan, Abhishek Mohta, Alex Polozov, Dhruv Batra, Prateek Jain, and Sumit Gulwani. 2018. Neural-Guided Deductive Search for Real-Time Program Synthesis from Examples. In 6th International Conference on Learning Representations (ICLR) (6th international conference on learning representations (iclr) ed.). https://www.microsoft.com/en-us/research/publication/neural-guided-deductive-search-real-time-program-synthesis-examples/
Google Scholar
- Richard M. Karp and Raymond E. Miller. 1969. Parallel Program Schemata. J. Comput. Syst. Sci., 3, 2 (1969), 147–195. https://doi.org/10.1016/S0022-0000(69)80011-5
Google Scholar
Digital Library
- Vu Le and Sumit Gulwani. 2014. FlashExtract: A Framework for Data Extraction by Examples. In PLDI. 542–553.
Google Scholar
- Woosuk Lee. 2021. Combining the top-down propagation and bottom-up enumeration for inductive program synthesis. Proc. ACM Program. Lang., 5, POPL (2021), 1–28. https://doi.org/10.1145/3434335
Google Scholar
Digital Library
- Woosuk Lee, Kihong Heo, Rajeev Alur, and Mayur Naik. 2018. Accelerating search-based program synthesis using learned probabilistic models. In Proceedings of the 39th ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI, Jeffrey S. Foster and Dan Grossman (Eds.). ACM, 436–449. https://doi.org/10.1145/3192366.3192410
Google Scholar
Digital Library
- Percy Liang, Michael I. Jordan, and Dan Klein. 2010. Learning Programs: A Hierarchical Bayesian Approach. In Proceedings of the 27th International Conference on Machine Learning (ICML-10), Johannes Fürnkranz and Thorsten Joachims (Eds.). Omnipress, 639–646.
Google Scholar
- Dylan Lukes, John Sarracino, Cora Coleman, Hila Peleg, Sorin Lerner, and Nadia Polikarpova. 2021. Synthesis of web layouts from examples. In ESEC/FSE ’21: 29th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering, Athens, Greece, August 23-28, 2021, Diomidis Spinellis, Georgios Gousios, Marsha Chechik, and Massimiliano Di Penta (Eds.). ACM, 651–663.
Google Scholar
Digital Library
- Aditya Krishna Menon, Omer Tamuz, Sumit Gulwani, Butler W. Lampson, and Adam Kalai. 2013. A Machine Learning Framework for Programming by Example. In Proceedings of the 30th International Conference on Machine Learning, ICML (JMLR Workshop and Conference Proceedings, Vol. 28). JMLR.org, 187–195. http://proceedings.mlr.press/v28/menon13.html
Google Scholar
- Anders Miltner, Kathleen Fisher, Benjamin C. Pierce, David Walker, and Steve Zdancewic. 2018. Synthesizing bijective lenses. Proc. ACM Program. Lang., 2, POPL (2018), 1:1–1:30.
Google Scholar
Digital Library
- Nagarajan Natarajan, Danny Simmons, Naren Datha, Prateek Jain, and Sumit Gulwani. 2019. Learning Natural Programs from a Few Examples in Real-Time. In AIStats. https://www.microsoft.com/en-us/research/publication/learning-natural-programs-from-a-few-examples-in-real-time/
Google Scholar
- Peter-Michael Osera and Steve Zdancewic. 2015. Type-and-example-directed program synthesis. In Proc. 36th ACM SIGPLAN Conf. on Programming Language Design and Implementation. ACM, 619–630. https://doi.org/10.1145/2737924.2738007
Google Scholar
Digital Library
- Rangeet Pan, Vu Le, Nachiappan Nagappan, Sumit Gulwani, Shuvendu K. Lahiri, and Mike Kaufman. 2021. Can Program Synthesis be Used to Learn Merge Conflict Resolutions? An Empirical Analysis. In 43rd IEEE/ACM International Conference on Software Engineering, ICSE 2021, Madrid, Spain, 22-30 May 2021. IEEE, 785–796.
Google Scholar
- Nadia Polikarpova, Ivan Kuraj, and Armando Solar-Lezama. 2016. Program synthesis from polymorphic refinement types. In Proceedings of the 37th ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI 2016, Santa Barbara, CA, USA, June 13-17, 2016, Chandra Krintz and Emery D. Berger (Eds.). ACM, 522–538. https://doi.org/10.1145/2908080.2908093
Google Scholar
Digital Library
- Oleksandr Polozov and Sumit Gulwani. 2015. FlashMeta: A Framework for Inductive Program synthesis. In OOPSLA/SPLASH. 107–126.
Google Scholar
- 2021. PowerFx: The low code programming language. https://powerapps.microsoft.com/en-us/blog/introducing-microsoft-power-fx-the-low-code-programming-language-for-everyone/ Accessed: 2021-11-19
Google Scholar
- Microsoft PROSE. 2022. PROSE public benchmark suite. https://github.com/microsoft/prose-benchmarks
Google Scholar
- Kia Rahmani, Mohammad Raza, Sumit Gulwani, Vu Le, Daniel Morris, Arjun Radhakrishna, Gustavo Soares, and Ashish Tiwari. 2021. Multi-modal program inference: a marriage of pre-trained language models and component-based synthesis. Proc. ACM Program. Lang., 5, OOPSLA (2021), 1–29.
Google Scholar
Digital Library
- Reudismam Rolim, Gustavo Soares, Loris D’Antoni, Oleksandr Polozov, Sumit Gulwani, Rohit Gheyi, Ryo Suzuki, and Björn Hartmann. 2017. Learning syntactic program transformations from examples. In ICSE. IEEE / ACM, 404–415.
Google Scholar
- Selenium. 2022. Selenium. https://github.com/SeleniumHQ/selenium
Google Scholar
- Nischal Shrestha, Titus Barik, and Chris Parnin. 2018. It’s Like Python But: Towards Supporting Transfer of Programming Language Knowledge. In 2018 IEEE Symposium on Visual Languages and Human-Centric Computing, VL/HCC, Jácome Cunha, João Paulo Fernandes, Caitlin Kelleher, Gregor Engels, and Jorge Mendes (Eds.). IEEE Computer Society, 177–185. https://doi.org/10.1109/VLHCC.2018.8506508
Google Scholar
Cross Ref
- Rishabh Singh and Sumit Gulwani. 2015. Predicting a Correct Program in Programming by Example. In CAV. 398–414.
Google Scholar
- Calvin Smith and Aws Albarghouthi. 2019. Program Synthesis with Equivalence Reduction. In VMCAI, Constantin Enea and Ruzica Piskac (Eds.).
Google Scholar
- Abhishek Udupa, Arun Raghavan, Jyotirmoy V. Deshmukh, Sela Mador-Haim, Milo M. K. Martin, and Rajeev Alur. 2013. TRANSIT: specifying protocols with concolic snippets. In ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI, Hans-Juergen Boehm and Cormac Flanagan (Eds.). ACM, 287–296. https://doi.org/10.1145/2491956.2462174
Google Scholar
Digital Library
- Mark van den Brand, Jeroen Scheerder, Jurgen J. Vinju, and Eelco Visser. 2002. Disambiguation Filters for Scannerless Generalized LR Parsers. In Compiler Construction, 11th Intl. Conf, CC 2002, Part of ETAPS, Proceedings (Lecture Notes in Computer Science, Vol. 2304). Springer, 143–158.
Google Scholar
- Gust Verbruggen, Vu Le, and Sumit Gulwani. 2021. Semantic programming by example with pre-trained models. Proc. ACM Program. Lang., 5, OOPSLA (2021), 1–25. https://doi.org/10.1145/3485477
Google Scholar
Digital Library
- Chenglong Wang, Alvin Cheung, and Rastislav Bodík. 2017. Synthesizing highly expressive SQL queries from input-output examples. In Proc. 38th ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI. ACM, 452–466. https://doi.org/10.1145/3062341.3062365
Google Scholar
Digital Library
- Xinyu Wang, Isil Dillig, and Rishabh Singh. 2018. Program synthesis using abstraction refinement. Proc. ACM Program. Lang., 2, POPL (2018), 63:1–63:30. https://doi.org/10.1145/3158151
Google Scholar
Digital Library
- Yuepeng Wang, Rushi Shah, Abby Criswell, Rong Pan, and Isil Dillig. 2020. Data Migration using Datalog Program Synthesis. Proc. VLDB Endow., 13, 7 (2020), 1006–1019. https://doi.org/10.14778/3384345.3384350
Google Scholar
Digital Library
- Tianyi Zhang, Zhiyang Chen, Yuanli Zhu, Priyan Vaithilingam, Xinyu Wang, and Elena L. Glassman. 2021. Interpretable Program Synthesis. Association for Computing Machinery, New York, NY, USA. isbn:9781450380966 https://doi.org/10.1145/3411764.3445646
Google Scholar
Digital Library
- Tianyi Zhang, London Lowmanstone, Xinyu Wang, and Elena L. Glassman. 2020. Interactive Program Synthesis by Augmented Examples. In Proceedings of the 33rd Annual ACM Symposium on User Interface Software and Technology (UIST ’20). Association for Computing Machinery, New York, NY, USA. 627–648. isbn:9781450375146 https://doi.org/10.1145/3379337.3415900
Google Scholar
Digital Library
Index Terms
FlashFill++: Scaling Programming by Example by Cutting to the Chase
Recommendations
FlashMeta: a framework for inductive program synthesis
OOPSLA '15Inductive synthesis, or programming-by-examples (PBE) is gaining prominence with disruptive applications for automating repetitive tasks in end-user programming. However, designing, developing, and maintaining an effective industrial-quality inductive ...
FlashMeta: a framework for inductive program synthesis
OOPSLA 2015: Proceedings of the 2015 ACM SIGPLAN International Conference on Object-Oriented Programming, Systems, Languages, and ApplicationsInductive synthesis, or programming-by-examples (PBE) is gaining prominence with disruptive applications for automating repetitive tasks in end-user programming. However, designing, developing, and maintaining an effective industrial-quality inductive ...
Static and Dynamic Program Compilation by Interpreter Specialization
Interpretation and run-time compilation techniques are increasingly important because they can support heterogeneous architectures, evolving programming languages, and dynamically-loaded code. Interpretation is simple to implement, but yields poor ...






Comments