skip to main content
research-article
Open Access

FlashFill++: Scaling Programming by Example by Cutting to the Chase

Published:11 January 2023Publication History
Skip Abstract Section

Abstract

Programming-by-Examples (PBE) involves synthesizing an "intended program" from a small set of user-provided input-output examples. A key PBE strategy has been to restrict the search to a carefully designed small domain-specific language (DSL) with "effectively-invertible" (EI) operators at the top and "effectively-enumerable" (EE) operators at the bottom. This facilitates an effective combination of top-down synthesis strategy (which backpropagates outputs over various paths in the DSL using inverse functions) with a bottom-up synthesis strategy (which propagates inputs over various paths in the DSL). We address the problem of scaling synthesis to large DSLs with several non-EI/EE operators. This is motivated by the need to support a richer class of transformations and the need for readable code generation. We propose a novel solution strategy that relies on propagating fewer values and over fewer paths.

Our first key idea is that of "cut functions" that prune the set of values being propagated by using knowledge of the sub-DSL on the other side. Cuts can be designed to preserve completeness of synthesis; however, DSL designers may use incomplete cuts to have finer control over the kind of programs synthesized. In either case, cuts make search feasible for non-EI/EE operators and efficient for deep DSLs. Our second key idea is that of "guarded DSLs" that allow a precedence on DSL operators, which dynamically controls exploration of various paths in the DSL. This makes search efficient over grammars with large fanouts without losing recall. It also makes ranking simpler yet more effective in learning an intended program from very few examples. Both cuts and precedence provide a mechanism to the DSL designer to restrict search to a reasonable, and possibly incomplete, space of programs.

Using cuts and gDSLs, we have built FlashFill++, an industrial-strength PBE engine for performing rich string transformations, including datetime and number manipulations. The FlashFill++ gDSL is designed to enable readable code generation in different target languages including Excel's formula language, PowerFx, and Python. We show FlashFill++ is more expressive, more performant, and generates better quality code than comparable existing PBE systems. FlashFill++ is being deployed in several mass-market products ranging from spreadsheet software to notebooks and business intelligence applications, each with millions of users.

References

  1. Annika Aasa. 1995. Precedences in specifications and implementations of programming languages. Theoretical Computer Science, 142, 1 (1995), 3–26. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Maaz Bin Safeer Ahmad and Alvin Cheung. 2018. Automatically Leveraging MapReduce Frameworks for Data-Intensive Applications. In Proc. 2018 International Conference on Management of Data, SIGMOD Conference. ACM, 1205–1220. https://doi.org/10.1145/3183713.3196891 Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Alfred Aho, S. Johnson, and Jeffrey Ullman. 1973. Deterministic parsing of ambiguous grammars. Commun. ACM, 18 (1973), 01, 441–452. https://doi.org/10.1145/360933.360969 Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Rajeev Alur, Rastislav Bodík, Garvit Juniwal, Milo M. K. Martin, Mukund Raghothaman, Sanjit A. Seshia, Rishabh Singh, Armando Solar-Lezama, Emina Torlak, and Abhishek Udupa. 2013. Syntax-Guided Synthesis. In Formal Methods in Computer-Aided Design, FMCAD 2013. 1–8. Google ScholarGoogle Scholar
  5. Rajeev Alur, Arjun Radhakrishna, and Abhishek Udupa. 2017. Scaling Enumerative Program Synthesis via Divide and Conquer. In TACAS. 319–336. Google ScholarGoogle Scholar
  6. Sébastien Bardin, Alain Finkel, Jérôme Leroux, and Laure Petrucci. 2008. FAST: acceleration from theory to practice. Int. J. Softw. Tools Technol. Transf., 10, 5 (2008), 401–424. https://doi.org/10.1007/s10009-008-0064-3 Google ScholarGoogle ScholarCross RefCross Ref
  7. Denis Béchet, Philippe de Groote, and Christian Retoré. 1997. A Complete Axiomatisation for the Inclusion of Series-Parallel Partial Orders. In Rewriting Techniques and Applications, 8th Int. Conf., RTA-97 (Lecture Notes in Computer Science, Vol. 1232). Springer, 230–240. https://doi.org/10.1007/3-540-62950-5_74 Google ScholarGoogle ScholarCross RefCross Ref
  8. Bernard Boigelot. 2003. On iterating linear transformations over recognizable sets of integers. Theor. Comput. Sci., 309, 1-3 (2003), 413–468. https://doi.org/10.1016/S0304-3975(03)00314-1 Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Swarat Chaudhuri, Kevin Ellis, Oleksandr Polozov, Rishabh Singh, Armando Solar-Lezama, and Yisong Yue. 2021. Neurosymbolic Programming. Found. Trends Program. Lang., 7, 3 (2021), 158–243. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Mark Chen, Jerry Tworek, Heewoo Jun, Qiming Yuan, Henrique Ponde de Oliveira Pinto, Jared Kaplan, Harrison Edwards, Yuri Burda, Nicholas Joseph, Greg Brockman, Alex Ray, Raul Puri, Gretchen Krueger, Michael Petrov, Heidy Khlaaf, Girish Sastry, Pamela Mishkin, Brooke Chan, Scott Gray, Nick Ryder, Mikhail Pavlov, Alethea Power, Lukasz Kaiser, Mohammad Bavarian, Clemens Winter, Philippe Tillet, Felipe Petroski Such, Dave Cummings, Matthias Plappert, Fotios Chantzis, Elizabeth Barnes, Ariel Herbert-Voss, William Hebgen Guss, Alex Nichol, Alex Paino, Nikolas Tezak, Jie Tang, Igor Babuschkin, Suchir Balaji, Shantanu Jain, William Saunders, Christopher Hesse, Andrew N. Carr, Jan Leike, Joshua Achiam, Vedant Misra, Evan Morikawa, Alec Radford, Matthew Knight, Miles Brundage, Mira Murati, Katie Mayer, Peter Welinder, Bob McGrew, Dario Amodei, Sam McCandlish, Ilya Sutskever, and Wojciech Zaremba. 2021. Evaluating Large Language Models Trained on Code. CoRR, abs/2107.03374 (2021), arXiv:2107.03374. arxiv:2107.03374 Google ScholarGoogle Scholar
  11. Xinyun Chen, Petros Maniatis, Rishabh Singh, Charles Sutton, Hanjun Dai, Max Lin, and Denny Zhou. 2021. SpreadsheetCoder: Formula Prediction from Semi-structured Context. In Proceedings of the 38th International Conference on Machine Learning, Marina Meila and Tong Zhang (Eds.) (Proceedings of Machine Learning Research, Vol. 139). PMLR, 1661–1672. https://proceedings.mlr.press/v139/chen21m.html Google ScholarGoogle Scholar
  12. Andrew Cropper. 2019. Playgol: Learning Programs Through Play. In Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence, IJCAI 2019, Macao, China, August 10-16, 2019, Sarit Kraus (Ed.). ijcai.org, 6074–6080. https://doi.org/10.24963/ijcai.2019/841 Google ScholarGoogle ScholarCross RefCross Ref
  13. Nachum Dershowitz and Jean-Pierre Jouannaud. 1990. Rewrite Systems. In Handbook of Theoretical Computer Science, Volume B: Formal Models and Semantics. Elsevier and MIT Press, 243–320. Google ScholarGoogle Scholar
  14. Jacob Devlin, Rudy Bunel, Rishabh Singh, Matthew J. Hausknecht, and Pushmeet Kohli. 2017. Neural Program Meta-Induction. In NIPS, Isabelle Guyon, Ulrike von Luxburg, Samy Bengio, Hanna M. Wallach, Rob Fergus, S. V. N. Vishwanathan, and Roman Garnett (Eds.). 2080–2088. Google ScholarGoogle Scholar
  15. Ian Drosos, Titus Barik, Philip J. Guo, Robert DeLine, and Sumit Gulwani. 2020. Wrex: A Unified Programming-by-Example Interaction for Synthesizing Readable Code for Data Scientists. In Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems (CHI ’20). Association for Computing Machinery, New York, NY, USA. 1–12. isbn:9781450367080 https://doi.org/10.1145/3313831.3376442 Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Jay Earley. 1974. Ambiguity and Precedence in Syntax Description. Acta Informatica, 4 (1974), 183–192. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Azadeh Farzan and Victor Nicolet. 2021. Phased synthesis of divide and conquer programs. In PLDI ’21: 42nd ACM SIGPLAN International Conference on Programming Language Design and Implementation. ACM, 974–986. https://doi.org/10.1145/3453483.3454089 Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Yu Feng, Ruben Martins, Jacob Van Geffen, Isil Dillig, and Swarat Chaudhuri. 2017. Component-based synthesis of table consolidation and transformation tasks from examples. In Proc. 38th ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI. ACM, 422–436. https://doi.org/10.1145/3062341.3062351 Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Alain Finkel. 1987. A Generalization of the Procedure of Karp and Miller to Well Structured Transition Systems. In Proc. 14th Intl. Colloquium on Automata, Languages and Programming, ICALP87, Thomas Ottmann (Ed.) (Lecture Notes in Computer Science, Vol. 267). Springer, 499–508. https://doi.org/10.1007/3-540-18088-5_43 Google ScholarGoogle ScholarCross RefCross Ref
  20. Bryan Ford. 2004. Parsing expression grammars: a recognition-based syntactic foundation. In Proc. 31st ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, POPL. ACM, 111–122. https://doi.org/10.1145/964001.964011 Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Tianyu Gao, Adam Fisch, and Danqi Chen. 2021. Making Pre-trained Language Models Better Few-shot Learners. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). Association for Computational Linguistics, Online. 3816–3830. https://doi.org/10.18653/v1/2021.acl-long.295 Google ScholarGoogle ScholarCross RefCross Ref
  22. Google. 2021. SpreadSheetCoder. https://github.com/google-research/google-research/tree/master/spreadsheet_coder Google ScholarGoogle Scholar
  23. Sumit Gulwani. 2011. Automating string processing in spreadsheets using input-output examples. In POPL. 317–330. Google ScholarGoogle Scholar
  24. Sumit Gulwani. 2016. Programming by Examples - and its applications in Data Wrangling. In Dependable Software Systems Engineering. 137–158. Google ScholarGoogle Scholar
  25. Sumit Gulwani, William R. Harris, and Rishabh Singh. 2012. Spreadsheet data manipulation using examples. Commun. ACM, 55, 8 (2012), 97–105. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. S. Gulwani, V. Korthikanti, and A. Tiwari. 2011. Synthesizing geometry constructions. In Proc. ACM Conf. on Prgm. Lang. Desgn. and Impl. PLDI. 50–61. Google ScholarGoogle Scholar
  27. Sumit Gulwani, Oleksandr Polozov, and Rishabh Singh. 2017. Program Synthesis. Foundations and Trends in Programming Languages, 4, 1-2 (2017), 1–119. Google ScholarGoogle ScholarCross RefCross Ref
  28. Zheng Guo, Michael James, David Justo, Jiaxiao Zhou, Ziteng Wang, Ranjit Jhala, and Nadia Polikarpova. 2020. Program synthesis by type-guided abstraction refinement. Proc. ACM Program. Lang., 4, POPL (2020), 12:1–12:28. https://doi.org/10.1145/3371080 Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Jan Heering, P. R. H. Hendriks, Paul Klint, and J. Rekers. 1989. The syntax definition formalism SDF - reference manual. ACM SIGPLAN Notices, 24, 11 (1989), 43–75. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Kangjing Huang, Xiaokang Qiu, Peiyuan Shen, and Yanjun Wang. 2020. Reconciling enumerative and deductive program synthesis. In Proc. 41st ACM SIGPLAN Intl. Conf. on Programming Language Design and Implementation, PLDI, Alastair F. Donaldson and Emina Torlak (Eds.). ACM, 1159–1174. https://doi.org/10.1145/3385412.3386027 Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Bertrand Jeannet, Peter Schrammel, and Sriram Sankaranarayanan. 2014. Abstract acceleration of general linear loops. In The 41st Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, POPL. ACM, 529–540. https://doi.org/10.1145/2535838.2535843 Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Ashwin Kalyan, Abhishek Mohta, Alex Polozov, Dhruv Batra, Prateek Jain, and Sumit Gulwani. 2018. Neural-Guided Deductive Search for Real-Time Program Synthesis from Examples. In 6th International Conference on Learning Representations (ICLR) (6th international conference on learning representations (iclr) ed.). https://www.microsoft.com/en-us/research/publication/neural-guided-deductive-search-real-time-program-synthesis-examples/ Google ScholarGoogle Scholar
  33. Richard M. Karp and Raymond E. Miller. 1969. Parallel Program Schemata. J. Comput. Syst. Sci., 3, 2 (1969), 147–195. https://doi.org/10.1016/S0022-0000(69)80011-5 Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Vu Le and Sumit Gulwani. 2014. FlashExtract: A Framework for Data Extraction by Examples. In PLDI. 542–553. Google ScholarGoogle Scholar
  35. Woosuk Lee. 2021. Combining the top-down propagation and bottom-up enumeration for inductive program synthesis. Proc. ACM Program. Lang., 5, POPL (2021), 1–28. https://doi.org/10.1145/3434335 Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Woosuk Lee, Kihong Heo, Rajeev Alur, and Mayur Naik. 2018. Accelerating search-based program synthesis using learned probabilistic models. In Proceedings of the 39th ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI, Jeffrey S. Foster and Dan Grossman (Eds.). ACM, 436–449. https://doi.org/10.1145/3192366.3192410 Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Percy Liang, Michael I. Jordan, and Dan Klein. 2010. Learning Programs: A Hierarchical Bayesian Approach. In Proceedings of the 27th International Conference on Machine Learning (ICML-10), Johannes Fürnkranz and Thorsten Joachims (Eds.). Omnipress, 639–646. Google ScholarGoogle Scholar
  38. Dylan Lukes, John Sarracino, Cora Coleman, Hila Peleg, Sorin Lerner, and Nadia Polikarpova. 2021. Synthesis of web layouts from examples. In ESEC/FSE ’21: 29th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering, Athens, Greece, August 23-28, 2021, Diomidis Spinellis, Georgios Gousios, Marsha Chechik, and Massimiliano Di Penta (Eds.). ACM, 651–663. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Aditya Krishna Menon, Omer Tamuz, Sumit Gulwani, Butler W. Lampson, and Adam Kalai. 2013. A Machine Learning Framework for Programming by Example. In Proceedings of the 30th International Conference on Machine Learning, ICML (JMLR Workshop and Conference Proceedings, Vol. 28). JMLR.org, 187–195. http://proceedings.mlr.press/v28/menon13.html Google ScholarGoogle Scholar
  40. Anders Miltner, Kathleen Fisher, Benjamin C. Pierce, David Walker, and Steve Zdancewic. 2018. Synthesizing bijective lenses. Proc. ACM Program. Lang., 2, POPL (2018), 1:1–1:30. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Nagarajan Natarajan, Danny Simmons, Naren Datha, Prateek Jain, and Sumit Gulwani. 2019. Learning Natural Programs from a Few Examples in Real-Time. In AIStats. https://www.microsoft.com/en-us/research/publication/learning-natural-programs-from-a-few-examples-in-real-time/ Google ScholarGoogle Scholar
  42. Peter-Michael Osera and Steve Zdancewic. 2015. Type-and-example-directed program synthesis. In Proc. 36th ACM SIGPLAN Conf. on Programming Language Design and Implementation. ACM, 619–630. https://doi.org/10.1145/2737924.2738007 Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Rangeet Pan, Vu Le, Nachiappan Nagappan, Sumit Gulwani, Shuvendu K. Lahiri, and Mike Kaufman. 2021. Can Program Synthesis be Used to Learn Merge Conflict Resolutions? An Empirical Analysis. In 43rd IEEE/ACM International Conference on Software Engineering, ICSE 2021, Madrid, Spain, 22-30 May 2021. IEEE, 785–796. Google ScholarGoogle Scholar
  44. Nadia Polikarpova, Ivan Kuraj, and Armando Solar-Lezama. 2016. Program synthesis from polymorphic refinement types. In Proceedings of the 37th ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI 2016, Santa Barbara, CA, USA, June 13-17, 2016, Chandra Krintz and Emery D. Berger (Eds.). ACM, 522–538. https://doi.org/10.1145/2908080.2908093 Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Oleksandr Polozov and Sumit Gulwani. 2015. FlashMeta: A Framework for Inductive Program synthesis. In OOPSLA/SPLASH. 107–126. Google ScholarGoogle Scholar
  46. 2021. PowerFx: The low code programming language. https://powerapps.microsoft.com/en-us/blog/introducing-microsoft-power-fx-the-low-code-programming-language-for-everyone/ Accessed: 2021-11-19 Google ScholarGoogle Scholar
  47. Microsoft PROSE. 2022. PROSE public benchmark suite. https://github.com/microsoft/prose-benchmarks Google ScholarGoogle Scholar
  48. Kia Rahmani, Mohammad Raza, Sumit Gulwani, Vu Le, Daniel Morris, Arjun Radhakrishna, Gustavo Soares, and Ashish Tiwari. 2021. Multi-modal program inference: a marriage of pre-trained language models and component-based synthesis. Proc. ACM Program. Lang., 5, OOPSLA (2021), 1–29. Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. Reudismam Rolim, Gustavo Soares, Loris D’Antoni, Oleksandr Polozov, Sumit Gulwani, Rohit Gheyi, Ryo Suzuki, and Björn Hartmann. 2017. Learning syntactic program transformations from examples. In ICSE. IEEE / ACM, 404–415. Google ScholarGoogle Scholar
  50. Selenium. 2022. Selenium. https://github.com/SeleniumHQ/selenium Google ScholarGoogle Scholar
  51. Nischal Shrestha, Titus Barik, and Chris Parnin. 2018. It’s Like Python But: Towards Supporting Transfer of Programming Language Knowledge. In 2018 IEEE Symposium on Visual Languages and Human-Centric Computing, VL/HCC, Jácome Cunha, João Paulo Fernandes, Caitlin Kelleher, Gregor Engels, and Jorge Mendes (Eds.). IEEE Computer Society, 177–185. https://doi.org/10.1109/VLHCC.2018.8506508 Google ScholarGoogle ScholarCross RefCross Ref
  52. Rishabh Singh and Sumit Gulwani. 2015. Predicting a Correct Program in Programming by Example. In CAV. 398–414. Google ScholarGoogle Scholar
  53. Calvin Smith and Aws Albarghouthi. 2019. Program Synthesis with Equivalence Reduction. In VMCAI, Constantin Enea and Ruzica Piskac (Eds.). Google ScholarGoogle Scholar
  54. Abhishek Udupa, Arun Raghavan, Jyotirmoy V. Deshmukh, Sela Mador-Haim, Milo M. K. Martin, and Rajeev Alur. 2013. TRANSIT: specifying protocols with concolic snippets. In ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI, Hans-Juergen Boehm and Cormac Flanagan (Eds.). ACM, 287–296. https://doi.org/10.1145/2491956.2462174 Google ScholarGoogle ScholarDigital LibraryDigital Library
  55. Mark van den Brand, Jeroen Scheerder, Jurgen J. Vinju, and Eelco Visser. 2002. Disambiguation Filters for Scannerless Generalized LR Parsers. In Compiler Construction, 11th Intl. Conf, CC 2002, Part of ETAPS, Proceedings (Lecture Notes in Computer Science, Vol. 2304). Springer, 143–158. Google ScholarGoogle Scholar
  56. Gust Verbruggen, Vu Le, and Sumit Gulwani. 2021. Semantic programming by example with pre-trained models. Proc. ACM Program. Lang., 5, OOPSLA (2021), 1–25. https://doi.org/10.1145/3485477 Google ScholarGoogle ScholarDigital LibraryDigital Library
  57. Chenglong Wang, Alvin Cheung, and Rastislav Bodík. 2017. Synthesizing highly expressive SQL queries from input-output examples. In Proc. 38th ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI. ACM, 452–466. https://doi.org/10.1145/3062341.3062365 Google ScholarGoogle ScholarDigital LibraryDigital Library
  58. Xinyu Wang, Isil Dillig, and Rishabh Singh. 2018. Program synthesis using abstraction refinement. Proc. ACM Program. Lang., 2, POPL (2018), 63:1–63:30. https://doi.org/10.1145/3158151 Google ScholarGoogle ScholarDigital LibraryDigital Library
  59. Yuepeng Wang, Rushi Shah, Abby Criswell, Rong Pan, and Isil Dillig. 2020. Data Migration using Datalog Program Synthesis. Proc. VLDB Endow., 13, 7 (2020), 1006–1019. https://doi.org/10.14778/3384345.3384350 Google ScholarGoogle ScholarDigital LibraryDigital Library
  60. Tianyi Zhang, Zhiyang Chen, Yuanli Zhu, Priyan Vaithilingam, Xinyu Wang, and Elena L. Glassman. 2021. Interpretable Program Synthesis. Association for Computing Machinery, New York, NY, USA. isbn:9781450380966 https://doi.org/10.1145/3411764.3445646 Google ScholarGoogle ScholarDigital LibraryDigital Library
  61. Tianyi Zhang, London Lowmanstone, Xinyu Wang, and Elena L. Glassman. 2020. Interactive Program Synthesis by Augmented Examples. In Proceedings of the 33rd Annual ACM Symposium on User Interface Software and Technology (UIST ’20). Association for Computing Machinery, New York, NY, USA. 627–648. isbn:9781450375146 https://doi.org/10.1145/3379337.3415900 Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. FlashFill++: Scaling Programming by Example by Cutting to the Chase

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Article Metrics

        • Downloads (Last 12 months)522
        • Downloads (Last 6 weeks)48

        Other Metrics

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader
      About Cookies On This Site

      We use cookies to ensure that we give you the best experience on our website.

      Learn more

      Got it!