Abstract

Every day, millions of computer end-users need to perform tasks over large, tabular data, yet lack the programming knowledge to do such tasks automatically. In this work, we present an automatic technique that takes from a user an example of how the user needs to transform a table of data, and provides to the user a program that implements the transformation described by the example. In particular, we present a language of programs TableProg that can describe transformations that real users require.We then present an algorithm ProgFromEx that takes an example input and output table, and infers a program in TableProg that implements the transformation described by the example. When the program is applied to the example input, it reproduces the example output. When the program is applied to another, potentially larger, table with a 'similar' layout as the example input table, then the program produces a corresponding table with a layout that is similar to the example output table. A user can apply ProgFromEx interactively, providing multiple small examples to obtain a program that implements the transformation that the user desires. Moreover, ProgFromEx can help identify 'noisy' examples that contain errors.
To evaluate the practicality of TableProg and ProgFromEx, we implemented ProgFromEx as a module for the Microsoft Excel spreadsheet program. We applied the module to automatically implement over 50 table transformations specified by endusers through examples on online Excel help forums. In seconds, ProgFromEx found programs that satisfied the examples and could be applied to larger input tables. This experience demonstrates that TableProg and ProgFromEx can significantly automate the tasks over tabular data that users need to perform.
- R. Abraham and M. Erwig. Header and unit inference for spreadsheets through spatial analyses. In phProceedings of the 2004 IEEE Symposium on Visual Languages - Human Centric Computing (VLHCC), pages 165--172, 2004. Google Scholar
Digital Library
- D. M. Barbosa, J. Cretin, N. Foster, M. Greenberg, and B. C. Pierce. Matching lenses: alignment and view update. In ICFP '10, 2010. Google Scholar
Digital Library
- A. Cypher, editor. Watch What I Do -- Programming by Demonstration. MIT Press, Cambridge, MA, USA, 1993. Full text available at web.media.mit.edu/ lieber/PBE/. Google Scholar
Digital Library
- A. Das Sarma, A. Parameswaran, H. Garcia-Molina, and J. Widom. Synthesizing view definitions from data. In ICDT '10, 2010. Google Scholar
Digital Library
- Excel. Microsoft Excel, 2010. URL http://office.microsoft.com/en-us/excel.Google Scholar
- Excel Help Forums. Excel help forum threads, 2010. URL http://cs.wisc.edu/~wrharris/pldi2011/tests.html.Google Scholar
- K. Fisher, D. Walker, K. Q. Zhu, and P. White. From dirt to shovels: fully automatic tool generation from ad hoc data. In POPL '08, 2008. Google Scholar
Digital Library
- S. Gulwani. Dimensions in program synthesis (invited talk paper). In ACM Symposium on PPDP, 2010. Google Scholar
Digital Library
- S. Gulwani. Automating string processing in spreadsheets using input-output examples. In POPL, pages 317--330, 2011. Google Scholar
Digital Library
- S. Gulwani, S. K. Jha, A. Tiwari, and R. Venkatesan. Synthesis of loop-free programs. In PLDI, 2011. Google Scholar
Digital Library
- W. R. Harris and S. Gulwani. Table Transformations from Examples. Technical Report MSR-TR-2011-34, Microsoft Research, Redmond, March 2011.Google Scholar
- S. Itzhaky, S. Gulwani, N. Immerman, and M. Sagiv. A simple inductive synthesis methodology and its applications. In OOPSLA, pages 36--46, 2010. Google Scholar
Digital Library
- S. Jha, S. Gulwani, S. A. Seshia, and A. Tiwari. Oracle-guided component-based program synthesis. In ICSE '10, 2010. Google Scholar
Digital Library
- S. Kandel, A. Paepcke, J. Hellerstein, and J. Heer. Wrangler: Interactive visual specification of data transformation scripts. In ACM Human Factors in Computing Systems (CHI), 2011. Google Scholar
Digital Library
- T. Lau, S. A. Wolfman, P. Domingos, and D. S. Weld. Programming by demonstration using version space algebra. Mach. Learn., 53 (1--2): 111--156, 2003. ISSN 0885-6125. Google Scholar
Digital Library
- T. A. Lau, P. Domingos, and D. S. Weld. Version space algebra and its application to programming by demonstration. In ICML '00, 2000. Google Scholar
Digital Library
- H. Lieberman, editor. Your wish is my command: programming by example. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 2001. ISBN 1-55860-688-2.Google Scholar
- R. Nix. Editing by example. In POPL '84, 1984. Google Scholar
Digital Library
- OpenOffice. Openoffice.org, 2010. URL http://www.openoffice.org/.Google Scholar
- SKETCH. Sketch, 2010. URL https://bitbucket.org/gatoatigrado/sketch-frontend/ wiki/Home.Google Scholar
- A. Solar-Lezama, G. Arnold, L. Tancau, R. Bodik, V. Saraswat, and S. Seshia. Sketching stencils. In PLDI '07, 2007. Google Scholar
Digital Library
- S. Srivastava, S. Gulwani, and J. S. Foster. From program verification to program synthesis. In POPL, pages 313--326, 2010. Google Scholar
Digital Library
- S. Srivastava, S. Gulwani, S. Chaudhuri, and J. Foster. Path-based inductive synthesis for program inversion. In PLDI, 2011. Google Scholar
Digital Library
- A. Taly, S. Gulwani, and A. Tiwari. Synthesizing switching logic using constraint solving. In VMCAI, pages 305--319, 2009. Google Scholar
Digital Library
- Q. T. Tran, C.-Y. Chan, and S. Parthasarathy. Query by output. In SIGMOD '09, 2009. Google Scholar
Digital Library
- I. H. Witten and D. Mo. TELS: learning text editing tasks from examples, pages 183--203. MIT Press, 1993. Google Scholar
Digital Library
Recommendations
Automating string processing in spreadsheets using input-output examples
POPL '11We describe the design of a string programming/expression language that supports restricted forms of regular expressions, conditionals and loops. The language is expressive enough to represent a wide variety of string manipulation tasks that end-users ...
Spreadsheet table transformations from examples
PLDI '11: Proceedings of the 32nd ACM SIGPLAN Conference on Programming Language Design and ImplementationEvery day, millions of computer end-users need to perform tasks over large, tabular data, yet lack the programming knowledge to do such tasks automatically. In this work, we present an automatic technique that takes from a user an example of how the ...
NLyze: interactive programming by natural language for spreadsheet data analysis and manipulation
SIGMOD '14: Proceedings of the 2014 ACM SIGMOD International Conference on Management of DataMillions of computer end users need to perform tasks over tabular spreadsheet data, yet lack the programming knowledge to do such tasks automatically. This paper describes the design and implementation of a robust natural language based interface to ...







Comments