skip to main content
article

Transforming spreadsheet data types using examples

Published:11 January 2016Publication History
Skip Abstract Section

Abstract

Cleaning spreadsheet data types is a common problem faced by millions of spreadsheet users. Data types such as date, time, name, and units are ubiquitous in spreadsheets, and cleaning transformations on these data types involve parsing and pretty printing their string representations. This presents many challenges to users because cleaning such data requires some background knowledge about the data itself and moreover this data is typically non-uniform, unstructured, and ambiguous. Spreadsheet systems and Programming Languages provide some UI-based and programmatic solutions for this problem but they are either insufficient for the user's needs or are beyond their expertise. In this paper, we present a programming by example methodology of cleaning data types that learns the desired transformation from a few input-output examples. We propose a domain specific language with probabilistic semantics that is parameterized with declarative data type definitions. The probabilistic semantics is based on three key aspects: (i) approximate predicate matching, (ii) joint learning of data type interpretation, and (iii) weighted branches. This probabilistic semantics enables the language to handle non-uniform, unstructured, and ambiguous data. We then present a synthesis algorithm that learns the desired program in this language from a set of input-output examples. We have implemented our algorithm as an Excel add-in and present its successful evaluation on 55 benchmark problems obtained from online help forums and Excel product team.

References

  1. R. Alur, R. Bod´ık, E. Dallal, D. Fisman, P. Garg, G. Juniwal, H. Kress-Gazit, P. Madhusudan, M. M. K. Martin, M. Raghothaman, S. Saha, S. A. Seshia, R. Singh, A. Solar-Lezama, E. Torlak, and A. Udupa. Syntax-guided synthesis. In Dependable Software Systems Engineering, pages 1–25. 2015.Google ScholarGoogle Scholar
  2. S. Chaudhuri and U. Dayal. An overview of data warehousing and olap technology. ACM Sigmod record, 26(1):65–74, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. K. Fisher and R. Gruber. PADS: a domain-specific language for processing ad hoc data. In PLDI, pages 295–304, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. K. Fisher, Y. Mandelbaum, and D. Walker. The next 700 data description languages. In POPL, pages 2–15, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. K. Fisher, D. Walker, K. Q. Zhu, and P. White. From dirt to shovels: fully automatic tool generation from ad hoc data. In POPL, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. N. D. Goodman, V. K. Mansinghka, D. M. Roy, K. Bonawitz, and J. B. Tenenbaum. Church: a language for generative models. In UAI, pages 220–229, 2008.Google ScholarGoogle Scholar
  7. A. D. Gordon, T. Graepel, N. Rolland, C. Russo, J. Borgstrom, and J. Guiver. Tabular: A schema-driven probabilistic programming language. In POPL, pages 321–334, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. S. Gulwani. Program Analysis using Random Interpretation. PhD thesis, EECS Dept., UC Berkeley, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. S. Gulwani. Automating string processing in spreadsheets using inputoutput examples. In POPL, pages 317–330, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. S. Gulwani, W. R. Harris, and R. Singh. Spreadsheet data manipulation using examples. Communications of the ACM, 55(8), 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. S. Gulwani and G. C. Necula. Precise interprocedural analysis using random interpretation. In POPL, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. P. Hawkins, A. Aiken, K. Fisher, M. C. Rinard, and M. Sagiv. Data representation synthesis. In PLDI, pages 38–49, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. W. Kim and J. Seo. Classifying schematic and data heterogeneity in multidatabase systems. Computer, 24(12):12–18, 1991. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. V. Kuncak, M. Mayer, R. Piskac, and P. Suter. Complete functional synthesis. In PLDI, pages 316–329, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. T. Lau, S. Wolfman, P. Domingos, and D. Weld. Programming by demonstration using version space algebra. Machine Learning, 53(1- 2), 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. R. C. Miller and B. A. Myers. Interactive simultaneous editing of multiple text regions. In USENIX Annual Technical Conference, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. R. P. Nix. Editing by example. TOPLAS, 7(4):600–621, 1985. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. A. V. Nori, C. Hur, S. K. Rajamani, and S. Samuel. R2: an efficient MCMC sampler for probabilistic programs. In AAAI, pages 2476– 2482, 2014.Google ScholarGoogle Scholar
  19. A. V. Nori, S. Ozair, S. K. Rajamani, and D. Vijaykeerthy. Efficient synthesis of probabilistic programs. In PLDI, pages 208–217, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. P. M. Phothilimthana, T. Jelvis, R. Shah, N. Totla, S. Chasins, and R. Bod´ık. Chlorophyll: synthesis-aided compiler for low-power spatial architectures. In PLDI, page 42, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. V. Raman and J. M. Hellerstein. Potter’s wheel: An interactive data cleaning system. In VLDB, pages 381–390, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. V. Raychev, M. Schäfer, M. Sridharan, and M. T. Vechev. Refactoring with synthesis. In OOPSLA, pages 339–354, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. C. Scaffidi. Topes: Enabling end-user programmers to validate and reformat data, 2009.Google ScholarGoogle Scholar
  24. C. Scaffidi, B. A. Myers, and M. Shaw. Intelligently creating and recommending reusable reformatting rules. In IUI, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. R. Singh and S. Gulwani. Learning semantic string transformations from examples. PVLDB, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. R. Singh and S. Gulwani. Synthesizing number transformations from input-output examples. In CAV, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. R. Singh, S. Gulwani, and A. Solar-Lezama. Automated feedback generation for introductory programming assignments. In PLDI, pages 15–26, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. R. Singh and A. Solar-Lezama. Synthesizing data structure manipulations from storyboards. In SIGSOFT FSE, pages 289–299, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. A. Solar-Lezama. Program Synthesis By Sketching. PhD thesis, EECS Dept., UC Berkeley, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. A. Solar-Lezama, R. Rabbah, R. Bodik, and K. Ebcioglu. Programming by sketching for bit-streaming programs. In PLDI, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. E. Torlak and R. Bod´ık. Growing solver-aided languages with rosette. In Onward, pages 135–152, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. E. Torlak and R. Bod´ık. A lightweight symbolic virtual machine for solver-aided host languages. In PLDI, page 54, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. M. Vechev, E. Yahav, and G. Yorsh. Abstraction-guided synthesis of synchronization. In POPL, New York, NY, USA, 2010. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Q. Xi and D. Walker. A context-free markup language for semistructured text. In PLDI, pages 221–232, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library

Recommendations

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Sign in

Full Access

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader
About Cookies On This Site

We use cookies to ensure that we give you the best experience on our website.

Learn more

Got it!