skip to main content
article
Public Access

Synthesizing highly expressive SQL queries from input-output examples

Published:14 June 2017Publication History
Skip Abstract Section

Abstract

SQL is the de facto language for manipulating relational data. Though powerful, many users find it difficult to write SQL queries due to highly expressive constructs.

While using the programming-by-example paradigm to help users write SQL queries is an attractive proposition, as evidenced by online help forums such as Stack Overflow, developing techniques for synthesizing SQL queries from given input-output (I/O) examples has been difficult, due to the large space of SQL queries as a result of its rich set of operators.

In this paper, we present a new scalable and efficient algorithm for synthesizing SQL queries based on I/O examples. The key innovation of our algorithm is development of a language for abstract queries, i.e., queries with uninstantiated operators, that can be used to express a large space of SQL queries efficiently. Using abstract queries to represent the search space nicely decomposes the synthesis problem into two tasks: 1) searching for abstract queries that can potentially satisfy the given I/O examples, and 2) instantiating the found abstract queries and ranking the results.

We have implemented this algorithm in a new tool called Scythe and evaluated it using 193 benchmarks collected from Stack Overflow. Our evaluation shows that Scythe can efficiently solve 74% of the benchmarks, most in just a few seconds, and the queries range from simple ones involving a single selection to complex queries with 6 nested subqueires.

Skip Supplemental Material Section

Supplemental Material

References

  1. M. B. S. Ahmad and A. Cheung. Leveraging parallel data processing frameworks with verified lifting. In Proceedings of the Fifth Workshop on Synthesis, [email protected] 2016, Toronto, Canada, July 17-18, 2016., pages 67–83, 2016.Google ScholarGoogle ScholarCross RefCross Ref
  2. R. Alur, R. Bodík, G. Juniwal, M. M. K. Martin, M. Raghothaman, S. A. Seshia, R. Singh, A. Solar-Lezama, E. Torlak, and A. Udupa. Syntax-guided synthesis. In Formal Methods in Computer-Aided Design, FMCAD 2013, Portland, OR, USA, October 20-23, 2013, pages 1–8, 2013.Google ScholarGoogle Scholar
  3. R. Bodík and B. Jobstmann. Algorithmic program synthesis: introduction. STTT, 15(5-6):397–411, 2013.Google ScholarGoogle Scholar
  4. A. Cheung and A. Solar-Lezama. Computer-assisted query formulation. Found. Trends Program. Lang., 3(1):1–94, June 2016. ISSN 2325-1107. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. A. Cheung, A. Solar-Lezama, and S. Madden. Using program synthesis for social recommendations. In 21st ACM International Conference on Information and Knowledge Management, CIKM’12, Maui, HI, USA, October 29 - November 02, 2012, pages 1732–1736, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. S. Chu, C. Wang, K. Weitz, and A. Cheung. Cosette: An automated prover for SQL. In CIDR 2017, 8th Biennial Conference on Innovative Data Systems Research, Chaminade, CA, USA, January 8-11, 2017, Online Proceedings, 2017.Google ScholarGoogle Scholar
  7. S. Chu, K. Weitz, A. Cheung, and D. Suciu. Hottsql: Proving query rewrites with univalent SQL semantics. In ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI ’17 (To Appear), 2017. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. L. De Raedt. Inductive logic programming. In Encyclopedia of Machine Learning, pages 529–537. Springer, 2011.Google ScholarGoogle Scholar
  9. J. Frankle, P.-M. Osera, D. Walker, and S. Zdancewic. Exampledirected synthesis: A type-theoretic interpretation. In Proceedings of the 43rd Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, POPL 2016, pages 802–815, New York, NY, USA, 2016. ACM. ISBN 978-1- 4503-3549-2. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. J. Galenson, P. Reames, R. Bodik, B. Hartmann, and K. Sen. Codehint: Dynamic and interactive synthesis of code snippets. In Proceedings of the 36th International Conference on Software Engineering, pages 653–663. ACM, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. S. Gulwani. Dimensions in program synthesis. In Proceedings of the 12th International ACM SIGPLAN Symposium on Principles and Practice of Declarative Programming, pages 13–24. ACM, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. S. Gulwani. Automating string processing in spreadsheets using input-output examples. In ACM SIGPLAN Notices, volume 46, pages 317–330. ACM, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. W. R. Harris and S. Gulwani. Spreadsheet table transformations from examples. In ACM SIGPLAN Notices, volume 46, pages 317–328. ACM, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. A. Kiezun, V. Ganesh, P. J. Guo, P. Hooimeijer, and M. D. Ernst. Hampi: a solver for string constraints. In Proceedings of the Eighteenth International Symposium on Software Testing and Analysis, pages 105–116. ACM, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. D. Kini and S. Gulwani. Flashnormalize: Programming by examples for text normalization. In Proceedings of the 24th International Conference on Artificial Intelligence, IJCAI’15, pages 776–783. AAAI Press, 2015. ISBN 978-1-57735-738-4. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. T. Lau, S. A. Wolfman, P. Domingos, and D. S. Weld. Programming by demonstration using version space algebra. Machine Learning, 53(1-2):111–156, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. V. Le and S. Gulwani. Flashextract: A framework for data extraction by examples. In ACM SIGPLAN Notices, volume 49, pages 542–553. ACM, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. M. Lee, S. So, and H. Oh. Synthesizing regular expressions from examples for introductory automata assignments. In Proceedings of the 2016 ACM SIGPLAN International Conference on Generative Programming: Concepts and Experiences, pages 70–80. ACM, 2016. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. F. Li and H. V. Jagadish. NaLIR: an interactive natural language interface for querying relational databases. In Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data, pages 709–712. ACM, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. M. Mayer, G. Soares, M. Grechkin, V. Le, M. Marron, O. Polozov, R. Singh, B. Zorn, and S. Gulwani. User interaction models for disambiguation in programming by example. In Proceedings of the 28th Annual ACM Symposium on User Interface Software & Technology, pages 291–301. ACM, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. S. Muggleton, R. Otero, and A. Tamaddoni-Nezhad. Inductive Logic Programming, volume 38. Springer, 1992.Google ScholarGoogle Scholar
  22. M. Negri, G. Pelagatti, and L. Sbattella. Formal semantics of sql queries. ACM Transactions on Database Systems (TODS), 16(3):513–534, 1991. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. P.-M. Osera and S. Zdancewic. Type-and-example-directed program synthesis. In Proceedings of the 36th ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI 2015, pages 619–630, New York, NY, USA, 2015. ACM. ISBN 978-1-4503-3468-6. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. P. M. Phothilimthana, T. Jelvis, R. Shah, N. Totla, S. Chasins, and R. Bodik. Chlorophyll: Synthesis-aided compiler for low-power spatial architectures. In ACM SIGPLAN Notices, volume 49, pages 396–407. ACM, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. P. M. Phothilimthana, A. Thakur, R. Bodik, and D. Dhurjati. Scaling up superoptimization. In Proceedings of the Twenty-First International Conference on Architectural Support for Programming Languages and Operating Systems, pages 297– 310. ACM, 2016. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. O. Polozov and S. Gulwani. Flashmeta: A framework for inductive program synthesis. In Proceedings of the 2015 ACM SIGPLAN International Conference on Object-Oriented Programming, Systems, Languages, and Applications, pages 107–126. ACM, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. E. Schkufza, R. Sharma, and A. Aiken. Stochastic superoptimization. ACM SIGPLAN Notices, 48(4):305–316, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. R. Singh. Blinkfill: Semi-supervised programming by example for syntactic string transformations.Google ScholarGoogle Scholar
  29. R. Singh and S. Gulwani. Learning semantic string transformations from examples. Proceedings of the VLDB Endowment, 5 (8):740–751, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. R. Singh and S. Gulwani. Synthesizing number transformations from input-output examples. In International Conference on Computer Aided Verification, pages 634–651. Springer, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. R. Singh and S. Gulwani. Predicting a correct program in programming by example. In International Conference on Computer Aided Verification, pages 398–414. Springer, 2015.Google ScholarGoogle Scholar
  32. R. Singh and S. Gulwani. Transforming spreadsheet data types using examples. ACM SIGPLAN Notices, 51(1):343–356, 2016. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. C. Smith and A. Albarghouthi. Mapreduce program synthesis. In Proceedings of the 37th ACM SIGPLAN Conference on Programming Language Design and Implementation, pages 326–340. ACM, 2016. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. A. Solar-Lezama. Program Synthesis by Sketching. PhD thesis, Berkeley, CA, USA, 2008. AAI3353225. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. E. Torlak and R. Bodik. A lightweight symbolic virtual machine for solver-aided host languages. In Proceedings of the 35th ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI ’14, pages 530–541, New York, NY, USA, 2014. ACM. ISBN 978-1-4503-2784-8. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Q. T. Tran, C.-Y. Chan, and S. Parthasarathy. Query by output. In Proceedings of the 2009 ACM SIGMOD International Conference on Management of data, pages 535–548. ACM, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. A. Udupa, A. Raghavan, J. V. Deshmukh, S. Mador-Haim, M. M. Martin, and R. Alur. Transit: Specifying protocols with concolic snippets. SIGPLAN Not., 48(6):287–296, June 2013. ISSN 0362-1340. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. X. Wang, S. Gulwani, and R. Singh. Fidex: Filtering spreadsheet data using examples. In Proceedings of the 2016 ACM SIGPLAN International Conference on Object-Oriented Programming, Systems, Languages, and Applications, OOPSLA 2016, pages 195–213, New York, NY, USA, 2016. ACM. ISBN 978-1-4503-4444-9. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. N. Yaghmazadeh, C. Klinger, I. Dillig, and S. Chaudhuri. Synthesizing transformations on hierarchically structured data. In Proceedings of the 37th ACM SIGPLAN Conference on Programming Language Design and Implementation, pages 508–521. ACM, 2016. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. S. Zhang and Y. Sun. Automatically synthesizing sql queries from input-output examples. In 2013 IEEE/ACM 28th International Conference on Automated Software Engineering (ASE), pages 224–234. IEEE, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Synthesizing highly expressive SQL queries from input-output examples

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Published in

        cover image ACM SIGPLAN Notices
        ACM SIGPLAN Notices  Volume 52, Issue 6
        PLDI '17
        June 2017
        708 pages
        ISSN:0362-1340
        EISSN:1558-1160
        DOI:10.1145/3140587
        Issue’s Table of Contents
        • cover image ACM Conferences
          PLDI 2017: Proceedings of the 38th ACM SIGPLAN Conference on Programming Language Design and Implementation
          June 2017
          708 pages
          ISBN:9781450349888
          DOI:10.1145/3062341

        Copyright © 2017 ACM

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 14 June 2017

        Check for updates

        Qualifiers

        • article

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader
      About Cookies On This Site

      We use cookies to ensure that we give you the best experience on our website.

      Learn more

      Got it!