Abstract
SQL is the de facto language for manipulating relational data. Though powerful, many users find it difficult to write SQL queries due to highly expressive constructs.
While using the programming-by-example paradigm to help users write SQL queries is an attractive proposition, as evidenced by online help forums such as Stack Overflow, developing techniques for synthesizing SQL queries from given input-output (I/O) examples has been difficult, due to the large space of SQL queries as a result of its rich set of operators.
In this paper, we present a new scalable and efficient algorithm for synthesizing SQL queries based on I/O examples. The key innovation of our algorithm is development of a language for abstract queries, i.e., queries with uninstantiated operators, that can be used to express a large space of SQL queries efficiently. Using abstract queries to represent the search space nicely decomposes the synthesis problem into two tasks: 1) searching for abstract queries that can potentially satisfy the given I/O examples, and 2) instantiating the found abstract queries and ranking the results.
We have implemented this algorithm in a new tool called Scythe and evaluated it using 193 benchmarks collected from Stack Overflow. Our evaluation shows that Scythe can efficiently solve 74% of the benchmarks, most in just a few seconds, and the queries range from simple ones involving a single selection to complex queries with 6 nested subqueires.
Supplemental Material
Available for Download
This is a snapshot of our artifact for artifact evaluation. For up-to-date tool, visit http://scythe.cs.washington.edu
- M. B. S. Ahmad and A. Cheung. Leveraging parallel data processing frameworks with verified lifting. In Proceedings of the Fifth Workshop on Synthesis, [email protected] 2016, Toronto, Canada, July 17-18, 2016., pages 67–83, 2016.Google Scholar
Cross Ref
- R. Alur, R. Bodík, G. Juniwal, M. M. K. Martin, M. Raghothaman, S. A. Seshia, R. Singh, A. Solar-Lezama, E. Torlak, and A. Udupa. Syntax-guided synthesis. In Formal Methods in Computer-Aided Design, FMCAD 2013, Portland, OR, USA, October 20-23, 2013, pages 1–8, 2013.Google Scholar
- R. Bodík and B. Jobstmann. Algorithmic program synthesis: introduction. STTT, 15(5-6):397–411, 2013.Google Scholar
- A. Cheung and A. Solar-Lezama. Computer-assisted query formulation. Found. Trends Program. Lang., 3(1):1–94, June 2016. ISSN 2325-1107. Google Scholar
Digital Library
- A. Cheung, A. Solar-Lezama, and S. Madden. Using program synthesis for social recommendations. In 21st ACM International Conference on Information and Knowledge Management, CIKM’12, Maui, HI, USA, October 29 - November 02, 2012, pages 1732–1736, 2012. Google Scholar
Digital Library
- S. Chu, C. Wang, K. Weitz, and A. Cheung. Cosette: An automated prover for SQL. In CIDR 2017, 8th Biennial Conference on Innovative Data Systems Research, Chaminade, CA, USA, January 8-11, 2017, Online Proceedings, 2017.Google Scholar
- S. Chu, K. Weitz, A. Cheung, and D. Suciu. Hottsql: Proving query rewrites with univalent SQL semantics. In ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI ’17 (To Appear), 2017. Google Scholar
Digital Library
- L. De Raedt. Inductive logic programming. In Encyclopedia of Machine Learning, pages 529–537. Springer, 2011.Google Scholar
- J. Frankle, P.-M. Osera, D. Walker, and S. Zdancewic. Exampledirected synthesis: A type-theoretic interpretation. In Proceedings of the 43rd Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, POPL 2016, pages 802–815, New York, NY, USA, 2016. ACM. ISBN 978-1- 4503-3549-2. Google Scholar
Digital Library
- J. Galenson, P. Reames, R. Bodik, B. Hartmann, and K. Sen. Codehint: Dynamic and interactive synthesis of code snippets. In Proceedings of the 36th International Conference on Software Engineering, pages 653–663. ACM, 2014. Google Scholar
Digital Library
- S. Gulwani. Dimensions in program synthesis. In Proceedings of the 12th International ACM SIGPLAN Symposium on Principles and Practice of Declarative Programming, pages 13–24. ACM, 2010. Google Scholar
Digital Library
- S. Gulwani. Automating string processing in spreadsheets using input-output examples. In ACM SIGPLAN Notices, volume 46, pages 317–330. ACM, 2011. Google Scholar
Digital Library
- W. R. Harris and S. Gulwani. Spreadsheet table transformations from examples. In ACM SIGPLAN Notices, volume 46, pages 317–328. ACM, 2011. Google Scholar
Digital Library
- A. Kiezun, V. Ganesh, P. J. Guo, P. Hooimeijer, and M. D. Ernst. Hampi: a solver for string constraints. In Proceedings of the Eighteenth International Symposium on Software Testing and Analysis, pages 105–116. ACM, 2009. Google Scholar
Digital Library
- D. Kini and S. Gulwani. Flashnormalize: Programming by examples for text normalization. In Proceedings of the 24th International Conference on Artificial Intelligence, IJCAI’15, pages 776–783. AAAI Press, 2015. ISBN 978-1-57735-738-4. Google Scholar
Digital Library
- T. Lau, S. A. Wolfman, P. Domingos, and D. S. Weld. Programming by demonstration using version space algebra. Machine Learning, 53(1-2):111–156, 2003. Google Scholar
Digital Library
- V. Le and S. Gulwani. Flashextract: A framework for data extraction by examples. In ACM SIGPLAN Notices, volume 49, pages 542–553. ACM, 2014. Google Scholar
Digital Library
- M. Lee, S. So, and H. Oh. Synthesizing regular expressions from examples for introductory automata assignments. In Proceedings of the 2016 ACM SIGPLAN International Conference on Generative Programming: Concepts and Experiences, pages 70–80. ACM, 2016. Google Scholar
Digital Library
- F. Li and H. V. Jagadish. NaLIR: an interactive natural language interface for querying relational databases. In Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data, pages 709–712. ACM, 2014. Google Scholar
Digital Library
- M. Mayer, G. Soares, M. Grechkin, V. Le, M. Marron, O. Polozov, R. Singh, B. Zorn, and S. Gulwani. User interaction models for disambiguation in programming by example. In Proceedings of the 28th Annual ACM Symposium on User Interface Software & Technology, pages 291–301. ACM, 2015. Google Scholar
Digital Library
- S. Muggleton, R. Otero, and A. Tamaddoni-Nezhad. Inductive Logic Programming, volume 38. Springer, 1992.Google Scholar
- M. Negri, G. Pelagatti, and L. Sbattella. Formal semantics of sql queries. ACM Transactions on Database Systems (TODS), 16(3):513–534, 1991. Google Scholar
Digital Library
- P.-M. Osera and S. Zdancewic. Type-and-example-directed program synthesis. In Proceedings of the 36th ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI 2015, pages 619–630, New York, NY, USA, 2015. ACM. ISBN 978-1-4503-3468-6. Google Scholar
Digital Library
- P. M. Phothilimthana, T. Jelvis, R. Shah, N. Totla, S. Chasins, and R. Bodik. Chlorophyll: Synthesis-aided compiler for low-power spatial architectures. In ACM SIGPLAN Notices, volume 49, pages 396–407. ACM, 2014. Google Scholar
Digital Library
- P. M. Phothilimthana, A. Thakur, R. Bodik, and D. Dhurjati. Scaling up superoptimization. In Proceedings of the Twenty-First International Conference on Architectural Support for Programming Languages and Operating Systems, pages 297– 310. ACM, 2016. Google Scholar
Digital Library
- O. Polozov and S. Gulwani. Flashmeta: A framework for inductive program synthesis. In Proceedings of the 2015 ACM SIGPLAN International Conference on Object-Oriented Programming, Systems, Languages, and Applications, pages 107–126. ACM, 2015. Google Scholar
Digital Library
- E. Schkufza, R. Sharma, and A. Aiken. Stochastic superoptimization. ACM SIGPLAN Notices, 48(4):305–316, 2013. Google Scholar
Digital Library
- R. Singh. Blinkfill: Semi-supervised programming by example for syntactic string transformations.Google Scholar
- R. Singh and S. Gulwani. Learning semantic string transformations from examples. Proceedings of the VLDB Endowment, 5 (8):740–751, 2012. Google Scholar
Digital Library
- R. Singh and S. Gulwani. Synthesizing number transformations from input-output examples. In International Conference on Computer Aided Verification, pages 634–651. Springer, 2012. Google Scholar
Digital Library
- R. Singh and S. Gulwani. Predicting a correct program in programming by example. In International Conference on Computer Aided Verification, pages 398–414. Springer, 2015.Google Scholar
- R. Singh and S. Gulwani. Transforming spreadsheet data types using examples. ACM SIGPLAN Notices, 51(1):343–356, 2016. Google Scholar
Digital Library
- C. Smith and A. Albarghouthi. Mapreduce program synthesis. In Proceedings of the 37th ACM SIGPLAN Conference on Programming Language Design and Implementation, pages 326–340. ACM, 2016. Google Scholar
Digital Library
- A. Solar-Lezama. Program Synthesis by Sketching. PhD thesis, Berkeley, CA, USA, 2008. AAI3353225. Google Scholar
Digital Library
- E. Torlak and R. Bodik. A lightweight symbolic virtual machine for solver-aided host languages. In Proceedings of the 35th ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI ’14, pages 530–541, New York, NY, USA, 2014. ACM. ISBN 978-1-4503-2784-8. Google Scholar
Digital Library
- Q. T. Tran, C.-Y. Chan, and S. Parthasarathy. Query by output. In Proceedings of the 2009 ACM SIGMOD International Conference on Management of data, pages 535–548. ACM, 2009. Google Scholar
Digital Library
- A. Udupa, A. Raghavan, J. V. Deshmukh, S. Mador-Haim, M. M. Martin, and R. Alur. Transit: Specifying protocols with concolic snippets. SIGPLAN Not., 48(6):287–296, June 2013. ISSN 0362-1340. Google Scholar
Digital Library
- X. Wang, S. Gulwani, and R. Singh. Fidex: Filtering spreadsheet data using examples. In Proceedings of the 2016 ACM SIGPLAN International Conference on Object-Oriented Programming, Systems, Languages, and Applications, OOPSLA 2016, pages 195–213, New York, NY, USA, 2016. ACM. ISBN 978-1-4503-4444-9. Google Scholar
Digital Library
- N. Yaghmazadeh, C. Klinger, I. Dillig, and S. Chaudhuri. Synthesizing transformations on hierarchically structured data. In Proceedings of the 37th ACM SIGPLAN Conference on Programming Language Design and Implementation, pages 508–521. ACM, 2016. Google Scholar
Digital Library
- S. Zhang and Y. Sun. Automatically synthesizing sql queries from input-output examples. In 2013 IEEE/ACM 28th International Conference on Automated Software Engineering (ASE), pages 224–234. IEEE, 2013. Google Scholar
Digital Library
Index Terms
Synthesizing highly expressive SQL queries from input-output examples
Recommendations
Synthesizing highly expressive SQL queries from input-output examples
PLDI 2017: Proceedings of the 38th ACM SIGPLAN Conference on Programming Language Design and ImplementationSQL is the de facto language for manipulating relational data. Though powerful, many users find it difficult to write SQL queries due to highly expressive constructs.
While using the programming-by-example paradigm to help users write SQL queries is ...
Interactive Query Synthesis from Input-Output Examples
SIGMOD '17: Proceedings of the 2017 ACM International Conference on Management of DataThis demo showcases Scythe, a novel query-by-example system that can synthesize expressive SQL queries from input-output examples. Scythe is designed to help end-users program SQL and explore data simply using input-output examples. From a web-browser, ...
Duoquest: A Dual-Specification System for Expressive SQL Queries
SIGMOD '20: Proceedings of the 2020 ACM SIGMOD International Conference on Management of DataQuerying a relational database is difficult because it requires users to be familiar with both the SQL language and the schema. However, many users possess enough domain expertise to describe their desired queries by alternative means. For such users, ...






Comments