Abstract
The ability to reason about relational queries plays an important role across many types of database applications, such as test data generation, query equivalence checking, and computer-assisted query authoring. Unfortunately, symbolic reasoning about relational queries can be challenging because relational tables are multisets (bags) of tuples, and the underlying languages, such as SQL, can introduce complex computation among tuples.
We propose a space refinement algorithm that soundly reduces the space of tables such applications need to consider. The refinement procedure, independent of the specific dataset application, uses the abstract semantics of the query language to exploit the provenance of tuples in the query output to prune the search space. We implemented the refinement algorithm and evaluated it on SQL using three reasoning tasks: bounded query equivalence checking, test generation for applications that manipulate relational data, and concolic testing of database applications. Using real world benchmarks, we show that our refinement algorithm significantly speeds up (up to 100×) the SQL solver when reasoning about a large class of challenging SQL queries, such as those with aggregations.
Supplemental Material
- Amol Bhangdiya, Bikash Chandra, Biplab Kar, Bharath Radhakrishnan, KV Maheshwara Reddy, Shetal Shah, and S Sudarshan. 2015. The XDa-TA system for automated grading of SQL query assignments. In Data Engineering (ICDE), 2015 IEEE 31st International Conference on. IEEE, 1468–1471.Google Scholar
Cross Ref
- Rajendra Bose and James Frew. 2005. Lineage retrieval for scientific data processing: a survey. ACM Comput. Surv. 37, 1 (2005), 1–28. Google Scholar
Digital Library
- Peter Buneman, Adriane Chapman, and James Cheney. 2006. Provenance management in curated databases. In Proceedings of the ACM SIGMOD International Conference on Management of Data, Chicago, Illinois, USA, June 27-29, 2006. 539–550. Google Scholar
Digital Library
- Peter Buneman, Sanjeev Khanna, and Wang Chiew Tan. 2001. Why and Where: A Characterization of Data Provenance. In Database Theory - ICDT 2001, 8th International Conference, London, UK, January 4-6, 2001, Proceedings. 316–330. Google Scholar
Digital Library
- Bikash Chandra, Bhupesh Chawda, Biplab Kar, K. V. Maheshwara Reddy, Shetal Shah, and S. Sudarshan. 2015. Data generation for testing and grading SQL queries. VLDB J. 24, 6 (2015), 731–755. Google Scholar
Digital Library
- Satish Chandra, Stephen J. Fink, and Manu Sridharan. 2009. Snugglebug: a powerful approach to weakest preconditions. In Proceedings of the 2009 ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI 2009, Dublin, Ireland, June 15-21, 2009, Michael Hind and Amer Diwan (Eds.). ACM, 363–374. Google Scholar
Digital Library
- Surajit Chaudhuri and Moshe Y. Vardi. 1993. Optimization of Real Conjunctive Queries. In Proceedings of the Twelfth ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, May 25-28, 1993, Washington, DC, USA. 59–70. Google Scholar
Digital Library
- Alvin Cheung, Armando Solar-Lezama, and Samuel Madden. 2011. Partial Replay of Long-running Applications. In Proceedings of the 19th ACM SIGSOFT Symposium and the 13th European Conference on Foundations of Software Engineering (ESEC/FSE ’11). 135–145. Google Scholar
Digital Library
- Shumo Chu, Chenglong Wang, Konstantin Weitz, and Alvin Cheung. 2017a. Cosette: An Automated Prover for SQL. In CIDR 2017, 8th Biennial Conference on Innovative Data Systems Research, Chaminade, CA, USA, January 8-11, 2017, Online Proceedings. http://cidrdb.org/cidr2017/papers/p51-chu-cidr17.pdfGoogle Scholar
- Shumo Chu, Konstantin Weitz, Alvin Cheung, and Dan Suciu. 2017b. HoT TSQL: proving query rewrites with univalent SQL semantics. In Proceedings of the 38th ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI 2017, Barcelona, Spain, June 18-23, 2017. 510–524. Google Scholar
Digital Library
- E. F. Codd. 1970. A Relational Model of Data for Large Shared Data Banks. Commun. ACM 13, 6 (June 1970), 377–387. Google Scholar
Digital Library
- Sara Cohen. 2009. Equivalence of queries that are sensitive to multiplicities. VLDB J. 18, 3 (2009), 765–785. Google Scholar
Digital Library
- Sara Cohen, Werner Nutt, and Yehoshua Sagiv. 2007. Deciding equivalences among conjunctive aggregate queries. J. ACM 54, 2 (2007), 5. Google Scholar
Digital Library
- James M. Crawford, Matthew L. Ginsberg, Eugene M. Luks, and Amitabha Roy. 1996. Symmetry-Breaking Predicates for Search Problems. In Proceedings of the Fifth International Conference on Principles of Knowledge Representation and Reasoning (KR’96), Cambridge, Massachusetts, USA, November 5-8, 1996. 148–159. Google Scholar
Digital Library
- Yingwei Cui, Jennifer Widom, and Janet L. Wiener. 2000. Tracing the lineage of view data in a warehousing environment. ACM Trans. Database Syst. 25, 2 (2000), 179–227. Google Scholar
Digital Library
- David Déharbe, Pascal Fontaine, Stephan Merz, and Bruno Woltzenlogel Paleo. 2011. Exploiting Symmetry in SMT Problems. In Automated Deduction - CADE-23 - 23rd International Conference on Automated Deduction, Wroclaw, Poland, July 31 -August 5, 2011. Proceedings. 222–236. Google Scholar
Digital Library
- Evelyn Duesterwald, Rajiv Gupta, and Mary Lou Soffa. 1995. Demand-driven Computation of Interprocedural Data Flow. In Conference Record of POPL’95: 22nd ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, San Francisco, California, USA, January 23-25, 1995, Ron K. Cytron and Peter Lee (Eds.). ACM Press, 37–48. Google Scholar
Digital Library
- Yu Feng, Ruben Martins, Jacob Van Geffen, Isil Dillig, and Swarat Chaudhuri. 2017. Component-based synthesis of table consolidation and transformation tasks from examples. In Proceedings of the 38th ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI 2017, Barcelona, Spain, June 18-23, 2017. 422–436. Google Scholar
Digital Library
- John K. Feser, Swarat Chaudhuri, and Isil Dillig. 2015. Synthesizing data structure transformations from input-output examples. In Proceedings of the 36th ACM SIGPLAN Conference on Programming Language Design and Implementation, Portland, OR, USA, June 15-17, 2015. 229–239. Google Scholar
Digital Library
- Todd J. Green, Grigoris Karvounarakis, Zachary G. Ives, and Val Tannen. 2007. Update Exchange with Mappings and Provenance. In Proceedings of the 33rd International Conference on Very Large Data Bases, University of Vienna, Austria, September 23-27, 2007. 675–686. http://www.vldb.org/conf/2007/papers/research/p675-green.pdf Google Scholar
Digital Library
- Shelly Grossman, Sara Cohen, Shachar Itzhaky, Noam Rinetzky, and Mooly Sagiv. 2017. Verifying Equivalence of Spark Programs. In Computer Aided Verification - 29th International Conference, CAV 2017, Heidelberg, Germany, July 24-28, 2017, Proceedings, Part II. 282–300.Google Scholar
- Bhanu Pratap Gupta, Devang Vira, and S. Sudarshan. 2010. X-data: Generating test data for killing SQL mutants. In Proceedings of the 26th International Conference on Data Engineering, ICDE 2010, March 1-6, 2010, Long Beach, California, USA. 876–879.Google Scholar
Cross Ref
- Susan Horwitz, Thomas W. Reps, and Shmuel Sagiv. 1995. Demand Interprocedural Dataflow Analysis. In SIGSOFT ’95, Proceedings of the Third ACM SIGSOFT Symposium on Foundations of Software Engineering, Washington, DC, USA, October 10-13, 1995. 104–115. Google Scholar
Digital Library
- Mauro Negri, Giuseppe Pelagatti, and Licia Sbattella. 1991. Formal Semantics of SQL Queries. ACM Trans. Database Syst. 16, 3 (1991), 513–534. Google Scholar
Digital Library
- Nadia Polikarpova, Ivan Kuraj, and Armando Solar-Lezama. 2016. Program synthesis from polymorphic refinement types. In Proceedings of the 37th ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI 2016, Santa Barbara, CA, USA, June 13-17, 2016. 522–538. Google Scholar
Digital Library
- Yehoshua Sagiv and Mihalis Yannakakis. 1980. Equivalences Among Relational Expressions with the Union and Difference Operators. J. ACM 27, 4 (1980), 633–655. Google Scholar
Digital Library
- Max Schäfer and Oege de Moor. 2010. Type inference for datalog with complex type hierarchies. In Proceedings of the 37th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, POPL 2010, Madrid, Spain, January 17-23, 2010. 145–156. Google Scholar
Digital Library
- Shetal Shah, S. Sudarshan, Suhas Kajbaje, Sandeep Patidar, Bhanu Pratap Gupta, and Devang Vira. 2011. Generating test data for killing SQL mutants: A constraint-based approach. In Proceedings of the 27th International Conference on Data Engineering, ICDE 2011, April 11-16, 2011, Hannover, Germany. 1175–1186. Google Scholar
Digital Library
- Haruto Tanno, Xiaojing Zhang, Takashi Hoshino, and Koushik Sen. 2015. TesMa and CATG: automated test generation tools for models of enterprise applications. In Proceedings of the 37th International Conference on Software Engineering-Volume 2. IEEE Press, 717–720. Google Scholar
Digital Library
- Emina Torlak and Rastislav Bodík. 2014. A lightweight symbolic virtual machine for solver-aided host languages. In ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI ’14, Edinburgh, United Kingdom - June 09 - 11, 2014. 530–541. Google Scholar
Digital Library
- Margus Veanes, Pavel Grigorenko, Peli de Halleux, and Nikolai Tillmann. 2009. Symbolic Query Exploration. In Formal Methods and Software Engineering, 11th International Conference on Formal Engineering Methods, ICFEM 2009, Rio de Janeiro, Brazil, December 9-12, 2009. Proceedings. 49–68. Google Scholar
Digital Library
- Margus Veanes, Nikolai Tillmann, and Jonathan de Halleux. 2010. Qex: Symbolic SQL Query Explorer. In Logic for Programming, Artificial Intelligence, and Reasoning - 16th International Conference, LPAR-16, Dakar, Senegal, April 25-May 1, 2010, Revised Selected Papers. 425–446. Google Scholar
Digital Library
- Chenglong Wang, Alvin Cheung, and Rastislav Bodík. 2017. Synthesizing highly expressive SQL queries from input-output examples. In Proceedings of the 38th ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI 2017, Barcelona, Spain, June 18-23, 2017. 452–466. Google Scholar
Digital Library
- Yuepeng Wang, Isil Dillig, Shuvendu K. Lahiri, and William R. Cook. 2018. Verifying equivalence of database-driven applications. PACMPL 2, POPL (2018), 56:1–56:29. Google Scholar
Digital Library
Index Terms
Speeding up symbolic reasoning for relational queries
Recommendations
Relational-style XML query
SIGMOD '08: Proceedings of the 2008 ACM SIGMOD international conference on Management of dataWe study the problem of querying relational data embedded in XML. Relational data can be represented by various tree structures in XML. However, current XML query methods, such as XPath and XQuery, demand explicit path expressions, and thus it is quite ...
Translation of Object-Oriented Queries to Relational Queries
ICDE '95: Proceedings of the Eleventh International Conference on Data EngineeringProposes a formal approach for translating OODB queries to equivalent relational queries. The translation is accomplished through the use of relational predicate graphs and OODB predicate graphs. One advantage of using such a graph-based approach is ...
Algebraic optimization of relational queries with various kinds of preferences
SOFSEM'08: Proceedings of the 34th conference on Current trends in theory and practice of computer sciencePreferences can be used for information filtering and extraction to deliver the most relevant data to the user. Therefore the efficient integration of querying with preferences into standard database technology is an important issue. The paper resumes a ...






Comments