skip to main content
research-article
Open Access
Artifacts Available
Artifacts Evaluated & Functional

Speeding up symbolic reasoning for relational queries

Published:24 October 2018Publication History
Skip Abstract Section

Abstract

The ability to reason about relational queries plays an important role across many types of database applications, such as test data generation, query equivalence checking, and computer-assisted query authoring. Unfortunately, symbolic reasoning about relational queries can be challenging because relational tables are multisets (bags) of tuples, and the underlying languages, such as SQL, can introduce complex computation among tuples.

We propose a space refinement algorithm that soundly reduces the space of tables such applications need to consider. The refinement procedure, independent of the specific dataset application, uses the abstract semantics of the query language to exploit the provenance of tuples in the query output to prune the search space. We implemented the refinement algorithm and evaluated it on SQL using three reasoning tasks: bounded query equivalence checking, test generation for applications that manipulate relational data, and concolic testing of database applications. Using real world benchmarks, we show that our refinement algorithm significantly speeds up (up to 100×) the SQL solver when reasoning about a large class of challenging SQL queries, such as those with aggregations.

Skip Supplemental Material Section

Supplemental Material

a157-wang.webm

References

  1. Amol Bhangdiya, Bikash Chandra, Biplab Kar, Bharath Radhakrishnan, KV Maheshwara Reddy, Shetal Shah, and S Sudarshan. 2015. The XDa-TA system for automated grading of SQL query assignments. In Data Engineering (ICDE), 2015 IEEE 31st International Conference on. IEEE, 1468–1471.Google ScholarGoogle ScholarCross RefCross Ref
  2. Rajendra Bose and James Frew. 2005. Lineage retrieval for scientific data processing: a survey. ACM Comput. Surv. 37, 1 (2005), 1–28. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Peter Buneman, Adriane Chapman, and James Cheney. 2006. Provenance management in curated databases. In Proceedings of the ACM SIGMOD International Conference on Management of Data, Chicago, Illinois, USA, June 27-29, 2006. 539–550. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Peter Buneman, Sanjeev Khanna, and Wang Chiew Tan. 2001. Why and Where: A Characterization of Data Provenance. In Database Theory - ICDT 2001, 8th International Conference, London, UK, January 4-6, 2001, Proceedings. 316–330. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Bikash Chandra, Bhupesh Chawda, Biplab Kar, K. V. Maheshwara Reddy, Shetal Shah, and S. Sudarshan. 2015. Data generation for testing and grading SQL queries. VLDB J. 24, 6 (2015), 731–755. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Satish Chandra, Stephen J. Fink, and Manu Sridharan. 2009. Snugglebug: a powerful approach to weakest preconditions. In Proceedings of the 2009 ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI 2009, Dublin, Ireland, June 15-21, 2009, Michael Hind and Amer Diwan (Eds.). ACM, 363–374. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Surajit Chaudhuri and Moshe Y. Vardi. 1993. Optimization of Real Conjunctive Queries. In Proceedings of the Twelfth ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, May 25-28, 1993, Washington, DC, USA. 59–70. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Alvin Cheung, Armando Solar-Lezama, and Samuel Madden. 2011. Partial Replay of Long-running Applications. In Proceedings of the 19th ACM SIGSOFT Symposium and the 13th European Conference on Foundations of Software Engineering (ESEC/FSE ’11). 135–145. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Shumo Chu, Chenglong Wang, Konstantin Weitz, and Alvin Cheung. 2017a. Cosette: An Automated Prover for SQL. In CIDR 2017, 8th Biennial Conference on Innovative Data Systems Research, Chaminade, CA, USA, January 8-11, 2017, Online Proceedings. http://cidrdb.org/cidr2017/papers/p51-chu-cidr17.pdfGoogle ScholarGoogle Scholar
  10. Shumo Chu, Konstantin Weitz, Alvin Cheung, and Dan Suciu. 2017b. HoT TSQL: proving query rewrites with univalent SQL semantics. In Proceedings of the 38th ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI 2017, Barcelona, Spain, June 18-23, 2017. 510–524. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. E. F. Codd. 1970. A Relational Model of Data for Large Shared Data Banks. Commun. ACM 13, 6 (June 1970), 377–387. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Sara Cohen. 2009. Equivalence of queries that are sensitive to multiplicities. VLDB J. 18, 3 (2009), 765–785. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Sara Cohen, Werner Nutt, and Yehoshua Sagiv. 2007. Deciding equivalences among conjunctive aggregate queries. J. ACM 54, 2 (2007), 5. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. James M. Crawford, Matthew L. Ginsberg, Eugene M. Luks, and Amitabha Roy. 1996. Symmetry-Breaking Predicates for Search Problems. In Proceedings of the Fifth International Conference on Principles of Knowledge Representation and Reasoning (KR’96), Cambridge, Massachusetts, USA, November 5-8, 1996. 148–159. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Yingwei Cui, Jennifer Widom, and Janet L. Wiener. 2000. Tracing the lineage of view data in a warehousing environment. ACM Trans. Database Syst. 25, 2 (2000), 179–227. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. David Déharbe, Pascal Fontaine, Stephan Merz, and Bruno Woltzenlogel Paleo. 2011. Exploiting Symmetry in SMT Problems. In Automated Deduction - CADE-23 - 23rd International Conference on Automated Deduction, Wroclaw, Poland, July 31 -August 5, 2011. Proceedings. 222–236. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Evelyn Duesterwald, Rajiv Gupta, and Mary Lou Soffa. 1995. Demand-driven Computation of Interprocedural Data Flow. In Conference Record of POPL’95: 22nd ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, San Francisco, California, USA, January 23-25, 1995, Ron K. Cytron and Peter Lee (Eds.). ACM Press, 37–48. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Yu Feng, Ruben Martins, Jacob Van Geffen, Isil Dillig, and Swarat Chaudhuri. 2017. Component-based synthesis of table consolidation and transformation tasks from examples. In Proceedings of the 38th ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI 2017, Barcelona, Spain, June 18-23, 2017. 422–436. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. John K. Feser, Swarat Chaudhuri, and Isil Dillig. 2015. Synthesizing data structure transformations from input-output examples. In Proceedings of the 36th ACM SIGPLAN Conference on Programming Language Design and Implementation, Portland, OR, USA, June 15-17, 2015. 229–239. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Todd J. Green, Grigoris Karvounarakis, Zachary G. Ives, and Val Tannen. 2007. Update Exchange with Mappings and Provenance. In Proceedings of the 33rd International Conference on Very Large Data Bases, University of Vienna, Austria, September 23-27, 2007. 675–686. http://www.vldb.org/conf/2007/papers/research/p675-green.pdf Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Shelly Grossman, Sara Cohen, Shachar Itzhaky, Noam Rinetzky, and Mooly Sagiv. 2017. Verifying Equivalence of Spark Programs. In Computer Aided Verification - 29th International Conference, CAV 2017, Heidelberg, Germany, July 24-28, 2017, Proceedings, Part II. 282–300.Google ScholarGoogle Scholar
  22. Bhanu Pratap Gupta, Devang Vira, and S. Sudarshan. 2010. X-data: Generating test data for killing SQL mutants. In Proceedings of the 26th International Conference on Data Engineering, ICDE 2010, March 1-6, 2010, Long Beach, California, USA. 876–879.Google ScholarGoogle ScholarCross RefCross Ref
  23. Susan Horwitz, Thomas W. Reps, and Shmuel Sagiv. 1995. Demand Interprocedural Dataflow Analysis. In SIGSOFT ’95, Proceedings of the Third ACM SIGSOFT Symposium on Foundations of Software Engineering, Washington, DC, USA, October 10-13, 1995. 104–115. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Mauro Negri, Giuseppe Pelagatti, and Licia Sbattella. 1991. Formal Semantics of SQL Queries. ACM Trans. Database Syst. 16, 3 (1991), 513–534. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Nadia Polikarpova, Ivan Kuraj, and Armando Solar-Lezama. 2016. Program synthesis from polymorphic refinement types. In Proceedings of the 37th ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI 2016, Santa Barbara, CA, USA, June 13-17, 2016. 522–538. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Yehoshua Sagiv and Mihalis Yannakakis. 1980. Equivalences Among Relational Expressions with the Union and Difference Operators. J. ACM 27, 4 (1980), 633–655. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Max Schäfer and Oege de Moor. 2010. Type inference for datalog with complex type hierarchies. In Proceedings of the 37th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, POPL 2010, Madrid, Spain, January 17-23, 2010. 145–156. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Shetal Shah, S. Sudarshan, Suhas Kajbaje, Sandeep Patidar, Bhanu Pratap Gupta, and Devang Vira. 2011. Generating test data for killing SQL mutants: A constraint-based approach. In Proceedings of the 27th International Conference on Data Engineering, ICDE 2011, April 11-16, 2011, Hannover, Germany. 1175–1186. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Haruto Tanno, Xiaojing Zhang, Takashi Hoshino, and Koushik Sen. 2015. TesMa and CATG: automated test generation tools for models of enterprise applications. In Proceedings of the 37th International Conference on Software Engineering-Volume 2. IEEE Press, 717–720. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Emina Torlak and Rastislav Bodík. 2014. A lightweight symbolic virtual machine for solver-aided host languages. In ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI ’14, Edinburgh, United Kingdom - June 09 - 11, 2014. 530–541. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Margus Veanes, Pavel Grigorenko, Peli de Halleux, and Nikolai Tillmann. 2009. Symbolic Query Exploration. In Formal Methods and Software Engineering, 11th International Conference on Formal Engineering Methods, ICFEM 2009, Rio de Janeiro, Brazil, December 9-12, 2009. Proceedings. 49–68. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Margus Veanes, Nikolai Tillmann, and Jonathan de Halleux. 2010. Qex: Symbolic SQL Query Explorer. In Logic for Programming, Artificial Intelligence, and Reasoning - 16th International Conference, LPAR-16, Dakar, Senegal, April 25-May 1, 2010, Revised Selected Papers. 425–446. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Chenglong Wang, Alvin Cheung, and Rastislav Bodík. 2017. Synthesizing highly expressive SQL queries from input-output examples. In Proceedings of the 38th ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI 2017, Barcelona, Spain, June 18-23, 2017. 452–466. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Yuepeng Wang, Isil Dillig, Shuvendu K. Lahiri, and William R. Cook. 2018. Verifying equivalence of database-driven applications. PACMPL 2, POPL (2018), 56:1–56:29. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Speeding up symbolic reasoning for relational queries

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader
      About Cookies On This Site

      We use cookies to ensure that we give you the best experience on our website.

      Learn more

      Got it!