Abstract
Every database system contains a query optimizer that performs query rewrites. Unfortunately, developing query optimizers remains a highly challenging task. Part of the challenges comes from the intricacies and rich features of query languages, which makes reasoning about rewrite rules difficult. In this paper, we propose a machine-checkable denotational semantics for SQL, the de facto language for relational database, for rigorously validating rewrite rules. Unlike previously proposed semantics that are either non-mechanized or only cover a small amount of SQL language features, our semantics covers all major features of SQL, including bags, correlated subqueries, aggregation, and indexes. Our mechanized semantics, called HoTT SQL, is based on K-Relations and homotopy type theory, where we denote relations as mathematical functions from tuples to univalent types. We have implemented HoTTSQL in Coq, which takes only fewer than 300 lines of code and have proved a wide range of SQL rewrite rules, including those from database research literature (e.g., magic set rewrites) and real-world query optimizers (e.g., subquery elimination). Several of these rewrite rules have never been previously proven correct. In addition, while query equivalence is generally undecidable, we have implemented an automated decision procedure using HoTTSQL for conjunctive queries: a well studied decidable fragment of SQL that encompasses many real-world queries.
- S. Abiteboul, R. Hull, and V. Vianu. Foundations of Databases. Addison-Wesley, 1995. Google Scholar
Digital Library
- F. Bancilhon, D. Maier, Y. Sagiv, and J. D. Ullman. Magic sets and other strange ways to implement logic programs. In PODS, pages 1–15, 1986. Google Scholar
Digital Library
- B. Barras, B. Grégoire, A. Mahboubi, and L. Théry. Coq reference manual chapter 25: The ring and field tactic families. https://coq.inria.fr/refman/Reference-Manual028. html.Google Scholar
- V. Benzaken, E. Contejean, and S. Dumbrava. A coq formalization of the relational data model. In ESOP, pages 189–208, 2014. Google Scholar
Digital Library
- D. Bruijn and N. Govert. Lambda calculus notation with nameless dummies: A tool for automatic formula manipulation, with application to the church-rosser theorem. Indagationes Mathematicae, 34:381–392, 1972.Google Scholar
Cross Ref
- P. Buneman, L. Libkin, D. Suciu, V. Tannen, and L. Wong. Comprehension syntax. SIGMOD Record, 23(1):87–96, 1994. Google Scholar
Digital Library
- P. Buneman, S. A. Naqvi, V. Tannen, and L. Wong. Principles of programming with complex objects and collection types. Theor. Comput. Sci., 149(1):3–48, 1995. Google Scholar
Digital Library
- A. K. Chandra and P. M. Merlin. Optimal implementation of conjunctive queries in relational data bases. In STOC, pages 77–90. ACM, 1977. Google Scholar
Digital Library
- S. Chaudhuri and U. Dayal. An overview of data warehousing and OLAP technology. SIGMOD Record, 26(1):65–74, 1997. Google Scholar
Digital Library
- S. Chaudhuri and M. Y. Vardi. Optimization of Real conjunctive queries. In Proceedings of the Twelfth ACM SIGACTSIGMOD-SIGART Symposium on Principles of Database Systems, May 25-28, 1993, Washington, DC, USA, pages 59–70, 1993. Google Scholar
Digital Library
- S. Chaudhuri and M. Y. Vardi. On the complexity of equivalence between recursive and nonrecursive datalog programs. In PODS, pages 107–116. ACM Press, 1994. Google Scholar
Digital Library
- H. Chen, D. Ziegler, T. Chajed, A. Chlipala, M. F. Kaashoek, and N. Zeldovich. Using crash hoare logic for certifying the FSCQ file system. In SOSP, pages 18–37, 2015. Google Scholar
Digital Library
- M. Cherniack and S. B. Zdonik. Rule languages and internal algebras for rule-based optimizers. In SIGMOD Conference, pages 401–412. ACM Press, 1996. Google Scholar
Digital Library
- M. Cherniack and S. B. Zdonik. Changing the rules: Transformations for rule-based optimizers. In SIGMOD Conference, pages 61–72. ACM Press, 1998. Google Scholar
Digital Library
- A. Chlipala. The bedrock structured programming system: combining generative metaprogramming and hoare logic in an extensible program verifier. In ICFP, pages 391–402. ACM, 2013. Google Scholar
Digital Library
- S. Chu, K. Weitz, A. Cheung, and D. Suciu. HoTTSQL: Proving query rewrites with univalent SQL semantics. CoRR, abs/1607.04822, 2016. 04822.Google Scholar
- S. Chu, C. Wang, K. Weitz, and A. Cheung. Cosette: An automated prover for SQL. In CIDR. www.cidrdb.org, 2017.Google Scholar
- E. F. Codd. A relational model of data for large shared data banks. Commun. ACM, 13(6):377–387, 1970. Google Scholar
Digital Library
- E. F. Codd. Relational completeness of data base sublanguages. In: R. Rustin (ed.): Database Systems: 65-98, Prentice Hall and IBM Research Report RJ 987, San Jose, California, 1972.Google Scholar
- S. Dar, M. J. Franklin, B. T. J´onsson, D. Srivastava, and M. Tan. Semantic data caching and replacement. In VLDB, pages 330–341. Morgan Kaufmann, 1996. Google Scholar
Digital Library
- C. J. Date. A Guide to the SQL Standard, Second Edition. Addison-Wesley, 1989. ISBN 978-0-201-50209-1. Google Scholar
Digital Library
- R. A. Ganski and H. K. T. Wong. Optimization of nested SQL queries revisited. In SIGMOD Conference, pages 23–33. ACM Press, 1987. Google Scholar
Digital Library
- H. Garcia-Molina, J. D. Ullman, and J. Widom. Database systems - the complete book (2. ed.). Pearson Education, 2009. ISBN 978-0-13-187325-4. Google Scholar
Digital Library
- G. Geck, B. Ketsman, F. Neven, and T. Schwentick. Parallelcorrectness and containment for conjunctive queries with union and negation. In ICDT, volume 48 of LIPIcs, pages 9:1– 9:17. Schloss Dagstuhl - Leibniz-Zentrum fuer Informatik, 2016.Google Scholar
- G. Graefe. The cascades framework for query optimization. IEEE Data Eng. Bull., 18(3):19–29, 1995.Google Scholar
- G. Graefe and D. J. DeWitt. The EXODUS optimizer generator. In SIGMOD Conference, pages 160–172. ACM Press, 1987. Google Scholar
Digital Library
- G. Graefe and W. J. McKenna. The volcano optimizer generator: Extensibility and efficient search. In ICDE, pages 209– 218. IEEE Computer Society, 1993. Google Scholar
Digital Library
- T. J. Green, G. Karvounarakis, and V. Tannen. Provenance semirings. In PODS, pages 31–40, 2007. Google Scholar
Digital Library
- J. Gross, M. Shulman, A. Bauer, P. L. Lumsdaine, A. Mahboubi, and B. Spitters. The hott libary in coq. https: //github.com/HoTT/HoTT.Google Scholar
- J. V. Guttag and J. J. Horning. Larch: Languages and Tools for Formal Specification. Springer-Verlag New York, Inc., New York, NY, USA, 1993. ISBN 0-387-94006-5. Google Scholar
Digital Library
- L. M. Haas, J. C. Freytag, G. M. Lohman, and H. Pirahesh. Extensible query processing in starburst. In SIGMOD Conference, pages 377–388. ACM Press, 1989. Google Scholar
Digital Library
- C. Hawblitzel, J. Howell, J. R. Lorch, A. Narayan, B. Parno, D. Zhang, and B. Zill. Ironclad apps: End-to-end security via automated full-system verification. In OSDI, pages 165–181. USENIX Association, 2014. Google Scholar
Digital Library
- Y. E. Ioannidis and R. Ramakrishnan. Containment of conjunctive queries: Beyond relations as sets. ACM Trans. Database Syst., 20(3):288–324, 1995. Google Scholar
Digital Library
- ISO/IEC. Iso/iec 9075-1:2011. https://www.iso.org/obp/ ui/#iso:std:iso-iec:9075:-1:ed-4:v1:en. Online; accessed 9-May-2016.Google Scholar
- T. S. Jayram, P. G. Kolaitis, and E. Vee. The containment problem for REAL conjunctive queries with inequalities. In PODS, pages 80–89. ACM, 2006. Google Scholar
Digital Library
- M. A. Khamis, H. Q. Ngo, and A. Rudra. FAQ: questions asked frequently. In PODS, pages 13–28, 2016. Google Scholar
Digital Library
- G. Klein, K. Elphinstone, G. Heiser, J. Andronick, D. Cock, P. Derrin, D. Elkaduwe, K. Engelhardt, R. Kolanski, M. Norrish, T. Sewell, H. Tuch, and S. Winwood. sel4: formal verification of an OS kernel. In SOSP, pages 207–220. ACM, 2009. Google Scholar
Digital Library
- X. Leroy. Formal verification of a realistic compiler. Commun. ACM, 52(7):107–115, 2009. Google Scholar
Digital Library
- A. Y. Levy, I. S. Mumick, and Y. Sagiv. Query optimization by predicate move-around. In VLDB, pages 96–107. Morgan Kaufmann, 1994. Google Scholar
Digital Library
- G. Malecha and R. Wisnesky. Using dependent types and tactics to enable semantic optimization of language-integrated queries. In DBPL, pages 49–58. ACM, 2015. Google Scholar
Digital Library
- J. G. Malecha, G. Morrisett, A. Shinnar, and R. Wisnesky. Toward a verified relational database management system. In POPL, pages 237–248, 2010. Google Scholar
Digital Library
- I. S. Mumick, S. J. Finkelstein, H. Pirahesh, and R. Ramakrishnan. Magic is relevant. In SIGMOD Conference, pages 247–258, 1990. Google Scholar
Digital Library
- M. Muralikrishna. Improved unnesting algorithms for join aggregate SQL queries. In VLDB, pages 91–102. Morgan Kaufmann, 1992. Google Scholar
Digital Library
- M. Negri, G. Pelagatti, and L. Sbattella. Formal semantics of SQL queries. ACM Trans. Database Syst., 16(3):513–534, 1991. Google Scholar
Digital Library
- G. Nelson and D. C. Oppen. Fast decision procedures based on congruence closure. J. ACM, 27(2):356–364, 1980. Google Scholar
Digital Library
- H. Pirahesh, J. M. Hellerstein, and W. Hasan. Extensible/rule based query rewrite optimization in starburst. In SIGMOD Conference, pages 39–48. ACM Press, 1992. Google Scholar
Digital Library
- J. Rohmer, R. Lescoeur, and J. Kerisit. The alexander method - A technique for the processing of recursive axioms in deductive databases. New Generation Comput., 4(3):273–285, 1986. Google Scholar
Digital Library
- Y. Sagiv and M. Yannakakis. Equivalences among relational expressions with the union and difference operators. J. ACM, 27(4):633–655, 1980. Google Scholar
Digital Library
- D. Schmitt. Bug #5673: Optimizer creates strange execution plan leading to wrong results. https: //www.postgresql.org/message-id/201009231503.Google Scholar
- o8NF3B[email protected]. Online; accessed 1-July-2016.Google Scholar
- P. Seshadri, J. M. Hellerstein, H. Pirahesh, T. Y. C. Leung, R. Ramakrishnan, D. Srivastava, P. J. Stuckey, and S. Sudarshan. Cost-based optimization for magic: Algebra and implementation. In SIGMOD Conference, pages 435–446, 1996. Google Scholar
Digital Library
- M. Sulik. Bug #70038: Wrong select count distinct with a field included in two-column unique key. http://bugs.mysql. com/bug.php?id=70038. Online; accessed 1-July-2016.Google Scholar
- The Univalent Foundations Program. Homotopy Type Theory: Univalent Foundations of Mathematics. https: //homotopytypetheory.org/book, Institute for Advanced Study, 2013.Google Scholar
- B. A. Trakhtenbrot. Impossibility of an algorithm for the decision problem in finite classes. Dok. Akad. Nauk USSR, 70(1):569–572, 1950.Google Scholar
- Transaction Processing Performance Council (TPC). Tpc benchmark h revision 2.17.1. http://www.tpc.org/tpc documents current versions/pdf/tpc-h v2.17.1.pdf.Google Scholar
- O. G. Tsatalos, M. H. Solomon, and Y. E. Ioannidis. The GMAP: A versatile tool for physical data independence. In VLDB, pages 367–378. Morgan Kaufmann, 1994. Google Scholar
Digital Library
- J. D. Ullman. Principles of Database and Knowledge-Base Systems, Volume II. Computer Science Press, 1989. Google Scholar
Digital Library
- J. D. Ullman. Information integration using logical views. In ICDT, volume 1186 of Lecture Notes in Computer Science, pages 19–40. Springer, 1997. Google Scholar
Digital Library
- R. van der Meyden. The complexity of querying indefinite data about linearly ordered domains. In PODS, pages 331– 345, 1992. Google Scholar
Digital Library
- M. Veanes, P. Grigorenko, P. de Halleux, and N. Tillmann. Symbolic query exploration. In ICFEM, pages 49–68, 2009. Google Scholar
Digital Library
- M. Veanes, N. Tillmann, and J. de Halleux. Qex: Symbolic SQL query explorer. In LPAR (Dakar), volume 6355 of Lecture Notes in Computer Science, pages 425–446. Springer, 2010. Google Scholar
Digital Library
- D. Vytiniotis, S. L. P. Jones, K. Claessen, and D. Rosén. HALO: haskell to logic through denotational semantics. In POPL, pages 431–442, 2013. Google Scholar
Digital Library
- K. Weitz, D. Woos, E. Torlak, M. D. Ernst, A. Krishnamurthy, and Z. Tatlock. Bagpipe: Verified BGP configuration checking. Technical Report UW-CSE-16-01-01, University of Washington Department of Computer Science and Engineering, Seattle, WA, USA, Jan. 2016.Google Scholar
- J. R. Wilcox, D. Woos, P. Panchekha, Z. Tatlock, X. Wang, M. D. Ernst, and T. E. Anderson. Verdi: a framework for implementing and formally verifying distributed systems. In PLDI, pages 357–368. ACM, 2015. Google Scholar
Digital Library
Index Terms
HoTTSQL: proving query rewrites with univalent SQL semantics
Recommendations
A Coq mechanised formal semantics for realistic SQL queries: formally reconciling SQL and bag relational algebra
CPP 2019: Proceedings of the 8th ACM SIGPLAN International Conference on Certified Programs and ProofsIn this article, we provide a Coq mechanised, executable, formal semantics for a realistic fragment of SQL consisting of "select [distinct] from where group by having" queries with null values, functions, aggregates, quantifiers and nested potentially ...
Formal semantics of SQL queries
The semantics of SQL queries is formally defined by stating a set of rules that determine a syntax-driven translation of an SQL query to a formal model. The target model, called Extended Three Valued Predicate Calculus (E3VPC), is largely based on a set ...
HoTTSQL: proving query rewrites with univalent SQL semantics
PLDI 2017: Proceedings of the 38th ACM SIGPLAN Conference on Programming Language Design and ImplementationEvery database system contains a query optimizer that performs query rewrites. Unfortunately, developing query optimizers remains a highly challenging task. Part of the challenges comes from the intricacies and rich features of query languages, which ...






Comments