skip to main content
article
Public Access

HoTTSQL: proving query rewrites with univalent SQL semantics

Published:14 June 2017Publication History
Skip Abstract Section

Abstract

Every database system contains a query optimizer that performs query rewrites. Unfortunately, developing query optimizers remains a highly challenging task. Part of the challenges comes from the intricacies and rich features of query languages, which makes reasoning about rewrite rules difficult. In this paper, we propose a machine-checkable denotational semantics for SQL, the de facto language for relational database, for rigorously validating rewrite rules. Unlike previously proposed semantics that are either non-mechanized or only cover a small amount of SQL language features, our semantics covers all major features of SQL, including bags, correlated subqueries, aggregation, and indexes. Our mechanized semantics, called HoTT SQL, is based on K-Relations and homotopy type theory, where we denote relations as mathematical functions from tuples to univalent types. We have implemented HoTTSQL in Coq, which takes only fewer than 300 lines of code and have proved a wide range of SQL rewrite rules, including those from database research literature (e.g., magic set rewrites) and real-world query optimizers (e.g., subquery elimination). Several of these rewrite rules have never been previously proven correct. In addition, while query equivalence is generally undecidable, we have implemented an automated decision procedure using HoTTSQL for conjunctive queries: a well studied decidable fragment of SQL that encompasses many real-world queries.

References

  1. S. Abiteboul, R. Hull, and V. Vianu. Foundations of Databases. Addison-Wesley, 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. F. Bancilhon, D. Maier, Y. Sagiv, and J. D. Ullman. Magic sets and other strange ways to implement logic programs. In PODS, pages 1–15, 1986. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. B. Barras, B. Grégoire, A. Mahboubi, and L. Théry. Coq reference manual chapter 25: The ring and field tactic families. https://coq.inria.fr/refman/Reference-Manual028. html.Google ScholarGoogle Scholar
  4. V. Benzaken, E. Contejean, and S. Dumbrava. A coq formalization of the relational data model. In ESOP, pages 189–208, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. D. Bruijn and N. Govert. Lambda calculus notation with nameless dummies: A tool for automatic formula manipulation, with application to the church-rosser theorem. Indagationes Mathematicae, 34:381–392, 1972.Google ScholarGoogle ScholarCross RefCross Ref
  6. P. Buneman, L. Libkin, D. Suciu, V. Tannen, and L. Wong. Comprehension syntax. SIGMOD Record, 23(1):87–96, 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. P. Buneman, S. A. Naqvi, V. Tannen, and L. Wong. Principles of programming with complex objects and collection types. Theor. Comput. Sci., 149(1):3–48, 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. A. K. Chandra and P. M. Merlin. Optimal implementation of conjunctive queries in relational data bases. In STOC, pages 77–90. ACM, 1977. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. S. Chaudhuri and U. Dayal. An overview of data warehousing and OLAP technology. SIGMOD Record, 26(1):65–74, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. S. Chaudhuri and M. Y. Vardi. Optimization of Real conjunctive queries. In Proceedings of the Twelfth ACM SIGACTSIGMOD-SIGART Symposium on Principles of Database Systems, May 25-28, 1993, Washington, DC, USA, pages 59–70, 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. S. Chaudhuri and M. Y. Vardi. On the complexity of equivalence between recursive and nonrecursive datalog programs. In PODS, pages 107–116. ACM Press, 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. H. Chen, D. Ziegler, T. Chajed, A. Chlipala, M. F. Kaashoek, and N. Zeldovich. Using crash hoare logic for certifying the FSCQ file system. In SOSP, pages 18–37, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. M. Cherniack and S. B. Zdonik. Rule languages and internal algebras for rule-based optimizers. In SIGMOD Conference, pages 401–412. ACM Press, 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. M. Cherniack and S. B. Zdonik. Changing the rules: Transformations for rule-based optimizers. In SIGMOD Conference, pages 61–72. ACM Press, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. A. Chlipala. The bedrock structured programming system: combining generative metaprogramming and hoare logic in an extensible program verifier. In ICFP, pages 391–402. ACM, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. S. Chu, K. Weitz, A. Cheung, and D. Suciu. HoTTSQL: Proving query rewrites with univalent SQL semantics. CoRR, abs/1607.04822, 2016. 04822.Google ScholarGoogle Scholar
  17. S. Chu, C. Wang, K. Weitz, and A. Cheung. Cosette: An automated prover for SQL. In CIDR. www.cidrdb.org, 2017.Google ScholarGoogle Scholar
  18. E. F. Codd. A relational model of data for large shared data banks. Commun. ACM, 13(6):377–387, 1970. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. E. F. Codd. Relational completeness of data base sublanguages. In: R. Rustin (ed.): Database Systems: 65-98, Prentice Hall and IBM Research Report RJ 987, San Jose, California, 1972.Google ScholarGoogle Scholar
  20. S. Dar, M. J. Franklin, B. T. J´onsson, D. Srivastava, and M. Tan. Semantic data caching and replacement. In VLDB, pages 330–341. Morgan Kaufmann, 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. C. J. Date. A Guide to the SQL Standard, Second Edition. Addison-Wesley, 1989. ISBN 978-0-201-50209-1. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. R. A. Ganski and H. K. T. Wong. Optimization of nested SQL queries revisited. In SIGMOD Conference, pages 23–33. ACM Press, 1987. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. H. Garcia-Molina, J. D. Ullman, and J. Widom. Database systems - the complete book (2. ed.). Pearson Education, 2009. ISBN 978-0-13-187325-4. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. G. Geck, B. Ketsman, F. Neven, and T. Schwentick. Parallelcorrectness and containment for conjunctive queries with union and negation. In ICDT, volume 48 of LIPIcs, pages 9:1– 9:17. Schloss Dagstuhl - Leibniz-Zentrum fuer Informatik, 2016.Google ScholarGoogle Scholar
  25. G. Graefe. The cascades framework for query optimization. IEEE Data Eng. Bull., 18(3):19–29, 1995.Google ScholarGoogle Scholar
  26. G. Graefe and D. J. DeWitt. The EXODUS optimizer generator. In SIGMOD Conference, pages 160–172. ACM Press, 1987. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. G. Graefe and W. J. McKenna. The volcano optimizer generator: Extensibility and efficient search. In ICDE, pages 209– 218. IEEE Computer Society, 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. T. J. Green, G. Karvounarakis, and V. Tannen. Provenance semirings. In PODS, pages 31–40, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. J. Gross, M. Shulman, A. Bauer, P. L. Lumsdaine, A. Mahboubi, and B. Spitters. The hott libary in coq. https: //github.com/HoTT/HoTT.Google ScholarGoogle Scholar
  30. J. V. Guttag and J. J. Horning. Larch: Languages and Tools for Formal Specification. Springer-Verlag New York, Inc., New York, NY, USA, 1993. ISBN 0-387-94006-5. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. L. M. Haas, J. C. Freytag, G. M. Lohman, and H. Pirahesh. Extensible query processing in starburst. In SIGMOD Conference, pages 377–388. ACM Press, 1989. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. C. Hawblitzel, J. Howell, J. R. Lorch, A. Narayan, B. Parno, D. Zhang, and B. Zill. Ironclad apps: End-to-end security via automated full-system verification. In OSDI, pages 165–181. USENIX Association, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Y. E. Ioannidis and R. Ramakrishnan. Containment of conjunctive queries: Beyond relations as sets. ACM Trans. Database Syst., 20(3):288–324, 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. ISO/IEC. Iso/iec 9075-1:2011. https://www.iso.org/obp/ ui/#iso:std:iso-iec:9075:-1:ed-4:v1:en. Online; accessed 9-May-2016.Google ScholarGoogle Scholar
  35. T. S. Jayram, P. G. Kolaitis, and E. Vee. The containment problem for REAL conjunctive queries with inequalities. In PODS, pages 80–89. ACM, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. M. A. Khamis, H. Q. Ngo, and A. Rudra. FAQ: questions asked frequently. In PODS, pages 13–28, 2016. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. G. Klein, K. Elphinstone, G. Heiser, J. Andronick, D. Cock, P. Derrin, D. Elkaduwe, K. Engelhardt, R. Kolanski, M. Norrish, T. Sewell, H. Tuch, and S. Winwood. sel4: formal verification of an OS kernel. In SOSP, pages 207–220. ACM, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. X. Leroy. Formal verification of a realistic compiler. Commun. ACM, 52(7):107–115, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. A. Y. Levy, I. S. Mumick, and Y. Sagiv. Query optimization by predicate move-around. In VLDB, pages 96–107. Morgan Kaufmann, 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. G. Malecha and R. Wisnesky. Using dependent types and tactics to enable semantic optimization of language-integrated queries. In DBPL, pages 49–58. ACM, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. J. G. Malecha, G. Morrisett, A. Shinnar, and R. Wisnesky. Toward a verified relational database management system. In POPL, pages 237–248, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. I. S. Mumick, S. J. Finkelstein, H. Pirahesh, and R. Ramakrishnan. Magic is relevant. In SIGMOD Conference, pages 247–258, 1990. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. M. Muralikrishna. Improved unnesting algorithms for join aggregate SQL queries. In VLDB, pages 91–102. Morgan Kaufmann, 1992. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. M. Negri, G. Pelagatti, and L. Sbattella. Formal semantics of SQL queries. ACM Trans. Database Syst., 16(3):513–534, 1991. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. G. Nelson and D. C. Oppen. Fast decision procedures based on congruence closure. J. ACM, 27(2):356–364, 1980. Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. H. Pirahesh, J. M. Hellerstein, and W. Hasan. Extensible/rule based query rewrite optimization in starburst. In SIGMOD Conference, pages 39–48. ACM Press, 1992. Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. J. Rohmer, R. Lescoeur, and J. Kerisit. The alexander method - A technique for the processing of recursive axioms in deductive databases. New Generation Comput., 4(3):273–285, 1986. Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. Y. Sagiv and M. Yannakakis. Equivalences among relational expressions with the union and difference operators. J. ACM, 27(4):633–655, 1980. Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. D. Schmitt. Bug #5673: Optimizer creates strange execution plan leading to wrong results. https: //www.postgresql.org/message-id/201009231503.Google ScholarGoogle Scholar
  50. o8NF3B[email protected]. Online; accessed 1-July-2016.Google ScholarGoogle Scholar
  51. P. Seshadri, J. M. Hellerstein, H. Pirahesh, T. Y. C. Leung, R. Ramakrishnan, D. Srivastava, P. J. Stuckey, and S. Sudarshan. Cost-based optimization for magic: Algebra and implementation. In SIGMOD Conference, pages 435–446, 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. M. Sulik. Bug #70038: Wrong select count distinct with a field included in two-column unique key. http://bugs.mysql. com/bug.php?id=70038. Online; accessed 1-July-2016.Google ScholarGoogle Scholar
  53. The Univalent Foundations Program. Homotopy Type Theory: Univalent Foundations of Mathematics. https: //homotopytypetheory.org/book, Institute for Advanced Study, 2013.Google ScholarGoogle Scholar
  54. B. A. Trakhtenbrot. Impossibility of an algorithm for the decision problem in finite classes. Dok. Akad. Nauk USSR, 70(1):569–572, 1950.Google ScholarGoogle Scholar
  55. Transaction Processing Performance Council (TPC). Tpc benchmark h revision 2.17.1. http://www.tpc.org/tpc documents current versions/pdf/tpc-h v2.17.1.pdf.Google ScholarGoogle Scholar
  56. O. G. Tsatalos, M. H. Solomon, and Y. E. Ioannidis. The GMAP: A versatile tool for physical data independence. In VLDB, pages 367–378. Morgan Kaufmann, 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  57. J. D. Ullman. Principles of Database and Knowledge-Base Systems, Volume II. Computer Science Press, 1989. Google ScholarGoogle ScholarDigital LibraryDigital Library
  58. J. D. Ullman. Information integration using logical views. In ICDT, volume 1186 of Lecture Notes in Computer Science, pages 19–40. Springer, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  59. R. van der Meyden. The complexity of querying indefinite data about linearly ordered domains. In PODS, pages 331– 345, 1992. Google ScholarGoogle ScholarDigital LibraryDigital Library
  60. M. Veanes, P. Grigorenko, P. de Halleux, and N. Tillmann. Symbolic query exploration. In ICFEM, pages 49–68, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  61. M. Veanes, N. Tillmann, and J. de Halleux. Qex: Symbolic SQL query explorer. In LPAR (Dakar), volume 6355 of Lecture Notes in Computer Science, pages 425–446. Springer, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  62. D. Vytiniotis, S. L. P. Jones, K. Claessen, and D. Rosén. HALO: haskell to logic through denotational semantics. In POPL, pages 431–442, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  63. K. Weitz, D. Woos, E. Torlak, M. D. Ernst, A. Krishnamurthy, and Z. Tatlock. Bagpipe: Verified BGP configuration checking. Technical Report UW-CSE-16-01-01, University of Washington Department of Computer Science and Engineering, Seattle, WA, USA, Jan. 2016.Google ScholarGoogle Scholar
  64. J. R. Wilcox, D. Woos, P. Panchekha, Z. Tatlock, X. Wang, M. D. Ernst, and T. E. Anderson. Verdi: a framework for implementing and formally verifying distributed systems. In PLDI, pages 357–368. ACM, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. HoTTSQL: proving query rewrites with univalent SQL semantics

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        Full Access

        • Published in

          cover image ACM SIGPLAN Notices
          ACM SIGPLAN Notices  Volume 52, Issue 6
          PLDI '17
          June 2017
          708 pages
          ISSN:0362-1340
          EISSN:1558-1160
          DOI:10.1145/3140587
          Issue’s Table of Contents
          • cover image ACM Conferences
            PLDI 2017: Proceedings of the 38th ACM SIGPLAN Conference on Programming Language Design and Implementation
            June 2017
            708 pages
            ISBN:9781450349888
            DOI:10.1145/3062341

          Copyright © 2017 ACM

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 14 June 2017

          Check for updates

          Qualifiers

          • article

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader
        About Cookies On This Site

        We use cookies to ensure that we give you the best experience on our website.

        Learn more

        Got it!