skip to main content
research-article
Open Access

Deductive optimization of relational data storage

Published:13 November 2020Publication History
Skip Abstract Section

Abstract

Optimizing the physical data storage and retrieval of data are two key database management problems. In this paper, we propose a language that can express both a relational query and the layout of its data. Our language can express a wide range of physical database layouts, going well beyond the row- and column-based methods that are widely used in database management systems. We use deductive program synthesis to turn a high-level relational representation of a database query into a highly optimized low-level implementation which operates on a specialized layout of the dataset. We build an optimizing compiler for this language and conduct experiments using a popular database benchmark, which shows that the performance of our specialized queries is better than a state-of-the-art in memory compiled database system while achieving an order-of-magnitude reduction in memory use.

Skip Supplemental Material Section

Supplemental Material

Auxiliary Presentation Video

Presentation video for the paper "Deductive Optimization of Relational Data Storage" at OOPSLA 2020.

References

  1. Anastassia Ailamaki, David J. DeWitt, Mark D. Hill, and Marios Skounakis. 2001. Weaving Relations for Cache Performance. In VLDB 2001, Proceedings of 27th International Conference on Very Large Data Bases, September 11-14, 2001, Roma, Italy, Peter M. G. Apers, Paolo Atzeni, Stefano Ceri, Stefano Paraboschi, Kotagiri Ramamohanarao, and Richard T. Snodgrass (Eds.). Morgan Kaufmann, 169-180. http://www.vldb.org/conf/2001/P169.pdfGoogle ScholarGoogle Scholar
  2. Lee Blaine, Limei Gilham, Junbo Liu, Douglas R. Smith, and Stephen J. Westfold. 1998. Planware-Domain-Specific Synthesis of High-Performance Schedulers. In The Thirteenth IEEE Conference on Automated Software Engineering, ASE 1998, Honolulu, Hawaii, USA, October 13-16, 1998. IEEE Computer Society, 270. https://doi.org/10.1109/ASE. 1998.732672 Google ScholarGoogle ScholarCross RefCross Ref
  3. Peter A. Boncz and Martin L. Kersten. 1999. MIL Primitives for Querying a Fragmented World. VLDB J. 8, 2 ( 1999 ), 101-119. https://doi.org/10.1007/s007780050076 Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Fabiano C. Botelho, Rasmus Pagh, and Nivio Ziviani. 2007. Simple and Space-Eficient Minimal Perfect Hash Functions. In Algorithms and Data Structures, 10th International Workshop, WADS 2007, Halifax, Canada, August 15-17, 2007, Proceedings (Lecture Notes in Computer Science), Frank K. H. A. Dehne, Jörg-Rüdiger Sack, and Norbert Zeh (Eds.), Vol. 4619. Springer, 139-150. https://doi.org/10.1007/978-3-540-73951-7_13 Google ScholarGoogle ScholarCross RefCross Ref
  5. Nicolas Bruno and Surajit Chaudhuri. 2005. Automatic Physical Database Tuning: A Relaxation-based Approach. In Proceedings of the ACM SIGMOD International Conference on Management of Data, Baltimore, Maryland, USA, June 14-16, 2005, Fatma Özcan (Ed.). ACM, 227-238. https://doi.org/10.1145/1066157.1066184 Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Surajit Chaudhuri. 1998. An Overview of Query Optimization in Relational Systems. In Proceedings of the Seventeenth ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, June 1-3, 1998, Seattle, Washington, USA, Alberto O. Mendelzon and Jan Paredaens (Eds.). ACM Press, 34-43. https://doi.org/10.1145/275487.275492 Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Alvin Cheung, Armando Solar-Lezama, and Samuel Madden. 2013. Optimizing database-backed applications with query synthesis. In ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI '13, Seattle, WA, USA, June 16-19, 2013, Hans-Juergen Boehm and Cormac Flanagan (Eds.). ACM, 3-14. https://doi.org/10.1145/2491956.2462180 Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Rada Chirkova and Michael R. Genesereth. 2000. Linearly Bounded Reformulations of Conjunctive Databases. In Computational Logic-CL 2000, First International Conference, London, UK, 24-28 July, 2000, Proceedings (Lecture Notes in Computer Science), John W. Lloyd, Verónica Dahl, Ulrich Furbach, Manfred Kerber, Kung-Kiu Lau, Catuscia Palamidessi, Luís Moniz Pereira, Yehoshua Sagiv, and Peter J. Stuckey (Eds.), Vol. 1861. Springer, 987-1001. https://doi.org/10.1007/3-540-44957-4_66 Google ScholarGoogle ScholarCross RefCross Ref
  9. E. F. Codd. 1970. A Relational Model of Data for Large Shared Data Banks. Commun. ACM 13, 6 ( 1970 ), 377-387. https: //doi.org/10.1145/362384.362685 Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. E. F. Codd. 1971. A Database Sublanguage Founded on the Relational Calculus. In Proceedings of 1971 ACM-SIGFIDET Workshop on Data Description, Access and Control, San Diego, California, USA, November 11-12, 1971, E. F. Codd and A. L. Dean (Eds.). ACM, 35-68.Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Transaction Processing Performance Council. 2008. TPC-H Benchmark Specification. 21 ( 2008 ), 592-603.Google ScholarGoogle Scholar
  12. Philippe Cudré-Mauroux, Eugene Wu, and Samuel Madden. 2009. The Case for RodentStore: An Adaptive, Declarative Storage System. In CIDR 2009, Fourth Biennial Conference on Innovative Data Systems Research, Asilomar, CA, USA, January 4-7, 2009, Online Proceedings. www.cidrdb.org. http://www-db.cs.wisc.edu/cidr/cidr2009/Paper_97.pdfGoogle ScholarGoogle Scholar
  13. Davi de Castro Reis, Djamel Belazzougui, Fabiano Cupertino Botelho, and Nivio Ziviani. 2011. CMPH: C Minimal Perfect Hashing Library. http://cmph.sourceforge.netGoogle ScholarGoogle Scholar
  14. Benjamin Delaware, Clément Pit-Claudel, Jason Gross, and Adam Chlipala. 2015. Fiat: Deductive Synthesis of Abstract Data Types in a Proof Assistant. In Proceedings of the 42nd Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, POPL 2015, Mumbai, India, January 15-17, 2015, Sriram K. Rajamani and David Walker (Eds.). ACM, 689-700. https://doi.org/10.1145/2676726.2677006 Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Matthias Felleisen and Robert Hieb. 1992. The Revised Report on the Syntactic Theories of Sequential Control and State. Theor. Comput. Sci. 103, 2 ( 1992 ), 235-271. https://doi.org/10.1016/ 0304-3975 ( 92 ) 90014-7 Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Goetz Graefe. 1994. Volcano-An Extensible and Parallel Query Evaluation System. IEEE Trans. Knowl. Data Eng. 6, 1 ( 1994 ), 120-135. https://doi.org/10.1109/69.273032 Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Himanshu Gupta, Venky Harinarayan, Anand Rajaraman, and Jefrey D. Ullman. 1997. Index Selection for OLAP. In Proceedings of the Thirteenth International Conference on Data Engineering, April 7-11, 1997, Birmingham, UK, W. A. Gray and Per-Åke Larson (Eds.). IEEE Computer Society, 208-219. https://doi.org/10.1109/ICDE. 1997.581755 Google ScholarGoogle ScholarCross RefCross Ref
  18. Angélica García Gutiérrez and Peter Baumann. 2007. Modeling Fundamental Geo-Raster Operations with Array Algebra. In Workshops Proceedings of the 7th IEEE International Conference on Data Mining (ICDM 2007 ), October 28-31, 2007, Omaha, Nebraska, USA. 607-612. https://doi.org/10.1109/ICDMW. 2007.53 Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Alon Y. Halevy. 2001. Answering queries using views: A survey. VLDB J. 10, 4 ( 2001 ), 270-294. https://doi.org/10.1007/ s007780100054 Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Peter Hawkins, Alex Aiken, Kathleen Fisher, Martin C. Rinard, and Mooly Sagiv. 2010. Data Structure Fusion. In Programming Languages and Systems-8th Asian Symposium, APLAS 2010, Shanghai, China, November 28-December 1, 2010. Proceedings (Lecture Notes in Computer Science), Kazunori Ueda (Ed.), Vol. 6461. Springer, 204-221. https://doi.org/10.1007/978-3-642-17164-2_15 Google ScholarGoogle ScholarCross RefCross Ref
  21. Peter Hawkins, Alex Aiken, Kathleen Fisher, Martin C. Rinard, and Mooly Sagiv. 2011. Data representation synthesis. In Proceedings of the 32nd ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI 2011, San Jose, CA, USA, June 4-8, 2011, Mary W. Hall and David A. Padua (Eds.). ACM, 38-49. https://doi.org/10.1145/1993498.1993504 Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Matthias Jarke and Jürgen Koch. 1984. Query Optimization in Database Systems. ACM Comput. Surv. 16, 2 ( 1984 ), 111-152. https://doi.org/10.1145/356924.356928 Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Yannis Klonatos, Christoph Koch, Tiark Rompf, and Hassan Chafi. 2014. Building Eficient Query Engines in a High-Level Language. Proc. VLDB Endow. 7, 10 ( 2014 ), 853-864. https://doi.org/10.14778/2732951.2732959 Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Calvin Loncaric, Michael D. Ernst, and Emina Torlak. 2018. Generalized data structure synthesis. In Proceedings of the 40th International Conference on Software Engineering, ICSE 2018, Gothenburg, Sweden, May 27-June 03, 2018, Michel Chaudron, Ivica Crnkovic, Marsha Chechik, and Mark Harman (Eds.). ACM, 958-968. https://doi.org/10.1145/3180155.3180211 Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Calvin Loncaric, Emina Torlak, and Michael D. Ernst. 2016. Fast synthesis of fast collections. In Proceedings of the 37th ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI 2016, Santa Barbara, CA, USA, June 13-17, 2016, Chandra Krintz and Emery Berger (Eds.). ACM, 355-368. https://doi.org/10.1145/2908080.2908122 Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Thomas Neumann. 2011. Eficiently Compiling Eficient Query Plans for Modern Hardware. Proc. VLDB Endow. 4, 9 ( 2011 ), 539-550. https://doi.org/10.14778/2002938.2002940 Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Thomas Neumann and Alfons Kemper. 2015. Unnesting Arbitrary Queries. In Datenbanksysteme für Business, Technologie und Web (BTW), 16. Fachtagung des GI-Fachbereichs "Datenbanken und Informationssysteme" (DBIS), 4.-6.3.2015 in Hamburg, Germany. Proceedings (LNI), Thomas Seidl, Norbert Ritter, Harald Schöning, Kai-Uwe Sattler, Theo Härder, Stefen Friedrich, and Wolfram Wingerath (Eds.), Vol. P-241. GI, 383-402. https://dl.gi. de/20.500.12116/2418Google ScholarGoogle Scholar
  28. Rachel Pottinger and Alon Y. Levy. 2000. A Scalable Algorithm for Answering Queries Using Views. In VLDB 2000, Proceedings of 26th International Conference on Very Large Data Bases, September 10-14, 2000, Cairo, Egypt, Amr El Abbadi, Michael L. Brodie, Sharma Chakravarthy, Umeshwar Dayal, Nabil Kamel, Gunter Schlageter, and Kyu-Young Whang (Eds.). Morgan Kaufmann, 484-495. http://www.vldb.org/conf/2000/P484.pdfGoogle ScholarGoogle Scholar
  29. Markus Püschel, José M. F. Moura, Jeremy R. Johnson, David A. Padua, Manuela M. Veloso, Bryan Singer, Jianxin Xiong, Franz Franchetti, Aca Gacic, Yevgen Voronenko, Kang Chen, Robert W. Johnson, and Nicholas Rizzolo. 2005. SPIRAL: Code Generation for DSP Transforms. Proc. IEEE 93, 2 ( 2005 ), 232-275. https://doi.org/10.1109/JPROC. 2004.840306 Google ScholarGoogle ScholarCross RefCross Ref
  30. Jonathan Ragan-Kelley, Connelly Barnes, Andrew Adams, Sylvain Paris, Frédo Durand, and Saman P. Amarasinghe. 2013. Halide: a language and compiler for optimizing parallelism, locality, and recomputation in image processing pipelines. ( 2013 ), 519-530. https://doi.org/10.1145/2491956.2462176 Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Tiark Rompf and Nada Amin. 2015. Functional pearl: a SQL to C compiler in 500 lines of code. In Proceedings of the 20th ACM SIGPLAN International Conference on Functional Programming, ICFP 2015, Vancouver, BC, Canada, September 1-3, 2015, Kathleen Fisher and John H. Reppy (Eds.). ACM, 2-9. https://doi.org/10.1145/2784731.2784760 Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Amir Shaikhha, Yannis Klonatos, Lionel Parreaux, Lewis Brown, Mohammad Dashti, and Christoph Koch. 2016. How to Architect a Query Compiler. In Proceedings of the 2016 International Conference on Management of Data, SIGMOD Conference 2016, San Francisco, CA, USA, June 26-July 01, 2016, Fatma Özcan, Georgia Koutrika, and Sam Madden (Eds.). ACM, 1907-1922. https://doi.org/10.1145/2882903.2915244 Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Michael Stonebraker. 1974. The choice of partial inversions and combined indices. International Journal of Parallel Programming 3, 2 ( 1974 ), 167-188. https://doi.org/10.1007/BF00976642 Google ScholarGoogle ScholarCross RefCross Ref
  34. Michael Stonebraker. 2012. SciDB: An Open-Source DBMS for Scientific Data. ERCIM News 2012, 89 ( 2012 ). http://ercimnews.ercim.eu/en89/special/scidb-an-open-source-dbms-for-scientific-dataGoogle ScholarGoogle Scholar
  35. Michael Stonebraker, Daniel J. Abadi, Adam Batkin, Xuedong Chen, Mitch Cherniack, Miguel Ferreira, Edmond Lau, Amerson Lin, Samuel Madden, Elizabeth J. O'Neil, Patrick E. O'Neil, Alex Rasin, Nga Tran, and Stanley B. Zdonik. 2005. C-Store: A Column-oriented DBMS. In Proceedings of the 31st International Conference on Very Large Data Bases, Trondheim, Norway, August 30-September 2, 2005, Klemens Böhm, Christian S. Jensen, Laura M. Haas, Martin L. Kersten, Per-Åke Larson, and Beng Chin Ooi (Eds.). ACM, 553-564. http://www.vldb.org/archives/website/2005/program/paper/thu/p553-stonebraker.pdfGoogle ScholarGoogle ScholarDigital LibraryDigital Library
  36. Arvind K. Sujeeth, Kevin J. Brown, HyoukJoong Lee, Tiark Rompf, Hassan Chafi, Martin Odersky, and Kunle Olukotun. 2014. Delite: A Compiler Architecture for Performance-Oriented Embedded Domain-Specific Languages. ACM Trans. Embed. Comput. Syst. 13, 4s ( 2014 ), 1-25. https://doi.org/10.1145/2584665 Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Ruby Y. Tahboub, Grégory M. Essertel, and Tiark Rompf. 2018. How to Architect a Query Compiler, Revisited. In Proceedings of the 2018 International Conference on Management of Data, SIGMOD Conference 2018, Houston, TX, USA, June 10-15, 2018, Gautam Das, Christopher M. Jermaine, and Philip A. Bernstein (Eds.). ACM, 307-322. https://doi.org/10.1145/3183713. 3196893 Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Zohreh Asgharzadeh Talebi, Rada Chirkova, Yahya Fathi, and Matthias F. Stallmann. 2008. Exact and inexact methods for selecting views and indexes for OLAP performance improvement. In EDBT 2008, 11th International Conference on Extending Database Technology, Nantes, France, March 25-29, 2008, Proceedings (ACM International Conference Proceeding Series), Alfons Kemper, Patrick Valduriez, Noureddine Mouaddib, Jens Teubner, Mokrane Bouzeghoub, Volker Markl, Laurent Amsaleg, and Ioana Manolescu (Eds.), Vol. 261. ACM, 311-322. https://doi.org/10.1145/1353343.1353383 Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Eelco Visser. 2005. A survey of strategies in rule-based program transformation systems. J. Symb. Comput. 40, 1 ( 2005 ), 831-873. https://doi.org/10.1016/j.jsc. 2004. 12.011 Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Cong Yan and Alvin Cheung. 2019. Generating Application-specific Data Layouts for In-memory Databases. Proc. VLDB Endow. 12, 11 ( 2019 ), 1513-1525. https://doi.org/10.14778/3342263.3342630 Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Kuat Yessenov, Ivan Kuraj, and Armando Solar-Lezama. 2017. DemoMatch: API discovery from demonstrations. In Proceedings of the 38th ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI 2017, Barcelona, Spain, June 18-23, 2017, Albert Cohen and Martin T. Vechev (Eds.). ACM, 64-78. https://doi.org/10.1145/3062341.3062386 Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Deductive optimization of relational data storage

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in

          Full Access

          • Published in

            cover image Proceedings of the ACM on Programming Languages
            Proceedings of the ACM on Programming Languages  Volume 4, Issue OOPSLA
            November 2020
            3108 pages
            EISSN:2475-1421
            DOI:10.1145/3436718
            Issue’s Table of Contents

            Copyright © 2020 Owner/Author

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            • Published: 13 November 2020
            Published in pacmpl Volume 4, Issue OOPSLA

            Permissions

            Request permissions about this article.

            Request Permissions

            Check for updates

            Qualifiers

            • research-article

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader
          About Cookies On This Site

          We use cookies to ensure that we give you the best experience on our website.

          Learn more

          Got it!