Abstract
Optimizing the physical data storage and retrieval of data are two key database management problems. In this paper, we propose a language that can express both a relational query and the layout of its data. Our language can express a wide range of physical database layouts, going well beyond the row- and column-based methods that are widely used in database management systems. We use deductive program synthesis to turn a high-level relational representation of a database query into a highly optimized low-level implementation which operates on a specialized layout of the dataset. We build an optimizing compiler for this language and conduct experiments using a popular database benchmark, which shows that the performance of our specialized queries is better than a state-of-the-art in memory compiled database system while achieving an order-of-magnitude reduction in memory use.
Supplemental Material
- Anastassia Ailamaki, David J. DeWitt, Mark D. Hill, and Marios Skounakis. 2001. Weaving Relations for Cache Performance. In VLDB 2001, Proceedings of 27th International Conference on Very Large Data Bases, September 11-14, 2001, Roma, Italy, Peter M. G. Apers, Paolo Atzeni, Stefano Ceri, Stefano Paraboschi, Kotagiri Ramamohanarao, and Richard T. Snodgrass (Eds.). Morgan Kaufmann, 169-180. http://www.vldb.org/conf/2001/P169.pdfGoogle Scholar
- Lee Blaine, Limei Gilham, Junbo Liu, Douglas R. Smith, and Stephen J. Westfold. 1998. Planware-Domain-Specific Synthesis of High-Performance Schedulers. In The Thirteenth IEEE Conference on Automated Software Engineering, ASE 1998, Honolulu, Hawaii, USA, October 13-16, 1998. IEEE Computer Society, 270. https://doi.org/10.1109/ASE. 1998.732672 Google Scholar
Cross Ref
- Peter A. Boncz and Martin L. Kersten. 1999. MIL Primitives for Querying a Fragmented World. VLDB J. 8, 2 ( 1999 ), 101-119. https://doi.org/10.1007/s007780050076 Google Scholar
Digital Library
- Fabiano C. Botelho, Rasmus Pagh, and Nivio Ziviani. 2007. Simple and Space-Eficient Minimal Perfect Hash Functions. In Algorithms and Data Structures, 10th International Workshop, WADS 2007, Halifax, Canada, August 15-17, 2007, Proceedings (Lecture Notes in Computer Science), Frank K. H. A. Dehne, Jörg-Rüdiger Sack, and Norbert Zeh (Eds.), Vol. 4619. Springer, 139-150. https://doi.org/10.1007/978-3-540-73951-7_13 Google Scholar
Cross Ref
- Nicolas Bruno and Surajit Chaudhuri. 2005. Automatic Physical Database Tuning: A Relaxation-based Approach. In Proceedings of the ACM SIGMOD International Conference on Management of Data, Baltimore, Maryland, USA, June 14-16, 2005, Fatma Özcan (Ed.). ACM, 227-238. https://doi.org/10.1145/1066157.1066184 Google Scholar
Digital Library
- Surajit Chaudhuri. 1998. An Overview of Query Optimization in Relational Systems. In Proceedings of the Seventeenth ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, June 1-3, 1998, Seattle, Washington, USA, Alberto O. Mendelzon and Jan Paredaens (Eds.). ACM Press, 34-43. https://doi.org/10.1145/275487.275492 Google Scholar
Digital Library
- Alvin Cheung, Armando Solar-Lezama, and Samuel Madden. 2013. Optimizing database-backed applications with query synthesis. In ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI '13, Seattle, WA, USA, June 16-19, 2013, Hans-Juergen Boehm and Cormac Flanagan (Eds.). ACM, 3-14. https://doi.org/10.1145/2491956.2462180 Google Scholar
Digital Library
- Rada Chirkova and Michael R. Genesereth. 2000. Linearly Bounded Reformulations of Conjunctive Databases. In Computational Logic-CL 2000, First International Conference, London, UK, 24-28 July, 2000, Proceedings (Lecture Notes in Computer Science), John W. Lloyd, Verónica Dahl, Ulrich Furbach, Manfred Kerber, Kung-Kiu Lau, Catuscia Palamidessi, Luís Moniz Pereira, Yehoshua Sagiv, and Peter J. Stuckey (Eds.), Vol. 1861. Springer, 987-1001. https://doi.org/10.1007/3-540-44957-4_66 Google Scholar
Cross Ref
- E. F. Codd. 1970. A Relational Model of Data for Large Shared Data Banks. Commun. ACM 13, 6 ( 1970 ), 377-387. https: //doi.org/10.1145/362384.362685 Google Scholar
Digital Library
- E. F. Codd. 1971. A Database Sublanguage Founded on the Relational Calculus. In Proceedings of 1971 ACM-SIGFIDET Workshop on Data Description, Access and Control, San Diego, California, USA, November 11-12, 1971, E. F. Codd and A. L. Dean (Eds.). ACM, 35-68.Google Scholar
Digital Library
- Transaction Processing Performance Council. 2008. TPC-H Benchmark Specification. 21 ( 2008 ), 592-603.Google Scholar
- Philippe Cudré-Mauroux, Eugene Wu, and Samuel Madden. 2009. The Case for RodentStore: An Adaptive, Declarative Storage System. In CIDR 2009, Fourth Biennial Conference on Innovative Data Systems Research, Asilomar, CA, USA, January 4-7, 2009, Online Proceedings. www.cidrdb.org. http://www-db.cs.wisc.edu/cidr/cidr2009/Paper_97.pdfGoogle Scholar
- Davi de Castro Reis, Djamel Belazzougui, Fabiano Cupertino Botelho, and Nivio Ziviani. 2011. CMPH: C Minimal Perfect Hashing Library. http://cmph.sourceforge.netGoogle Scholar
- Benjamin Delaware, Clément Pit-Claudel, Jason Gross, and Adam Chlipala. 2015. Fiat: Deductive Synthesis of Abstract Data Types in a Proof Assistant. In Proceedings of the 42nd Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, POPL 2015, Mumbai, India, January 15-17, 2015, Sriram K. Rajamani and David Walker (Eds.). ACM, 689-700. https://doi.org/10.1145/2676726.2677006 Google Scholar
Digital Library
- Matthias Felleisen and Robert Hieb. 1992. The Revised Report on the Syntactic Theories of Sequential Control and State. Theor. Comput. Sci. 103, 2 ( 1992 ), 235-271. https://doi.org/10.1016/ 0304-3975 ( 92 ) 90014-7 Google Scholar
Digital Library
- Goetz Graefe. 1994. Volcano-An Extensible and Parallel Query Evaluation System. IEEE Trans. Knowl. Data Eng. 6, 1 ( 1994 ), 120-135. https://doi.org/10.1109/69.273032 Google Scholar
Digital Library
- Himanshu Gupta, Venky Harinarayan, Anand Rajaraman, and Jefrey D. Ullman. 1997. Index Selection for OLAP. In Proceedings of the Thirteenth International Conference on Data Engineering, April 7-11, 1997, Birmingham, UK, W. A. Gray and Per-Åke Larson (Eds.). IEEE Computer Society, 208-219. https://doi.org/10.1109/ICDE. 1997.581755 Google Scholar
Cross Ref
- Angélica García Gutiérrez and Peter Baumann. 2007. Modeling Fundamental Geo-Raster Operations with Array Algebra. In Workshops Proceedings of the 7th IEEE International Conference on Data Mining (ICDM 2007 ), October 28-31, 2007, Omaha, Nebraska, USA. 607-612. https://doi.org/10.1109/ICDMW. 2007.53 Google Scholar
Digital Library
- Alon Y. Halevy. 2001. Answering queries using views: A survey. VLDB J. 10, 4 ( 2001 ), 270-294. https://doi.org/10.1007/ s007780100054 Google Scholar
Digital Library
- Peter Hawkins, Alex Aiken, Kathleen Fisher, Martin C. Rinard, and Mooly Sagiv. 2010. Data Structure Fusion. In Programming Languages and Systems-8th Asian Symposium, APLAS 2010, Shanghai, China, November 28-December 1, 2010. Proceedings (Lecture Notes in Computer Science), Kazunori Ueda (Ed.), Vol. 6461. Springer, 204-221. https://doi.org/10.1007/978-3-642-17164-2_15 Google Scholar
Cross Ref
- Peter Hawkins, Alex Aiken, Kathleen Fisher, Martin C. Rinard, and Mooly Sagiv. 2011. Data representation synthesis. In Proceedings of the 32nd ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI 2011, San Jose, CA, USA, June 4-8, 2011, Mary W. Hall and David A. Padua (Eds.). ACM, 38-49. https://doi.org/10.1145/1993498.1993504 Google Scholar
Digital Library
- Matthias Jarke and Jürgen Koch. 1984. Query Optimization in Database Systems. ACM Comput. Surv. 16, 2 ( 1984 ), 111-152. https://doi.org/10.1145/356924.356928 Google Scholar
Digital Library
- Yannis Klonatos, Christoph Koch, Tiark Rompf, and Hassan Chafi. 2014. Building Eficient Query Engines in a High-Level Language. Proc. VLDB Endow. 7, 10 ( 2014 ), 853-864. https://doi.org/10.14778/2732951.2732959 Google Scholar
Digital Library
- Calvin Loncaric, Michael D. Ernst, and Emina Torlak. 2018. Generalized data structure synthesis. In Proceedings of the 40th International Conference on Software Engineering, ICSE 2018, Gothenburg, Sweden, May 27-June 03, 2018, Michel Chaudron, Ivica Crnkovic, Marsha Chechik, and Mark Harman (Eds.). ACM, 958-968. https://doi.org/10.1145/3180155.3180211 Google Scholar
Digital Library
- Calvin Loncaric, Emina Torlak, and Michael D. Ernst. 2016. Fast synthesis of fast collections. In Proceedings of the 37th ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI 2016, Santa Barbara, CA, USA, June 13-17, 2016, Chandra Krintz and Emery Berger (Eds.). ACM, 355-368. https://doi.org/10.1145/2908080.2908122 Google Scholar
Digital Library
- Thomas Neumann. 2011. Eficiently Compiling Eficient Query Plans for Modern Hardware. Proc. VLDB Endow. 4, 9 ( 2011 ), 539-550. https://doi.org/10.14778/2002938.2002940 Google Scholar
Digital Library
- Thomas Neumann and Alfons Kemper. 2015. Unnesting Arbitrary Queries. In Datenbanksysteme für Business, Technologie und Web (BTW), 16. Fachtagung des GI-Fachbereichs "Datenbanken und Informationssysteme" (DBIS), 4.-6.3.2015 in Hamburg, Germany. Proceedings (LNI), Thomas Seidl, Norbert Ritter, Harald Schöning, Kai-Uwe Sattler, Theo Härder, Stefen Friedrich, and Wolfram Wingerath (Eds.), Vol. P-241. GI, 383-402. https://dl.gi. de/20.500.12116/2418Google Scholar
- Rachel Pottinger and Alon Y. Levy. 2000. A Scalable Algorithm for Answering Queries Using Views. In VLDB 2000, Proceedings of 26th International Conference on Very Large Data Bases, September 10-14, 2000, Cairo, Egypt, Amr El Abbadi, Michael L. Brodie, Sharma Chakravarthy, Umeshwar Dayal, Nabil Kamel, Gunter Schlageter, and Kyu-Young Whang (Eds.). Morgan Kaufmann, 484-495. http://www.vldb.org/conf/2000/P484.pdfGoogle Scholar
- Markus Püschel, José M. F. Moura, Jeremy R. Johnson, David A. Padua, Manuela M. Veloso, Bryan Singer, Jianxin Xiong, Franz Franchetti, Aca Gacic, Yevgen Voronenko, Kang Chen, Robert W. Johnson, and Nicholas Rizzolo. 2005. SPIRAL: Code Generation for DSP Transforms. Proc. IEEE 93, 2 ( 2005 ), 232-275. https://doi.org/10.1109/JPROC. 2004.840306 Google Scholar
Cross Ref
- Jonathan Ragan-Kelley, Connelly Barnes, Andrew Adams, Sylvain Paris, Frédo Durand, and Saman P. Amarasinghe. 2013. Halide: a language and compiler for optimizing parallelism, locality, and recomputation in image processing pipelines. ( 2013 ), 519-530. https://doi.org/10.1145/2491956.2462176 Google Scholar
Digital Library
- Tiark Rompf and Nada Amin. 2015. Functional pearl: a SQL to C compiler in 500 lines of code. In Proceedings of the 20th ACM SIGPLAN International Conference on Functional Programming, ICFP 2015, Vancouver, BC, Canada, September 1-3, 2015, Kathleen Fisher and John H. Reppy (Eds.). ACM, 2-9. https://doi.org/10.1145/2784731.2784760 Google Scholar
Digital Library
- Amir Shaikhha, Yannis Klonatos, Lionel Parreaux, Lewis Brown, Mohammad Dashti, and Christoph Koch. 2016. How to Architect a Query Compiler. In Proceedings of the 2016 International Conference on Management of Data, SIGMOD Conference 2016, San Francisco, CA, USA, June 26-July 01, 2016, Fatma Özcan, Georgia Koutrika, and Sam Madden (Eds.). ACM, 1907-1922. https://doi.org/10.1145/2882903.2915244 Google Scholar
Digital Library
- Michael Stonebraker. 1974. The choice of partial inversions and combined indices. International Journal of Parallel Programming 3, 2 ( 1974 ), 167-188. https://doi.org/10.1007/BF00976642 Google Scholar
Cross Ref
- Michael Stonebraker. 2012. SciDB: An Open-Source DBMS for Scientific Data. ERCIM News 2012, 89 ( 2012 ). http://ercimnews.ercim.eu/en89/special/scidb-an-open-source-dbms-for-scientific-dataGoogle Scholar
- Michael Stonebraker, Daniel J. Abadi, Adam Batkin, Xuedong Chen, Mitch Cherniack, Miguel Ferreira, Edmond Lau, Amerson Lin, Samuel Madden, Elizabeth J. O'Neil, Patrick E. O'Neil, Alex Rasin, Nga Tran, and Stanley B. Zdonik. 2005. C-Store: A Column-oriented DBMS. In Proceedings of the 31st International Conference on Very Large Data Bases, Trondheim, Norway, August 30-September 2, 2005, Klemens Böhm, Christian S. Jensen, Laura M. Haas, Martin L. Kersten, Per-Åke Larson, and Beng Chin Ooi (Eds.). ACM, 553-564. http://www.vldb.org/archives/website/2005/program/paper/thu/p553-stonebraker.pdfGoogle Scholar
Digital Library
- Arvind K. Sujeeth, Kevin J. Brown, HyoukJoong Lee, Tiark Rompf, Hassan Chafi, Martin Odersky, and Kunle Olukotun. 2014. Delite: A Compiler Architecture for Performance-Oriented Embedded Domain-Specific Languages. ACM Trans. Embed. Comput. Syst. 13, 4s ( 2014 ), 1-25. https://doi.org/10.1145/2584665 Google Scholar
Digital Library
- Ruby Y. Tahboub, Grégory M. Essertel, and Tiark Rompf. 2018. How to Architect a Query Compiler, Revisited. In Proceedings of the 2018 International Conference on Management of Data, SIGMOD Conference 2018, Houston, TX, USA, June 10-15, 2018, Gautam Das, Christopher M. Jermaine, and Philip A. Bernstein (Eds.). ACM, 307-322. https://doi.org/10.1145/3183713. 3196893 Google Scholar
Digital Library
- Zohreh Asgharzadeh Talebi, Rada Chirkova, Yahya Fathi, and Matthias F. Stallmann. 2008. Exact and inexact methods for selecting views and indexes for OLAP performance improvement. In EDBT 2008, 11th International Conference on Extending Database Technology, Nantes, France, March 25-29, 2008, Proceedings (ACM International Conference Proceeding Series), Alfons Kemper, Patrick Valduriez, Noureddine Mouaddib, Jens Teubner, Mokrane Bouzeghoub, Volker Markl, Laurent Amsaleg, and Ioana Manolescu (Eds.), Vol. 261. ACM, 311-322. https://doi.org/10.1145/1353343.1353383 Google Scholar
Digital Library
- Eelco Visser. 2005. A survey of strategies in rule-based program transformation systems. J. Symb. Comput. 40, 1 ( 2005 ), 831-873. https://doi.org/10.1016/j.jsc. 2004. 12.011 Google Scholar
Digital Library
- Cong Yan and Alvin Cheung. 2019. Generating Application-specific Data Layouts for In-memory Databases. Proc. VLDB Endow. 12, 11 ( 2019 ), 1513-1525. https://doi.org/10.14778/3342263.3342630 Google Scholar
Digital Library
- Kuat Yessenov, Ivan Kuraj, and Armando Solar-Lezama. 2017. DemoMatch: API discovery from demonstrations. In Proceedings of the 38th ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI 2017, Barcelona, Spain, June 18-23, 2017, Albert Cohen and Martin T. Vechev (Eds.). ACM, 64-78. https://doi.org/10.1145/3062341.3062386 Google Scholar
Digital Library
Index Terms
Deductive optimization of relational data storage
Recommendations
Statistical relational tables for statistical database management
This paper extends Codd's relational view to represent statistical data and to achieve the efficient analysis of statistical data. It discusses why the relational calculus has not been popular with statisticians. A new view called a statistical ...
Automated probabilistic modeling for relational data
CIKM '13: Proceedings of the 22nd ACM international conference on Information & Knowledge ManagementProbabilistic graphical model representations of relational data provide a number of desired features, such as inference of missing values, detection of errors, visualization of data, and probabilistic answers to relational queries. However, adoption ...






Comments