Abstract
Relational database management systems (RDBMS) are operationally similar to a dynamic language processor. They take SQL queries as input, dynamically generate an optimized execution plan, and then execute it. In recent decades, the emergence of in-memory databases with columnar storage, which use array-like storage structures, has shifted the focus on optimizations from the traditional I/O bottleneck to CPU and memory. However, database research so far has primarily focused on CPU cache optimizations. The similarity in the computational characteristics of such database workloads and array programming language optimizations are largely unexplored. We believe that these database implementations can benefit from merging database optimizations with dynamic array-based programming language approaches. Therefore, in this paper, we propose a novel approach to optimize database query execution using a new array-based intermediate representation, HorseIR, that resides between database queries and compiled code. Furthermore, we provide a translator to generate HorseIR from database execution plans and a compiler that optimizes HorseIR and generates efficient code. We compare HorseIR with the MonetDB RDBMS, by testing standard SQL queries, and show how our approach and compiler optimizations improve the runtime of complex queries.
- Daniel J Abadi, Samuel R Madden, and Nabil Hachem. 2008. Column-Stores vs. Row-Stores: How Different Are They Really?. In Special Interest Group on Management of Data (SIGMOD). ACM, 967-980. Google Scholar
Digital Library
- Yanif Ahmad and Christoph Koch. 2009. DBToaster: A SQL Compiler for High-Performance Delta Processing in Main-Memory Databases. The Proceedings of the VLDB Endowment (PVLDB) 2 (2009), 1566-1569. Google Scholar
Digital Library
- Anastassia Ailamaki, David J DeWitt, Mark D Hill, and David A Wood. 1999. DBMSs On A Modern Processor: Where Does Time Go?. In Conference on Very Large Data Bases (VLDB). 266-277. Google Scholar
Digital Library
- Peter A. Boncz, Marcin Zukowski, and Niels Nes. 2005. MonetDB/X100: Hyper-Pipelining Query Execution. In Conference on Innovative Data Systems Research (CIDR). 225-237.Google Scholar
- Qiang Cao, Pedro Trancoso, J-L Larriba-Pey, Josep Torrellas, Robert Knighten, and Youjip Won. 1999. Detailed Characterization of a Quad Pentium Pro Server Running TPC-D. In International Conference on Computer Design (ICCD). IEEE, 108-115. Google Scholar
Digital Library
- Stefano Ceri and Georg Gottlob. 1985. Translating SQL into Relational Algebra: Optimization, Semantics, and Equivalence of SQL Queries. Transactions on Software Engineering 4 (1985), 324-345. Google Scholar
Digital Library
- Donald D Chamberlin and Raymond F Boyce. 1974. SEQUEL: A Structured English Query Language. In Proc. ACM SIGFIDET (now SIGMOD) Workshop on Data Description, Access and Control. ACM, 249-264. Google Scholar
Digital Library
- Hanfeng Chen and Wai-Mee Ching. 2013. ELI: A Simple System for Array Programming. Vector, the Journal of the British APL Association 26, 1 (2013), 94-103.Google Scholar
- Hanfeng Chen, Alexander Krolik, Erick Lavoie, and Laurie J. Hendren. 2016. Automatic Vectorization for MATLAB. In Workshop on Languages and Compilers for Parallel Computing (LCPC). 171-187.Google Scholar
- Wai-Mee Ching and Da Zheng. 2012. Automatic Parallelization of Array-oriented Programs for a Multi-core Machine. International Journal of Parallel Programming 40, 5 (2012), 514-531.Google Scholar
Cross Ref
- Edgar F Codd. 1970. A Relational Model of Data for Large Shared Data Banks. Commun. ACM 13, 6 (1970), 377-387. Google Scholar
Digital Library
- Vincent Foley-Bourgon and Laurie J. Hendren. 2016. Efficiently Implementing the Copy Semantics of MATLAB's Arrays in JavaScript. In Dynamic Languages Symposium (DLS). 72-83. Google Scholar
Digital Library
- Stavros Harizopoulos, Daniel J Abadi, Samuel Madden, and Michael Stonebraker. 2008. OLTP through the Looking Glass, and What We Found There. In Special Interest Group on Management of Data (SIGMOD). ACM, 981-992. Google Scholar
Digital Library
- Stratos Idreos, Fabian Groffen, Niels Nes, Stefan Manegold, Sjoerd Mullender, and Martin Kersten. 2012. MonetDB: Two Decades of Research in Column-oriented Database Architectures. IEEE Data Eng. Bull. 35, 1 (2012), 40-45.Google Scholar
- Yannis E Ioannidis. 1996. Query Optimization. Comput. Surveys 28, 1 (1996), 121-123. Google Scholar
Digital Library
- ISO/IEC 9075-1:2016 2016. Information technology - Database languages - SQL - Part 1: Framework (SQL/Framework). Standard. International Organization for Standardization.Google Scholar
- Matthias Jarke and Jurgen Koch. 1984. Query Optimization in Database Systems. Comput. Surveys 16, 2 (1984), 111-152. Google Scholar
Digital Library
- Michael A. Jenkins. 1989. Q'Nial; A Portable Interpreter for the Nested Interactive Array Language, Nial. Software: Practice and Experience 19, 2 (1989), 111-126. Google Scholar
Digital Library
- Alfons Kemper and Thomas Neumann. 2011. HyPer: A Hybrid OLTP & OLAP Main Memory Database System Based on Virtual Memory Snapshots. In International Conference on Data Engineering (ICDE). 195-206. Google Scholar
Digital Library
- Ken Kennedy. 2001. Fast Greedy Weighted Fusion. International Journal of Parallel Programming 29, 5 (2001), 463-491. Google Scholar
Digital Library
- Christoph Koch. 2014. Abstraction Without Regret in Database Systems Building: a Manifesto. IEEE Data Eng. Bull. 37, 1 (2014), 70-79.Google Scholar
- Vineet Kumar and Laurie J. Hendren. 2014. MIX10: compiling MATLAB to X10 for high performance. In Conference on Object-oriented Programming, Systems, and Applications. 617-636. Google Scholar
Digital Library
- kx. 2018. KDB+/Q Database System. Retrieved June 2018 from https://kx.com/.Google Scholar
- Vijay Menon and Keshav Pingali. 1999. A Case for Source-level Transformations in MATLAB. In Conference on Domain-specific Languages (DSL). ACM, 53-65. Google Scholar
Digital Library
- MonetDB. 2018. MonetDB Optimizer Pipelines. Retrieved June 2018 from https://www.monetdb.org/Documentation/Cookbooks/SQLrecipes/OptimizerPipelines.Google Scholar
- Thomas Neumann. 2011. Efficiently Compiling Efficient Query Plans for Modern Hardware. The Proceedings of the VLDB Endowment (PVLDB) 4, 9 (2011), 539-550. Google Scholar
Digital Library
- Sriram Padmanabhan, Timothy Malkemus, Anant Jhingran, and Ramesh Agarwal. 2001. Block Oriented Processing of Relational Database Operations in Modern Computer Architectures. In International Conference on Data Engineering (ICDE). IEEE, 567-574. Google Scholar
Digital Library
- David Lorge Parnas. 1972. On the Criteria to Be Used in Decomposing Systems into Modules. Commun. ACM 15, 12 (1972), 1053-1058. Google Scholar
Digital Library
- Mark Raasveldt and Hannes Mühleisen. 2016. Vectorized UDFs in Column-Stores. In Proceedings of the 28th International Conference on Scientific and Statistical Database Management, SSDBM 2016, Budapest, Hungary, July 18-20, 2016. 16:1-16:12. Google Scholar
Digital Library
- Amir Shaikhha, Yannis Klonatos, Lionel Parreaux, Lewis Brown, Mohammad Dashti, and Christoph Koch. 2016. How to Architect a Query Compiler. In Special Interest Group on Management of Data (SIGMOD). 1907-1922. Google Scholar
Digital Library
- John Miles Smith and Philip Yen-Tang Chang. 1975. Optimizing the Performance of a Relational Algebra Database Interface. Commun. ACM 18, 10 (1975), 568-579. Google Scholar
Digital Library
- Transaction Processing Performance Council. 2017. TPC Benchmark H.Google Scholar
- Patrick Valduriez. 1987. Join Indices. ACM Transactions on Database Systems (TODS) 12, 2 (1987), 218-246. Google Scholar
Digital Library
- Hao Zhang, Gang Chen, Beng Chin Ooi, Kian-Lee Tan, and Meihui Zhang. 2015. In-Memory Big Data Management and Processing: A Survey. IEEE Trans. on Knowl. and Data Eng. 27, 7 (2015), 1920-1948.Google Scholar
Cross Ref
- Ying Zhang, Martin L. Kersten, and Stefan Manegold. 2013. SciQL: Array Data Processing Inside an RDBMS. In Special Interest Group on Management of Data (SIGMOD). 1049-1052. Google Scholar
Digital Library
- Jingren Zhou and Kenneth A Ross. 2004. Buffering Databse Operations for Enhanced Instruction Cache Performance. In Special Interest Group on Management of Data (SIGMOD). 191-202. Google Scholar
Digital Library
Index Terms
HorseIR: bringing array programming languages together with database query processing
Recommendations
HorseIR: bringing array programming languages together with database query processing
DLS 2018: Proceedings of the 14th ACM SIGPLAN International Symposium on Dynamic LanguagesRelational database management systems (RDBMS) are operationally similar to a dynamic language processor. They take SQL queries as input, dynamically generate an optimized execution plan, and then execute it. In recent decades, the emergence of in-...
Improving database query performance with automatic fusion
CC 2020: Proceedings of the 29th International Conference on Compiler ConstructionArray-based programming languages have shown significant promise for improving performance of column-based in-memory database systems, allowing elegant representation of query execution plans that are also amenable to standard compiler optimization ...







Comments