skip to main content
article

HorseIR: bringing array programming languages together with database query processing

Published:06 April 2020Publication History
Skip Abstract Section

Abstract

Relational database management systems (RDBMS) are operationally similar to a dynamic language processor. They take SQL queries as input, dynamically generate an optimized execution plan, and then execute it. In recent decades, the emergence of in-memory databases with columnar storage, which use array-like storage structures, has shifted the focus on optimizations from the traditional I/O bottleneck to CPU and memory. However, database research so far has primarily focused on CPU cache optimizations. The similarity in the computational characteristics of such database workloads and array programming language optimizations are largely unexplored. We believe that these database implementations can benefit from merging database optimizations with dynamic array-based programming language approaches. Therefore, in this paper, we propose a novel approach to optimize database query execution using a new array-based intermediate representation, HorseIR, that resides between database queries and compiled code. Furthermore, we provide a translator to generate HorseIR from database execution plans and a compiler that optimizes HorseIR and generates efficient code. We compare HorseIR with the MonetDB RDBMS, by testing standard SQL queries, and show how our approach and compiler optimizations improve the runtime of complex queries.

References

  1. Daniel J Abadi, Samuel R Madden, and Nabil Hachem. 2008. Column-Stores vs. Row-Stores: How Different Are They Really?. In Special Interest Group on Management of Data (SIGMOD). ACM, 967-980. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Yanif Ahmad and Christoph Koch. 2009. DBToaster: A SQL Compiler for High-Performance Delta Processing in Main-Memory Databases. The Proceedings of the VLDB Endowment (PVLDB) 2 (2009), 1566-1569. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Anastassia Ailamaki, David J DeWitt, Mark D Hill, and David A Wood. 1999. DBMSs On A Modern Processor: Where Does Time Go?. In Conference on Very Large Data Bases (VLDB). 266-277. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Peter A. Boncz, Marcin Zukowski, and Niels Nes. 2005. MonetDB/X100: Hyper-Pipelining Query Execution. In Conference on Innovative Data Systems Research (CIDR). 225-237.Google ScholarGoogle Scholar
  5. Qiang Cao, Pedro Trancoso, J-L Larriba-Pey, Josep Torrellas, Robert Knighten, and Youjip Won. 1999. Detailed Characterization of a Quad Pentium Pro Server Running TPC-D. In International Conference on Computer Design (ICCD). IEEE, 108-115. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Stefano Ceri and Georg Gottlob. 1985. Translating SQL into Relational Algebra: Optimization, Semantics, and Equivalence of SQL Queries. Transactions on Software Engineering 4 (1985), 324-345. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Donald D Chamberlin and Raymond F Boyce. 1974. SEQUEL: A Structured English Query Language. In Proc. ACM SIGFIDET (now SIGMOD) Workshop on Data Description, Access and Control. ACM, 249-264. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Hanfeng Chen and Wai-Mee Ching. 2013. ELI: A Simple System for Array Programming. Vector, the Journal of the British APL Association 26, 1 (2013), 94-103.Google ScholarGoogle Scholar
  9. Hanfeng Chen, Alexander Krolik, Erick Lavoie, and Laurie J. Hendren. 2016. Automatic Vectorization for MATLAB. In Workshop on Languages and Compilers for Parallel Computing (LCPC). 171-187.Google ScholarGoogle Scholar
  10. Wai-Mee Ching and Da Zheng. 2012. Automatic Parallelization of Array-oriented Programs for a Multi-core Machine. International Journal of Parallel Programming 40, 5 (2012), 514-531.Google ScholarGoogle ScholarCross RefCross Ref
  11. Edgar F Codd. 1970. A Relational Model of Data for Large Shared Data Banks. Commun. ACM 13, 6 (1970), 377-387. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Vincent Foley-Bourgon and Laurie J. Hendren. 2016. Efficiently Implementing the Copy Semantics of MATLAB's Arrays in JavaScript. In Dynamic Languages Symposium (DLS). 72-83. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Stavros Harizopoulos, Daniel J Abadi, Samuel Madden, and Michael Stonebraker. 2008. OLTP through the Looking Glass, and What We Found There. In Special Interest Group on Management of Data (SIGMOD). ACM, 981-992. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Stratos Idreos, Fabian Groffen, Niels Nes, Stefan Manegold, Sjoerd Mullender, and Martin Kersten. 2012. MonetDB: Two Decades of Research in Column-oriented Database Architectures. IEEE Data Eng. Bull. 35, 1 (2012), 40-45.Google ScholarGoogle Scholar
  15. Yannis E Ioannidis. 1996. Query Optimization. Comput. Surveys 28, 1 (1996), 121-123. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. ISO/IEC 9075-1:2016 2016. Information technology - Database languages - SQL - Part 1: Framework (SQL/Framework). Standard. International Organization for Standardization.Google ScholarGoogle Scholar
  17. Matthias Jarke and Jurgen Koch. 1984. Query Optimization in Database Systems. Comput. Surveys 16, 2 (1984), 111-152. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Michael A. Jenkins. 1989. Q'Nial; A Portable Interpreter for the Nested Interactive Array Language, Nial. Software: Practice and Experience 19, 2 (1989), 111-126. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Alfons Kemper and Thomas Neumann. 2011. HyPer: A Hybrid OLTP & OLAP Main Memory Database System Based on Virtual Memory Snapshots. In International Conference on Data Engineering (ICDE). 195-206. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Ken Kennedy. 2001. Fast Greedy Weighted Fusion. International Journal of Parallel Programming 29, 5 (2001), 463-491. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Christoph Koch. 2014. Abstraction Without Regret in Database Systems Building: a Manifesto. IEEE Data Eng. Bull. 37, 1 (2014), 70-79.Google ScholarGoogle Scholar
  22. Vineet Kumar and Laurie J. Hendren. 2014. MIX10: compiling MATLAB to X10 for high performance. In Conference on Object-oriented Programming, Systems, and Applications. 617-636. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. kx. 2018. KDB+/Q Database System. Retrieved June 2018 from https://kx.com/.Google ScholarGoogle Scholar
  24. Vijay Menon and Keshav Pingali. 1999. A Case for Source-level Transformations in MATLAB. In Conference on Domain-specific Languages (DSL). ACM, 53-65. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. MonetDB. 2018. MonetDB Optimizer Pipelines. Retrieved June 2018 from https://www.monetdb.org/Documentation/Cookbooks/SQLrecipes/OptimizerPipelines.Google ScholarGoogle Scholar
  26. Thomas Neumann. 2011. Efficiently Compiling Efficient Query Plans for Modern Hardware. The Proceedings of the VLDB Endowment (PVLDB) 4, 9 (2011), 539-550. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Sriram Padmanabhan, Timothy Malkemus, Anant Jhingran, and Ramesh Agarwal. 2001. Block Oriented Processing of Relational Database Operations in Modern Computer Architectures. In International Conference on Data Engineering (ICDE). IEEE, 567-574. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. David Lorge Parnas. 1972. On the Criteria to Be Used in Decomposing Systems into Modules. Commun. ACM 15, 12 (1972), 1053-1058. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Mark Raasveldt and Hannes Mühleisen. 2016. Vectorized UDFs in Column-Stores. In Proceedings of the 28th International Conference on Scientific and Statistical Database Management, SSDBM 2016, Budapest, Hungary, July 18-20, 2016. 16:1-16:12. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Amir Shaikhha, Yannis Klonatos, Lionel Parreaux, Lewis Brown, Mohammad Dashti, and Christoph Koch. 2016. How to Architect a Query Compiler. In Special Interest Group on Management of Data (SIGMOD). 1907-1922. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. John Miles Smith and Philip Yen-Tang Chang. 1975. Optimizing the Performance of a Relational Algebra Database Interface. Commun. ACM 18, 10 (1975), 568-579. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Transaction Processing Performance Council. 2017. TPC Benchmark H.Google ScholarGoogle Scholar
  33. Patrick Valduriez. 1987. Join Indices. ACM Transactions on Database Systems (TODS) 12, 2 (1987), 218-246. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Hao Zhang, Gang Chen, Beng Chin Ooi, Kian-Lee Tan, and Meihui Zhang. 2015. In-Memory Big Data Management and Processing: A Survey. IEEE Trans. on Knowl. and Data Eng. 27, 7 (2015), 1920-1948.Google ScholarGoogle ScholarCross RefCross Ref
  35. Ying Zhang, Martin L. Kersten, and Stefan Manegold. 2013. SciQL: Array Data Processing Inside an RDBMS. In Special Interest Group on Management of Data (SIGMOD). 1049-1052. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Jingren Zhou and Kenneth A Ross. 2004. Buffering Databse Operations for Enhanced Instruction Cache Performance. In Special Interest Group on Management of Data (SIGMOD). 191-202. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. HorseIR: bringing array programming languages together with database query processing

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Published in

        cover image ACM SIGPLAN Notices
        ACM SIGPLAN Notices  Volume 53, Issue 8
        DLS '18
        August 2018
        100 pages
        ISSN:0362-1340
        EISSN:1558-1160
        DOI:10.1145/3393673
        Issue’s Table of Contents
        • cover image ACM Conferences
          DLS 2018: Proceedings of the 14th ACM SIGPLAN International Symposium on Dynamic Languages
          October 2018
          100 pages
          ISBN:9781450360302
          DOI:10.1145/3276945

        Copyright © 2018 ACM

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 6 April 2020

        Check for updates

        Qualifiers

        • article

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader
      About Cookies On This Site

      We use cookies to ensure that we give you the best experience on our website.

      Learn more

      Got it!