skip to main content
research-article

A Comprehensive Empirical Study of Query Performance Across GPU DBMSes

Authors Info & Claims
Published:28 February 2022Publication History
Skip Abstract Section

Abstract

In recent years, GPU database management systems (DBMSes) have rapidly become popular largely due to their remarkable acceleration capability obtained through extreme parallelism in query evaluations. However, there has been relatively little study on the characteristics of these GPU DBMSes for a better understanding of their query performance in various contexts. Also, little has been known about what the potential factors could be that affect the query processing jobs within the GPU DBMSes. To fill this gap, we have conducted a study to identify such factors and to propose a structural causal model, including key factors and their relationships, to explicate the variances of the query execution times on the GPU DBMSes. We have also established a set of hypotheses drawn from the model that explained the performance characteristics. To test the model, we have designed and run comprehensive experiments and conducted in-depth statistical analyses on the obtained empirical data. As a result, our model achieves about 77% amount of variance explained on the query time and indicates that reducing kernel time and data transfer time are the key factors to improve the query time. Also, our results show that the studied systems should resolve several concerns such as bounded processing within GPU memory, lack of rich query evaluation operators, limited scalability, and GPU under-utilization.

References

  1. Richard Bieringa, Abijith Radhakrishnan, Tavneet Singh, Sophie Vos, Jesse Donkervliet, and Alexandru Iosup. 2021. An Empirical Evaluation of the Performance of Video Conferencing Systems. In Companion of the ACM/SPEC International Conference on Performance Engineering . 65--71.Google ScholarGoogle Scholar
  2. BlazingSQL, Inc. 2021 a. BlazingSQL - Source Code Repository on GitHub . URL: https://github.com/BlazingDB .Google ScholarGoogle Scholar
  3. BlazingSQL, Inc. 2021 b. BlazingSQL - The Official Homepage . URL: https://blazingsql.com/.Google ScholarGoogle Scholar
  4. Sebastian Breß. 2014. The Design and Implementation of CoGaDB: A Column-oriented hboxGPU-accelerated DBMS . Datenbank-Spektrum , Vol. 14, 3 (2014), 199--209.Google ScholarGoogle ScholarCross RefCross Ref
  5. Sebastian Breß and Gunter Saake. 2013. Why It Is Time for a HyPE: A Hybrid Query Processing Engine for Efficient GPU Coprocessing in DBMS . Proceedings of the VLDB Endowment , Vol. 6, 12 (2013), 1398--1403.Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Jared Casper and Kunle Olukotun. 2014. Hardware Acceleration of Database Operations. In Proceedings of the 2014 ACM/SIGDA International Symposium on Field Programmable Gate Arrays. 151--160.Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Zhifeng Chen, Yan Zhang, Yuanyuan Zhou, Heidi Scott, and Berni Schiefer. 2005. Empirical Evaluation of Multi-level Buffer Cache Collaboration for Storage Systems. In Proceedings of the 2005 ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Systems. 145--156.Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Periklis Chrysogelos, Panagiotis Sioulas, and Anastasia Ailamaki. 2019. Hardware-conscious Query Processing in GPU-accelerated Analytical Engines. In Proceesings of the 9th Biennial Conference on Innovative Data Systems Research. www.cidrdb.org.Google ScholarGoogle Scholar
  9. Hawon Chu, Seounghyun Kim, Joo-Young Lee, and Young-Kyoon Suh. 2020. Empirical Evaluation across Multiple hboxGPU-accelerated DBMSes. In Proceedings of the 16th International Workshop on Data Management on New Hardware . ACM, Article 16, bibinfonumpages3 pages.Google ScholarGoogle Scholar
  10. Junyoung Chung, Caglar Gulcehre, KyungHyun Cho, and Yoshua Bengio. 2014. Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling . arXiv preprint arXiv:1412.3555 (2014).Google ScholarGoogle Scholar
  11. Louise Helen Crockett, Ross Elliot, Martin Enderwitz, and Robert Stewart. 2014. The Zynq Book: Embedded Processing with the ARM Cortex-A9 on the Xilinx Zynq-7000 All Programmable SoC.Google ScholarGoogle Scholar
  12. Sabah Currim, Richard T. Snodgrass, Young-Kyoon Suh, and Rui Zhang. 2016. hboxDBMS Metrology: Measuring Query Time . ACM Transactions on Database Systems , Vol. 42, 1, Article 3 (2016), bibinfonumpages42 pages.Google ScholarGoogle Scholar
  13. Sabah Currim, Richard T. Snodgrass, Young-Kyoon Suh, Rui Zhang, Matthew Wong Johnson, and Cheng Yi. 2013. DBMS Metrology: Measuring Query Time. In Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data. ACM, 421--432.Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Rhian M Daniel, Bianca L De Stavola, SN Cousens, and Stijn Vansteelandt. 2015. Causal Mediation Analysis with Multiple Mediators . Biometrics , Vol. 71, 1 (2015), 1--14.Google ScholarGoogle ScholarCross RefCross Ref
  15. Songyun Duan, Vamsidhar Thummala, and Shivnath Babu. 2009. Tuning Database Configuration Parameters with iTuned . Proceedings of the VLDB Endowment , Vol. 2, 1 (2009), 1246--1257.Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Phillip Ein-Dor and Eli Segev. 1982. Organizational Context and MIS Structure: Some Empirical Evidence . MIS Quarterly , Vol. 6, 3 (1982), 55--68.Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Jian Fang, Yvo TB Mulder, Jan Hidders, Jinho Lee, and H Peter Hofstee. 2020. In-memory Database Acceleration on FPGAs: A Survey . The VLDB Journal , Vol. 29, 1 (2020), 33--59.Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Sofoklis Floratos, Mengbai Xiao, Hao Wang, Chengxin Guo, Yuan Yuan, Rubao Lee, and Xiaodong Zhang. 2021. NestGPU: Nested Query Processing on GPU. In Proceedings of the 37th IEEE International Conference on Data Engineering. IEEE, 1008--1019.Google ScholarGoogle ScholarCross RefCross Ref
  19. Francisco, Phil. 2021. IBM PureData System for Analytics Architecture . URL: https://www.redbooks.ibm.com/redpapers/pdfs/redp4725.pdf .Google ScholarGoogle Scholar
  20. Emily Furst, Mark Oskin, and Bill Howe. 2017. Profiling a GPU Database Implementation: A Holistic View of GPU Resource Utilization on TPC-H Queries. In Proceedings of the 13th International Workshop on Data Management on New Hardware. ACM, Article 3, bibinfonumpages6 pages.Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Gigabyte. 2021. Z370 AORUS Gaming 7 . URL: https://www.gigabyte.com/us/Motherboard/Z370-AORUS-Gaming-7-rev-10 .Google ScholarGoogle Scholar
  22. Guin Gilman, Samuel S Ogden, Tian Guo, and Robert J Walls. 2021. Demystifying the Placement Policies of the NVIDIA GPU Thread Block Scheduler for Concurrent Kernels. ACM SIGMETRICS Performance Evaluation Review , Vol. 48, 3 (2021), 81--88.Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Moises Goldszmidt and Rebecca Isaacs. 2011. More Intervention Now!. In Proceedings of the 13th USENIX Conference on Hot Topics in Operating Systems. USENIX Association, 25.Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Gupta, Prabhat K. 2016. Accelerating Datacenter Workloads . URL: https://www.fpl2016.org/slides/Gupta -- Accelerating Datacenter Workloads.pdf .Google ScholarGoogle Scholar
  25. Andrew F. Hayes. 2017. Introduction to Mediation, Moderation, and Conditional Process Analysis: A Regression-Based Approach. Guilford publications.Google ScholarGoogle Scholar
  26. Li-tze Hu and Peter M. Bentler. 1999. Cutoff Criteria for Fit Indexes in Covariance Structure Analysis: Conventional Criteria Versus New Alternatives . Structural Equation Modeling: A Multidisciplinary Journal , Vol. 6, 1 (1999), 1--55.Google ScholarGoogle Scholar
  27. IDG Communications. 2021. KickFire from IDG . URL: https://www.kickfire.com/.Google ScholarGoogle Scholar
  28. S. Idreos, F. Groffen, N. Nes, S. Manegold, S. Mullender, and M. Kersten. 2012. Monetdb: Two Decades of Research in Column-Oriented Database . IEEE Data Engineering Bulletin , Vol. 35, 1 (2012), 40--45.Google ScholarGoogle Scholar
  29. Kinetica DB Inc. 2021. Kinetica High Performance Analytics Database . URL: https://www.kinetica.com/.Google ScholarGoogle Scholar
  30. Zhuohang Lai, Xibo Sun, Qiong Luo, and Xiaolong Xie. 2021. Accelerating Multi-Way Joins on the GPU . The VLDB Journal (2021), 1--25. https://doi.org/10.1007/s00778-021-00708-yGoogle ScholarGoogle ScholarDigital LibraryDigital Library
  31. Jinho Lee, Heesu Kim, Sungjoo Yoo, Kiyoung Choi, H. Peter Hofstee, Gi-Joon Nam, Mark R. Nutter, and Damir Jamsek. 2017. Extrav: Boosting Graph Processing Near Storage with a Coherent Accelerator . Proceedings of the VLDB Endowment , Vol. 10, 12 (2017), 1706--1717.Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Viktor Leis. 2019. Join Order Benchmark . URL: https://github.com/gregrahn/join-order-benchmark .Google ScholarGoogle Scholar
  33. Viktor Leis, Andrey Gubichev, Atanas Mirchev, Peter Boncz, Alfons Kemper, and Thomas Neumann. 2015. How Good Are Query Optimizers, Really? Proceedings of the VLDB Endowment , Vol. 9, 3 (2015), 204--215.Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Clemens Lutz, Sebastian Breß, Steffen Zeuch, Tilmann Rabl, and Volker Markl. 2020. Pump up the Volume: Processing Large Data on GPUs with Fast Interconnects. In Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data . 1633--1649.Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Stefan Manegold. 2008. An Empirical Evaluation of XQuery Processors . Information Systems , Vol. 33, 2 (2008), 203--220.Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Anton Marks. 2017. Alenka - GPU Database Engine . URL: https://github.com/antonmks/Alenka .Google ScholarGoogle Scholar
  37. Michele Mazzucco and Isi Mitrani. 2012. Empirical Evaluation of Power Saving Policies for Data Centers . ACM SIGMETRICS Performance Evaluation Review , Vol. 40, 3 (2012), 18--22.Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Monash Research (DBMS2). 2009. Kickfire's FPGA-based Technical Strategy . URL: https://www.dbms2.com/2009/08/21/kickfires-fpga-based-technical-strategy/.Google ScholarGoogle Scholar
  39. Rene Mueller, Jens Teubner, and Gustavo Alonso. 2009. Data Processing on FPGAs . Proceedings of the VLDB Endowment , Vol. 2, 1 (2009), 910--921.Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. NVIDIA. 2021 a. CUDA CGoogle ScholarGoogle Scholar
  41. Programming Guide . URL: https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html .Google ScholarGoogle Scholar
  42. NVIDIA. 2021 b. GeForce GTX 1080 Ti . URL: https://www.nvidia.com/en-sg/geforce/products/10series/geforce-gtx-1080-ti/.Google ScholarGoogle Scholar
  43. NVIDIA. 2021 c. GeForce RTX 2080 Ti . URL: https://www.nvidia.com/en-us/geforce/graphics-cards/rtx-2080-ti/.Google ScholarGoogle Scholar
  44. NVIDIA. 2021 d. GeForce RTX 3090 . URL: https://www.nvidia.com/en-us/geforce/graphics-cards/30-series/rtx-3090/.Google ScholarGoogle Scholar
  45. NVIDIA. 2021 e. Nsight Systems User Guide . URL: https://docs.nvidia.com/nsight-systems/UserGuide/index.html .Google ScholarGoogle Scholar
  46. NVIDIA. 2021 f. NVIDIA Nsight Compute . URL: https://developer.nvidia.com/nsight-compute-2019_5 .Google ScholarGoogle Scholar
  47. NVIDIA. 2021 g. NVIDIA Nsight Systems . URL: https://developer.nvidia.com/nsight-systems .Google ScholarGoogle Scholar
  48. NVIDIA. 2021 h. Profiler User's Guide . URL: https://docs.nvidia.com/cuda/profiler-users-guide/index.html .Google ScholarGoogle Scholar
  49. OmniSci, Inc. 2021. OmniSciDB - The Official Website . URL: https://omnisci.com/platform/omniscidb .Google ScholarGoogle Scholar
  50. OmniSci, Inc. 2021. OmniSciDB (formerly MapD Core) GitHub . URL: https://github.com/omnisci/omniscidb .Google ScholarGoogle Scholar
  51. Patrick O'Neil, Betty O'Neil, and Xuedong Chen. 2009a. Star Schema Benchmark . URL: https://www.cs.umb.edu/ poneil/StarSchemaB.PDF .Google ScholarGoogle Scholar
  52. Patrick O'Neil, Elizabeth O'Neil, Xuedong Chen, and Stephen Revilak. 2009b. The Star Schema Benchmark and Augmented Fact Table Indexing. In Technology Conference on Performance Evaluation and Benchmarking. Springer, 237--252.Google ScholarGoogle Scholar
  53. Johns Paul, Jiong He, and Bingsheng He. 2016. GPL: A GPU-based Pipelined Query Processing Engine. In Proceedings of the 2016 ACM SIGMOD International Conference on Management of Data . 1935--1950.Google ScholarGoogle ScholarDigital LibraryDigital Library
  54. Johns Paul, Shengliang Lu, and Bingsheng He. 2021 a. Foundations and Trends® in Databases , Vol. 11, 1 (2021), 1--108.Google ScholarGoogle Scholar
  55. Johns Paul, Shengliang Lu, Bingsheng He, and Chiew Tong Lau. 2021 b. MG-Join: A Scalable Join for Massively Parallel Multi-GPU Architectures. In Proceedings of the 2021 ACM SIGMOD International Conference on Management of Data . ACM, 1413--1425.Google ScholarGoogle ScholarDigital LibraryDigital Library
  56. Judea Pearl. 2012. The Causal Foundations of Structural Equation Modeling. Technical Report. UCLA Computer Science Department.Google ScholarGoogle Scholar
  57. PG-Strom Development Team. 2021 a. PG-Strom Manual - Home . URL: https://heterodb.github.io/pg-strom/.Google ScholarGoogle Scholar
  58. PG-Strom Development Team. 2021 b. PG-Strom Manual - License activation . URL: http://heterodb.github.io/pg-strom?inebreak/install/#license-activation .Google ScholarGoogle Scholar
  59. Holger Pirk, Oscar Moll, Matei Zaharia, and Sam Madden. 2016. Voodoo-A Vector Algebra for Portable Database Performance on Modern Hardware . Proceedings of the VLDB Endowment , Vol. 9, 14 (2016), 1707--1718.Google ScholarGoogle ScholarDigital LibraryDigital Library
  60. R Core Team. 2021. R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing. https://www.R-project.org/Google ScholarGoogle Scholar
  61. Syed Mohammad Aunn Raza, Periklis Chrysogelos, Panagiotis Sioulas, Vladimir Indjic, Angelos Christos Anadiotis, and Anastasia Ailamaki. 2020. hboxGPU-accelerated Data Management under the Test of Time. In Proceesings of the 10th Conference on Innovative Data Systems Research . www.cidrdb.org.Google ScholarGoogle Scholar
  62. Yves Rosseel. 2012. Lavaan: An R package for Structural Equation Modeling . Journal of Statistical Software , Vol. 48, 2 (2012), 1--36.Google ScholarGoogle ScholarCross RefCross Ref
  63. Behzad Salami, Oriol Arcas-Abella, and Nehir Sonmez. 2015. HATCH: Hash Table Caching in Hardware for Efficient Relational Join on FPGA. In 2015 IEEE 23rd Annual International Symposium on Field-Programmable Custom Computing Machines. IEEE, 163--163.Google ScholarGoogle Scholar
  64. Raja R. Sambasivan, Ilari Shafer, Jonathan Mace, Benjamin H. Sigelman, Rodrigo Fonseca, and Gregory R. Ganger. 2016. Principled Workflow-Centric Tracing of Distributed Systems. In Proceedings of the Seventh ACM Symposium on Cloud Computing. ACM, 401--414.Google ScholarGoogle Scholar
  65. Todd C. Scofield, Jeffrey A. Delmerico, Vipin Chaudhary, and Geno Valente. 2010. XtremeData dbX: An FPGA-Based Data Warehouse Appliance . Computing in Science & Engineering , Vol. 12, 4 (2010), 66--73.Google ScholarGoogle ScholarDigital LibraryDigital Library
  66. Anil Shanbhag, Samuel Madden, and Xiangyao Yu. 2020. A Study of the Fundamental Performance Characteristics of GPUs and CPUs for Database Analytics. In Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data. ACM, 1617--1632.Google ScholarGoogle ScholarDigital LibraryDigital Library
  67. Martyn Shuttleworth. 2008. Operationalization . URL: https://explorable.com/operationalization .Google ScholarGoogle Scholar
  68. Richard T. Snodgrass, Sabah Currim, and Young-Kyoon Suh. 2021. Have Query Optimizers Hit the Wall . The VLDB Journal (2021), 1--20. https://doi.org/10.1007/s00778-021-00689-yGoogle ScholarGoogle ScholarDigital LibraryDigital Library
  69. SQream Technologies. 2021. SQream - The Official Website . URL: https://sqream.com/.Google ScholarGoogle Scholar
  70. Young-Kyoon Suh, Seounghyeon Kim, Hawon Chu, Joo-Young Lee, Junyoung An, and Kyong-Ha Lee. 2021. An hboxExperimental Study Across GPU DBMSes Toward Cost-Effective Analytical Processing . IEICE Transactions on hboxInformation and Systems , Vol. E104-D, 5 (2021), 551--555.Google ScholarGoogle Scholar
  71. Young-Kyoon Suh, Richard T. Snodgrass, and Sabah Currim. 2017. An Empirical Study of Transaction Throughput Thrashing across Multiple Relational DBMSes . Information Systems , Vol. 66 (2017), 119--136.Google ScholarGoogle ScholarDigital LibraryDigital Library
  72. Xulong Tang, Ashutosh Pattnaik, Onur Kayiran, Adwait Jog, Mahmut Taylan Kandemir, and Chita Das. 2019. hboxQuantifying Data Locality in Dynamic Parallelism in GPUs . ACM SIGMETRICS Performance Evaluation Review , Vol. 47, 1 (2019), 25--26.Google ScholarGoogle ScholarDigital LibraryDigital Library
  73. Transaction Processing Performance Council. 2021 a. TPC-DS . URL: http://www.tpc.org/tpcds/.Google ScholarGoogle Scholar
  74. Transaction Processing Performance Council. 2021 b. TPC-H . URL: http://www.tpc.org/tpc_documents_current_?inebreakversions/pdf/tpc-h_v2.17.1.pdf .Google ScholarGoogle Scholar
  75. Dana Van Aken, Andrew Pavlo, Geoffrey J. Gordon, and Bohan Zhang. 2017. Automatic Database Management System Tuning Through Large-Scale Machine Learning. In Proceedings of the 2017 ACM International Conference on Management of Data. ACM, 1009--1024.Google ScholarGoogle ScholarDigital LibraryDigital Library
  76. Kaibo Wang, Kai Zhang, Yuan Yuan, Siyuan Ma, Rubao Lee, Xiaoning Ding, and Xiaodong Zhang. 2014. Concurrent Analytical Query Processing with GPUs . Proceedings of the VLDB Endowment , Vol. 7, 11 (2014), 1011--1022.Google ScholarGoogle ScholarDigital LibraryDigital Library
  77. Wccftech. 2021. NVIDIA GeForce RTX 30 Series `Ampere' Graphics Card Specifications . URL: https://wccftech.com/nvidia-geforce-rtx-3080-ti-20-gb-graphics-card-specs-leak/.Google ScholarGoogle Scholar
  78. Wikipedia. 2021. List of Nvidia Graphics Processing Units . URL: https://en.wikipedia.org/wiki/List_of_Nvidia_graphics_processing_units .Google ScholarGoogle Scholar
  79. Yingjun Wu, Joy Arulraj, Jiexi Lin, Ran Xian, and Andrew Pavlo. 2017. An Empirical Evaluation of In-memory Multi-version Concurrency Control . Proceedings of the VLDB Endowment , Vol. 10, 7 (2017), 781--792.Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. A Comprehensive Empirical Study of Query Performance Across GPU DBMSes

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        Full Access

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader
        About Cookies On This Site

        We use cookies to ensure that we give you the best experience on our website.

        Learn more

        Got it!