Abstract

In recent years, GPU database management systems (DBMSes) have rapidly become popular largely due to their remarkable acceleration capability obtained through extreme parallelism in query evaluations. However, there has been relatively little study on the characteristics of these GPU DBMSes for a better understanding of their query performance in various contexts. Also, little has been known about what the potential factors could be that affect the query processing jobs within the GPU DBMSes. To fill this gap, we have conducted a study to identify such factors and to propose a structural causal model, including key factors and their relationships, to explicate the variances of the query execution times on the GPU DBMSes. We have also established a set of hypotheses drawn from the model that explained the performance characteristics. To test the model, we have designed and run comprehensive experiments and conducted in-depth statistical analyses on the obtained empirical data. As a result, our model achieves about 77% amount of variance explained on the query time and indicates that reducing kernel time and data transfer time are the key factors to improve the query time. Also, our results show that the studied systems should resolve several concerns such as bounded processing within GPU memory, lack of rich query evaluation operators, limited scalability, and GPU under-utilization.
- Richard Bieringa, Abijith Radhakrishnan, Tavneet Singh, Sophie Vos, Jesse Donkervliet, and Alexandru Iosup. 2021. An Empirical Evaluation of the Performance of Video Conferencing Systems. In Companion of the ACM/SPEC International Conference on Performance Engineering . 65--71.Google Scholar
- BlazingSQL, Inc. 2021 a. BlazingSQL - Source Code Repository on GitHub . URL: https://github.com/BlazingDB .Google Scholar
- BlazingSQL, Inc. 2021 b. BlazingSQL - The Official Homepage . URL: https://blazingsql.com/.Google Scholar
- Sebastian Breß. 2014. The Design and Implementation of CoGaDB: A Column-oriented hboxGPU-accelerated DBMS . Datenbank-Spektrum , Vol. 14, 3 (2014), 199--209.Google Scholar
Cross Ref
- Sebastian Breß and Gunter Saake. 2013. Why It Is Time for a HyPE: A Hybrid Query Processing Engine for Efficient GPU Coprocessing in DBMS . Proceedings of the VLDB Endowment , Vol. 6, 12 (2013), 1398--1403.Google Scholar
Digital Library
- Jared Casper and Kunle Olukotun. 2014. Hardware Acceleration of Database Operations. In Proceedings of the 2014 ACM/SIGDA International Symposium on Field Programmable Gate Arrays. 151--160.Google Scholar
Digital Library
- Zhifeng Chen, Yan Zhang, Yuanyuan Zhou, Heidi Scott, and Berni Schiefer. 2005. Empirical Evaluation of Multi-level Buffer Cache Collaboration for Storage Systems. In Proceedings of the 2005 ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Systems. 145--156.Google Scholar
Digital Library
- Periklis Chrysogelos, Panagiotis Sioulas, and Anastasia Ailamaki. 2019. Hardware-conscious Query Processing in GPU-accelerated Analytical Engines. In Proceesings of the 9th Biennial Conference on Innovative Data Systems Research. www.cidrdb.org.Google Scholar
- Hawon Chu, Seounghyun Kim, Joo-Young Lee, and Young-Kyoon Suh. 2020. Empirical Evaluation across Multiple hboxGPU-accelerated DBMSes. In Proceedings of the 16th International Workshop on Data Management on New Hardware . ACM, Article 16, bibinfonumpages3 pages.Google Scholar
- Junyoung Chung, Caglar Gulcehre, KyungHyun Cho, and Yoshua Bengio. 2014. Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling . arXiv preprint arXiv:1412.3555 (2014).Google Scholar
- Louise Helen Crockett, Ross Elliot, Martin Enderwitz, and Robert Stewart. 2014. The Zynq Book: Embedded Processing with the ARM Cortex-A9 on the Xilinx Zynq-7000 All Programmable SoC.Google Scholar
- Sabah Currim, Richard T. Snodgrass, Young-Kyoon Suh, and Rui Zhang. 2016. hboxDBMS Metrology: Measuring Query Time . ACM Transactions on Database Systems , Vol. 42, 1, Article 3 (2016), bibinfonumpages42 pages.Google Scholar
- Sabah Currim, Richard T. Snodgrass, Young-Kyoon Suh, Rui Zhang, Matthew Wong Johnson, and Cheng Yi. 2013. DBMS Metrology: Measuring Query Time. In Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data. ACM, 421--432.Google Scholar
Digital Library
- Rhian M Daniel, Bianca L De Stavola, SN Cousens, and Stijn Vansteelandt. 2015. Causal Mediation Analysis with Multiple Mediators . Biometrics , Vol. 71, 1 (2015), 1--14.Google Scholar
Cross Ref
- Songyun Duan, Vamsidhar Thummala, and Shivnath Babu. 2009. Tuning Database Configuration Parameters with iTuned . Proceedings of the VLDB Endowment , Vol. 2, 1 (2009), 1246--1257.Google Scholar
Digital Library
- Phillip Ein-Dor and Eli Segev. 1982. Organizational Context and MIS Structure: Some Empirical Evidence . MIS Quarterly , Vol. 6, 3 (1982), 55--68.Google Scholar
Digital Library
- Jian Fang, Yvo TB Mulder, Jan Hidders, Jinho Lee, and H Peter Hofstee. 2020. In-memory Database Acceleration on FPGAs: A Survey . The VLDB Journal , Vol. 29, 1 (2020), 33--59.Google Scholar
Digital Library
- Sofoklis Floratos, Mengbai Xiao, Hao Wang, Chengxin Guo, Yuan Yuan, Rubao Lee, and Xiaodong Zhang. 2021. NestGPU: Nested Query Processing on GPU. In Proceedings of the 37th IEEE International Conference on Data Engineering. IEEE, 1008--1019.Google Scholar
Cross Ref
- Francisco, Phil. 2021. IBM PureData System for Analytics Architecture . URL: https://www.redbooks.ibm.com/redpapers/pdfs/redp4725.pdf .Google Scholar
- Emily Furst, Mark Oskin, and Bill Howe. 2017. Profiling a GPU Database Implementation: A Holistic View of GPU Resource Utilization on TPC-H Queries. In Proceedings of the 13th International Workshop on Data Management on New Hardware. ACM, Article 3, bibinfonumpages6 pages.Google Scholar
Digital Library
- Gigabyte. 2021. Z370 AORUS Gaming 7 . URL: https://www.gigabyte.com/us/Motherboard/Z370-AORUS-Gaming-7-rev-10 .Google Scholar
- Guin Gilman, Samuel S Ogden, Tian Guo, and Robert J Walls. 2021. Demystifying the Placement Policies of the NVIDIA GPU Thread Block Scheduler for Concurrent Kernels. ACM SIGMETRICS Performance Evaluation Review , Vol. 48, 3 (2021), 81--88.Google Scholar
Digital Library
- Moises Goldszmidt and Rebecca Isaacs. 2011. More Intervention Now!. In Proceedings of the 13th USENIX Conference on Hot Topics in Operating Systems. USENIX Association, 25.Google Scholar
Digital Library
- Gupta, Prabhat K. 2016. Accelerating Datacenter Workloads . URL: https://www.fpl2016.org/slides/Gupta -- Accelerating Datacenter Workloads.pdf .Google Scholar
- Andrew F. Hayes. 2017. Introduction to Mediation, Moderation, and Conditional Process Analysis: A Regression-Based Approach. Guilford publications.Google Scholar
- Li-tze Hu and Peter M. Bentler. 1999. Cutoff Criteria for Fit Indexes in Covariance Structure Analysis: Conventional Criteria Versus New Alternatives . Structural Equation Modeling: A Multidisciplinary Journal , Vol. 6, 1 (1999), 1--55.Google Scholar
- IDG Communications. 2021. KickFire from IDG . URL: https://www.kickfire.com/.Google Scholar
- S. Idreos, F. Groffen, N. Nes, S. Manegold, S. Mullender, and M. Kersten. 2012. Monetdb: Two Decades of Research in Column-Oriented Database . IEEE Data Engineering Bulletin , Vol. 35, 1 (2012), 40--45.Google Scholar
- Kinetica DB Inc. 2021. Kinetica High Performance Analytics Database . URL: https://www.kinetica.com/.Google Scholar
- Zhuohang Lai, Xibo Sun, Qiong Luo, and Xiaolong Xie. 2021. Accelerating Multi-Way Joins on the GPU . The VLDB Journal (2021), 1--25. https://doi.org/10.1007/s00778-021-00708-yGoogle Scholar
Digital Library
- Jinho Lee, Heesu Kim, Sungjoo Yoo, Kiyoung Choi, H. Peter Hofstee, Gi-Joon Nam, Mark R. Nutter, and Damir Jamsek. 2017. Extrav: Boosting Graph Processing Near Storage with a Coherent Accelerator . Proceedings of the VLDB Endowment , Vol. 10, 12 (2017), 1706--1717.Google Scholar
Digital Library
- Viktor Leis. 2019. Join Order Benchmark . URL: https://github.com/gregrahn/join-order-benchmark .Google Scholar
- Viktor Leis, Andrey Gubichev, Atanas Mirchev, Peter Boncz, Alfons Kemper, and Thomas Neumann. 2015. How Good Are Query Optimizers, Really? Proceedings of the VLDB Endowment , Vol. 9, 3 (2015), 204--215.Google Scholar
Digital Library
- Clemens Lutz, Sebastian Breß, Steffen Zeuch, Tilmann Rabl, and Volker Markl. 2020. Pump up the Volume: Processing Large Data on GPUs with Fast Interconnects. In Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data . 1633--1649.Google Scholar
Digital Library
- Stefan Manegold. 2008. An Empirical Evaluation of XQuery Processors . Information Systems , Vol. 33, 2 (2008), 203--220.Google Scholar
Digital Library
- Anton Marks. 2017. Alenka - GPU Database Engine . URL: https://github.com/antonmks/Alenka .Google Scholar
- Michele Mazzucco and Isi Mitrani. 2012. Empirical Evaluation of Power Saving Policies for Data Centers . ACM SIGMETRICS Performance Evaluation Review , Vol. 40, 3 (2012), 18--22.Google Scholar
Digital Library
- Monash Research (DBMS2). 2009. Kickfire's FPGA-based Technical Strategy . URL: https://www.dbms2.com/2009/08/21/kickfires-fpga-based-technical-strategy/.Google Scholar
- Rene Mueller, Jens Teubner, and Gustavo Alonso. 2009. Data Processing on FPGAs . Proceedings of the VLDB Endowment , Vol. 2, 1 (2009), 910--921.Google Scholar
Digital Library
- NVIDIA. 2021 a. CUDA CGoogle Scholar
- Programming Guide . URL: https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html .Google Scholar
- NVIDIA. 2021 b. GeForce GTX 1080 Ti . URL: https://www.nvidia.com/en-sg/geforce/products/10series/geforce-gtx-1080-ti/.Google Scholar
- NVIDIA. 2021 c. GeForce RTX 2080 Ti . URL: https://www.nvidia.com/en-us/geforce/graphics-cards/rtx-2080-ti/.Google Scholar
- NVIDIA. 2021 d. GeForce RTX 3090 . URL: https://www.nvidia.com/en-us/geforce/graphics-cards/30-series/rtx-3090/.Google Scholar
- NVIDIA. 2021 e. Nsight Systems User Guide . URL: https://docs.nvidia.com/nsight-systems/UserGuide/index.html .Google Scholar
- NVIDIA. 2021 f. NVIDIA Nsight Compute . URL: https://developer.nvidia.com/nsight-compute-2019_5 .Google Scholar
- NVIDIA. 2021 g. NVIDIA Nsight Systems . URL: https://developer.nvidia.com/nsight-systems .Google Scholar
- NVIDIA. 2021 h. Profiler User's Guide . URL: https://docs.nvidia.com/cuda/profiler-users-guide/index.html .Google Scholar
- OmniSci, Inc. 2021. OmniSciDB - The Official Website . URL: https://omnisci.com/platform/omniscidb .Google Scholar
- OmniSci, Inc. 2021. OmniSciDB (formerly MapD Core) GitHub . URL: https://github.com/omnisci/omniscidb .Google Scholar
- Patrick O'Neil, Betty O'Neil, and Xuedong Chen. 2009a. Star Schema Benchmark . URL: https://www.cs.umb.edu/ poneil/StarSchemaB.PDF .Google Scholar
- Patrick O'Neil, Elizabeth O'Neil, Xuedong Chen, and Stephen Revilak. 2009b. The Star Schema Benchmark and Augmented Fact Table Indexing. In Technology Conference on Performance Evaluation and Benchmarking. Springer, 237--252.Google Scholar
- Johns Paul, Jiong He, and Bingsheng He. 2016. GPL: A GPU-based Pipelined Query Processing Engine. In Proceedings of the 2016 ACM SIGMOD International Conference on Management of Data . 1935--1950.Google Scholar
Digital Library
- Johns Paul, Shengliang Lu, and Bingsheng He. 2021 a. Foundations and Trends® in Databases , Vol. 11, 1 (2021), 1--108.Google Scholar
- Johns Paul, Shengliang Lu, Bingsheng He, and Chiew Tong Lau. 2021 b. MG-Join: A Scalable Join for Massively Parallel Multi-GPU Architectures. In Proceedings of the 2021 ACM SIGMOD International Conference on Management of Data . ACM, 1413--1425.Google Scholar
Digital Library
- Judea Pearl. 2012. The Causal Foundations of Structural Equation Modeling. Technical Report. UCLA Computer Science Department.Google Scholar
- PG-Strom Development Team. 2021 a. PG-Strom Manual - Home . URL: https://heterodb.github.io/pg-strom/.Google Scholar
- PG-Strom Development Team. 2021 b. PG-Strom Manual - License activation . URL: http://heterodb.github.io/pg-strom?inebreak/install/#license-activation .Google Scholar
- Holger Pirk, Oscar Moll, Matei Zaharia, and Sam Madden. 2016. Voodoo-A Vector Algebra for Portable Database Performance on Modern Hardware . Proceedings of the VLDB Endowment , Vol. 9, 14 (2016), 1707--1718.Google Scholar
Digital Library
- R Core Team. 2021. R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing. https://www.R-project.org/Google Scholar
- Syed Mohammad Aunn Raza, Periklis Chrysogelos, Panagiotis Sioulas, Vladimir Indjic, Angelos Christos Anadiotis, and Anastasia Ailamaki. 2020. hboxGPU-accelerated Data Management under the Test of Time. In Proceesings of the 10th Conference on Innovative Data Systems Research . www.cidrdb.org.Google Scholar
- Yves Rosseel. 2012. Lavaan: An R package for Structural Equation Modeling . Journal of Statistical Software , Vol. 48, 2 (2012), 1--36.Google Scholar
Cross Ref
- Behzad Salami, Oriol Arcas-Abella, and Nehir Sonmez. 2015. HATCH: Hash Table Caching in Hardware for Efficient Relational Join on FPGA. In 2015 IEEE 23rd Annual International Symposium on Field-Programmable Custom Computing Machines. IEEE, 163--163.Google Scholar
- Raja R. Sambasivan, Ilari Shafer, Jonathan Mace, Benjamin H. Sigelman, Rodrigo Fonseca, and Gregory R. Ganger. 2016. Principled Workflow-Centric Tracing of Distributed Systems. In Proceedings of the Seventh ACM Symposium on Cloud Computing. ACM, 401--414.Google Scholar
- Todd C. Scofield, Jeffrey A. Delmerico, Vipin Chaudhary, and Geno Valente. 2010. XtremeData dbX: An FPGA-Based Data Warehouse Appliance . Computing in Science & Engineering , Vol. 12, 4 (2010), 66--73.Google Scholar
Digital Library
- Anil Shanbhag, Samuel Madden, and Xiangyao Yu. 2020. A Study of the Fundamental Performance Characteristics of GPUs and CPUs for Database Analytics. In Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data. ACM, 1617--1632.Google Scholar
Digital Library
- Martyn Shuttleworth. 2008. Operationalization . URL: https://explorable.com/operationalization .Google Scholar
- Richard T. Snodgrass, Sabah Currim, and Young-Kyoon Suh. 2021. Have Query Optimizers Hit the Wall . The VLDB Journal (2021), 1--20. https://doi.org/10.1007/s00778-021-00689-yGoogle Scholar
Digital Library
- SQream Technologies. 2021. SQream - The Official Website . URL: https://sqream.com/.Google Scholar
- Young-Kyoon Suh, Seounghyeon Kim, Hawon Chu, Joo-Young Lee, Junyoung An, and Kyong-Ha Lee. 2021. An hboxExperimental Study Across GPU DBMSes Toward Cost-Effective Analytical Processing . IEICE Transactions on hboxInformation and Systems , Vol. E104-D, 5 (2021), 551--555.Google Scholar
- Young-Kyoon Suh, Richard T. Snodgrass, and Sabah Currim. 2017. An Empirical Study of Transaction Throughput Thrashing across Multiple Relational DBMSes . Information Systems , Vol. 66 (2017), 119--136.Google Scholar
Digital Library
- Xulong Tang, Ashutosh Pattnaik, Onur Kayiran, Adwait Jog, Mahmut Taylan Kandemir, and Chita Das. 2019. hboxQuantifying Data Locality in Dynamic Parallelism in GPUs . ACM SIGMETRICS Performance Evaluation Review , Vol. 47, 1 (2019), 25--26.Google Scholar
Digital Library
- Transaction Processing Performance Council. 2021 a. TPC-DS . URL: http://www.tpc.org/tpcds/.Google Scholar
- Transaction Processing Performance Council. 2021 b. TPC-H . URL: http://www.tpc.org/tpc_documents_current_?inebreakversions/pdf/tpc-h_v2.17.1.pdf .Google Scholar
- Dana Van Aken, Andrew Pavlo, Geoffrey J. Gordon, and Bohan Zhang. 2017. Automatic Database Management System Tuning Through Large-Scale Machine Learning. In Proceedings of the 2017 ACM International Conference on Management of Data. ACM, 1009--1024.Google Scholar
Digital Library
- Kaibo Wang, Kai Zhang, Yuan Yuan, Siyuan Ma, Rubao Lee, Xiaoning Ding, and Xiaodong Zhang. 2014. Concurrent Analytical Query Processing with GPUs . Proceedings of the VLDB Endowment , Vol. 7, 11 (2014), 1011--1022.Google Scholar
Digital Library
- Wccftech. 2021. NVIDIA GeForce RTX 30 Series `Ampere' Graphics Card Specifications . URL: https://wccftech.com/nvidia-geforce-rtx-3080-ti-20-gb-graphics-card-specs-leak/.Google Scholar
- Wikipedia. 2021. List of Nvidia Graphics Processing Units . URL: https://en.wikipedia.org/wiki/List_of_Nvidia_graphics_processing_units .Google Scholar
- Yingjun Wu, Joy Arulraj, Jiexi Lin, Ran Xian, and Andrew Pavlo. 2017. An Empirical Evaluation of In-memory Multi-version Concurrency Control . Proceedings of the VLDB Endowment , Vol. 10, 7 (2017), 781--792.Google Scholar
Digital Library
Index Terms
A Comprehensive Empirical Study of Query Performance Across GPU DBMSes
Recommendations
Empirical evaluation across multiple GPU-accelerated DBMSes
DaMoN '20: Proceedings of the 16th International Workshop on Data Management on New HardwareIn this paper we conduct an empirical study across modern GPU-accelerated DBMSes with TPC-H workloads. Our rigorous experiments demonstrate that the studied DBMSes appear to utilize GPU resource effectively but do not scale well with growing databases ...
A Comprehensive Empirical Study of Query Performance Across GPU DBMSes
SIGMETRICS '22In recent years, GPU database management systems (DBMSes) have rapidly become popular largely due to their remarkable acceleration capability obtained through extreme parallelism in query evaluations. However, there has been relatively little study on ...
A Comprehensive Empirical Study of Query Performance Across GPU DBMSes
SIGMETRICS/PERFORMANCE '22: Abstract Proceedings of the 2022 ACM SIGMETRICS/IFIP PERFORMANCE Joint International Conference on Measurement and Modeling of Computer SystemsIn recent years, GPU database management systems (DBMSes) have rapidly become popular largely due to their remarkable acceleration capability obtained through extreme parallelism in query evaluations. However, there has been relatively little study on ...






Comments